Visualizing proteins & their evolution.
Abstract: We present a tutorial for Cn3D, a molecular visualization program that allows students to see the tertiary structure of a protein and compare it with the primary structure of the same protein (Sayers et al., 2009). Students can also use the program to visualize two major evolutionary mechanisms: duplication and divergence, and exon shuffling.

Key Words: Cn3D; molecular visualization; protein simulation; evolution.
Subject: Amino acids (Tests, problems and exercises)
Proteins (Tests, problems and exercises)
Authors: Offner, Susan
Pohlman, Robert F.
Pub Date: 08/01/2010
Publication: Name: The American Biology Teacher Publisher: National Association of Biology Teachers Audience: Academic; Professional Format: Magazine/Journal Subject: Biological sciences; Education Copyright: COPYRIGHT 2010 National Association of Biology Teachers ISSN: 0002-7685
Issue: Date: August, 2010 Source Volume: 72 Source Issue: 6
Geographic: Geographic Scope: United States Geographic Code: 1USA United States
Accession Number: 245037768
Full Text: [ILLUSTRATION OMITTED]

* Procedure

First, install Cn3D on your computer. You can download it for free from the NCBI Web site (http://www.ncbi.nlm.nih.gov). Click on "Domains & Structures" for a page with a link for downloading Cn3D. Follow the installation directions.

I. Getting Acquainted with Cn3D

1. Go to http://www.ncbi.nlm.nih.gov/structure/ MMDB/mmdb.shtml (case sensitive).

2. On the top line, type "1D5R" in the Structure search box. This is Pten tumor suppressor protein, chosen because it contains alpha-helices, beta-pleated sheets, and coils. Click on "Go." Click on "1D5R."

3. Click on "Structure View in Cn3D" below the picture of the protein.

4. Either Cn3D will open directly or a file will download. If Cn3D has not opened, double click the downloaded file to open Cn3D.

5. The protein is visible in two windows:

a. The square window, called the structure window, shows the tertiary (three-dimensional) structure.

b. The rectangular window below, called the sequence/alignment viewer, shows the primary structure (order of amino acids). Single-letter abbreviations are used for amino acids.

6. The amino acids are the same colors in both windows, correlating primary, secondary, and tertiary structures.

7. In the default settings of Cn3D:

a. Alpha-helices are green cylinders; arrows point from the free amino end to the free carboxyl end.

b. Beta-pleated sheets (strands) are orange arrows pointing from the free amino end to the free carboxyl end.

c. Random coils are blue.

8. To rotate the molecule, place the cursor in the structure window and move it.

9. View/Restore or View/Reset restores the original rendering.

10. The sequence/alignment viewer shows that this protein contains one amino acid chain. Select some amino acids with your cursor. Looking carefully, observe that they turn yellow in both windows. You can highlight any amino acids and see where they occur in the three-dimensional structure of the protein. Hold down the shift key to highlight more than one part of the protein at the same time. Highlight several alpha-helices (green). The cylinders remain green, but the protein backbone turns yellow.

11. In the sequence/alignment viewer, the free amino end is on the left and the free carboxyl end is on the right. The gray amino acids at each end are not in the final protein.

12. Highlight some amino acids at the free amino end of the protein. In the structure window, the free amino end of the protein turns yellow. Repeat with the free carboxyl end.

13. Place the cursor on the last amino acid at the free carboxyl end. The lower left-hand corner of the sequence/alignment viewer indicates that there are 324 amino acids in this protein. You can find the number of any amino acid by placing the cursor on it.

14. Highlight several parts of the beta-pleated sheets. They turn yellow in the structure window; the arrows remain orange.

15. In the View menu, click on "Zoom In" and then "Zoom Out." View/ Restore or View/Reset restores the original setting.

16. Rotate your molecule by clicking on "Spin" in the View/Animation menu. Click on "Stop" to stop.

17. In the Style/Edit Global Style menu, uncheck "alpha-helix"; the cylinders disappear. Uncheck "Strands"; the arrows disappear. To color different parts of the molecule, set the toggle at "User Color" for the part of the molecule you want to color; manipulate the color wheel. Click on "Solvents" to see the water molecules surrounding the protein. Click on "Protein Side Chains" to see the amino acid side chains.

18. In the Style/Rendering shortcuts menu, try different rendering styles. Return to "Worms," the original rendering style.

19. In the Style/Coloring shortcuts menu, try different coloring shortcuts.

20. To see other molecules, return to the NCBI Structure home page by clicking on "Structure" in the top menu bar. Type the PDB/MMDB code into the box at the top of the page and click on "Go." To get a PDB/MMDB code, search for the protein by name from the NCBI Structure home page. Select the protein you are interested in from the many responses.

II. Hemoglobin

1. On the NCBI Structure home page, enter "2DN2" (human hemoglobin). How many amino acid chains are in hemoglobin? (Answer: 4.)

2. One hemoglobin molecule has two alpha-chains that are identical to each other and two beta-chains that are identical to each other. The alpha-chains are on lines A and C, and the beta-chains are on lines B and D. Hold down the Shift key and highlight the two alpha-chains; observe them in the structure window. Repeat for the beta-chains.

3. Find the sixth amino acid, glutamic acid (e), in the two beta-chains (lines B and D). Holding down the Shift key, highlight them both.

4. Rotating the molecule in the structure window, locate the glutamic acids on the outside of the protein. Glutamic acid is negatively charged; charged amino acids are often on the outside of proteins, where they make the protein more soluble in water. In sickle cell anemia, these glutamic acids are replaced by valines, large nonpolar amino acids. The sickle cell hemoglobins aggregate into chains thousands of molecules long, as valines on adjacent hemoglobin molecules are attracted by hydrophobic interactions. These long chains of hemoglobins cause sickle-shaped red blood cells.

5. Observe the four heme groups in the molecule. Each heme group carries one oxygen molecule (O2), so one hemoglobin molecule carries four O2. In the Edit/Global Style window, click on "Heterogens"; the heme groups disappear. Clicking on "Heterogens" again makes them reappear.

III. Using VAST to Compare Two Proteins

VAST (Vector Alignment Search Tool) shows the tertiary structures of two or more proteins superimposed on each other while the sequence/ alignment viewer displays the primary structure of each protein. Compare the beta-chain of human hemoglobin with sperm whale myoglobin. Myoglobin carries oxygen in muscle cells.

1. Go to the Structure Summary page for 2DN2, human hemoglobin. Click on "VAST" (after Related Structure). Click on chain B, "entire chain"; this is human beta-hemoglobin. This screen shows all the proteins that can be compared with human beta-hemoglobin. Check the box to the left of 1A6M_A, sperm whale myoglobin. Click on "View 3D Alignment" at the top left of your screen.

2. Human beta-hemoglobin and sperm whale myoglobin are superimposed in the structure window. The sequence/alignment viewer shows the primary structure of both proteins. Highlight differences in the primary structures, and find them in the structure window. Small differences in primary structure lead to subtle differences in tertiary structure; these cause important differences in the properties of the protein. In the Edit Global Style window in the Style menu, check "protein side chains." Look at the amino acid side chains, concentrating on places where the proteins are different. Rotate the proteins in the structure window. You are comparing the entire whale myoglobin molecule with one beta-chain of human hemoglobin.

IV. Other Proteins

INSULIN--2G4M. How many amino acid chains are in an insulin molecule? (Answer: 2.) In Style/Coloring Shortcuts/Domain, color each chain a different color. Notice the three disulfide bonds (gold).

NUCLEOSOME WITH DNA--2NZD. The nucleosome contains histone proteins and DNA. It forms when chromosomes coil up during mitosis and meiosis. How many histone proteins are in one nucleosome? (Answer: 8, labeled A through H. I and J are the strands of the DNA molecule.) In the Style/Edit Global Style menu, color the proteins magenta and the DNA white. (Suggested settings: Uncheck the "helix objects" box; view both the protein and nucleotide backbone in "wire worms" rendering; click on "Nucleotide Side Chains" to see the DNA base pairs; for color, in "user selection," select magenta for proteins and white for nucleotide backbone and nucleotide side chains; uncheck "Heterogens.") Rotate the molecule; DNA winds around each nucleosome twice. DNA is negatively charged; histones are positively charged because they contain many positively charged lysines and arginines. Holding down the Shift key, highlight some lysines (k) and arginines (r); find them in the tertiary structure of the proteins.

AQUAPORIN--3D9S. Aquaporin carries water across cell membranes. It is a homotetramer (4 identical amino acid chains) with one pore in each chain. Highlight one amino acid chain. In the Style menu, click on "Edit Global Style." Uncheck "helix objects" in the Show column to see the four amino acid chains. The chain you highlighted is yellow. Rotate the molecule 90[degrees] so that you no longer see the pores. You see the alpha-helices that cross the cell membrane. Holding down the Shift key, highlight all nine alpha-helices (green) in one amino acid chain. They are not all the same length and are in different places in the protein, but most are in the part of the protein that crosses the cell membrane.

CATALASE--1DGB. Catalase, a homotetramer, catalyzes the breakdown of hydrogen peroxide. How many amino acids are in each chain? (Answer: 498.) Holding down the Shift key, highlight one of the chains and observe its location. In the Style menu, Coloring Shortcuts: Molecule colors each amino acid chain a different color.

TUBULIN--1TUB (pig). Tubulin is a globular (roughly spherical) protein that makes up microtubules in the cytoskeleton. Cilia, flagella, and the mitotic spindle are made of microtubules; many vesicles move along microtubules. Tubulin molecules occur as dimers containing one alpha-tubulin (1TUB_A) and one beta-tubulin (1TUB_B) molecule. Highlight one chain to see alpha-tubulin and beta-tubulin individually. Compare alpha- and beta-tubulin using VAST. Click on "VAST," then click on chain A, "entire chain." Click on "1TUB_B," and click on "View 3D alignment." Alpha-tubulin contains 440 amino acids and beta-tubulin contains 427. Highlight an area where the two proteins are different (amino acids nos. 43-58 in alpha-tubulin) to see a difference in the tertiary structure of the proteins.

V. Duplication & Divergence

Genes in organisms today are descended from approximately 1200 genes that existed around the time of the universal ancestor (Nusslein-Volhard, 2006). Each of these early genes coded for a protein motif. In the ensuing 3.5 billion years, these genes have given rise to the genes that exist today through two major evolutionary mechanisms: duplication and divergence, and exon shuffling.

In duplication and divergence, an ancestral gene is duplicated through a copying error. This occurs frequently over evolutionary time, although rarely in individuals. After a gene has duplicated, one copy of the gene performs its usual function, whereas the second copy can accumulate mutations. Occasionally, the second copy accumulates mutations and codes for a useful protein that is preserved by natural selection.

This is divergence. Myoglobin and hemoglobin are similar because they evolved by duplication and divergence from an ancestral globin gene. Alpha- and beta-tubulin are similar because they evolved by duplication and divergence from another ancestral gene.

Human FSH and HCG are proteins containing one alpha-chain and one beta-chain. The alpha-chains are identical; the beta-chains are different. They arose by duplication and divergence from an ancestral gene. FSH, follicle stimulating hormone, is produced by the anterior pituitary gland. In females, it induces maturation of an egg; in males, it stimulates sperm production. HCG, human chorionic gonadotropin, is produced by fertilized eggs and prevents the breakdown of the corpus luteum, allowing the fertilized egg to implant. HCG is measured in pregnancy tests.

1. Go to 1HRP, the structure page for HCG. Observe the alpha- and beta-chains. Highlight each to see its location.

2. Click on "VAST."

3. Click on B, "entire chain."

4. Click on "1FL7_B, beta-FSH."

5. Click on "View 3D Alignment."

6. Tertiary structures of beta-FSH and beta-HCG are superimposed in the structure window. There are 145 amino acids in beta-HCG and 111 in beta-FSH.

7. In the view menu, click on "Zoom Out" to see the entire molecule.

8. Highlight all the amino acids in 1HRP_B to turn beta-HCG yellow. Repeat for beta-FSH (1FL7_B).

9. Highlight areas where the primary structures differ (amino acids nos. 72-79 and 46-41 in beta-HCG). Observe how the tertiary structures also differ.

VI. Exon Shuffling

Exon shuffling is another major evolutionary mechanism. One form of exon shuffling occurs when exons from different genes combine to form a new gene; the new gene codes for a protein containing motifs from several different proteins.

Notch, a protein found in all animals, has many functions during early embryonic development. Human Notch contains 2703 amino acids, with an extracellular domain outside the cell, one transmembrane segment passing through the cell membrane, and an intracellular domain inside the cell (Figure 1).

The free amino end of the protein is outside the cell. The extracellular domain contains 36 EGF (epidermal growth factor) repeats, followed by 3 NL domains, then two NOD regions. The transmembrane segment, an alpha-helix between amino acids nos. 1745-1767, is next.

The intracellular domain contains six ankyrin repeats, the final motif found in Notch. When Notch is activated, this part of the protein is cleaved, enters the nucleus, and binds to specific DNA sequences, turning genes on. The molecule binding to the outside of Notch causes genes to be expressed even though it never enters the cell (Barrick & Kopan, 2006).

Observe these parts of the Notch protein:

1TOZ--Human Notch 1. The ligand binding region shown here contains three EGF repeats, each 38 amino acids long. There are 12 of these regions in one Notch protein, each containing three beta-pleated sheets. Each EGF repeat contains one two-stranded beta-pleated sheet. Six cysteines in each EGF repeat form three disulfide bonds (gold). 1PB5--NL domain. One NL domain contains 35 amino acids, no alphahelices, and no beta-pleated sheets. There are three disulfide bonds (gold). Each Notch protein contains three NL domains.

2F8Y--ankyrin repeat. The six ankyrin repeats combined contain two nearly identical amino acid chains 223 amino acids long. Each amino acid chain has 12 alpha-helices and either four or six beta-pleated sheets. Each beta-pleated sheet has two strands.

How did Notch arise by exon shuffling? Choanoflagellates are one-celled eukaryotes that are the closest relatives of animals (Figure 2). They are widely distributed in aquatic environments (King, 2005). See Choano-Wiki (http://www.choano.org/wiki/ChoanoWiki) for a movie (choanoflagellate gallery). Because choanoflagellates do not have a Notch protein and all animals do, we can infer that Notch evolved after the last common ancestor of animals and choanoflagellates, and before the earliest animal (Figure 2; King et al., 2008). Choanoflagellates have genes coding for three separate proteins, each of which has parts found in Notch. Choanoflagellate protein N1 contains six ankyrin repeats. Choanoflagellate protein N2 contains two NL domains, and choanoflagellate protein N3 contains 36 EGF repeats (Figure 3).

[FIGURE 1 OMITTED]

[FIGURE 2 OMITTED]

[FIGURE 3 OMITTED]

Early in animal evolution, over half a billion years ago, the genes that code for these three proteins in choanoflagellates recombined by exon shuffling to form the gene that codes for Notch (King et al., 2008).

VII. Exporting Your Structure

1. Open the protein structure.

a. In the File menu, select "Export PNG."

b. Export as "protein.png."

c. Add ".png" to the end of your file.

2. Dragging this file onto a Word or PowerPoint file adds the protein to your document.

* Acknowledgments

The idea for this tutorial and some of the initial procedures came from a bioinformatics workshop for high school biology teachers presented by Harvard University's Life Sciences--HHMI Outreach Program, and the accompanying handout entitled "Bioinformatics Lab: Viewing Proteins" by Rob Kulathinal of Harvard University and Brian Bettencourt of University of Massachusetts/Lowell. We also thank Raymond S. Broadhead, Brooks School, Massachusetts, and Leone Castles Rochelle, Ridgeview High School, South Carolina. Special thanks to Nicole King for helpful conversations and to Nadav Kupiec for expert preparation of artwork. We thank our students who endured many trial runs of this tutorial and contributed many thoughtful suggestions.

DOI: 10.1525/abt.2010.72.6.12

References

Barrick, D. & Kopan, R. (2006). The Notch transcription activation complex makes its move. Cell, 124, 883-885.

King, N. (2005). Choanoflagellates. Current Biology, 15, R113-R114.

King, N., Westbrook, M.J., Young, S.L., Kuo, A., Abedin, M., Chapman, J. & others. (2008). The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature, 451, 783-788.

Nusslein-Volhard, C. (2006). Coming to Life: How Genes Drive Development. Carlsbad, CA: Kales Press.

Sayers, E.W., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V. & others. (2009). Database resources of the National Center for Biotechnology Information. Nucleic Acids Research, 37, D5-D15.

SUSAN OFFNER (soffner@ix.netcom.com) and ROBERT F. POHLMAN (rpohlman@ sch.ci.lexington.ma.us) are biology teachers at Lexington High School, 251 Waltham Street, Lexington, MA 02421.
Gale Copyright: Copyright 2010 Gale, Cengage Learning. All rights reserved.