Protein Modeling

Protein Modeling was introduced as a trial in 2009 and as a National Event in 2010/2011.

Description
Students will explore protein structure and function by building 3D models of selected proteins based on computer models- one prior to competition, and one on site at the competition- and taking a test on both general protein-related concepts and information specific to the protein chosen for that year. For 2011, students will model proteins involved in reprogramming adult cells to become stem cells.

Pre-Build Model
For the pre-build model, a section of a certain protein will be specified from the Protein Data Bank. Students will be provided beforehand with a Mini-Toober of the correct length, at a scale of 2 cm per amino acid residue, and red and blue plastic end-caps representing the carboxy and amino termini of the protein chain respectively. The model must be correctly folded, and the end-caps placed at the correct ends of the protein chain, but the model must also include "creative additions" that showcase the function of the selected protein. Students must decide for themselves which sidechains and/or ligands are important to the protein's function and should be displayed. These additions must be explained on a two-sided, 3x5 index card, along with a paragraph explanation of how they are relevant to the protein's function, and submitted with the model, typically at impound.

On-Site Model
For the on-site model, a file will be specified from the Protein Data Bank, but the specific section of this protein that students are to build will not be stated until the competition. At the competition, students will be given a Mini-Toober of the correct length, at the same scale of 2 cm per amino acid residue, foam representations of important sidechains or associated ligands, and red and blue plastic end-caps representing the carboxy and amino termini of the protein chain respectively. The model must be correctly folded, the end-caps placed at the correct ends of the protein chain, and the given sidechains or ligands attached to the Mini-Toober at the correct locations and with the correct orientations.

Written Test
The written test consists of multiple choice and short answer questions about both general protein topics (e.g., "What is the full name of the N-terminus of a protein chain?"), information specific to the selected protein (e.g., for the 2009-10 hemagglutinin, "Why are pigs important to the spread of influenza viruses?"), and information extremely specific to the PDB file (e.g., "What is the resolution in Angstroms of the PDB file 1HTM.pdb?").

These attributes are consistent across all levels of competition, because the Center for Biomolecular Modeling at MSOE provides all tests for the Protein Modeling event. However, the questions (at least ostensibly) get more challenging at each successive level of competition.

Scoring
40% of the score is determined by the pre-build model, 30% by the on-site model, and 30% by the written test.

Pre-build Model
The pre-build model is scored based on:
 * Accuracy and correct placement of secondary structures (alpha helices and beta sheets)
 * Accuracy of tertiary structure (3D arrangement of secondary structures- e.g., two strands of a beta sheet being correctly placed next to each other, and a nearby helix being perpendicular to both)
 * Relevance and correct placement of sidechains or ligands you've chosen to display (and clarity/accuracy of the explanation on your card), as well as correct placement of end-caps
 * Relevance and creativity of "creative additions" (i.e., labeling the "fusion peptide" on hemagglutinin would get you points, but displaying every sidechain wouldn't, since that doesn't show any understanding of the protein's function)

For an example of how specific it has to be (and of what constitutes good "creative additions"), see the [[Media:2010_SONT_Prebuild.pdf|2010 National Pre-build Rubric]].

On-Site Model
The on-site model is scored based on:
 * Accuracy and correct placement of secondary structures (alpha helices and beta sheets)
 * Accuracy of tertiary structure (3D arrangement of secondary structures- e.g., two strands of a beta sheet being correctly placed next to each other, and a nearby helix being perpendicular to both)
 * Correct placement and orientation of given sidechains and ligands, as well as correct placement of end-caps

For an example of what exactly they look for in the on-site model, see the [[Media:2010_SONT_Onsite.pdf|2010 National On-site Rubric]].

Written Test
There are relatively few questions on the test- for example, the 2010 National test consisted of 10 multiple choice and 5 short answer questions. Each multiple choice question was worth one point, and the short answer questions were worth 20 points total (each question was assigned its own point value, which was printed on the test). A similar number of questions and mode of scoring can be expected on future tests. Some of the short answer questions will be selected as tiebreaker questions, and will be designated as such. These still count normally toward your regular score on the test portion, but will be weighted more in the event of a tie.

Materials
As of 2009-10, ten two-sided 8.5x11 sheets of reference materials from any source, typed, handwritten, or graphics, were allowed, along with a scientific calculator and writing utensils (a Sharpie or other permanent marker is recommended for marking the Mini-Toober for the on-site model, as well as normal pens/pencils for taking the test).

Content
While this information may appear on the written test, it's also important to know for the construction of your models- particularly the pre-build model, on which you have to decide what's vital to show or not.

Protein Structure
"Structure equals function" is the basic tenet of Protein Modeling: i.e., it's important to know what a protein's structure is like because its function is determined by its structure.

There are four different types of protein structure: primary, secondary, tertiary, and quaternary.

Primary Structure
Primary structure is the sequence of amino acid residues in a protein chain (they're called residues, by the way, because they're not individual amino acids anymore, having lost a hydrogen off their amino groups and a hydroxide ion off their carboxylic acid groups in the process of bonding through dehydration synthesis). There are 20 main varieties of amino acid, which differ only in their sidechain (sometimes called an "R group").

Each residue in a chain is given a number, starting at the amino terminus- that is, the end that has an amino group still present- with 1 and going up to the carboxy terminus- the end that has a carboxyl group still present.

Secondary Structure
Secondary structure is the first level of folding in a protein. Patterns called "motifs", such as alpha helices and beta sheets (by far the two most common), are caused by hydrogen bonding between the backbone carbons (the central carbons of amino acids, also known as alpha carbons) of the residues. Alpha helices are slightly more common in proteins overall than beta sheets. These helices are tightly coiled single strands, kept in place by hydrogen bonds between nearby residues. They can be anywhere from only a few residues in length to over 100 Angstroms in some proteins. They tend to be the base of protein "stalks" (such as that of 2009-10's influenza hemagglutinin).

Beta sheets, on the other hand, are made up of many beta strands- kinked sequences of residues separated by loops. These strands line up parallel to each other- actually, antiparallel, which means that adjacent strands point in opposite directions (direction matters, remember, because of the numbering of residues from the amino terminus to the carboxy terminus)- with multiple hydrogen bonds between adjacent strands. They are very strong as protective or support layers (such as the "beta-barrel" exterior of GFP).

Tertiary Structure
Tertiary structure is the position in three dimensions of the secondary structures (motifs). It is determined by the secondary structures present, as well as the properties of the sidechains. Hydrophilic sidechains such as glutamine will move to the "outside" when the protein is folded in a watery environment, while hydrophobic sidechains such as tryptophan will cluster "inside" the protein, protected by other sections of the protein, to prevent their exposure to water. Oppositely charged sidechains come together, forming salt bridges (ionic bonds), while sidechains with the same charge repel each other. Cysteine, which contains sulfur, bonds covalently with other cysteines to form strong disulfide bonds. The interaction of all these attractions and repulsions cause the protein to develop a unique shape in 3D, called a "conformation".

The protein's tertiary structure also depends on the environment in which it is folded: in the human body, which is a watery environment, the hydrophobic (nonpolar) sidechains end up on the inside, as stated above. However, if a protein is folded in a nonpolar solvent (such as, say, vegetable oil), the hydrophilic (polar) sidechains end up on the inside.

Quaternary Structure
Quaternary structure is the arrangement of each of the individual subunits (monomers) of a multi-unit (multimeric) protein. These subunits, or "chains" as they are often called, each have their own amino and carboxy terminus, and are not physically attached to each other. However, they are held together by bonds- which can be disulfide or ionic, although more commonly the latter- and arranged together in a specific conformation. Multimers are quite common, and may contain several distinct chains or simply several copies of the same one (or few).

2011-Specific Info
The 2011 pre-build protein is Klf4, a promoter/repressor of DNA transcription studied for its role in cell proliferation, differentiation, and survival, and found under the Protein Data Bank ID 2WBU. At Regional competitions, the on-site model will be a selected region of Oct4 (PDB ID 1GTO), which is critically involved in the self-renewal of undifferentiated embryonic stem cells. At States, it will be a section of Nanog (PDB ID 2KT0), which is another transcription factor involved in self-renewal of undifferentiated embryonic stem cells.

Klf4
IN PROGRESS

Oct4
IN PROGRESS

Nanog
IN PROGRESS

Using Jmol
Jmol is the software used to create computer models of the proteins for the Protein Modeling event. It is open-source and can be downloaded here or used online (This is the 2010 pre-build Jmol environment; once the 2011 protein selection is announced, a pre-build Jmol environment for it will presumably be posted).

Getting Started
IN PROGRESS

Quick Guide to Jmol Commands
These are the commands you're most likely to use frequently while modeling your protein in Jmol:


 * select (object):Selects the specified objects, making them the subject of future commands (i.e., "select *a" would select chain A of your protein, since * is the symbol for chain).


 * restrict (object):Selects the specified objects and makes everything else disappear (i.e., "restrict *a" would leave you with a screen showing only chain A, which would also be selected).


 * center (object):Centers the field of view on the specified objects. This is an incredibly useful command after a "restrict", because otherwise, every time you try to look at the other side of your protein fragment, it will swing out of your field of view.


 * color [R,G,B]:Colors the currently selected objects the RGB values specified (you put numbers in place of R, G, and B to specify what color you want). There's also the easy way: instead of putting in [R,G,B], just put in a color name. Jmol only knows some of them, though (for instance, it knows "yellow", but probably doesn't know "tangerine". It might, though, since it has a pretty extensive- and somewhat bizarre- color library).

IN PROGRESS

Helpful Tips
IN PROGRESS

Folding the Mini-Toober Model
IN PROGRESS