Protein Modeling

Protein Modeling is a Division C event for the 2016 season. It was introduced as a trial event in 2009 and as an official national event in the 2010-2011 season, 2011-2012 season, 2014-2015 season, and the current 2015-2016 season. For the event, students use computer visualization and online resources to guide the construction of physical models of proteins and to understand how protein structure determines function. For 2015 students will model proteins used to edit the human genome.

Components
40% of the score is determined by the pre-build model, 30% by the on-site model, and 30% by the written test.

Pre-Build Model
For the pre-build model, a section of a certain protein will be specified from the Protein Data Bank. Students will be provided beforehand with a Mini-Toober of the correct length, at a scale of 2 cm per amino acid residue, and red and blue plastic end-caps representing the carboxy and amino termini of the protein chain respectively. The model must be correctly folded, and the end-caps placed at the correct ends of the protein chain, but the model must also include "creative additions" that showcase the function of the selected protein. Students must decide for themselves which sidechains and/or ligands are important to the protein's function and should be displayed. These additions must be explained on a two-sided, 3x5 index card, along with a paragraph explanation of how they are relevant to the protein's function, and submitted with the model, typically at impound.

The pre-build model is scored based on:
 * Accuracy and correct placement of secondary structures (alpha helices and beta sheets)
 * Accuracy of tertiary structure (3D arrangement of secondary structures- e.g., two strands of a beta sheet being correctly placed next to each other, and a nearby helix being perpendicular to both)
 * Relevance and correct placement of sidechains or ligands you've chosen to display (and clarity/accuracy of the explanation on your card), as well as correct placement of end-caps
 * Relevance and creativity of "creative additions" (i.e., labeling the "fusion peptide" on hemagglutinin would get you points, but displaying every sidechain wouldn't, since that doesn't show any understanding of the protein's function)

For an example of how specific the pre-build model has to be (and of what constitutes good "creative additions"), see the [[Media:2010_SONT_Prebuild.pdf|2010 National Tournament Pre-build Rubric]] and the [[Media:2011_SONT_Prebuild.pdf|2011 National Tournament Pre-build Rubric]].

On-Site Model
For the on-site model, a file will be specified from the Protein Data Bank, but the specific section of this protein that students are to build will not be stated until the competition. At the competition, students will be given a Mini-Toober of the correct length, at the same scale of 2 cm per amino acid residue, foam representations of important sidechains or associated ligands, and red and blue plastic end-caps representing the carboxy and amino termini of the protein chain respectively. The model must be correctly folded, the end-caps placed at the correct ends of the protein chain, and the given sidechains or ligands attached to the Mini-Toober at the correct locations and with the correct orientations.

The on-site model is scored based on:
 * Accuracy and correct placement of secondary structures (alpha helices and beta sheets)
 * Accuracy of tertiary structure (3D arrangement of secondary structures- e.g., two strands of a beta sheet being correctly placed next to each other, and a nearby helix being perpendicular to both)
 * Correct placement and orientation of given sidechains and ligands, as well as correct placement of end-caps

For an example of what exactly they look for in the on-site model, see the [[Media:2010_SONT_Onsite.pdf|2010 National Tournament On-site Rubric]] and the [[Media:2011_SONT_Onsite.pdf|2011 National Tournament On-site Rubric]].

Written Test
The written test consists of multiple choice and short answer questions about both general protein topics (e.g., "What is the full name of the N-terminus of a protein chain?"), information specific to the selected protein (e.g., for the 2009-10 hemagglutinin, "Why are pigs important to the spread of influenza viruses?"), and information extremely specific to the PDB file (e.g., "What is the resolution in Angstroms of the PDB file 1HTM.pdb?").

There are relatively few questions on the test- for example, the 2010 National test consisted of 10 multiple choice and 5 short answer questions. Each multiple choice question was worth one point, and the short answer questions were worth 20 points total (each question was assigned its own point value, which was printed on the test). A similar number of questions and mode of scoring can be expected on future tests. Some of the short answer questions will be selected as tiebreaker questions, and will be designated as such. These still count normally toward your regular score on the test portion, but will be weighted more in the event of a tie.

See the Test Exchange for exams and answer keys from the 2010 and 2011 National Tournaments.

These attributes are consistent across all levels of competition, because the Center for Biomolecular Modeling at MSOE provides all tests for the Protein Modeling event. However, the questions (at least ostensibly) get more challenging at each successive level of competition.

Materials
This year, five two-sided 8.5x11 sheets of reference materials from any source, typed, handwritten, or graphics, are allowed, along with a scientific calculator and writing utensils (a Sharpie or other permanent marker is recommended for marking the Mini-Toober for the on-site model, as well as normal pens/pencils for taking the test).

Content
While this information may appear on the written test, it's also important to know for the construction of your models- particularly the pre-build model, on which you have to decide what's vital to show or not.

Restriction Endonucleases
Restriction endonucleases are proteins that can cut DNA at a specific point in a specific sequence, allowing genome editing. They are termed "restriction enzymes" because they restrict the infection of bacteriophages. Bacteria are under constant attack by bacteriophages (e.g. bacteriophage phiX174).

To protect themselves, many types of bacteria have developed a method to chop up any foreign DNA, such as that of an attacking phage. bacteria build an endonuclease (an enzyme that cuts DNA) which is allowed to circulate in the bacterial cytoplasm, waiting for phage DNA. Each type of restriction enzyme seeks out a single DNA sequence and precisely cuts it in one place.

Example: EcoRI, cuts the sequence GAATTC, cutting between the G and the A. Roving endonucleases can be dangerous, so bacteria protect their own DNA by modifying it with methyl groups. These groups are added to adenine or cytosine bases (depending on the particular type of bacteria) in the major groove. Methyl groups block the binding of restriction enzymes but don’t block the normal reading and replication of the genomic information stored in the DNA. DNA from an attacking bacteriophage won’t have protective methyl groups and will be destroyed.

Each particular type of bacteria has a restriction enzyme (or several different ones) that cuts a specific DNA sequence, paired with a methyl-transferase enzyme that protects this same sequence in the bacterial genome.

Endonuclease FokI
The specific nuclease FokI occurs naturally in bacteria as a defense mechanism against invading viruses. It is an enzyme derived from Flavobacterium okeanokoites (or Planomicrobium okeanokoites)

This protein, like other restriction enzymes, has two domains (functional parts): the cleavage domain (nuclease) and the DNA-binding domain, composed of zinc fingers. It is commonly used in designing genome editing nucleases The nuclease of the FokI is typically removed from its natural DNA binding domains and attached to new binding domains, to create a new specialized restriction enzyme.

The nuclease functions solely as a dimer, meaning it requires two copies (one attached to each strand of DNA) in order to successfully cleave the DNA It can recognize specific DNA sequences (5’GGATG3’ and 5’CATCC3’) and cuts or cleaves it on both DNA strands 14 bases after the first bolded and underlined G and 13 bases before the bolded and underlined C. It has a cofactor: Mg2+

Zinc Finger Proteins
The zinc finger protein has a tetra-coordinated zinc at the core of the structure to stabilize its structure.

Some scientists experimented with the idea of replacing the zinc coordination with other interactions. This exercise led to the design of a peptide that could adopt the same shape and structure as the DNA binding zinc finger domain but had a completely different rationale for its stability.

Zinc Finger Nucleases are sequence specific DNA binding proteins. each finger binds three bases Each finger is composed of a short alpha helix and a 2-stranded beta sheet. Zinc fingers were first identified in a frog transcription factor (transcription factor IIIA). this protein structure was found to bind both 5S RNA and its cognate DNA. Over the years zinc fingers have been identified in many other proteins and is one of the most common protein domains that binds to specific DNA/RNA sequences.

Each zinc finger domain has ~30 amino acids. In addition to its hydrophobic core, it is stabilized by a Zinc ion coordinated by side chains of four Cysteines, four Histidines or a combination of these. Most zinc finger containing proteins have a series of these domains linked to each other. These domains bind to the major groove of the DNA. Specific amino acid side chains reach out from these domains to "read" the DNA sequence by interacting with specific DNA bases.

CCR5 (Chemokine Receptor 5)
CCR5 is a membrane receptor protein found in human immune cells that is used by HIV to enter the host cell. is an HIV co-receptor; cooperates with the host cellular CD4 primary receptor to allow the initial docking of the HIV virus onto T-cells, and subsequent infection.

The CD4 bound HIV envelope spike protein use this molecule as a co-receptor to enter and infect host cells. In some instances HIV uses another similar chemokine receptor CXCR4 as the co-receptor for entry into host cells. A naturally occurring deletion in this protein enables a cell to become resistant to the HIV virus since it is unable to properly bind and insert its genetic information.

Normally 353 amino acids long, and folds up into a structure composed of 7 transmembrane alpha helices with structural homology to the family of G protein-coupled receptors (GPCRs). Primarily, the CCR5 gene is involved in the receiving of chemical signals called chemokines and recruiting other immune cells to help the immune system function.

However, this variation is homozygous recessive, meaning it requires both recessive alleles in order to express its resistant properties. In some ethnic groups (Caucasians) a 32 nucleotide deletion in the gene results in a corresponding deletion in the mRNA.

Because the genetic code is a triplet code, and 32 isn’t a multiple of 3, the deletion results in 1) the deletion of 11 amino acids 2) a switch in the translational reading frame resulting in a scrambled amino acids sequence even after the deletion site. 31 additional amino acids are added as a result of the deletion before a stop codon is met by the ribosome. This prematurely terminated CCR5 protein is 215 amino acids long.

CCR5 normally dimerizes and is phosphorylated in the endoplasmic reticulum and is then efficiently trafficked through the Golgi to the cell membrane. In contrast, 32CCR5 is not phosphorylated, and is not trafficked to the cell membrane. 32CCR5 retains its ability to dimerize with wild type CCR5 leading to a transdominant negative effect on the delivery of the functional CCR5 to the cell surface.

Approximately 15-20% of the northern European population is heterozygous for a naturally occurring 32 base pair deletion in their CCR5 gene – making them resistant to HIV infection. Approximately 1% of European caucasians are homozygous for this mutation – and resistant to HIV infection. Based on the functional cure of the Berlin patient it appears that introducing the CCR5 delta 32 mutation may make host cells resistant to HIV. Using an engineered nuclease, such as a zinc finger nuclease, and specifically targeting the CCR5 gene in HIV patients to isolate and deactivate the CCR5 protein will make the patient’s endogenous T-cells resistant to further infection.

Since HIV infection is persistent, making the host cells resistant may provide a functional cure for HIV infected individuals. Sangamo Biosciences (a biotech company specializing in the development of therapeutic zinc finger nucleases) has developed a zinc finger nuclease that is targeted to disrupt the CCR5 gene.

currently being tested in a Phase 2 clinical trial with HIV/AIDS patients by Sangamo Biosciences in collaboration with groups from the University of Pennsylvania School of Medicine and the Albert Einstein College of Medicine.

HIV
The Human Immunodeficiency Virus (HIV) is an RNA virus that can infect specific immune cells in our body, called T helper cells. The RNA genome of HIV is encased in a capsid, which is in turn covered by an envelope derived from the host cell membrane. The structures and functions of most of HIV’s proteins are now known. We are still learning about the accessory and regulatory proteins of HIV that exploits the host cell’s machinery for its own advantage.

Life Cycle

Attachment: The HIV spike or envelope protein, gp120, attaches to the host cell protein CD4 on specific types of T-cells.

Fusion and entry: Binding of gp120 and CD4 rearranges their structures allowing the complex to bind another host cell receptor, the chemokine receptors, called CCR5. In some cases an alternate receptor called CXCR4 may replace CCR5 in this interaction. This in turn facilitates the stock of the HIV spike (the protein gp41) to penetrate the host cell membrane and fuse the viral envelope with the host cell membrane.

Reverse transcription: Upon entry, HIV sheds its capsid and the 2 single strands of viral RNA are converted to a double stranded DNA by a special viral enzyme called Reverse transcriptase.

Integration: The double stranded DNA, or proviral DNA, enters the host cell nucleus and is integrated in the cell’s genome by another special viral enzyme called Integrase.

Transcription and translation: The proviral DNA is transcribed and translated like any other host cell gene using host cell machinery (RNA polymerase, Ribosomes etc.)

Assembly and budding: The various viral proteins and RNA come together to assemble the virus. At this stage some of the viral proteins are still linked to each other as part of the polyprotein synthesized by the virus. Various HIV proteins and RNA are packaged into an immature viral particle that buds off from the host cell encased in its membrane.

Maturation of viral particle: With action of the viral protease the various HIV proteins are cut and separated, free to perform their specific functions. This rearrangement or maturation helps the HIV become a mature infectious particle ready to infect another cell.

All the steps of the viral lifecycle are presented in the HHMI Biointeractives animation, narrated by HHMI investigator, Bruce Walker, MD.

The approaches currently used to treat HIV infections include: Viral Enzyme inhibitors: block the actions of some critical enzymes in the HIV lifecycle.

Reverse transcriptase inhibitors (RTI): block initial conversion of viral RNA to proviral DNA that is integrated in the host cell genome By mimicking the enzyme substrate and directly binding to the active site (nucleoside RTIs)

By binding to a site near the enzyme active site and blocking its function (non-nucleoside RTIs)

Integrase inhibitors:block integration of proviral DNA into the host cell genome preventing permanent infection of the host cells

Protease inhibitors:block cleavage of viral polyprotein, preventing maturation of HIV to infectious particles

Entry inhibitors: block interaction of the CD4-gp120 complex with the chemokine co-receptor preventing entry of HIV in the host cell

Fusion inhibitors: block the structural changes in the stock of the HIV spike (gp41) that are needed for the viral envelope and host cell membranes to fuse

Upcoming Approaches: Making the host cells resistant to HIV: Currently researchers are using Zinc finger nucleases to target the CCR5 gene in stem cells that give rise to blood cells and introduce a deletion or disruption in the gene. As a result these cells are unable to make a functional CCR5 protein and become resistant to HIV infection. A treatment protocol using approach is currently in a Phase II clinical trial conducted by a group from the University of Pennsylvania School of Medicine, the Albert Einstein College of Medicine and Sangamo Biosciences (a biotech company specializing in the development of therapeutic zinc finger nucleases).

Seek out and destroy all the integrated proviral DNA: A recent research report has suggested the possibility of using a gene therapeutic approach to specifically identifying and editing out the integrated proviral HIV-1 DNA. While there is a long way before this can even be tested as a treatment option it offers the hope that gene therapy can be used for dealing with tough diseases like HIV/AIDS.

Protein Structure
"Structure equals function" is the basic tenet of Protein Modeling: i.e., it's important to know what a protein's structure is like because its function is determined by its structure.

There are four different types of protein structure: primary, secondary, tertiary, and quaternary.

Primary Structure
Primary structure is the sequence of amino acid residues in a protein chain (they're called residues, by the way, because they're not individual amino acids anymore, having lost a hydrogen off their amino groups and a hydroxide ion off their carboxylic acid groups in the process of bonding through dehydration synthesis; TL;DR the ends of the amino acids are missing because they're connected, so we call them "residues" instead of "amino acids"). There are 20 main varieties of amino acid, which differ only in their sidechain (sometimes called an "R group").

Different residue sidechains have different properties; for example, the red sidechains in the diagram are negatively charged, and the blue ones are positively charged. These properties determine how the protein folds (i.e., the secondary and tertiary structure), because certain types of residues attract, repel or bond to other types of residues. Also, the types of residues present can determine how the protein interacts with other molecules such as DNA- for example, serine can form hydrogen bonds, and therefore is often found at binding sites in a protein.

Charged sidechains repel like charges and attract opposite charges. Hydrophilic, or polar, sidechains usually end up on the outside of a folded structure, because most proteins fold in a watery environment and the polar sidechains interact well with water, which is also polar. For the same reason, hydrophobic, or non-polar, sidechains usually end up on the inside of the structure, because they do not interact well with water. Cysteine, which is shown in green in the diagram, forms very strong covalent disulfide bonds with other cysteines.

Each residue in a chain is given a number, starting at the amino terminus (that is, the end that has an amino group still present) with the lowest number (which is not always 1, depending on the numbering conventions for the particular family of proteins) and going up to the carboxy terminus (the end that has a carboxyl group still present).

Secondary Structure
Secondary structure is the first level of folding in a protein. Patterns called "motifs", such as alpha helices and beta sheets (by far the two most common), are caused by hydrogen bonding between the backbone carbons (the central carbons of amino acids, also known as alpha carbons) of the residues. Alpha helices are slightly more common in proteins overall than beta sheets. These helices are tightly coiled single strands, kept in place by hydrogen bonds between nearby residues. They can be anywhere from only a few residues in length to over 100 Angstroms in some proteins. They tend to be the base of protein "stalks" (such as that of 2009-10's influenza hemagglutinin).

Beta sheets, on the other hand, are made up of many beta strands- kinked sequences of residues separated by loops. These strands line up parallel to each other- actually, antiparallel, which means that adjacent strands point in opposite directions (direction matters, remember, because of the numbering of residues from the amino terminus to the carboxy terminus)- with multiple hydrogen bonds between adjacent strands. They are very strong as protective or support layers (such as the "beta-barrel" exterior of GFP).

Tertiary Structure
Tertiary structure is the position in three dimensions of the secondary structures (motifs). It is determined by the secondary structures present, as well as the properties of the sidechains. Hydrophilic sidechains such as glutamine will move to the "outside" when the protein is folded in a watery environment, while hydrophobic sidechains such as tryptophan will cluster "inside" the protein, protected by other sections of the protein, to prevent their exposure to water. Oppositely charged sidechains come together, forming salt bridges (ionic bonds), while sidechains with the same charge repel each other. Cysteine, which contains sulfur, bonds covalently with other cysteines to form strong disulfide bonds. The interaction of all these attractions and repulsions cause the protein to develop a unique shape in 3D, called a "conformation".

The protein's tertiary structure also depends on the environment in which it is folded: in the human body, which is a watery environment, the hydrophobic (nonpolar) sidechains end up on the inside, as stated above. However, in a protein folded in a hydrophobic environment (such as a protein embedded in a phospholipid cell membrane), the hydrophilic (polar) sidechains end up on the inside.

Quaternary Structure
Quaternary structure is the arrangement of each of the individual pieces (monomers) of a multi-unit (multimeric) protein. These subunits, or "chains" as they are often called, each have their own amino and carboxy terminus, and are not physically attached to each other. However, they are held together by bonds- which can be disulfide or ionic, although more commonly the latter- and arranged together in a specific conformation. Multimers are quite common, and may contain several distinct chains or simply several copies of the same one (or few).

Using Jmol
Jmol is the software used to create computer models of the proteins for the Protein Modeling event.

Downloading Jmol
Jmol is open-source and can be downloaded here or used online. This is the 2015 pre-build online Jmol environment; it offers the same features as the online Jmol environment you will be using in competition to model the on-site protein, so it's good to use it at least once before competition to become familiar with the interface. You can also use the 2015 practice environment, which presents a similar task as you will encounter for the onsite build.

However, it's definitely still worthwhile to download Jmol onto your computer. One of the main reasons for this is that the pre-build online environment only lets you work with that one file, so if you want to familiarize yourself with the on-site protein files before competition, you'll need the Jmol application on your computer.

Downloading Jmol can be a little tricky, though. The download page linked to above gives specific instructions, but to summarize:

Click the link at the top of the page, which will take you to a list of files to download. The file you want is the one called "Jmol-12.2.14-binary.zip", which should have a bunch of different icons next to it (Apple, Windows, Linux, etc).

Your computer will then unarchive the file (it should do this automatically when you try to open the file; if it doesn't, you're missing the unarchiving software and need to ask for help from someone who knows more about computers), giving you a folder full of a ton of different files, pretty much all with "Jmol" in the name. The only one you need is Jmol.jar.

[If you're using a Unix-based OS and are familiar with the command line, you can also download the .tar.gz file, use "tar -xzvf" to extract, and then take Jmol.jar from the resulting directory.]

Put this one in its own new folder, as close to your home directory as possible, and name the folder something that doesn't have any spaces in it (Jmol can be stupid about saving and retrieving files; spaces in the file path sometimes confuse it, so not only should you not put spaces in the folder name, you should put that folder in your home directory or in a folder in your home directory that has no spaces in the name, to make sure there are no spaces higher in the file path). You can delete everything else from the .zip file.

Save every PDB file you download and every script you create to this folder. Again, Jmol can be stupid; it sometimes has trouble opening files that aren't saved to the same folder that it's in.

Downloading PDB Files
If you are using Jmol on your own computer, rather than using the online pre-build environment, you will need the PDB file(s) on your computer as well (note: technically you only need the file for the pre-build; however, to do as well as possible in competition, it helps to be familiar with the on-site beforehand as well. This is one of the main reasons it's worthwhile to download Jmol, so you can look at any protein file you want).

To download a PDB file, go to its structure summary page (linked to in the Year-Specific Information section) and click the "Download Files" option in the upper-right corner. A drop-down menu will appear, as shown below. You want the PDB File (Text).



Remember to save this file in the same folder as Jmol.jar!

Entering Commands
There are two ways to enter commands in Jmol- through the command prompt or by selecting options from the drop-down menus. Although people who have never done any programming or used a command line setup before might be more comfortable with the menus, it is vastly easier to use the command prompt, and not that difficult once you get used to it. If you're using the online Jmol environment instead of downloading it yourself, you must use the command prompt (most of the drop-down menus aren't available), and since everyone has to use the online environment in competition, it is necessary to know how to use the command line. If you've downloaded Jmol onto your computer, to open the command prompt, go to the File menu and click "Output Console". Some useful commands are listed in the section below.

Quick Guide to Jmol Commands
These are the commands you're most likely to use frequently while modeling your protein in Jmol:


 * select (object):Selects the specified objects, making them the subject of future commands (i.e., "select *a" would select chain A of your protein, since * is the symbol for chain).


 * restrict (object):Selects the specified objects and makes everything else disappear (i.e., "restrict *a" would leave you with a screen showing only chain A, which would also be selected).


 * center (object):Centers the field of view on the specified objects. This is an incredibly useful command after a "restrict", because otherwise, every time you try to look at the other side of your protein fragment, it will swing out of your field of view.


 * undo:Undoes the last command.


 * redo:Redoes the last undone command.


 * write [scriptname].spt:Saves your current display state as a file called "[scriptname].spt" in the same folder where Jmol is. This can be useful for creating a file that shows all the sidechains you're modeling for your prebuild, for example. You can then load it by going to File -> Open and selecting the script.

Objects you can select include:


 * chains:As mentioned above, the symbol for a chain is *. It should work as *[letter] or * [letter](i.e., with or without the space).


 * specific residues:You can specify residues using their residue numbers- individual or a range- or three-letter codes. For example, "select 41" will select residue 41, "select 41-60" will select all residues from 41 to 60 inclusive, and "select his" will select ALL histidines. If you wanted to select all histidines between residues 41 and 60 only, you can string together ranges and residue types using Boolean operators (see below).


 * DNA:DNA works the same way as proteins do in Jmol- in that you can select specific parts, like the backbone or nucleotides- but can also be selected as a whole with "select dna".


 * non-protein stuff:Everything that isn't protein (or DNA) is "hetero", as in "select hetero". This includes any ligands (such as ions or small molecules) associated with the protein, as well as water (which has its own designation as well- "hoh"). Because of the method by which protein structure data is obtained, all the surrounding water molecules are also captured. They can be quite distracting, so you may want to remove the water or just not show it at all ("select hetero and not hoh"- see below for more information about Boolean operators).


 * sidechains:When you view a given residue, you probably don't want to see the entire residue, including the amino and carboxy groups (or what's left of them); in most cases, you only want to see the sidechain sticking out from the alpha-carbon backbone. You can "select sidechain" (which will select all sidechains of all residues) or "select [residue] and sidechain" (see below for more information about Boolean operators) to select just the sidechain of a particular residue.


 * alpha helices or beta sheets:When folding your Mini-Toober model, it can be helpful to see where the helices and sheets are. You can select the alpha helices with "select helix" and the beta sheets with "select sheets".

Once you have something selected, you can modify the way it is displayed in various ways.


 * backbone [number]:Displays the selection in backbone mode- where only the alpha carbons of the amino acid residues are shown- with the number being the radius. This is the display mode you'll start with- the Mini-Toober represents this part of the protein. A radius of about 300 works well.


 * wireframe [number]:Displays the selection in wireframe mode- where the connections between atoms (not just alpha carbons) are shown as lines/cylinders- with the number being the radius of the cylinders. This display mode is the most common way of viewing sidechains; a radius of about 200 works well. To remove wireframe from the selection, use "wireframe off".


 * spacefill [number]:Displays all atoms in the selection as spheres, the number being the radius of the spheres. If you don't put any number, Jmol will show the full Van der Waals radius of the atoms. This is another way of viewing sidechains or, more commonly, ligands such as ions. For ions, typically no number is needed (the Van der Waals radius works just fine), but if you're using it for your sidechains, you most likely want something slightly larger than your wireframe (225-250). To remove spacefill from the selection, use "spacefill off".

Coloring Modes

 * color cpk:Colors the selection according to the "CPK" color system, which has a specific color for each element. Carbon is gray, oxygen is red, nitrogen is blue, and sulfur is yellow (these are the most common ones you'll see). Note: if you have the entire protein in "backbone" mode, the entire protein will appear gray, because only the alpha-carbon backbone is being displayed.


 * color [R,G,B]:Colors the currently selected objects the RGB values specified (you put numbers in place of R, G, and B to specify what color you want). There's also the easy way: instead of putting in [R,G,B], just put in a color name. Jmol only knows some of them, though (for instance, it knows "yellow", but probably doesn't know "tangerine". It might, though, since it has a pretty extensive- and somewhat bizarre- color library).


 * color structure:Colors the alpha-helices (pinkish-red) and beta pleated sheets (yellow). DNA is purple under this coloring system, and everything that has no specific secondary structure is white. Depending on the version of Jmol you're using, 3/10 helices may be the same color as alpha-helices, or may be a more purplish color.


 * color group:Colors the protein to display the amino and carboxy termini. The amino terminus is colored blue, and the carboxy terminus is red. Everything in between is displayed in rainbow gradations, so the part shown in green is closer to the amino terminus, while the part shown in orange is closer to the carboxy terminus.


 * color chain:Assigns a different color to each chain of the protein. This is useful mainly for determining how many chains are present in a given PDB file, or where one chain ends and another begins if they are closely associated.

Boolean Operators
Boolean operators are connecting words like "and" and "or", which determine how Jmol combines the different groups you're telling it to act on.


 * and:If you use "and" to connect two things, Jmol will act on only things that fit both criteria. For example, "select his and sidechain" selects only the sidechains of all the histidine residues. You can also string together "and"s, and Jmol will act only on things that fit all the criteria: "restrict *a and 1-40 and cys" selects only cysteine residues in the interval from 1-40 of chain A.


 * or:If you use "or" to connect two things, Jmol will act on anything that fits either of your criteria. For example, "select cys or his" will select all cysteine and all histidine residues. Note: "or" is often what you want when intuitively you'd think "and": to select cysteines AND histidines, you type "select cys or his". "select cys and his" does nothing, because there are no residues that are both cysteine and histidine.


 * not:This tells Jmol to act on everything that doesn't fit some given criteria. For example, "select not dna" selects everything in the file that isn't DNA. It can be combined with other operators, like "select hetero and not hoh", which selects all non-protein, non-DNA matter that also isn't water.


 * xor:If you use "xor" to connect two things, Jmol will act on anything that fits only one of your criteria.

Combining Operators
Parentheses can be used to combine different Boolean operators in different ways. If all your operators are the same (e.g., "*a and 1-40 and his" or "his or cys or gly"), you don't need any parentheses; however, if you have a combination of "and"s and "or"s or want a "not" to apply to some combination of criteria, you need parentheses to tell Jmol which operators should be grouped together.

For example, "select (*a or *b) and his" will select all histidines in chain A and all histidines in chain B, whereas "select *a or (*b and his)* will select ALL of chain A and all histidines in chain B. If you leave out all parentheses, "and" takes precedence over "or", so Jmol will assume the latter case.

It can also be confusing figuring out what your "not" applies to. If there are no parentheses present, it modifies only the object immediately following it (i.e., "select not *a or his" selects everything that isn't part of chain A, as well as all histidines, including those in chain A- one of the criteria is "not *a" and one is "his", and everything that fits at least one or the other gets selected). To have your "not" apply to more than one criteria, use parentheses after the not; for example "select not (*a or his)" will select everything except chain A that also isn't a histidine. It has the same effect as "select not *a and not his".

Folding the Mini-Toober Model
At competitions, for the on-site models, you will be given a kit consisting of Mini-Toobers, end caps, cross-linkers, and foam sidechains for the important residues they ask you to highlight. For the pre-build, it is possible to buy a similar kit (minus the foam sidechains, as you have to figure out on your own what to highlight). Alternately, you can make your model out of any other similar material (12-gauge dimensional house wire is given as an example in the rules, but anything that is sufficiently flexible but will hold its shape will suffice).

During competition, and if you get the pre-build kit, you will see that the Toober, end caps, and any sidechains that are given are all required parts of your model – their placement and presence count toward your score.

The cross-linkers, however, are for stability purposes only, and do not have to represent any particular structure. You do not have to use any or all of them, although during competition, it makes sense to use all the ones you are given just to make your model as structurally sound as possible (the judges will be handing your on-site model back and forth, and you'd prefer if it didn't distort in the middle of their grading it). For your pre-build, you can use them just for stability, the way you would in the on-site, or you can have them represent particular bonds (as long as you specify this on your card), or you can not use them at all and instead stabilize your model some other way.

Your Mini-Toober model will be based on your Jmol model, but unfortunately, the only way to convert the latter into the former is to use the Jmol image as a sort of "map" showing how to fold the toober.

Fortunately, CBM has an excellent step-by-step video tutorial of how to fold the Mini-Toober based on what you have in Jmol. This should answer any questions you have about creating your Mini-Toober model.

Useful Links

 * Mini-Toober folding tutorial (mentioned above)
 * Protein Data Bank
 * Zinc fingers Molecule of the Month article
 * General protein info (sadly, this link appears to be broken. I can't find the original site anymore, although there are links to it all over different protein sites. None of them have the same information, though)
 * Information on genetic coding and the mutation of XIAP