Protein Modeling/CRISPR-Cas9

The CRISPR-Cas9 and Anti-CRISPR proteins are the topic of Protein Modeling for the 2018-2019 season. The CRISPR complex known as Cascade can be found under the Protein Data Bank ID 4QYZ, and the Cas9 protein can be found under the ID 4OO8. The pre-build model is the anti-CRISPR protein AcrII4A, which can be found under the ID 5VW1.

Function
CRISPR stands for clustered regularly interspaced short palindromic repeats, and it refers to a family of DNA sequences found in prokaryotes that defend the organism from bacteriophages. The Cas9 (CRISPR associated 9) enzymes use CRISPR to cut specific DNA strands complementary to the CRISPR sequence. In prokaryotic organisms with CRISPR genes, the Cas9 enzymes cut viral DNA so the organism does not get infected by it. When used with CRISPR sequences in other organisms, Cas9 enzymes can be used to edit the genes of different organisms. RNA guided Cas proteins can also be used to cut RNA strands. CRISPR sequences are found in approximately 50% of bacteria and 90% of archaea that have been sequenced.

Anti-CRISPR can be used by phages to deactivate the CRISPR system, protecting viruses from having their DNA destroyed. These proteins mimic DNA so that the Cas9 protein binds to them, preventing it from cutting the real DNA. The anti-CRISPR proteins target specific spots on Cas9 so that it cannot function after the proteins bind to it. This also means that once a gene is edited with CRISPR-Cas9 technology, the process can be stopped so that no parts of an organism's DNA are harmed by the system.

Protospacer adjacent motifs (PAM) are also important in the function of anti-CRISPR proteins. CRISPR proteins will not cut DNA if it is not followed immediately by the PAM sequence in an invading virus. In genetic editing, this requirement can be used to target specific mutations without affecting other alleles.

CRISPR Cas9
When a bacteria survives an attack from a phage, the Cas1 and Cas2 proteins will remove a 20-bp snippet of viral genetic material and add it to the bacteria's CRISPR array. Later, when the virus attacks again, the DNA is transcripted and the resulting RNA strand pairs up with a tracrRNA through base pairing, forming a single-guide RNA (sgRNA). The sgRNA then forms a complex with the Cas9 protein. The Cas9 complex will look in the viral DNA to try to find a sequence complementary to the sgRNA. The Cas9 protein will unravel a section of DNA. If the sgRNA can bind to the section, and detects a PAM sequence, then it will make a double-stranded cut. The PAM sequence is a 3-bp (NGG) sequence in the viral DNA, adjacent to the 20-bp target sequence. The Cas9 protein uses the PAM sequence to make sure that it is cutting the viral DNA and not the bacteria's own CRISPR array, which doesn't contain the PAM sequence. The structure of the Cas9-dsDNA-sgRNA complex can be found in PDB file 5F9R.

Inhibition of Cas9 by AcrIIA4
Phages have also developed an evolutionary immune system to the CRISPR system. In a study published by Nature, Anti-CRISPR proteins were found to be highly acidic DNA mimics. (DOI: 10.1126/sciadv.1701620). In-depth studies by Yang & Patel and [http://cbm.msoe.edu/images/contentImages/scienceOlympiad/module/2.%20Dong.2017.nature22377.pdf Dong, et. al] provided insight into the structure and function of the AcrIIA4 protein. The main ways that AcrIIA4 inhibit the function of Cas 9 are: Cas9 must be in complex with an sgRNA in order for AcrIIA4 to bind to it. When AcrIIA4 binds to Cas9, the viral DNA cannot bind to the complex, which allows the virus to survive.
 * Blocking the CTD and Topo domains to prevent PAM recognition
 * Blocking the RuvC domain to prevent cleavage of the non-complimentary strand

Cas9
The Cas9 protein has a bi-lobed structure, consisting of a REC (recognition) lobe and a NUC (nuclease) lobe. The lobes are further divided into domains. The REC lobe consists of 3 Helical domains and a Bridge Helix. The NUC lobe consists of a RuvC domain split into 3 parts, an HNH domain, a Topo domain, and a CTD domain. The RuvC domain includes an active site which cleaves the non-complimentary strand, and the HNH domain cleaves the complimentary strand. The Topo and CTD domains serve to identify the PAM sequence, as well as to bind to the non-complimentary strand. The Helical domains and Bridge Helix bind to the complimentary DNA strand.

AcrIIA4
The AcrIIA4 protein is a relatively small protein, with 87 amino acids. It consists of an alpha helix, a 3-stranded beta sheet, and 2 more alpha helices (N-C). The beta hairpins play a crucial role in the inhibition of Cas9, by occupying various active sites of the Cas9 protein.

Resources

 * Cas9 on Wikipedia
 * The official Center for BioMolecular Modeling page on the Protein Modeling event
 * Protein Modeling SciOly.org Wiki