Difference between revisions of "Cybersecurity"

From Wiki - Scioly.org
Jump to navigation Jump to search
m (→‎Block Ciphers: Updated synopsis and fixed general grammar)
 
(57 intermediate revisions by 10 users not shown)
Line 1: Line 1:
 
{{Incomplete}}
 
{{Incomplete}}
{{EventLinksBox
+
{{Infobox event
|active=
+
|active=y
 
|type=Inquiry
 
|type=Inquiry
|cat=Lab
+
|cat=Hybrid
}}
+
|description =Competitors will be assessed on their knowledge of cybersecurity through hands-on tasks as well as theoretical questions focused in the areas of cryptography and web architecture.
 +
|2022thread  = [https://scioly.org/forums/viewtopic.php?f=385&t=23474 2022 (Trial)]
 +
|2023thread = [https://scioly.org/forums/viewtopic.php?t=26047 2023 (Trial)]
 +
|2023questions = [https://scioly.org/forums/viewtopic.php?t=26066 2023 (Trial)]
 +
|eventtime=50 minutes|participants=2|resources=* Two notes sheets
 +
* tools
 +
* supplies
 +
* writing utensils
 +
* online IDE
 +
* official documentation for programming language of choice
 +
* mouse|1stCName=Ward Melville High School|2ndCName=William G. Enloe High School|3rdCName=Centennial High School}}
  
'''Cybersecurity''' is a [[Division B]] and [[Division C]] event that was first run as a trial event at the 2021 [[BEARSO]] Invitational to replace [[Ping Pong Parachute]]. The event consists of two parts: a written test on Cryptography and Web Architecture, and a hands on task on Cryptography and Programming.
+
'''Cybersecurity''' is a [[Division B]] and [[Division C]] event that was first run as a trial event at the 2021 [[BEARSO]] Invitational to replace [[Ping Pong Parachute]]. The event consists of two parts: a written test on Cryptography, Web Architecture, and Principles of Cybersecurity and a hands-on task on Cryptography and Programming. The event was also run at the [https://scilympiad.com/sopractice November Scilympiad Practice], [[Yosemite Invitational]], and [[Science Olympiad at Penn State Invitational]].
  
=Cryptography=
+
The event was also run as a Division C trial event at the [[California Institute of Technology 2022|2022 National Tournament]].
  
==Hash algorithms==
+
= Cryptography =
A hash algorithm is a one-way function that maps data, such as a string or a file, to a hash, or a "digest" - a string of data that is much shorter in length. Hash functions are always deterministic. If two equal inputs are hashed two separate times, the digest will always be the same. A hash can be used as a checksum to validate that a file has not been altered, since if a single bit of information was changed, the checksum would change. Hash functions are also designed to decrease the risk of hash collisions. Since the hashed digest of an input reduces its size significantly, hash collisions can occur when two inputs map to the same output. Hash functions are used in digital signatures, signing and authentication algorithms, and passwords.  
+
 
 +
== Hash algorithms ==
 +
A hash algorithm is a one-way function that maps data, such as a string or a file, to a hash, or a "digest" - a string of data that is much shorter in length. Hash functions are always deterministic. If two equal inputs are hashed two separate times, the digest will always be the same. A hash can be used as a checksum to validate that a file has not been altered, since if a single bit of information was changed, the checksum would change. Hash functions are also designed to decrease the risk of hash collisions. Since the hashed digest of an input reduces its size significantly, hash collisions can occur when two inputs map to the same output. Hash functions are used in digital signatures, signing and authentication algorithms, and passwords.
  
 
A good hash algorithm has the following characteristics:
 
A good hash algorithm has the following characteristics:
*It is hard to find collisions.
+
* It is hard to find collisions.
*It is irreversible.
+
* It is irreversible.
*It has to be deterministic.
+
* It has to be deterministic.
  
 
Passwords are one of the most important applications of hashing algorithms. When a password is inputted, a hash of the password is calculated, and compared to the hashed value of your original password. Thus, no plaintext passwords should be saved server-side, which would reduce the damage in the event of a data breach.
 
Passwords are one of the most important applications of hashing algorithms. When a password is inputted, a hash of the password is calculated, and compared to the hashed value of your original password. Thus, no plaintext passwords should be saved server-side, which would reduce the damage in the event of a data breach.
  
===Merkle-Damgård Hashes===
+
=== MD5 ===
 +
The MD5 hashing algorithm produces 16 byte digests (128 bits) with block sizes of 64 bytes (512 bits). It was first created in 1991 by Ronald Rivest. Multiple vulnerabilities have been exposed with the MD5 hashing algorithm, and collisions can be calculated in less than a second on a typical computer. Thus, MD5 is considered extremely cryptographically insecure; however, many programs and applications continue to use it.
 +
 
 +
=== SHA1 ===
 +
SHA1, which stands for Secure Hash Algorithm 1, was created in 1995. The algorithm produces 20 byte digests (160 bits) of data using block sizes of 512 bits. 80 rounds of hashing is done under this hashing algorithm. Hash collisions can be calculated with 2^60.3 to 2^65.3 operations, and collisions have been calculated. In addition, SHA1 is built using the Merkle-Damgård construction, so it is also prone to length extension attacks.
 +
 
 +
=== SHA2 ===
 +
SHA2 is a set of six hash functions, namely SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256. Depending on the name of the function, each hash algorithm produces a digest size with a different amount of bits. Either 64 or 80 rounds of hashing are performed on the plaintext. Similarly to SHA1, SHA2 hashes are vulnerable to hash length extension attacks. Currently, there are no known full collisions of SHA2 hashes.
 +
 
 +
=== Hash Length Extension Attack ===
 +
Algorithms that are based on the Merkle-Damgård construction are vulnerable to hash length extension attacks, including MD5, SHA1, and all SHA2 hashes. Because the hash digest is essentially a snapshot of the internal state of the hash, the internal state can be recreated with the hash. In addition, because the hashing algorithm is performed on blocks of data, one can compute HASH(UNKNOWN + CUSTOM MESSAGE) only knowing the length and hash digest of HASH(UNKNOWN) by continuing the hashing algorithm from the known internal state. This attack is significant because it allows attackers to forge requests. One patch of this attack is to hash the unknown secret twice: for instance, HASH(HASH(UNKNOWN)) could be used.
 +
 
 +
=== Hash Collisions ===
 +
A hash collision has occurred when two plaintexts, hashed with the same algorithm, has produced the same digest. All hashing algorithms have an infinite amount of collisions because the plaintext can be infinitely long, yet the length of the digest is finite. However, because of the length of the hash, it can be extremely hard to produce a collision.
 +
 
 +
== The XOR Operation ==
 +
The XOR operator is a binary operator that takes two bits of data and outputs one bit of data. XOR sounds for exclusive or; it returns true if the two inputs are different and false if the inputs are the same. In Java and Python, the XOR operator is denoted with the <code>^</code> symbol (not to be confused with exponentiation, which is typically represented with the <code>**</code> operator). In contrast, SageMath uses the <code>^</code> symbol to represent exponentiation, while <code>^^</code> is the XOR operator.
 +
 
 +
This is an XOR table which represents the inputs as well as the outputs:
 +
{| class="wikitable"
 +
|-
 +
! Input 1
 +
! Input 2
 +
! Output
 +
|-
 +
| True
 +
| True
 +
| False
 +
|-
 +
| True
 +
| False
 +
| True
 +
|-
 +
| False
 +
| True
 +
| True
 +
|-
 +
| False
 +
| False
 +
| False
 +
|}
 +
 
 +
In addition, the XOR operator is commutative (<code>A^B==B^A</code>), associative (<code>(A^B)^C==A^(B^C)</code>), and is its own inverse (<code>A^A==0</code>, <code>B^A^A=B</code>), meaning that it is symmetric and reversible. The latter is especially important, as this is a very simple way to reveal a plaintext or calculate a ciphertext using the same encryption/decryption function.
 +
 
 +
In cryptography, the XOR cipher is used extremely frequently in stream ciphers and block ciphers, as well as many others. Not only is the XOR operation computationally inexpensive, but it is also theoretically impossible to break as long as the key is secure.
 +
 
 +
Ciphers that rely purely on the XOR operation, such as stream ciphers such as OTP (one time pad), are susceptible to a known plaintext attack. If a plaintext <code>P</code> as well as its corresponding ciphertext <code>C</code> is known, then the keystream <code>K</code> can be trivially calculated using the operation <code>K=C^P</code>. Thus, it is important for a keystream to be unique and only be used once.
 +
 
 +
Crib dragging is a method of attacking XOR ciphers with a repeating key. One can use frequency analysis, as well as a brute-force, to calculate the plaintext and "drag" the key along the ciphertext in order to reveal the plaintext message.
 +
 
 +
== Bases ==
 +
Different number bases, most often powers of 2, are used extensively in computer science because of their compatibility with computers.
 +
 
 +
To express a string or a file as a long integer, each character in the string is looked up in an ASCII table and strung together as a string of bytes. For instance, the word "SciOly.org" would be expressed as "0x53 0x63 0x69 0x4f 0x6c 0x79 0x2e 0x6f 0x72 0x67" in hex (base 16), "01010011 01100011 01101001 01001111 01101100 01111001 00101110 01101111 01110010 01100111" in binary (base 2), and "U2NpT2x5Lm9yZw==" in base 64.
 +
 
 +
=== Binary ===
 +
A binary digit is simply a 0 or 1. Representing an integer, string, or file as a string of many binary digits is useful because it is one of the easiest ways to store information (only two different "modes" are necessary, such as high voltage and low voltage) and the logic gates for binary values are simple as well.
 +
 
 +
=== Hexadecimal ===
 +
The hexadecimal number system is often used in place of binary, not only because it is easier for humans to read but also because computers can also easily compute the binary representation of hexadecimal. In this number system, the numbers 0-15 would be expressed as 0123456789ABCDEF. Hexadecimal numbers are typically prefixed with a "0x" to be easily distinguishable. Most programming languages that accept hexadecimal numbers will expect this prefix.
 +
 
 +
=== Base 64 ===
 +
Base 64 is one of the most compact ways of expressing data as a string, since it stores more data per character. The 64 characters used are as follows:
 +
* 0-25: A-Z
 +
* 26-51: a-z
 +
* 52-61: 0-9
 +
* 62: +
 +
* 63: /
 +
 
 +
In base 64, every 4 bytes of encoded data can be decrypted to 3 bytes of unencoded data. Thus, when the unencoded data is not a multiple of 3, padding must be added in order to make the encoded data a multiple of 4. The "=" is used as a padding.
 +
 
 +
== Classical Cryptography ==
 +
Classical cryptography includes cryptosystems that are no longer in use, as extensive attacks have been developed and can be trivially implemented to reveal the message. Most classical cryptosystems encrypt plaintext, such as messages in English, rather than files. For example, all of the cipher types in [[Codebusters]] (excluding the [[Codebusters#RSA_Cipher|RSA cipher]]) are classical cryptosystems.
 +
 
 +
=== Substitution Ciphers ===
 +
In a substitution cipher, each plaintext character is replaced by a ciphertext character. In many methods of substitution, such as those involving the Morse or Baconian alphabets, the plaintext or ciphertext are actually groups of multiple characters. Either way, the plaintext is enciphered by replacing each plaintext unit with a ciphertext unit, and the ciphertext is deciphered by replacing each ciphertext unit with a plaintext unit.
 +
 
 +
The method of determining how to map plaintext and ciphertext units varies based on the cryptosystem. Some cryptosystems involve algorithms or mathematical formulas that can be used to determine the corresponding ciphertext given a plaintext unit (and vice versa), such as the Hill cipher. Other cryptosystems are simply random mappings from one alphabet to another, such as monoalphabetic substitution ciphers (e.g. Aristocrats).
 +
 
 +
=== Transposition Ciphers ===
 +
Whereas substitution ciphers replace plaintext/ciphertext units with ciphertext/plaintext units, transposition ciphers retain the same units but rearrange them using some pattern. An example of a transposition cipher would be the Railfence cipher.
 +
 
 +
== Attacks on Cryptosystems ==
 +
There are four main formal attacks that can be performed on cryptosystems: ciphertext-only attacks, known-plaintext attack (KPA), chosen-plaintext attack (CPA), and chosen-ciphertext attack (CCA).
 +
 
 +
=== Ciphertext-only attack ===
 +
A ciphertext-only attack is an attack model where an adversary only has access to a set of ciphertexts. The attack is considered to be successful if the adversary can deduce the plaintexts with the corresponding ciphertext and/or the key used to encrypt said ciphertext.
 +
 
 +
=== Known-plaintext attack ===
 +
A known-plaintext attack (KPA) is an attack model where an adversary has access to a plaintext (typically called a crib) and ciphertext pairs. The goal of the attack is to find the key used to encrypt the plaintexts.
  
===MD5===
+
=== Chosen-plaintext attack ===
 +
A chosen-ciphertext attack (CPA) is an attack model where an adversary can choose any plaintext and get its corresponding ciphertext. The goal of the attack is reveal all or part of the encryption key.
  
===SHA1===
+
Modern ciphers typically aim to provide semantic security, better known as indistinguishability under chosen-plaintext attack (IND-CPA), which reduces to being safe from CPA.
  
===SHA2===
+
==== IND-CPA ====
 +
Given the following:
 +
* An adversary, Ava, can generate <math>n</math> plaintexts.
 +
* The challenger, the encryption oracle, will then encrypt the plaintexts and return the plaintext/ciphertext pairs back to Ava.
 +
Consider the following game:
 +
* Ava sends two plaintexts <math>m_0</math> and <math>m_1</math> to the challenger.
 +
* The challenger picks a random bit <math>b \leftarrow \{0,1\}</math> and encrypts and returns back <math>E_k(m_b)</math>.
 +
* Ava guesses which plaintext was encrypted and outputs her guess <math>b'</math>.
  
===Whirlpool===
+
A cipher is considered to hold under IND-CPA if Ava can't guess <math>b</math> with a probability non-negligibility better than 1/2.
  
===Hash Length Extension Attack===
+
Formally speaking, the cipher holds under IND-CPA if:
  
===Hash Collisions===
+
<blockquote>
 +
<math>\text{P}[b'=b] \leq \frac{1}{2} + \epsilon(\lambda)</math>
 +
</blockquote>
  
==The XOR Operation==
+
Where <math>\epsilon</math> is an asymptotically negligible function and <math>\lambda</math> is the security parameter of the cipher.
  
==Bases==
+
=== Chosen-ciphertext attack ===
 +
A chosen-ciphertext attack (CCA) is an attack model where the adversary can choose any arbitrary ciphertext and receive the associated plaintext. The goal is to find the encryption key.
  
===Hexadecimal===
+
This attack, formally speaking, is the strongest form of attack as it gives the adversary the most control and information compared to the three other attack models.
  
===Base 64===
+
==== IND-CCA ====
 +
Indistinguishability under chosen-ciphertext attack (IND-CCA) is similar to the game presented for IND-CPA but gives the adversary more information:
 +
* The challenger picks a random bit <math>b \leftarrow \{0,1\}</math> and encrypts and returns back <math>E_k(m_b)</math>.
 +
* The challenger is both an encryption and decryption oracle. An adversary, Ava, has the ability to send messages to be either encrypted or decrypted.
 +
** For encryption queries, Ava sends over a pair of messages <math>(m_{i0}, m_{i1})</math>, and the challenger encrypts and sends back <math>E_k(m_{ib})</math>.
 +
** For decryption queries, Ava sends over a ciphertext <math>c_j</math> to be decrypted where <math>c_j</math> is not among any of the ciphertexts that the challenger encrypted in previous encryption queries.
 +
Now, consider the following game (same as mentioned for IND-CPA):
 +
* Ava sends two plaintexts <math>m_0</math> and <math>m_1</math> to the challenger.
 +
* Ava guesses which plaintext was encrypted and outputs her guess <math>b'</math>.
  
==Classical Cryptography==
+
Observe that Ava cannot use a decryption query on the challenge ciphertext to find the underlying plaintext, as that would make the attack trivial.
  
===Substitution Ciphers===
+
A cipher is considered to hold under IND-CCA if Ava can't guess <math>b</math> with a probability non-negligibility better than 1/2.
  
===Transposition Ciphers===
+
Formally speaking, the cipher holds under IND-CCA if:
  
===Frequency Analysis and Kaisiski Attack===
+
<blockquote>
 +
<math>\text{P}[b'=b] \leq \frac{1}{2} + \epsilon(\lambda)</math>
 +
</blockquote>
  
===Attacks on Classical Cryptosystems===
+
Where <math>\epsilon</math> is an asymptotically negligible function and <math>\lambda</math> is the security parameter of the cipher.
  
*Chosen Plaintext Attacks
+
== RSA ==
*Chosen Ciphertext Attacks
+
The RSA cryptosystem, like many other modern cryptosystems, is reliant on the computational difficulty of factoring prime numbers, though it isn't impossible.
*Known Plaintext Attacks
 
  
==RSA==
+
== Diffie-Hellman Key Exchange ==
  
===Encoding Plaintext and Decoding Ciphertext===
+
The Diffie-Hellman Key Exchange is a protocol to securely generate a secret key for both end-to-end parties to use.
  
===RSA Signatures===
+
=== Protocol ===
  
===Certificates===
+
Suppose Alice wants to send a message Bob. They would first have to agree on a symmetrical key to use first.
  
===Padding Schemes===
+
# Alice and Bob first agree on a prime <math>p</math> and a value <math>g \in (\Z_p)^*</math> where <math>(\Z_p)^* = \{1, 2, 3, ..., p - 1\}</math>.
 +
# Alice then generates a private value <math>a \in \{2, 3, ..., p - 2\}</math>, and Bob generates a private value <math>b \in \{2, 3, ..., p - 2\}</math>.*
 +
# Alice then sends her shared key <math>g^a\text{ mod }p</math> over to Bob, and Bob sends his shared key <math>g^b\text{ mod }p</math> over to Alice.
 +
# Alice calculates <math>(g^b)^a \text{ mod } p \equiv g^{ab}</math>, and Bob calculates <math>(g^a)^b \text{ mod } p \equiv g^{ab}</math>. <math>g^{ab} \text{ mod } p</math> is then used as their private key.
  
===Integer Factorization Problem===
+
<small>*In theory, <math>a, b \in \Z_p</math> will work fine, but in practice <math>\Z_p / \{1, p-1\} = \{2, 3, ..., p - 2\}</math> is used instead since <math>g^1 = g</math> and <math>g^{p-1}\text{ mod }p\equiv 1</math> for finite cyclic groups.</small>
  
===Small e Attack===
+
=== Theory ===
 +
==== The Discrete Log Problem ====
  
===Wiener's Attack===
+
The discrete logarithm problem (DLP) states:
  
===Coppersmith Attack===
+
<blockquote>
 +
Given generator <math>g \in (\Z_p)^*</math> such that the group generated by <math>g</math> (denoted by <math>G</math>) is a finite cyclic group where <math>p</math> is a large prime and <math>g^x</math> where <math>x \in (\Z_p)^*</math>, it is hard to find <math>x</math>.
 +
</blockquote>
  
===Hastad's Broadcast Attack===
+
In other words, given <math>g</math> and <math>y \in G</math>, it is hard to find <math>x</math> such that <math>g^x = y</math>.
  
===Partial Key Exposure Attack===
+
==== The Diffie-Hellman Problem ====
  
==Diffie-Hellman Key Exchange==
+
In order for the Diffie-Hellman protocol to be provably secure, we need to prove that an eavesdropper, Eve, cannot figure out the final key Alice and Bob are using (e.g. <math>g^{ab}</math>).
  
==Block Ciphers==
+
The Diffie-Hellman Problem (DHP) states that if given <math>g</math>, <math>p</math>, <math>g^a</math>, and <math>g^b</math>, it is hard to compute <math>g^{ab}</math>.
  
==Stream Ciphers==
+
If we suppose that Eve could see the messages between Alice and Bob, she would observe <math>g</math>, <math>p</math>, <math>g^a</math>, and <math>g^b</math>.
  
==Elliptic Curve Cryptography==
+
She could theoretically calculate <math>a</math> by performing discrete log on Alice's shared key (<math>g^a</math>) then find <math>g^{ab}</math> using Bob's shared key (<math>g^b</math>):
  
==Post Quantum Cryptography==
+
<blockquote>
 +
<math>a \equiv \text{log}_g(g^a)\text{ mod }p</math>
 +
 
 +
<math>(g^b)^a \equiv g^{ab}\text{ mod }p</math>
 +
</blockquote>
 +
 
 +
However, the discrete log step is considered to be computationally hard and non-trivial to solve. There currently is no better way for finding <math>g^{ab}</math> without the use of the DLP. Therefore, the DHP is considered today to be "hard." This is also known as the Computational Diffie-Hellman (CDH) assumption.
 +
 
 +
There is also the Decisional Diffie-Hellman (DDH) assumption, which states:
 +
 
 +
<blockquote>
 +
<math>(g^a, g^b, g^{ab}) \approx_c (g^a, g^b, g^c)</math> where <math>a, b, c \in (\Z_p)^*</math>
 +
</blockquote>
 +
 
 +
In other words, <math>g^{ab}</math> appears to be random and is computationally indistinguishable (<math>\approx_c</math>) from <math>g^c</math>. DDH is a stronger assumption than CDH. If CDH does not hold, it is trivial that DDH also does not hold (<math>\text{DDH holds} \rightarrow \text{CDH holds}</math>), but if DDH does not hold, it doesn't imply anything whether CDH holds or not (<math>\text{CDH holds} \nrightarrow \text{DDH holds}</math>).
 +
 
 +
=== Theoretical Attacks ===
 +
 
 +
The protocol is secure against passive adversaries, as shown as proved above. However, if an active adversary, Ava, were to perform a man-in-the-middle attack, she could forge both Alice's and Bob's shared key with her own <math>g^c</math>. Alice then calculates <math>(g^c)^a = g^{ac}</math>, Bob calculates <math>(g^c)^b = g^{bc}</math>, and Ava calculates both <math>(g^a)^c = g^{ac}</math> and <math>(g^b)^c = g^{bc}</math> allowing her to see all future messages on the channel.
 +
 
 +
There are ways to prevent an active adversary from tampering with messages during the protocol, an example of which occurs during the TLS handshake.
 +
 
 +
== Block Ciphers ==
 +
 
 +
Block ciphers are cryptosystems that operate on fixed-width blocks. It is now one of the key fundamentals for how encryption works on large pieces of data today.
 +
 
 +
There are multiple modes of block ciphers. The four most notable are electronic codebook (ECB), counter (CTR), Galois/Counter (GCM), and cipher block chaining (CBC) mode.
 +
 
 +
=== Electronic Codebook (ECB) ===
 +
 
 +
ECB mode is the most basic form of block cipher, simply taking each block and encrypting it independently of each other. While this does have parallelization benefits when encrypting and decrypting a large plaintext, ECB is not recommended for use in cryptosystems due to the lack of diffusion in the combined ciphertext.
 +
 
 +
<blockquote>
 +
<math>C_0 = E_K(P_0)</math>
 +
 
 +
<math>C_1 = E_K(P_1)</math>
 +
 
 +
<math>C_2 = E_K(P_2)</math>
 +
 
 +
<math>...</math>
 +
</blockquote>
 +
 
 +
<!-- TODO: Change to use `Template:multiple image` if that ever gets fixed and stabilized -->
 +
<div style="display: flex; justify-content: center; gap: 0.25rem">
 +
[[File:Tux.svg|thumb|196px|none|Original image]]
 +
[[File:Tux ECB.png|thumb|196px|none|Using ECB allows patterns to be easily discerned]]
 +
[[File:Tux secure.png|thumb|196px|none|Modes other than ECB result in pseudo-randomness]]
 +
</div>
 +
 
 +
=== Counter (CTR) ===
 +
 
 +
Counter mode is a step up from ECB mode, introducing an additional random input called an initialization vector (IV). Instead of directly encrypting the plaintext, nonce is encrypted, which is initially seeded with the IV and then combined with the plaintext, typically via XOR. The nonce is then incremented and used for the next block. In other words, block 0 uses IV, block 1 uses IV + 1, block 2 uses IV + 2, etc. The IV is then also sent alongside the ciphertext.
 +
 
 +
<blockquote>
 +
<math>C_0 = E_K(IV) \oplus P_0</math>
 +
 
 +
<math>C_1 = E_K(IV + 1) \oplus P_1</math>
 +
 
 +
<math>C_2 = E_K(IV + 2) \oplus P_2</math>
 +
 
 +
<math>...</math>
 +
</blockquote>
 +
 
 +
CTR is parallelizable in encryption and decryption, and unlike in ECB, the ciphertext will not leak any information about the underlying plaintext. However, an adversary can modify the underlying plaintext by trivially modifying the ciphertext.
 +
 
 +
==== Galois/Counter Mode (GCM) ====
 +
However, the underlying ciphertext modifications can be patched by adding an integrity check, like a signature. This is what GCM implements: an authenticated encryption scheme and a derivation of CTR mode. It achieves a level of authenticity by calculating a tag that can only be calculated by the individuals who have access to the public key and ciphertext (the IV is assumed to be public).
 +
 
 +
The tag is typically calculated by doing multiplicative operations using Galoid fields on value <math>H = E_K(0)</math> and each ciphertext block (<math>C_i</math>) and the length of the ciphertext (<math>|C|</math>). All those values are then summed together (XORed) and added (XORed) to <math>E_K(IV)</math>. The resulting calculation for the tag is:
 +
 
 +
<blockquote>
 +
<math>T=C_0H^{N+1}+C_1H^n+C_2H^{n-1}+...+C_nH^2+|C|H+E_K(IV)</math> where <math>n</math> is the number of blocks
 +
</blockquote>
 +
 
 +
This tag has proved to be feasibly untamperable by an adversary, given they only know <math>C</math>, <math>IV</math>, and the tag. However, if the adversary can find what <math>H</math> is, then they can change the plaintext and modify the tag trivially. This can happen if the same <math>IV</math> were to be used:
 +
 
 +
<blockquote>
 +
<math>T_1 = C_0H^{N+1}+C_1H^n+C_2H^{n-1}+...+C_nH^2+|C|H+E_K(IV)</math>
 +
 
 +
<math>T_2 = C'_0H^{N+1}+C'_1H^n+C'_2H^{n-1}+...+C'_nH^2+|C'|H+E_K(IV)</math>
 +
 
 +
<math>T_1 \oplus T_2 = (C_0H^{N+1}+C_1H^n+C_2H^{n-1}+...+C_nH^2+|C|H+E_K(IV))+(C'_0H^{N+1}+C'_1H^n+C'_2H^{n-1}+...+C'_nH^2+|C'|H+E_K(IV))</math>
 +
 
 +
<math>= (C_0H^{N+1}+C_1H^n+C_2H^{n-1}+...+C_nH^2+|C|H \oplus E_K(IV))+(C'_0H^{N+1}+C'_1H^n+C'_2H^{n-1}+...+C'_nH^2+|C'|H \oplus E_K(IV))</math> since addition in Galoid fields is XOR
 +
 
 +
<math>= (C_0H^{N+1}+C_1H^n+C_2H^{n-1}+...+C_nH^2+|C|H)+(C'_0H^{N+1}+C'_1H^n+C'_2H^{n-1}+...+C'_nH^2+|C'|H) \oplus E_K(IV)\oplus E_K(IV)</math> since XOR is associative
 +
 
 +
<math>= (C_0H^{N+1}+C_1H^n+C_2H^{n-1}+...+C_nH^2+|C|H)+(C'_0H^{N+1}+C'_1H^n+C'_2H^{n-1}+...+C'_nH^2+|C'|H)</math>
 +
 
 +
<math>= (C_0+C'_0)H^{N+1}+(C_1+C'_1)H^n+(C_2+C'_2)H^{n-1}+...+(C_n+C'_n)H^2+(|C|+|C'|)H</math>
 +
 
 +
<math>H</math> can now be solved by trivially calculating the root of the above polynomial.
 +
</blockquote>
 +
 
 +
Therefore, in order to prevent such an attack, <math>IV</math> must not be reused.
 +
 
 +
=== Cipher Block Chaining (CBC) ===
 +
 
 +
CBC mode uses previously encrypted blocks to "chain" or entangle future blocks. This can be done simply by XORing the ciphertext of the previous block with the current block's plaintext. A random IV is typically used instead of the first block in plaintext.
 +
 
 +
<blockquote>
 +
<math>C_0 = E_K(P_0 \oplus IV)</math>
 +
 
 +
<math>C_1 = E_K(P_1 \oplus C_0)</math>
 +
 
 +
<math>C_2 = E_K(P_2 \oplus C_1)</math>
 +
 
 +
<math>...</math>
 +
</blockquote>
 +
 
 +
Due to the dependencies of needing the previous ciphertext block, encryption is not parallelizable, but decryption is.
 +
 
 +
In early versions of TLS (<= TLS 1.0/SSL 3.0), the IV used was implicit (as future IVs were the previous ciphertext block). This makes the basic version of CBC vulnerable to CPA, as demonstrated in the BEAST attack. In TLS 1.1, this was fixed by first encrypting the initial IV and using the encrypted mask in the first block instead of the raw IV.
 +
 
 +
== Stream Ciphers ==
 +
Stream ciphers are symmetric key cryptosystems that use a generator to generate pseudorandom values to encrypt a message. They are heavily inspired by the One-Time Pad cipher. While they do not provide perfect secrecy because the key size is typically much smaller than the plaintext message, secure steam ciphers are all semantically secure (i.e., two messages encrypted with the same key are indistinguishable from each other).
 +
 
 +
An example of a stream cipher using a pseudorandom number generator (PRNG) <math>G</math>:
 +
<blockquote>
 +
<math>E_k(m) = m \oplus G(k)</math>
 +
 
 +
<math>D_k(c) = c \oplus G(k)</math>
 +
</blockquote>
 +
 
 +
In the context of cryptography, a PRNG <math>G</math> is considered secure if and only if:
 +
<blockquote>
 +
<math>\{k \leftarrow \{0,1\}^\lambda:G(k)\} \approx_c \text{uniform}\{\{0,1\}^{\lambda n}\}</math>
 +
where <math>n</math> is the stretch factor of <math>G</math> (i.e. <math>n = |G(k)|/|k|</math>)
 +
</blockquote>
 +
In other words, <math>G</math> must be computationally indistinguishable from a true uniformly random string; the difference in probabilities between an adversary being able to guess between a PRNG and random bit sequence is a negligible inverse non-polynomial probability.
 +
 
 +
Given that <math>G</math> is a secure cipher, it can be proved that the above steam cipher example is semantically secure.
 +
 
 +
== Elliptic Curve Cryptography ==
 +
 
 +
Elliptic Curve Cryptography proposes a new way of generating finite cyclical groups. It serves as an alternative to using the <math>(\Z_p)^*</math> domain for public key encryptions, key exchanges, and signatures, as RSA and Diffie-Hellman tend to require substantially large security parameters compared to, say, AES.
 +
{| class="wikitable"
 +
|+ Bit lengths of public-key algorithms with different security levels
 +
|-
 +
! rowspan="2" | Algorithm family
 +
! rowspan="2" | Cryptosystem
 +
! colspan="4" | Security level (bit)
 +
|-
 +
! 80
 +
! 128
 +
! 192
 +
! 256
 +
|-
 +
| Integer factorization || RSA || 1024 || 3072 || 7680 || 15360
 +
|-
 +
| Discrete logarithm || DH, DSA, El Gamal || 1024 || 3072 || 7680 || 15360
 +
|-
 +
| Elliptic curves || ECDH, ECDSA || 160 || 256 || 384 || 512
 +
|-
 +
| Symmetric-key || AES, 3DES || 80 || 128 || 192 || 256
 +
|}
 +
=== Defining an Elliptical Curve ===
 +
 
 +
The elliptical curve over <math>\Z_p</math>, where prime <math>p > 3</math>, is the set of all coordinate pairs from the equation:
 +
 
 +
<blockquote>
 +
<math>y^2 \equiv x^3 + ax + b \text{ mod } p</math>
 +
</blockquote>
 +
 
 +
including an imaginary point at infinity <math>\mathcal{O}</math> where:
 +
<blockquote>
 +
<math>a, b \in \Z_p</math> and <math>4 a^3 + 27b^2 \not\equiv 0 \text{ mod } p</math>
 +
</blockquote>
 +
 
 +
If the elliptic curve does not satisfy the above definition, it is not suitable for use in cryptography.
 +
 
 +
=== Finding a cyclic group ===
 +
A cyclic group <math>G</math> must satisfy the following criteria:
 +
 
 +
# It is composed of a set of elements
 +
# The group operation used must satisfy the group laws
 +
## Closedness (<math>\forall x,y,z \in G\text{, }x \circ y \equiv z</math>)
 +
## Associativity (<math>\forall x,y,z \in G\text{, }x \circ (y \circ z) \equiv (x \circ y) \circ z</math>)
 +
## Identity (<math>\forall x \in G\text{, }x \circ N \equiv x</math> where <math>N</math> is the neutral element)
 +
## Inverseness (<math>\forall x \in G\text{, }x \circ x^{-1} \equiv N</math> where <math>N</math> is the neutral element and <math>x^{-1}</math> represents the inverse of <math>x</math>)
 +
 
 +
If it just so happens that the group operation is also communicative (<math>\forall x, y \in G \text{, } x + y \equiv y + x</math>), then the group is an Abliean group.
 +
 
 +
In order to generate a cyclical group from an elliptic curve, we use two operations:
 +
 
 +
# Point addition (<math>P + Q</math>)
 +
# Point doubling (<math>2P</math> or <math>P + P</math>)
 +
 
 +
Do take note that point addition is NOT the same as arithmetic addition, as will be explained later.
 +
 
 +
Formally, the operations are as follows:
 +
<blockquote>
 +
Given points <math>P = (x_1, y_1)</math>, <math>Q = (x_2, y_2)</math> on an elliptic curve <math>E = y^2 \equiv x^3 + ax + b \text{ mod }p</math>:
 +
 
 +
<math>
 +
x_3 = s^2 - x_1 - x_2 \text{ mod } p \\
 +
y_3 = s(x_1 - x_3) - y_1 \text{ mod } p
 +
</math>
 +
 
 +
where <math>s = \begin{cases}
 +
          \frac{y_2 - y_1}{x_2 - x_1} & P \neq Q \text{ (point addition)} \\
 +
          \frac{3x_1^2 + a}{y_1} & P = Q \text{ (point doubling)} \\
 +
      \end{cases}</math>
 +
</blockquote>
 +
 
 +
The resulting point <math>(x_3, y_3)</math> lands on the same elliptic curve (closeness), and the operation fulfills associativity.
 +
 
 +
The neutral element of the group will be an imaginary point at infinity symbolled as <math>\mathcal{O}</math>:
 +
 
 +
<blockquote>
 +
<math>\forall P \in E \text{, } P + \mathcal{O} = P</math>
 +
</blockquote>
 +
 
 +
The inverse of each point <math>P = (x_1, y_1)</math> is equal to <math>(x_1, -y_1)</math>, which is denoted as <math>-P</math>, which is not to be confused by the algebraic negation:
 +
 
 +
<blockquote>
 +
<math>\forall P \in E \text{, } P + (-P) = \mathcal{O}</math>
 +
</blockquote>
 +
 
 +
With the following operation on elliptic curves, the following theorem holds:
 +
 
 +
<blockquote>
 +
The points on an elliptic curve, including <math>\mathcal{O}</math>, have cyclic subgroups. Under certain conditions, all points on an elliptic curve form a cyclic group.
 +
</blockquote>
 +
 
 +
=== A geometrical representation ===
 +
Please note that graphing points will not work when working in a modulo space (e.g., <math>\Z_p</math>), which is what computers implement in practice. This is just to provide a visual understanding of what each operation does. If you want to programmatically implement the elliptic curve operations, follow the formal definition above.
 +
 
 +
==== Point addition ====
 +
Point addition visually draws an intersection line that contains the two given points, <math>P</math> and <math>Q</math>. Due to the nature of the elliptical curve, the line will intersect as three locations on the curve, two of which were the given points. The remaining intersection point is flipped over the x-axis, and the result is the operation <math>P+Q</math>.
 +
[[File:Elliptic-curve-point-addition-abstract-drawing.jpg|thumb|center|Abstract drawing demonstrating point addition on an elliptical curve. Drawing not to scale.]]
 +
 
 +
==== Point doubling ====
 +
While point doubling doesn't seem intuitively similar to point addition, it performs an almost identical operation. Instead, if we want to calculate <math>2P</math> or <math>P+P</math>, we take the tangent line at point <math>P</math>. Due to the nature of the elliptical curve, the tangent will have only two locations on the curve, one of which is <math>P</math>. The remaining intersection point is flipped over the x-axis, and the result is the operation <math>2P</math>.
 +
[[File:Elliptic-curve-point-doubling-abstract-drawing.jpg|thumb|center|Abstract drawing demonstrating point doubling on an elliptical curve. Drawing not to scale.]]
 +
 
 +
=== Elliptic Curve Discrete Log Problem ===
 +
 
 +
=== Elliptic Curve Diffie-Hellman ===
 +
 
 +
== Post Quantum Cryptography ==
 
Quantum computers can break some cryptographic functions (e.g. prime factorization through Shor's algorithm). Post-quantum cryptography uses properties of quantum mechanics, such as entanglement, to protect against quantum computer attacks.
 
Quantum computers can break some cryptographic functions (e.g. prime factorization through Shor's algorithm). Post-quantum cryptography uses properties of quantum mechanics, such as entanglement, to protect against quantum computer attacks.
  
=Web architecture=
+
=== Shor's Algorithm ===
 +
Shor's algorithm utilizes classical algorithms which would be computationally slow when run with classical computers. However, when this algorithm is used with a quantum computer, it can factor general prime numbers in a computationally fast manner. It is only able to do so through a reduction, which is the redefinition of a problem in terms of an "easier" problem (that is, one that has a more optimal computational complexity). In particular, Shor's algorithm reduces the factoring problem into a problem relying primarily on greatest common divisor (GCD) computation (usually can be performed quickly using the Euclidean algorithm), modular exponentiation (which has many properties that allow for it to be done relatively quickly using quantum circuit implementations), and period-finding (the focus of Shor's algorithm as it is where the biggest speedup occurs). The itself algorithm is briefly described below.
 +
 
 +
First, a prime number <math>N</math> is chosen, along with an arbitrary <math>a \in \left(1, N\right)</math>. If <math>\text{gcd}\left(a, N\right) = p \neq 1</math> (that is, <math>a</math> is not coprime with <math>N</math> and shares a factor <math>p</math> which isn't <math>1</math>), then <math>N</math> has been factored. If it is not, the next step is to find the period <math>r</math> of <math>a^x\ \text{mod}\ N</math> using period-finding, where <math>r</math> must be even. If it is not, a different <math>a</math> must be chosen and the process must be restarted. Since, by definition, <math>a^r = 1\ \text{mod}\ N</math>, <math>a^r - 1</math> must be a multiple of <math>N</math>. Although this all may not seem very useful, the magic of Shor's algorithm lies in the following finding: we may factor <math>a^r - 1</math> into <math>\left(a^{r/2} - 1\right) \cdot \left(a^{r/2} + 1\right)</math>. The first term cannot be equivalent to <math>a^r - 1</math>, so it cannot be a multiple of <math>N</math>. To continue, it is assumed that the second term is also not a multiple of <math>N</math> (if it were, a different <math>a</math> would have to be chosen and the process would be to be restarted). Since neither terms of this factoring are multiples of N, but their product <math>a^r - 1</math> is a multiple of N, they must each consist of a distinct prime factor that is also a factor of <math>N</math> since, for a number to be a multiple of another number, it must have the same factors and more. If either <math>a^{r/2} - 1</math> or <math>a^{r/2} + 1</math> are prime numbers, then that must be one of the prime factors, which can easily be verified.
 +
 
 +
Although one execution of this algorithm does not immediately result in all (or even any necessarily) of the prime factors of <math>N</math>, knowing that <math>a^{r/2} \pm 1</math> is/are factor(s) of <math>N</math> provides a "better guess" for the prime factors of <math>N</math>. Thus, the entire algorithm can be repeated, instead starting from <math>\text{gcd}\left(a^{r/2} \pm 1, N\right)</math>. Although this may seem like a computationally complex step, it is relatively fast. Instead, the slowdown occurs in finding the period <math>r</math>. In order to do so, <math>a</math> must be raised to every number between <math>1</math> and <math>N</math>. Shor's Algorithm is centered around the quantum circuit implementation for period-finding, which involves the use of the quantum Fourier transform (simply the classical Fourier transform applied to superpositions) in order to find the frequency of <math>a^x\ \text{mod}\ N</math>. Since the period is related to the frequency, this allows for the computer to calculate the period of the expression very quickly by scanning through all possible powers and finding the frequency at which these powers they occur all at once using quantum superposition states instead of calculating each separately and finding which one is equal to <math>r</math>.
 +
 
 +
= Web architecture =
 +
 
 +
== HTML/CSS/JS ==
 +
The World Wide Web Consortium (W3C) is an international council that standardizes web architecture through their specifications for [https://dev.w3.org/html5/html-author/ HTML], [https://www.w3.org/TR/css-2020/ CSS], and Javascript. The Web Hypertext Application Technology Working Group (WHATWG), of which Google, Mozilla, Apple, and Microsoft are leading members, also standardizes [https://spec.whatwg.org/ many aspects of web architecture]. WHATWG specifications are the widely-accepted standard for HTML/CSS/JS development as they are frequently-updated. [https://developer.mozilla.org/en-US/docs/Learn Mozilla's web docs] tend to be the most up-to-date repository for web technologies information. W3Schools and Stack Overflow often don't follow industry best practices. However, it is worth noting that standards are not followed by all browsers, even major ones such as Safari. As a result, cross-browser compatibility is a major issue that must be considered when developing web applications, given that browser developers are free to implement whichever standards they choose however they want.
 +
 
 +
=== APIs ===
 +
Application Programming Interfaces (APIs) are connections (interfaces) between two or more endpoints, such as programs or systems, that allow for the endpoints to communicate and share functionality. In the context of web browsers, many APIs exist, such as the File API (which allows websites to interact with a device or browser's file system), the Sound API (which allows web developers to interact with a device's audio input and output hardware), graphics APIs (which allow websites to interact with a device's graphics hardware in order to display complex graphics), and the Notifications API (which allows websites to interact with the device's notification manager to send custom notifications triggered by the website or its server). APIs are not inherently supported on a website, browser, or even device. Rather, the website must require access to or include the API library in its code, the browser must implement this functionality, and the device must have the functionality needed (if the API in question requires device access).
 +
 
 +
One of the major issues concerning API security revolves around improper communications and insecure requests made through APIs, especially those that give the website external access, such as the ability to communicate with other websites or the user's device. In fact, unintended bugs in any piece of code could have adverse effects on other associated processes. In order to protect against these vulnerabilities, modern web browsers, such as those built on top of Chromium, use a method known as "sandboxing", where any insecure processes, including websites and API implementations, run in isolated environments known as sandboxes, which cause no harm to other processes if they are damaged or compromised. To interact with anything outside of its own sandbox, a process must use the functionality implemented and granted by the browser's inter-process communication (IPC) system. Although this is perfect in theory, it is not sufficient in developing a secure browser, especially as many browsers knowingly put multiple processes into the same sandboxes in order to meet system performance requirements.
  
==HTML/CSS/JS==
+
=== Protocols ===
[https://developer.mozilla.org/en-US/docs/Learn Mozilla's web docs] tend to be the most up-to-date repository for web technologies information. W3C Schools and Stack Overflow often don't follow industry best practices.
+
The Hypertext Transport Protocol (HTTP) is a protocol used by websites and other programs to communicate through the internet. Modern websites use HTTPS, which is simply an extension of HTTP that incorporates encryption through Transport Layer Security (TLS) in order to make HTTP requests more secure.
  
===Protocols===
+
==== Requests ====
Modern websites use HTTPS, a descendant of the Hypertext Transport Protocol (HTTP) that incorporates encryption through Transport Layer Security (TLS).
+
Requests are messages sent (usually) by user-end applications to some external endpoint, such as a server. There are multiple types of requests, known as methods or verbs.
  
==Programming==
+
* GET - Usually used to retrieve information from the endpoint, such as data in a database
Primality algorithm:
+
* POST - Usually used to send information to the endpoint (more specifically, POST is usually used to create new entries of data)
 +
* PUT - Usually used to send updated information to the endpoint (more specifically, PUT is usually used to replace an existing entry of data)
 +
* PATCH - Usually used to send modifications to the endpoint (more specifically, PATCH is usually used to modify portions of an existing entry of data)
 +
* DELETE - Usually used to request that a specific piece of data be deleted from the endpoint
  
==Resources==
+
== Resources ==
[https://scilympiad.com/data/org/bearso/public/Cybersecurity.pdf Cybersecurity Rules - BEARSO 2020]
+
: [https://www.soinc.org/sites/default/files/uploaded_files/CybersecurityC.pdf Cybersecurity Rules - Southern California Trial Event (9.30.2020)]
 +
: [http://www.usaco.org/ USACO]
 +
: [https://picoctf.org/ PicoCTF]
 +
: [https://www.uscyberpatriot.org/ CyberPatriot]
 +
: [https://toc.cryptobook.us/book.pdf A Graduate Course in Applied Cryptography]
 +
: [https://www.crypto-textbook.com/ Understanding Cryptography]
  
[[Category:Trial Event Pages]]
+
[[Category:Trial events]]
[[Category:Lab Event Pages]]
+
[[Category:Lab events]]
[[Category:Event Pages]]
+
[[Category:Events]]

Latest revision as of 00:03, 3 April 2024


Cybersecurity is a Division B and Division C event that was first run as a trial event at the 2021 BEARSO Invitational to replace Ping Pong Parachute. The event consists of two parts: a written test on Cryptography, Web Architecture, and Principles of Cybersecurity and a hands-on task on Cryptography and Programming. The event was also run at the November Scilympiad Practice, Yosemite Invitational, and Science Olympiad at Penn State Invitational.

The event was also run as a Division C trial event at the 2022 National Tournament.

Cryptography

Hash algorithms

A hash algorithm is a one-way function that maps data, such as a string or a file, to a hash, or a "digest" - a string of data that is much shorter in length. Hash functions are always deterministic. If two equal inputs are hashed two separate times, the digest will always be the same. A hash can be used as a checksum to validate that a file has not been altered, since if a single bit of information was changed, the checksum would change. Hash functions are also designed to decrease the risk of hash collisions. Since the hashed digest of an input reduces its size significantly, hash collisions can occur when two inputs map to the same output. Hash functions are used in digital signatures, signing and authentication algorithms, and passwords.

A good hash algorithm has the following characteristics:

  • It is hard to find collisions.
  • It is irreversible.
  • It has to be deterministic.

Passwords are one of the most important applications of hashing algorithms. When a password is inputted, a hash of the password is calculated, and compared to the hashed value of your original password. Thus, no plaintext passwords should be saved server-side, which would reduce the damage in the event of a data breach.

MD5

The MD5 hashing algorithm produces 16 byte digests (128 bits) with block sizes of 64 bytes (512 bits). It was first created in 1991 by Ronald Rivest. Multiple vulnerabilities have been exposed with the MD5 hashing algorithm, and collisions can be calculated in less than a second on a typical computer. Thus, MD5 is considered extremely cryptographically insecure; however, many programs and applications continue to use it.

SHA1

SHA1, which stands for Secure Hash Algorithm 1, was created in 1995. The algorithm produces 20 byte digests (160 bits) of data using block sizes of 512 bits. 80 rounds of hashing is done under this hashing algorithm. Hash collisions can be calculated with 2^60.3 to 2^65.3 operations, and collisions have been calculated. In addition, SHA1 is built using the Merkle-Damgård construction, so it is also prone to length extension attacks.

SHA2

SHA2 is a set of six hash functions, namely SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256. Depending on the name of the function, each hash algorithm produces a digest size with a different amount of bits. Either 64 or 80 rounds of hashing are performed on the plaintext. Similarly to SHA1, SHA2 hashes are vulnerable to hash length extension attacks. Currently, there are no known full collisions of SHA2 hashes.

Hash Length Extension Attack

Algorithms that are based on the Merkle-Damgård construction are vulnerable to hash length extension attacks, including MD5, SHA1, and all SHA2 hashes. Because the hash digest is essentially a snapshot of the internal state of the hash, the internal state can be recreated with the hash. In addition, because the hashing algorithm is performed on blocks of data, one can compute HASH(UNKNOWN + CUSTOM MESSAGE) only knowing the length and hash digest of HASH(UNKNOWN) by continuing the hashing algorithm from the known internal state. This attack is significant because it allows attackers to forge requests. One patch of this attack is to hash the unknown secret twice: for instance, HASH(HASH(UNKNOWN)) could be used.

Hash Collisions

A hash collision has occurred when two plaintexts, hashed with the same algorithm, has produced the same digest. All hashing algorithms have an infinite amount of collisions because the plaintext can be infinitely long, yet the length of the digest is finite. However, because of the length of the hash, it can be extremely hard to produce a collision.

The XOR Operation

The XOR operator is a binary operator that takes two bits of data and outputs one bit of data. XOR sounds for exclusive or; it returns true if the two inputs are different and false if the inputs are the same. In Java and Python, the XOR operator is denoted with the ^ symbol (not to be confused with exponentiation, which is typically represented with the ** operator). In contrast, SageMath uses the ^ symbol to represent exponentiation, while ^^ is the XOR operator.

This is an XOR table which represents the inputs as well as the outputs:

Input 1 Input 2 Output
True True False
True False True
False True True
False False False

In addition, the XOR operator is commutative (A^B==B^A), associative ((A^B)^C==A^(B^C)), and is its own inverse (A^A==0, B^A^A=B), meaning that it is symmetric and reversible. The latter is especially important, as this is a very simple way to reveal a plaintext or calculate a ciphertext using the same encryption/decryption function.

In cryptography, the XOR cipher is used extremely frequently in stream ciphers and block ciphers, as well as many others. Not only is the XOR operation computationally inexpensive, but it is also theoretically impossible to break as long as the key is secure.

Ciphers that rely purely on the XOR operation, such as stream ciphers such as OTP (one time pad), are susceptible to a known plaintext attack. If a plaintext P as well as its corresponding ciphertext C is known, then the keystream K can be trivially calculated using the operation K=C^P. Thus, it is important for a keystream to be unique and only be used once.

Crib dragging is a method of attacking XOR ciphers with a repeating key. One can use frequency analysis, as well as a brute-force, to calculate the plaintext and "drag" the key along the ciphertext in order to reveal the plaintext message.

Bases

Different number bases, most often powers of 2, are used extensively in computer science because of their compatibility with computers.

To express a string or a file as a long integer, each character in the string is looked up in an ASCII table and strung together as a string of bytes. For instance, the word "SciOly.org" would be expressed as "0x53 0x63 0x69 0x4f 0x6c 0x79 0x2e 0x6f 0x72 0x67" in hex (base 16), "01010011 01100011 01101001 01001111 01101100 01111001 00101110 01101111 01110010 01100111" in binary (base 2), and "U2NpT2x5Lm9yZw==" in base 64.

Binary

A binary digit is simply a 0 or 1. Representing an integer, string, or file as a string of many binary digits is useful because it is one of the easiest ways to store information (only two different "modes" are necessary, such as high voltage and low voltage) and the logic gates for binary values are simple as well.

Hexadecimal

The hexadecimal number system is often used in place of binary, not only because it is easier for humans to read but also because computers can also easily compute the binary representation of hexadecimal. In this number system, the numbers 0-15 would be expressed as 0123456789ABCDEF. Hexadecimal numbers are typically prefixed with a "0x" to be easily distinguishable. Most programming languages that accept hexadecimal numbers will expect this prefix.

Base 64

Base 64 is one of the most compact ways of expressing data as a string, since it stores more data per character. The 64 characters used are as follows:

  • 0-25: A-Z
  • 26-51: a-z
  • 52-61: 0-9
  • 62: +
  • 63: /

In base 64, every 4 bytes of encoded data can be decrypted to 3 bytes of unencoded data. Thus, when the unencoded data is not a multiple of 3, padding must be added in order to make the encoded data a multiple of 4. The "=" is used as a padding.

Classical Cryptography

Classical cryptography includes cryptosystems that are no longer in use, as extensive attacks have been developed and can be trivially implemented to reveal the message. Most classical cryptosystems encrypt plaintext, such as messages in English, rather than files. For example, all of the cipher types in Codebusters (excluding the RSA cipher) are classical cryptosystems.

Substitution Ciphers

In a substitution cipher, each plaintext character is replaced by a ciphertext character. In many methods of substitution, such as those involving the Morse or Baconian alphabets, the plaintext or ciphertext are actually groups of multiple characters. Either way, the plaintext is enciphered by replacing each plaintext unit with a ciphertext unit, and the ciphertext is deciphered by replacing each ciphertext unit with a plaintext unit.

The method of determining how to map plaintext and ciphertext units varies based on the cryptosystem. Some cryptosystems involve algorithms or mathematical formulas that can be used to determine the corresponding ciphertext given a plaintext unit (and vice versa), such as the Hill cipher. Other cryptosystems are simply random mappings from one alphabet to another, such as monoalphabetic substitution ciphers (e.g. Aristocrats).

Transposition Ciphers

Whereas substitution ciphers replace plaintext/ciphertext units with ciphertext/plaintext units, transposition ciphers retain the same units but rearrange them using some pattern. An example of a transposition cipher would be the Railfence cipher.

Attacks on Cryptosystems

There are four main formal attacks that can be performed on cryptosystems: ciphertext-only attacks, known-plaintext attack (KPA), chosen-plaintext attack (CPA), and chosen-ciphertext attack (CCA).

Ciphertext-only attack

A ciphertext-only attack is an attack model where an adversary only has access to a set of ciphertexts. The attack is considered to be successful if the adversary can deduce the plaintexts with the corresponding ciphertext and/or the key used to encrypt said ciphertext.

Known-plaintext attack

A known-plaintext attack (KPA) is an attack model where an adversary has access to a plaintext (typically called a crib) and ciphertext pairs. The goal of the attack is to find the key used to encrypt the plaintexts.

Chosen-plaintext attack

A chosen-ciphertext attack (CPA) is an attack model where an adversary can choose any plaintext and get its corresponding ciphertext. The goal of the attack is reveal all or part of the encryption key.

Modern ciphers typically aim to provide semantic security, better known as indistinguishability under chosen-plaintext attack (IND-CPA), which reduces to being safe from CPA.

IND-CPA

Given the following:

  • An adversary, Ava, can generate [math]\displaystyle{ n }[/math] plaintexts.
  • The challenger, the encryption oracle, will then encrypt the plaintexts and return the plaintext/ciphertext pairs back to Ava.

Consider the following game:

  • Ava sends two plaintexts [math]\displaystyle{ m_0 }[/math] and [math]\displaystyle{ m_1 }[/math] to the challenger.
  • The challenger picks a random bit [math]\displaystyle{ b \leftarrow \{0,1\} }[/math] and encrypts and returns back [math]\displaystyle{ E_k(m_b) }[/math].
  • Ava guesses which plaintext was encrypted and outputs her guess [math]\displaystyle{ b' }[/math].

A cipher is considered to hold under IND-CPA if Ava can't guess [math]\displaystyle{ b }[/math] with a probability non-negligibility better than 1/2.

Formally speaking, the cipher holds under IND-CPA if:

[math]\displaystyle{ \text{P}[b'=b] \leq \frac{1}{2} + \epsilon(\lambda) }[/math]

Where [math]\displaystyle{ \epsilon }[/math] is an asymptotically negligible function and [math]\displaystyle{ \lambda }[/math] is the security parameter of the cipher.

Chosen-ciphertext attack

A chosen-ciphertext attack (CCA) is an attack model where the adversary can choose any arbitrary ciphertext and receive the associated plaintext. The goal is to find the encryption key.

This attack, formally speaking, is the strongest form of attack as it gives the adversary the most control and information compared to the three other attack models.

IND-CCA

Indistinguishability under chosen-ciphertext attack (IND-CCA) is similar to the game presented for IND-CPA but gives the adversary more information:

  • The challenger picks a random bit [math]\displaystyle{ b \leftarrow \{0,1\} }[/math] and encrypts and returns back [math]\displaystyle{ E_k(m_b) }[/math].
  • The challenger is both an encryption and decryption oracle. An adversary, Ava, has the ability to send messages to be either encrypted or decrypted.
    • For encryption queries, Ava sends over a pair of messages [math]\displaystyle{ (m_{i0}, m_{i1}) }[/math], and the challenger encrypts and sends back [math]\displaystyle{ E_k(m_{ib}) }[/math].
    • For decryption queries, Ava sends over a ciphertext [math]\displaystyle{ c_j }[/math] to be decrypted where [math]\displaystyle{ c_j }[/math] is not among any of the ciphertexts that the challenger encrypted in previous encryption queries.

Now, consider the following game (same as mentioned for IND-CPA):

  • Ava sends two plaintexts [math]\displaystyle{ m_0 }[/math] and [math]\displaystyle{ m_1 }[/math] to the challenger.
  • Ava guesses which plaintext was encrypted and outputs her guess [math]\displaystyle{ b' }[/math].

Observe that Ava cannot use a decryption query on the challenge ciphertext to find the underlying plaintext, as that would make the attack trivial.

A cipher is considered to hold under IND-CCA if Ava can't guess [math]\displaystyle{ b }[/math] with a probability non-negligibility better than 1/2.

Formally speaking, the cipher holds under IND-CCA if:

[math]\displaystyle{ \text{P}[b'=b] \leq \frac{1}{2} + \epsilon(\lambda) }[/math]

Where [math]\displaystyle{ \epsilon }[/math] is an asymptotically negligible function and [math]\displaystyle{ \lambda }[/math] is the security parameter of the cipher.

RSA

The RSA cryptosystem, like many other modern cryptosystems, is reliant on the computational difficulty of factoring prime numbers, though it isn't impossible.

Diffie-Hellman Key Exchange

The Diffie-Hellman Key Exchange is a protocol to securely generate a secret key for both end-to-end parties to use.

Protocol

Suppose Alice wants to send a message Bob. They would first have to agree on a symmetrical key to use first.

  1. Alice and Bob first agree on a prime [math]\displaystyle{ p }[/math] and a value [math]\displaystyle{ g \in (\Z_p)^* }[/math] where [math]\displaystyle{ (\Z_p)^* = \{1, 2, 3, ..., p - 1\} }[/math].
  2. Alice then generates a private value [math]\displaystyle{ a \in \{2, 3, ..., p - 2\} }[/math], and Bob generates a private value [math]\displaystyle{ b \in \{2, 3, ..., p - 2\} }[/math].*
  3. Alice then sends her shared key [math]\displaystyle{ g^a\text{ mod }p }[/math] over to Bob, and Bob sends his shared key [math]\displaystyle{ g^b\text{ mod }p }[/math] over to Alice.
  4. Alice calculates [math]\displaystyle{ (g^b)^a \text{ mod } p \equiv g^{ab} }[/math], and Bob calculates [math]\displaystyle{ (g^a)^b \text{ mod } p \equiv g^{ab} }[/math]. [math]\displaystyle{ g^{ab} \text{ mod } p }[/math] is then used as their private key.

*In theory, [math]\displaystyle{ a, b \in \Z_p }[/math] will work fine, but in practice [math]\displaystyle{ \Z_p / \{1, p-1\} = \{2, 3, ..., p - 2\} }[/math] is used instead since [math]\displaystyle{ g^1 = g }[/math] and [math]\displaystyle{ g^{p-1}\text{ mod }p\equiv 1 }[/math] for finite cyclic groups.

Theory

The Discrete Log Problem

The discrete logarithm problem (DLP) states:

Given generator [math]\displaystyle{ g \in (\Z_p)^* }[/math] such that the group generated by [math]\displaystyle{ g }[/math] (denoted by [math]\displaystyle{ G }[/math]) is a finite cyclic group where [math]\displaystyle{ p }[/math] is a large prime and [math]\displaystyle{ g^x }[/math] where [math]\displaystyle{ x \in (\Z_p)^* }[/math], it is hard to find [math]\displaystyle{ x }[/math].

In other words, given [math]\displaystyle{ g }[/math] and [math]\displaystyle{ y \in G }[/math], it is hard to find [math]\displaystyle{ x }[/math] such that [math]\displaystyle{ g^x = y }[/math].

The Diffie-Hellman Problem

In order for the Diffie-Hellman protocol to be provably secure, we need to prove that an eavesdropper, Eve, cannot figure out the final key Alice and Bob are using (e.g. [math]\displaystyle{ g^{ab} }[/math]).

The Diffie-Hellman Problem (DHP) states that if given [math]\displaystyle{ g }[/math], [math]\displaystyle{ p }[/math], [math]\displaystyle{ g^a }[/math], and [math]\displaystyle{ g^b }[/math], it is hard to compute [math]\displaystyle{ g^{ab} }[/math].

If we suppose that Eve could see the messages between Alice and Bob, she would observe [math]\displaystyle{ g }[/math], [math]\displaystyle{ p }[/math], [math]\displaystyle{ g^a }[/math], and [math]\displaystyle{ g^b }[/math].

She could theoretically calculate [math]\displaystyle{ a }[/math] by performing discrete log on Alice's shared key ([math]\displaystyle{ g^a }[/math]) then find [math]\displaystyle{ g^{ab} }[/math] using Bob's shared key ([math]\displaystyle{ g^b }[/math]):

[math]\displaystyle{ a \equiv \text{log}_g(g^a)\text{ mod }p }[/math]

[math]\displaystyle{ (g^b)^a \equiv g^{ab}\text{ mod }p }[/math]

However, the discrete log step is considered to be computationally hard and non-trivial to solve. There currently is no better way for finding [math]\displaystyle{ g^{ab} }[/math] without the use of the DLP. Therefore, the DHP is considered today to be "hard." This is also known as the Computational Diffie-Hellman (CDH) assumption.

There is also the Decisional Diffie-Hellman (DDH) assumption, which states:

[math]\displaystyle{ (g^a, g^b, g^{ab}) \approx_c (g^a, g^b, g^c) }[/math] where [math]\displaystyle{ a, b, c \in (\Z_p)^* }[/math]

In other words, [math]\displaystyle{ g^{ab} }[/math] appears to be random and is computationally indistinguishable ([math]\displaystyle{ \approx_c }[/math]) from [math]\displaystyle{ g^c }[/math]. DDH is a stronger assumption than CDH. If CDH does not hold, it is trivial that DDH also does not hold ([math]\displaystyle{ \text{DDH holds} \rightarrow \text{CDH holds} }[/math]), but if DDH does not hold, it doesn't imply anything whether CDH holds or not ([math]\displaystyle{ \text{CDH holds} \nrightarrow \text{DDH holds} }[/math]).

Theoretical Attacks

The protocol is secure against passive adversaries, as shown as proved above. However, if an active adversary, Ava, were to perform a man-in-the-middle attack, she could forge both Alice's and Bob's shared key with her own [math]\displaystyle{ g^c }[/math]. Alice then calculates [math]\displaystyle{ (g^c)^a = g^{ac} }[/math], Bob calculates [math]\displaystyle{ (g^c)^b = g^{bc} }[/math], and Ava calculates both [math]\displaystyle{ (g^a)^c = g^{ac} }[/math] and [math]\displaystyle{ (g^b)^c = g^{bc} }[/math] allowing her to see all future messages on the channel.

There are ways to prevent an active adversary from tampering with messages during the protocol, an example of which occurs during the TLS handshake.

Block Ciphers

Block ciphers are cryptosystems that operate on fixed-width blocks. It is now one of the key fundamentals for how encryption works on large pieces of data today.

There are multiple modes of block ciphers. The four most notable are electronic codebook (ECB), counter (CTR), Galois/Counter (GCM), and cipher block chaining (CBC) mode.

Electronic Codebook (ECB)

ECB mode is the most basic form of block cipher, simply taking each block and encrypting it independently of each other. While this does have parallelization benefits when encrypting and decrypting a large plaintext, ECB is not recommended for use in cryptosystems due to the lack of diffusion in the combined ciphertext.

[math]\displaystyle{ C_0 = E_K(P_0) }[/math]

[math]\displaystyle{ C_1 = E_K(P_1) }[/math]

[math]\displaystyle{ C_2 = E_K(P_2) }[/math]

[math]\displaystyle{ ... }[/math]

Original image
Using ECB allows patterns to be easily discerned
Modes other than ECB result in pseudo-randomness

Counter (CTR)

Counter mode is a step up from ECB mode, introducing an additional random input called an initialization vector (IV). Instead of directly encrypting the plaintext, nonce is encrypted, which is initially seeded with the IV and then combined with the plaintext, typically via XOR. The nonce is then incremented and used for the next block. In other words, block 0 uses IV, block 1 uses IV + 1, block 2 uses IV + 2, etc. The IV is then also sent alongside the ciphertext.

[math]\displaystyle{ C_0 = E_K(IV) \oplus P_0 }[/math]

[math]\displaystyle{ C_1 = E_K(IV + 1) \oplus P_1 }[/math]

[math]\displaystyle{ C_2 = E_K(IV + 2) \oplus P_2 }[/math]

[math]\displaystyle{ ... }[/math]

CTR is parallelizable in encryption and decryption, and unlike in ECB, the ciphertext will not leak any information about the underlying plaintext. However, an adversary can modify the underlying plaintext by trivially modifying the ciphertext.

Galois/Counter Mode (GCM)

However, the underlying ciphertext modifications can be patched by adding an integrity check, like a signature. This is what GCM implements: an authenticated encryption scheme and a derivation of CTR mode. It achieves a level of authenticity by calculating a tag that can only be calculated by the individuals who have access to the public key and ciphertext (the IV is assumed to be public).

The tag is typically calculated by doing multiplicative operations using Galoid fields on value [math]\displaystyle{ H = E_K(0) }[/math] and each ciphertext block ([math]\displaystyle{ C_i }[/math]) and the length of the ciphertext ([math]\displaystyle{ |C| }[/math]). All those values are then summed together (XORed) and added (XORed) to [math]\displaystyle{ E_K(IV) }[/math]. The resulting calculation for the tag is:

[math]\displaystyle{ T=C_0H^{N+1}+C_1H^n+C_2H^{n-1}+...+C_nH^2+|C|H+E_K(IV) }[/math] where [math]\displaystyle{ n }[/math] is the number of blocks

This tag has proved to be feasibly untamperable by an adversary, given they only know [math]\displaystyle{ C }[/math], [math]\displaystyle{ IV }[/math], and the tag. However, if the adversary can find what [math]\displaystyle{ H }[/math] is, then they can change the plaintext and modify the tag trivially. This can happen if the same [math]\displaystyle{ IV }[/math] were to be used:

[math]\displaystyle{ T_1 = C_0H^{N+1}+C_1H^n+C_2H^{n-1}+...+C_nH^2+|C|H+E_K(IV) }[/math]

[math]\displaystyle{ T_2 = C'_0H^{N+1}+C'_1H^n+C'_2H^{n-1}+...+C'_nH^2+|C'|H+E_K(IV) }[/math]

[math]\displaystyle{ T_1 \oplus T_2 = (C_0H^{N+1}+C_1H^n+C_2H^{n-1}+...+C_nH^2+|C|H+E_K(IV))+(C'_0H^{N+1}+C'_1H^n+C'_2H^{n-1}+...+C'_nH^2+|C'|H+E_K(IV)) }[/math]

[math]\displaystyle{ = (C_0H^{N+1}+C_1H^n+C_2H^{n-1}+...+C_nH^2+|C|H \oplus E_K(IV))+(C'_0H^{N+1}+C'_1H^n+C'_2H^{n-1}+...+C'_nH^2+|C'|H \oplus E_K(IV)) }[/math] since addition in Galoid fields is XOR

[math]\displaystyle{ = (C_0H^{N+1}+C_1H^n+C_2H^{n-1}+...+C_nH^2+|C|H)+(C'_0H^{N+1}+C'_1H^n+C'_2H^{n-1}+...+C'_nH^2+|C'|H) \oplus E_K(IV)\oplus E_K(IV) }[/math] since XOR is associative

[math]\displaystyle{ = (C_0H^{N+1}+C_1H^n+C_2H^{n-1}+...+C_nH^2+|C|H)+(C'_0H^{N+1}+C'_1H^n+C'_2H^{n-1}+...+C'_nH^2+|C'|H) }[/math]

[math]\displaystyle{ = (C_0+C'_0)H^{N+1}+(C_1+C'_1)H^n+(C_2+C'_2)H^{n-1}+...+(C_n+C'_n)H^2+(|C|+|C'|)H }[/math]

[math]\displaystyle{ H }[/math] can now be solved by trivially calculating the root of the above polynomial.

Therefore, in order to prevent such an attack, [math]\displaystyle{ IV }[/math] must not be reused.

Cipher Block Chaining (CBC)

CBC mode uses previously encrypted blocks to "chain" or entangle future blocks. This can be done simply by XORing the ciphertext of the previous block with the current block's plaintext. A random IV is typically used instead of the first block in plaintext.

[math]\displaystyle{ C_0 = E_K(P_0 \oplus IV) }[/math]

[math]\displaystyle{ C_1 = E_K(P_1 \oplus C_0) }[/math]

[math]\displaystyle{ C_2 = E_K(P_2 \oplus C_1) }[/math]

[math]\displaystyle{ ... }[/math]

Due to the dependencies of needing the previous ciphertext block, encryption is not parallelizable, but decryption is.

In early versions of TLS (<= TLS 1.0/SSL 3.0), the IV used was implicit (as future IVs were the previous ciphertext block). This makes the basic version of CBC vulnerable to CPA, as demonstrated in the BEAST attack. In TLS 1.1, this was fixed by first encrypting the initial IV and using the encrypted mask in the first block instead of the raw IV.

Stream Ciphers

Stream ciphers are symmetric key cryptosystems that use a generator to generate pseudorandom values to encrypt a message. They are heavily inspired by the One-Time Pad cipher. While they do not provide perfect secrecy because the key size is typically much smaller than the plaintext message, secure steam ciphers are all semantically secure (i.e., two messages encrypted with the same key are indistinguishable from each other).

An example of a stream cipher using a pseudorandom number generator (PRNG) [math]\displaystyle{ G }[/math]:

[math]\displaystyle{ E_k(m) = m \oplus G(k) }[/math]

[math]\displaystyle{ D_k(c) = c \oplus G(k) }[/math]

In the context of cryptography, a PRNG [math]\displaystyle{ G }[/math] is considered secure if and only if:

[math]\displaystyle{ \{k \leftarrow \{0,1\}^\lambda:G(k)\} \approx_c \text{uniform}\{\{0,1\}^{\lambda n}\} }[/math] where [math]\displaystyle{ n }[/math] is the stretch factor of [math]\displaystyle{ G }[/math] (i.e. [math]\displaystyle{ n = |G(k)|/|k| }[/math])

In other words, [math]\displaystyle{ G }[/math] must be computationally indistinguishable from a true uniformly random string; the difference in probabilities between an adversary being able to guess between a PRNG and random bit sequence is a negligible inverse non-polynomial probability.

Given that [math]\displaystyle{ G }[/math] is a secure cipher, it can be proved that the above steam cipher example is semantically secure.

Elliptic Curve Cryptography

Elliptic Curve Cryptography proposes a new way of generating finite cyclical groups. It serves as an alternative to using the [math]\displaystyle{ (\Z_p)^* }[/math] domain for public key encryptions, key exchanges, and signatures, as RSA and Diffie-Hellman tend to require substantially large security parameters compared to, say, AES.

Bit lengths of public-key algorithms with different security levels
Algorithm family Cryptosystem Security level (bit)
80 128 192 256
Integer factorization RSA 1024 3072 7680 15360
Discrete logarithm DH, DSA, El Gamal 1024 3072 7680 15360
Elliptic curves ECDH, ECDSA 160 256 384 512
Symmetric-key AES, 3DES 80 128 192 256

Defining an Elliptical Curve

The elliptical curve over [math]\displaystyle{ \Z_p }[/math], where prime [math]\displaystyle{ p \gt 3 }[/math], is the set of all coordinate pairs from the equation:

[math]\displaystyle{ y^2 \equiv x^3 + ax + b \text{ mod } p }[/math]

including an imaginary point at infinity [math]\displaystyle{ \mathcal{O} }[/math] where:

[math]\displaystyle{ a, b \in \Z_p }[/math] and [math]\displaystyle{ 4 a^3 + 27b^2 \not\equiv 0 \text{ mod } p }[/math]

If the elliptic curve does not satisfy the above definition, it is not suitable for use in cryptography.

Finding a cyclic group

A cyclic group [math]\displaystyle{ G }[/math] must satisfy the following criteria:

  1. It is composed of a set of elements
  2. The group operation used must satisfy the group laws
    1. Closedness ([math]\displaystyle{ \forall x,y,z \in G\text{, }x \circ y \equiv z }[/math])
    2. Associativity ([math]\displaystyle{ \forall x,y,z \in G\text{, }x \circ (y \circ z) \equiv (x \circ y) \circ z }[/math])
    3. Identity ([math]\displaystyle{ \forall x \in G\text{, }x \circ N \equiv x }[/math] where [math]\displaystyle{ N }[/math] is the neutral element)
    4. Inverseness ([math]\displaystyle{ \forall x \in G\text{, }x \circ x^{-1} \equiv N }[/math] where [math]\displaystyle{ N }[/math] is the neutral element and [math]\displaystyle{ x^{-1} }[/math] represents the inverse of [math]\displaystyle{ x }[/math])

If it just so happens that the group operation is also communicative ([math]\displaystyle{ \forall x, y \in G \text{, } x + y \equiv y + x }[/math]), then the group is an Abliean group.

In order to generate a cyclical group from an elliptic curve, we use two operations:

  1. Point addition ([math]\displaystyle{ P + Q }[/math])
  2. Point doubling ([math]\displaystyle{ 2P }[/math] or [math]\displaystyle{ P + P }[/math])

Do take note that point addition is NOT the same as arithmetic addition, as will be explained later.

Formally, the operations are as follows:

Given points [math]\displaystyle{ P = (x_1, y_1) }[/math], [math]\displaystyle{ Q = (x_2, y_2) }[/math] on an elliptic curve [math]\displaystyle{ E = y^2 \equiv x^3 + ax + b \text{ mod }p }[/math]:

[math]\displaystyle{ x_3 = s^2 - x_1 - x_2 \text{ mod } p \\ y_3 = s(x_1 - x_3) - y_1 \text{ mod } p }[/math]

where [math]\displaystyle{ s = \begin{cases} \frac{y_2 - y_1}{x_2 - x_1} & P \neq Q \text{ (point addition)} \\ \frac{3x_1^2 + a}{y_1} & P = Q \text{ (point doubling)} \\ \end{cases} }[/math]

The resulting point [math]\displaystyle{ (x_3, y_3) }[/math] lands on the same elliptic curve (closeness), and the operation fulfills associativity.

The neutral element of the group will be an imaginary point at infinity symbolled as [math]\displaystyle{ \mathcal{O} }[/math]:

[math]\displaystyle{ \forall P \in E \text{, } P + \mathcal{O} = P }[/math]

The inverse of each point [math]\displaystyle{ P = (x_1, y_1) }[/math] is equal to [math]\displaystyle{ (x_1, -y_1) }[/math], which is denoted as [math]\displaystyle{ -P }[/math], which is not to be confused by the algebraic negation:

[math]\displaystyle{ \forall P \in E \text{, } P + (-P) = \mathcal{O} }[/math]

With the following operation on elliptic curves, the following theorem holds:

The points on an elliptic curve, including [math]\displaystyle{ \mathcal{O} }[/math], have cyclic subgroups. Under certain conditions, all points on an elliptic curve form a cyclic group.

A geometrical representation

Please note that graphing points will not work when working in a modulo space (e.g., [math]\displaystyle{ \Z_p }[/math]), which is what computers implement in practice. This is just to provide a visual understanding of what each operation does. If you want to programmatically implement the elliptic curve operations, follow the formal definition above.

Point addition

Point addition visually draws an intersection line that contains the two given points, [math]\displaystyle{ P }[/math] and [math]\displaystyle{ Q }[/math]. Due to the nature of the elliptical curve, the line will intersect as three locations on the curve, two of which were the given points. The remaining intersection point is flipped over the x-axis, and the result is the operation [math]\displaystyle{ P+Q }[/math].

Abstract drawing demonstrating point addition on an elliptical curve. Drawing not to scale.

Point doubling

While point doubling doesn't seem intuitively similar to point addition, it performs an almost identical operation. Instead, if we want to calculate [math]\displaystyle{ 2P }[/math] or [math]\displaystyle{ P+P }[/math], we take the tangent line at point [math]\displaystyle{ P }[/math]. Due to the nature of the elliptical curve, the tangent will have only two locations on the curve, one of which is [math]\displaystyle{ P }[/math]. The remaining intersection point is flipped over the x-axis, and the result is the operation [math]\displaystyle{ 2P }[/math].

Abstract drawing demonstrating point doubling on an elliptical curve. Drawing not to scale.

Elliptic Curve Discrete Log Problem

Elliptic Curve Diffie-Hellman

Post Quantum Cryptography

Quantum computers can break some cryptographic functions (e.g. prime factorization through Shor's algorithm). Post-quantum cryptography uses properties of quantum mechanics, such as entanglement, to protect against quantum computer attacks.

Shor's Algorithm

Shor's algorithm utilizes classical algorithms which would be computationally slow when run with classical computers. However, when this algorithm is used with a quantum computer, it can factor general prime numbers in a computationally fast manner. It is only able to do so through a reduction, which is the redefinition of a problem in terms of an "easier" problem (that is, one that has a more optimal computational complexity). In particular, Shor's algorithm reduces the factoring problem into a problem relying primarily on greatest common divisor (GCD) computation (usually can be performed quickly using the Euclidean algorithm), modular exponentiation (which has many properties that allow for it to be done relatively quickly using quantum circuit implementations), and period-finding (the focus of Shor's algorithm as it is where the biggest speedup occurs). The itself algorithm is briefly described below.

First, a prime number [math]\displaystyle{ N }[/math] is chosen, along with an arbitrary [math]\displaystyle{ a \in \left(1, N\right) }[/math]. If [math]\displaystyle{ \text{gcd}\left(a, N\right) = p \neq 1 }[/math] (that is, [math]\displaystyle{ a }[/math] is not coprime with [math]\displaystyle{ N }[/math] and shares a factor [math]\displaystyle{ p }[/math] which isn't [math]\displaystyle{ 1 }[/math]), then [math]\displaystyle{ N }[/math] has been factored. If it is not, the next step is to find the period [math]\displaystyle{ r }[/math] of [math]\displaystyle{ a^x\ \text{mod}\ N }[/math] using period-finding, where [math]\displaystyle{ r }[/math] must be even. If it is not, a different [math]\displaystyle{ a }[/math] must be chosen and the process must be restarted. Since, by definition, [math]\displaystyle{ a^r = 1\ \text{mod}\ N }[/math], [math]\displaystyle{ a^r - 1 }[/math] must be a multiple of [math]\displaystyle{ N }[/math]. Although this all may not seem very useful, the magic of Shor's algorithm lies in the following finding: we may factor [math]\displaystyle{ a^r - 1 }[/math] into [math]\displaystyle{ \left(a^{r/2} - 1\right) \cdot \left(a^{r/2} + 1\right) }[/math]. The first term cannot be equivalent to [math]\displaystyle{ a^r - 1 }[/math], so it cannot be a multiple of [math]\displaystyle{ N }[/math]. To continue, it is assumed that the second term is also not a multiple of [math]\displaystyle{ N }[/math] (if it were, a different [math]\displaystyle{ a }[/math] would have to be chosen and the process would be to be restarted). Since neither terms of this factoring are multiples of N, but their product [math]\displaystyle{ a^r - 1 }[/math] is a multiple of N, they must each consist of a distinct prime factor that is also a factor of [math]\displaystyle{ N }[/math] since, for a number to be a multiple of another number, it must have the same factors and more. If either [math]\displaystyle{ a^{r/2} - 1 }[/math] or [math]\displaystyle{ a^{r/2} + 1 }[/math] are prime numbers, then that must be one of the prime factors, which can easily be verified.

Although one execution of this algorithm does not immediately result in all (or even any necessarily) of the prime factors of [math]\displaystyle{ N }[/math], knowing that [math]\displaystyle{ a^{r/2} \pm 1 }[/math] is/are factor(s) of [math]\displaystyle{ N }[/math] provides a "better guess" for the prime factors of [math]\displaystyle{ N }[/math]. Thus, the entire algorithm can be repeated, instead starting from [math]\displaystyle{ \text{gcd}\left(a^{r/2} \pm 1, N\right) }[/math]. Although this may seem like a computationally complex step, it is relatively fast. Instead, the slowdown occurs in finding the period [math]\displaystyle{ r }[/math]. In order to do so, [math]\displaystyle{ a }[/math] must be raised to every number between [math]\displaystyle{ 1 }[/math] and [math]\displaystyle{ N }[/math]. Shor's Algorithm is centered around the quantum circuit implementation for period-finding, which involves the use of the quantum Fourier transform (simply the classical Fourier transform applied to superpositions) in order to find the frequency of [math]\displaystyle{ a^x\ \text{mod}\ N }[/math]. Since the period is related to the frequency, this allows for the computer to calculate the period of the expression very quickly by scanning through all possible powers and finding the frequency at which these powers they occur all at once using quantum superposition states instead of calculating each separately and finding which one is equal to [math]\displaystyle{ r }[/math].

Web architecture

HTML/CSS/JS

The World Wide Web Consortium (W3C) is an international council that standardizes web architecture through their specifications for HTML, CSS, and Javascript. The Web Hypertext Application Technology Working Group (WHATWG), of which Google, Mozilla, Apple, and Microsoft are leading members, also standardizes many aspects of web architecture. WHATWG specifications are the widely-accepted standard for HTML/CSS/JS development as they are frequently-updated. Mozilla's web docs tend to be the most up-to-date repository for web technologies information. W3Schools and Stack Overflow often don't follow industry best practices. However, it is worth noting that standards are not followed by all browsers, even major ones such as Safari. As a result, cross-browser compatibility is a major issue that must be considered when developing web applications, given that browser developers are free to implement whichever standards they choose however they want.

APIs

Application Programming Interfaces (APIs) are connections (interfaces) between two or more endpoints, such as programs or systems, that allow for the endpoints to communicate and share functionality. In the context of web browsers, many APIs exist, such as the File API (which allows websites to interact with a device or browser's file system), the Sound API (which allows web developers to interact with a device's audio input and output hardware), graphics APIs (which allow websites to interact with a device's graphics hardware in order to display complex graphics), and the Notifications API (which allows websites to interact with the device's notification manager to send custom notifications triggered by the website or its server). APIs are not inherently supported on a website, browser, or even device. Rather, the website must require access to or include the API library in its code, the browser must implement this functionality, and the device must have the functionality needed (if the API in question requires device access).

One of the major issues concerning API security revolves around improper communications and insecure requests made through APIs, especially those that give the website external access, such as the ability to communicate with other websites or the user's device. In fact, unintended bugs in any piece of code could have adverse effects on other associated processes. In order to protect against these vulnerabilities, modern web browsers, such as those built on top of Chromium, use a method known as "sandboxing", where any insecure processes, including websites and API implementations, run in isolated environments known as sandboxes, which cause no harm to other processes if they are damaged or compromised. To interact with anything outside of its own sandbox, a process must use the functionality implemented and granted by the browser's inter-process communication (IPC) system. Although this is perfect in theory, it is not sufficient in developing a secure browser, especially as many browsers knowingly put multiple processes into the same sandboxes in order to meet system performance requirements.

Protocols

The Hypertext Transport Protocol (HTTP) is a protocol used by websites and other programs to communicate through the internet. Modern websites use HTTPS, which is simply an extension of HTTP that incorporates encryption through Transport Layer Security (TLS) in order to make HTTP requests more secure.

Requests

Requests are messages sent (usually) by user-end applications to some external endpoint, such as a server. There are multiple types of requests, known as methods or verbs.

  • GET - Usually used to retrieve information from the endpoint, such as data in a database
  • POST - Usually used to send information to the endpoint (more specifically, POST is usually used to create new entries of data)
  • PUT - Usually used to send updated information to the endpoint (more specifically, PUT is usually used to replace an existing entry of data)
  • PATCH - Usually used to send modifications to the endpoint (more specifically, PATCH is usually used to modify portions of an existing entry of data)
  • DELETE - Usually used to request that a specific piece of data be deleted from the endpoint

Resources

Cybersecurity Rules - Southern California Trial Event (9.30.2020)
USACO
PicoCTF
CyberPatriot
A Graduate Course in Applied Cryptography
Understanding Cryptography