Disease Detectives

From Wiki - Scioly.org
Jump to navigation Jump to search

Disease Detectives is an event in Division B and Division C that focuses on epidemiology, the study of the distribution and determinants of health conditions in specified populations and the application of that study to control those health problems.

Focus Topics

Disease Detectives used to rotate between topics every two years. The topic typically had a minor impact on the content of the event, usually only affecting the specific types of diseases used in problems. Now, however, Disease Detectives includes diseases from all topics. Therefore, there is currently no specific topic focus.

Year Topic
2018 Food Borne Illness
2017
2016 Population Growth
2015
2014 Environmental Quality
2013
2012 Food Borne Illness
2011
2010 Population Growth
2009
2008 Environmental Quality
2007

Epidemiology

This event is all about epidemiology, but what is epidemiology all about? You can find wordier definitions elsewhere, but at its core, it’s the study of what causes disease, how disease is distributed, and how we can control the spread of disease.


It’s often useful to classify epidemiology according to several systems:

Classical vs clinical

Classical epidemiology is population oriented and studies the community origins of health problems related to nutrition, environment, and human behavior.
Clinical epidemiology studies patients in health care settings in order to improve diagnosis, treatment, and prognosis.
Infectious Disease Epidemiology - heavily dependent on laboratory support
Chronic Disease Epidemiology - dependent on complex sampling and statistical methods

Descriptive vs analytic

Descriptive epidemiology considers time/place/person data on a disease to determine trends. Answers questions like who/what/where/when.
Analytic epidemiology involves hypothesis testing to determine the cause of disease (causal relations). Answers questions like why/how.


There are many important epidemiology terms you should know for this event:

Basic Epidemiology Terms

Term Definition
Disease Infection that results in signs (objective) and symptoms (subjective).
Opportunistic disease A disease that causes sickness when given the opportunity of a damaged

or weakened immune system.

Noscomial disease An infection that is acquired in a hospital.
Iatrogenic disease An illness that is caused by medication or a physician.
Chronic infection An infection where the agent is continuously present and detectable.
Latent infection An infection where the agent is continuously present, but can remain

dormant before reactivation.

Incubation period Time in between when a person comes into contact with an agent of

disease and when they first show symptoms or signs of disease.

Latent period Time in between when a person comes into contact with a pathogen and

when they become infected.

Asymptomatic Displays no signs or symptoms, but is infected and can carry the disease
Susceptibility To what extent a member of a population is able to resist infection
Susceptible individual A member of a population at risk of becoming infected by a disease.
Pathogenicity The property of causing disease following infection.
Virulence The property of causing *severe* disease.
Infectivity The property of establishing infection following exposure.
Morbidity The rate of disease in a population.
Mortality The rate of death in a population.
Case fatality rate The rate of death due to a disease in the diseased population.
Prevalence The number of existing cases of disease in a given population.
Point prevalence The number of existing cases of disease in a given population at a given

point in time.

Period prevalence The total number of cases of disease in a given population over a period

of time.

Incidence The rate of new cases of disease in a given population over a period of

time.

Attack rate The number of people infected, divided by the total sample. There

should be a high attack rate in those exposed and a low attack rate in

those unexposed.

Person-time The sum of the time during which each individual in a population was

at risk for a disease.

Index case Also known as “patient zero”; the first case of a disease in a specific setting.
Etiology The cause of a disease.
Pathology The science of the study and diagnosis of disease and injury.
Determinant Any factor that brings about change in a health condition.
Herd immunity A critical proportion of a population is immune to a disease such that the entire

population is protected.

Fulminant A sudden and severe onset.
Quarantine When you may have been exposed.
Isolation When you have been exposed.

Outbreak investigation

The Steps of Outbreak Investigation

Depending on the source, you will find 10 or 13 steps of outbreak investigation. The following are the 13 steps:

  1. Prepare for field work
    • Research the disease or situation and gather needed supplies and equipment to conduct the investigation.
    • Make official administrative and personal travel arrangements.
    • Follow protocol and contact all parties to determine roles and local contacts.
  2. Establish the existence of an outbreak
    • Consider severity, potential for spread, public concern, and availability of resources.
  3. Verify the diagnosis
    • Verify the procedures used to diagnose the problem and check methods used for identifying infectious and toxic chemical agents.
    • Be sure that the increased number of cases is not due to experimental error.
    • Interview several persons who became ill to gain insight concerning possible cause, source, and spread of disease or problem.
    • Need to screen ill persons, collect clinical and environmental samples and get them tested in order to determine agent.
  4. Construct a working case definition
    • Establish with the 4 components or standard criteria for determining who has the disease or condition:
      • Clinical information about the disease or condition
      • Characteristics of the affected people
      • Location or place
      • Time sequence
    • Identification of specific cases
      • Confirmed cases: lab confirmation combined with signs and symptoms.
      • Probable cases: signs and symptoms but no lab confirmation.
      • Possible cases: some signs and symptoms, but unclear.
    • Line listing
      • A chart of cases which includes: identifying information, clinical information, time, person, place, and risk factors.
  5. Find cases systematically and record information
    • For each case, the following information should be collected: identifying information, demographic information, clinical information, risk factor information, and reporter information.
  6. Perform descriptive epidemiology
    • Consider the time, place, and person of an outbreak. This can involve epi curves (time), spot maps (place), and case information (person).
  7. Develop hypotheses
    • Use the agent/host/environment triad to create a hypothesis.
  8. Evaluate hypotheses epidemiologically
    • Perform studies (case-control or cohort) to validate a hypothesis.
  9. Reconsider, refine, and re-evaluate hypotheses
    • Use experimental data to narrow the hypothesis.
  10. Compare and reconcile with laboratory and/or environmental studies
    • Laboratory evidence is necessary to confirm a hypothesis.
  11. Implement control and prevention measures
    • Should be performed as soon as possible (when the source is known).
    • The goal of control and prevention measures is to disrupt the chain of transmission.
  12. Initiate or maintain surveillance
    • Evaluate the success of control and prevention measures.
  13. Communicate findings
    • Can include an oral briefing, written report, PSA, etc.

PulseNet and PFGE

PulseNet is a network of labs across the US that helps epidemiologists identify new foodborne disease outbreaks, automatically reporting data in real time to the CDC. PulseNet relies on two key premises: early detection/reporting, and hyper-specific DNA fingerprinting. Because of its rapidness and precision, PulseNet is able to facilitate implementation of specific control and prevention measures, stopping the spread of outbreaks.


The specific steps of PulseNet surveillance are key to its rapidness. You should know the sequence of these steps and what they are, though no test should be asking you about the “5th step of PulseNet”. The following is directly from the CDC website and lightly edited:

  1. A person falls ill and visits a doctor.
  2. The doctor suspects a foodborne illness and asks for a sample (usually a stool sample), which they send to a laboratory at a clinic or hospital.
  3. The clinical laboratory processes the sample to isolate the bacteria that is making the patient ill—for example, Salmonella.
  4. The clinical laboratory notifies the doctor, who will tell the patient and discuss treatment options. The clinical laboratory also sends the sample to a local or state public health laboratory.
  5. The public health laboratory determines what kind of Salmonella it is.
  6. The public health laboratory produces a DNA fingerprint of the bacteria, traditionally using a process called pulsed-field gel electrophoresis (PFGE), though now using whole-genome sequencing (WGS) to get the unique pattern.
  7. The public health laboratory uploads the pattern into an electronic database in its laboratory and also to the national databases at CDC.
  8. Microbiologists and epidemiologists review the laboratory reports to decide whether something unusual or unexpected is occurring that warrants further investigation.
  9. Epidemiologists interview the patients. These interviews may gather simple facts of their illness, or they may ask about everything they ate before they got sick, where they have been, and other questions that might provide a clue as to what caused their illness.
  10. The microbiologists continuously search the databases for identical patterns. These groups of matching patterns, called clusters, spur investigations by local, state, and national agencies to identify the source of the outbreak.


The original DNA fingerprinting technique used by PulseNet was PFGE—Pulsed Field Gel Electrophoresis. All DNA gel electrophoresis techniques rely on the interactions between negatively charged DNA molecules with an electric field. In PFGE, the electric field alternates direction and the DNA segments it is applied to are large—on the order of 0-10 mb long (traditional gel electrophoresis separates DNA segments 0-20 kb in length). PFGE gels look like the figure below:

An example PFGE gel. Image credit: CDC


The ladder—a sample of DNA segments of known length—is marked as λ in the figure above. Notice how lanes 1-22 and 24-25 have identical patterns, but lanes 23 and R6 are different. You can conclude that individuals 1-22 and 24 and 25 are infected with the same serotype of the infectious agent—the bacteria have identical genotypes.


In a PulseNet investigation, a computer program would be used to analyze these gels quickly and accurately. While PFGE is generally highly effective at distinguishing between different serotypes of an infectious agent, Whole Genome Sequencing (WGS) is even more precise and is now the PulseNet standard.


WGS relies on Next Generation Sequencing (NGS) techniques to generate a DNA fingerprint that is literally the *whole* genome of the agent, enabling almost base pair level precision in distinguishing between different serotypes. Although there have been massive improvements to the accuracy of NGS techniques in recent years, they still aren’t perfect, and lag behind the accuracy of more traditional sequencing techniques (i.e. Sanger sequencing). Overall, however, WGS can separate more accurately AND precisely different serotypes than PFGE.

Surveillance

Purpose of Surveillance

The purpose of surveillance is to gain knowledge of patterns of disease, injury, or other health problems in a community for prevention and control purposes. Surveillance is necessary to influence public health decisions and evaluate control measures.

Five Step Process for Surveillance

  1. Identify, define, and measure the health problem of interest
  2. Collect and compile data about the problem (and if possible, factors that influence it)
  3. Analyze and interpret these data
  4. Provide these data and their interpretation to those responsible for controlling the health problem
  5. Monitor and periodically evaluate the usefulness and quality of surveillance to improve it for future use (surveillance of a problem often does not include actions to control the problem),

https://www.cdc.gov/csels/dsepd/ss1978/lesson5/index.html

Another alternate version of five steps is often used as well. Either may appear in events; if unsure, go with the one above, as many tests tend to not include action as part of surveillance.

  1. Data Collection - reports, electronic and vital records, registries, and surveys.
  2. Data Analysis - ideally analyzed by location to find illness’s location so resources are sent there.
  3. Data Interpretation - identifying person, place, time to find how and why health event happened.
  4. Data Dissemination (Distribution) - announcements, reports, articles and media → to important people and public.
  5. Link to Action - without action, no real purpose (this version includes taking action),

https://www.soinc.org/sites/default/files/uploaded_files/20_DD_HANDOUT_PART_1_0.pdf

Types of Surveillance

  • Passive - Diseases are reported by healthcare providers. This type of surveillance, though simple and inexpensive, is often limited by incomplete reporting and quality variation in reporting.
  • Active - Health agencies contact health provers seeking reports. This ensures more complete reporting of conditions. Active surveillance is often used with a specific epidemiological investigation or during an outbreak.
  • Syndromic - Signs of the disease (such as school absences or prescription drug sales) are monitored as a proxy for the disease itself. The symptom must be infrequent and severe enough to warrant investigation of each identified case, and must be unique. This form of surveillance is often used when timeliness is key, diagnosis is difficult or time-consuming, or when detecting and defining the scope of an outbreak.
  • Sentinel - Professionals selected to represent a specific geographic area or group report health events to health agencies. This is used when high-quality data can't be obtained through passive surveillance. It involves monitoring trends or key health indicators and a limited network of reporting sites. Advantages include being able to implement intervention earlier and not being as reliant on doctors to diagnose disease. One downside to sentinel surveillance is that it's not as effective for detecting rare diseases or diseases that occur the outside the catchment areas of the sentinel sites.

How to prove x caused y, or Causation

Hill's Criteria for Causation

Nine criteria must be met to establish a cause-and-effect relationship. This is commonly known as Hill's Criteria for Causation:

  1. Strength of Association - relationship is clear and risk estimate is high
  2. Consistency - observation of association must be repeatable in different populations at different times
  3. Specificity - a single cause produces a specific effect
  4. Alternative Explanations - consideration of multiple hypotheses before making conclusions about whether an association is causal or not
  5. Temporality - cause/exposure must precede the effect/outcome
  6. Dose-Response Relationship - an increasing amount of exposure increases the risk
  7. Biological Plausibility - the association agrees with currently accepted understanding of biological and pathological processes
  8. Experimental Evidence - the condition can be altered, either prevented or accelerated, by an appropriate experimental process
  9. Coherence - the association should be compatible with existing theory and knowledge, including knowledge of past cases and epidemiological studies

Hill's Criteria for Causation Explanations and History

Koch's Postulates

These are criteria designed to establish a causal relationship between a causative microbe and a disease.

  1. The microbe must be present in abundance in all cases of the disease, but not in healthy organisms.
  2. The microbe must be isolated from the diseased organism and grown in pure culture.
  3. The cultured microorganism should cause disease when introduced into a healthy organism.
  4. The microbe must be reisolated from the inoculated, diseased experimental host and identified as identical to the original specific causative agent.

Koch's Postulates has limitations because many pathogens do not fulfill all of the criteria. For example, viruses can't be grown in pure culture, a pathogen could cause multiple diseases, and a disease could be caused by multiple pathogens, asymptomatic carriers can exist, noninfection upon exposure can occur, etc.

Evan's Postulates

  1. The prevalence of the disease should be significantly higher in those exposed to the risk factor than those not.
  2. Exposure to the risk factor should be more frequent among those with the disease.
  3. In prospective studies, the incidence of the disease should be higher in those exposed to the risk factor.
  4. The disease should follow exposure to the risk factor with a normal or log-normal distribution of incubation periods.
  5. A spectrum of host responses along a logical biological gradient from mild to severe should follow exposure to the risk factor.
  6. A measurable host response should follow exposure to the risk factor in those lacking a response before the exposure or increase the response in those with a response before exposure. A host response should be infrequent in those not exposed to the risk factor.
  7. In experiments, the disease should occur more frequently in those exposed to the risk factor than in the control group.
  8. Reduction or elimination of the risk factor should reduce the risk of disease.
  9. Modifying or preventing host response should eliminate or decrease disease.
  10. All findings should make biological and epidemiological sense.

Types of Carriers/Vectors

Convalescent - Humans are also capable of spreading disease following a period of illness, typically thinking themselves cured of the disease
Incubatory - When an individual transmits pathogens immediately following infection but prior to developing symptoms
Chronic - Someone who can transmit a disease for a long period of time
Genetic - has inherited a disease trait but shows no symptoms
Transient/Temporary - Someone who can transmit an infectious disease for a short amount of time

Epidemiological Triads

Epidemiologists use two triads. The first is the foundation for descriptive epidemiology - person, place and time. The second is described in the next section.

Chain of Transmission Triad

This is another common triad, which is an altered form of the Chain of Infection described below. It is a companion to the Epidemiological Triad. It also has three components:

  1. An external agent
  2. A susceptible host for the disease
  3. The environment where the host comes into contact with the agent

This is used to define the major points of a disease case.

Epidemiological Studies

Basic Studies

Ecological - looks for differences between groups of people with a shared characteristic rather than individuals
Cross Sectional - a survey, health questionnaire, "snapshot in time"
Case-Control - compare people with and without disease to find common exposures
Cohort - compare people with and without exposures to see what happens to each. Can be prospective or retrospective.
Randomized Controlled Trial - human experiment that randomly assigns participants to an experimental or control group
Quasi Experiments - research similarities with traditional experimental design or RCT, but lack element of random assignment to treatment/control; participants are assigned a group based on non-random criteria

Advantages and Disadvantages to Study Designs

Study Designs Advantages Disadvantages
Trial
Most Scientifically Sound
Best Measure of Exposure
Time Consuming
Unethical for Harmful Exposures
Most Expensive
Cohort Study
Most Accurate Observational Study
Good Measure of Exposure
Correct Time Sequence
Good for Rare Exposures
Easy Risk Calculation
Time Consuming
Expensive
Bad for Rare Diseases
Possible Loss of Follow-up
Case-Control Study
Can Study Rare Diseases
Relatively Less Expensive and Relatively Fast
Good for Rare Diseases
Good for Long Latency Periods
Possible Time-Order Confusion
Error in Recalling Exposure
Only 1 outcome
Hard to find good controls
Cross-Sectional Study
Fastest
Least Expensive
Good for More Than 1 Outcome
Possible Time-Order Confusion
Least Confidence in Findings

2x2 Table

A table which has two columns and rows for people with or without exposure and with or without disease; shows the number of people with each characteristic.

Disease No Disease
Exposure a b
No Exposure c d

Using the 2*2 Table, we can calculate the odds ratio and relative risk. These calculations allow comparisons between the case (group of people with disease) and control (group of people with very similar characteristics to case but with no disease). One is the neutral value and means that there is no difference between the groups compared; when the value is greater than one it means that there has been some difference between the two groups, whether it was caused by bias, chance, or an actual relationship between the exposure and outcome is yet to be seen. The P-value is the measure of how confident you are that your findings are correct. You can only trust your findings to be correct if the P-value is less than .05.

Odds Ratio - used in case-control study, [math]\displaystyle{ a \cdot d \over b \cdot c }[/math]

Relative Risk - used in cohort study, [math]\displaystyle{ a/(a+b) \over c/(c+d) }[/math]

Attack Rate - the rate that a group experienced an outcome or illness equal to the number sick divided by the total in that group. (There should be a high attack rate in those exposed and a low attack rate in those unexposed.)

For the exposed: [math]\displaystyle{ \frac{a}{a+b} }[/math]
For the unexposed: [math]\displaystyle{ \frac{c}{c+d} }[/math]

Chi-Square - used to determine the statistical significance of the difference indicated by the relative risk or odds ratio. Chi-Square compares your observed values (a, b, c, and d) with the expected values for those same groups.


The expected value for a group is calculated by multiplying the column and row total, and then dividing by the overall total. Take group a (disease and exposed) for example: [math]\displaystyle{ \frac{(a+b)(b+c)}{a+b+c+d} }[/math]. After obtaining the expected values, you can use the equation [math]\displaystyle{ \sum{\frac{(observed-expected)^2}{expected}} }[/math] to find the Chi-Square value. To determine the significance of this number, you must find the P-value using a table. The P-value is the measure of how confident you are that your findings are not due to chance; for example, a P-value of 0.01 means there is a 1% chance your results were a result of random fluctuations as opposed to a significant effect. The alpha number is a predetermined cutoff for the P-value, usually 0.05 (5%). If the P-value is less than alpha, the data is significant

Using Epi-Curves

An epi-curve is a histogram that shows the course of an outbreak by plotting the number of cases of a condition according to the time of onset. Epi-Curves fall into three classifications:

Point source epidemics occur when people are exposed to the same exposure over a limited, well defined period of time. The shape of the curve commonly rises rapidly and contains a definite peak, followed by a gradual decline.


Nh mosquito epicurve week53.jpg

Continuous common source epidemics occur when the exposure to the source is prolonged over an extended period of time and may occur over more than one incubation period. The down slope of the curve may be very sharp if the common source is removed or gradual if the outbreak is allowed to exhaust itself.

Co mosquito epicurve week53.jpg


Propagated (progressive source) epidemics occur when a case of disease serves later as a source of infection for subsequent cases and those subsequent cases, in turn, serve as sources for later cases. The shape of this curve usually contains a series of successively larger peaks, reflective of the increasing number of cases caused by person-to-person contact, until the pool of those susceptible is exhausted or control measures are implemented. The distance between these peaks may be a rough indication of the incubation period of the disease. As the outbreak progresses, the peaks flatten out (think of the variance around a mean over multiple generations).

Ga mosquito epicurve week53.jpg


One more type of epi curve that may come up is that of an intermittent source epidemic, where people are intermittently exposed to a source. There are generally multiple peaks in this type of curve. An example of this may be spoiled food at a store giving people an infection, where there are intermittent cases based on when people are exposed to this source.

Validity of Study Results: Error and Bias

Statistical bias, in the mathematical field of statistics, is a systematic tendency in which the methods used to gather data and generate statistics present an inaccurate, skewed or biased depiction of reality.

There are a total of 9 different categories of biases, and they can be found below. There are multiple sub biases in each category.

  1. Selection Biases: Pertaining to how the participants are chosen or retained in a study. This affects the accuracy and applicability of the data.
  2. Performance Biases: Researchers or participants alter their behavior, changing the accuracy of the data.
  3. Measurement Biases: Biases stemming from error in data collection.
  4. Information Biases: Biases stemming from errors in accuracy or lack of completeness in data.
  5. Cognitive Biases: Biases stemming from subjective judgment and irrational decision-making.
  6. Reporting Biases: Biases related to how study findings are presented or disseminated.
  7. Systematic Biases: General errors affecting the entire study framework.
  8. Temporal Biases: Biases related to time in study design.
  9. Miscellaneous Biases: Noncategorized, other biases. Include the Accumulation Effect, Panel Effect, Golem Effect, etc.

There are other factors that impact the validity, accuracy, and applicability. They include, but are not limited to the following:

  • Confounding: A third variable that is related to both the independent and dependent variables, distorting the true relationship between them.
  • Equipment Malfunction: This can be tied to instrument biases.

Error is defined as the difference between a value obtained from a data collection process and the 'true' value for the population.

Validity

External Validity: The extent to which the results of a study can be generalized to, or applied in, settings outside the study, such as other populations, environments, or times.

Internal Validity: The degree to which the results of a study are due to the intervention or treatment itself, rather than other confounding factors or biases.

Disease and Disease Transmission

Natural History and Spectrum of Disease

HistoryofDisease.jpg

The process begins with the appropriate exposure to or accumulation of factors sufficient for the disease process to begin in a susceptible host. For an infectious disease, the exposure is a microorganism. For cancer, the exposure may be a factor that initiates the process, such as asbestos fibers or components in tobacco smoke (for lung cancer), or one that promotes the process, such as estrogen (for endometrial cancer).

After the disease process has been triggered, pathological changes then occur without the individual being aware of them. This stage of subclinical disease, extending from the time of exposure to onset of disease symptoms, is usually called the incubation period for infectious diseases, and the latency period for chronic diseases. During this stage, disease is said to be asymptomatic (no symptoms) or inapparent. This period may be as brief as seconds for hypersensitivity and toxic reactions to as long as decades for certain chronic diseases. Even for a single disease, the characteristic incubation period has a range. For example, the typical incubation period for hepatitis A is as long as 7 weeks. The latency period for leukemia to become evident among survivors of the atomic bomb blast in Hiroshima ranged from 2 to 12 years, peaking at 6–7 years.

The onset of symptoms marks the transition from subclinical to clinical disease. Most diagnoses are made during the stage of clinical disease. In some people, however, the disease process may never progress to clinically apparent illness. In others, the disease process may result in illness that ranges from mild to severe or fatal. This range is called the spectrum of disease. Ultimately, the disease process ends either in recovery, disability or death.

Chain of Infection

Agent leaves reservoir through portal of exit, and is conveyed by some mode of transmission, and enters the appropriate portal of entry to infect a susceptible host.

Chainofinfection.jpg

Agent - A microbial organism with the ability to cause disease.

Reservoir - A place where agents can thrive and reproduce.

Portal of Exit - A place of exit providing a way for an agent to leave the reservoir; the route a pathogen takes out of an infected host. Portals of exit tend to be fairly well defined. What serve as portals of exit are often not terribly surprising, at least, once something is known of how and where a pathogen replicates and enters new hosts. Respiratory infections tend to utilize the mouth and nose as portals of exit. Gastrointestinal diseases tend to exit in feces or saliva, depending on the site of replication. Sexually transmitted diseases tend to have portals of exit at the urethra or genital region. Blood-borne diseases tend to exit via arthropods, needles, bleeding, or hyperdermic syringes. A more general portal of exit occurs when an infected animal is butchered or an infected person undergoes surgery. The three most common portals of exit are the skin, gastrointestinal tract, and respiratory tract.

Mode of Transmission - Method of transfer by which the organism moves or is carried from one place to another; the transfer of disease-causing microorganisms from one environment to another, particularly from an external environment to a susceptible individual. There are three general categories of transmission: contact, vehicle, and vector.

Portal of Entry - An opening allowing the microorganism to enter the host; the route a pathogen takes to enter a host. Just as with the portals of exit, many pathogens have preferred portals of entry. Many pathogens are not able to cause disease if their usual portal of entry is artificially bypassed. The most common portal of entry is the mucous membrane of the respiratory tract.

Susceptible Host - A person who cannot resist a microorganism invading the body, multiplying, and resulting in infection.

Chain of Infection: Diagram and Explanation

Characteristics of Agents

  1. Infectivity - capacity to cause infection in a susceptible host
  2. Pathogenicity - capacity to cause disease in a host
  3. Virulence - severity of disease that the agent causes to host

Modes of Disease Transmission

Contact Transmission - sub-categories include direct (person-to-person), indirect (fomite), or droplet.

  • Direct Contact - occurs through touching, kissing, dancing, etc . To prevent direct contact transmission, wear gloves and masks, etc.
  • Indirect Contact - occurs from a reservoir via inanimate objects called fomites. Fomites are basically almost anything an infected individual or reservoir can touch, upon which can be left a residue of contagious pathogen. Exceptions include the various inanimates referred to as vehicles: food, air, and liquids. Typically, it is more difficult to avoid indirect contact transmission than it is to avoid direct contact transmission. A certain degree of organismal durability may be necessary to survive passage on a fomite. The best way to prevent indirect contact transmission is by avoiding contact with fomites, avoiding contact of hands with mucous membranes, especially when handling or potentially handling fomites, the use of barriers when handling fomites, and disinfecting fomites before handling.
  • Droplet Transmission - consequence of being coughed, sneezed, or spit on. To be considered droplet transmission, mucous droplets must still be traveling with the velocity imparted on it leaving the mouth. As a rule of thumb, this is up to one meter after exiting the mouth. Any further and this is considered airborne transmission. Given interaction within one meter of people is certainly more difficult to avoid droplet transmission than it is to avoid either direct or indirect transmission. Not surprisingly, it is especially respiratory diseases that are transmitted by droplets.

Vehicle Transmission - transmission via a medium such as food, air, and liquid, which are all routinely taken into the body, and thus serve as vehicles into the body.

  • Airborne Transmission - occurs via droplets (typically mucous droplets) where droplets are liquids that remain airborne whether as aerosols (very small droplets) or associated with dust particles. An example is within airliners where economizing measures reduces the turnover of cabin air and consequently increases air recycling. Organisms which can find their way into the air and remain viable thus have repeated opportunities to infect passengers. It requires greater organismal durability that droplet transmission simply because of the length of time the microorganism is exposed to the air, before infecting a new host, is longer. Increased durability is to the effects of desiccation, exposure to sunlight, etc. This is why breathing does not typically result in the acquisition of disease.
  • Food-borne Transmission - any number of pathogens are found in food and not killed during processing may be transmitted via food product. Salmonella especially tends to be part of the normal flora of chickens and consequently associated with chicken products.
  • Water-borne Transmission - fecal contaminated water. Generally, this is via sewage contaminated water supplies. It is especially gastrointestinal pathogens that are present in feces and therefore which rely on this type of transmission.

Vector Transmission - no entry.

  • Portals of Entry to the Nervous System - the brain is typically fairly resistant to bacterial infection. There are four common portals of entry to the nervous system. For an organism to take advantage of these routes, they must display increasingly specialized adaptations as read from first to last: parenteral, via the blood, via the lymphatic systems, and up the peripheral nerve axons. Ordering of blood and lymphatic system was arbitrary and not intended to imply that one serves as a significantly more difficult portal to take advantage of than the other.

The scheme used by the American Public Health Association and CDC (Principles of Epi, 3rd edition) is a bit different. The 2 main categories are Direct and Indirect Transmission (not Contact). Direct Transmission includes Direct Contact and Droplet Spread. Examples of direct contact includes things like kissing, biting, and contact with soil containing infectious agents that penetrate the skin or enter wounds. Droplet spread is essentially an "in your face sneeze or cough". The idea is up close and immediate.

Indirect Transmission includes Airborne, Vehicles and Vectors. Airborne transmission involves dust or droplet nuclei (the latter are essentially little (<5 micron) particles that remain suspended in the air. Time and distance are both greater than for droplet spread (distance >6-8 ft). Vehicles include things like food, water, or fomites. Vehicles may passively carry pathogens or may promote growth or toxin production. Vectors are arthropods (e.g. mosquitoes, flies, lice, ticks) that spread infectious agents. If the agent multiplies or undergoes a change in life stage (as with malaria) within the vector, the vector is said to be a Biologic Vector. If the agent is simply carried from one place to another (think of a fly landing on feces and then a bowl of potato salad) it is a Mechanical Vector. Generally vector-borne diseases are thought of only in the context of biologic vectors. Rabies from a dog bite would be direct contact, not a vector. Note that while terms like food-borne, waterborne and zoonotic are not really included in this system - they are still valid.

Disease Prevention

For prevention strategies relating to the yearly topics, please see Disease Detectives#Yearly Topics.

Primordial prevention - intervention at the very beginning to avoid the development of risk factors the population may be exposed to. Often deals with changing physical and social environments.
Primary prevention - early intervention to avoid initial exposure to agent of disease preventing the process from starting.
Secondary prevention - during the latent stage (when the disease has just begun), process of screening and instituting treatment may prevent progression to symptomatic disease.
Tertiary prevention - during the symptomatic stage (when the patient shows symptoms), intervention may arrest, slow, or reverse the progression of disease.
Quaternary prevention - set of health activities to mitigate or avoid consequences of unnecessary/excessive intervention of the health system. Social credit that legitimizes medical intervention may be damaged if doctors don't prevent unnecessary medical activity and its consequences.

In Disease Detectives scenarios, event supervisors will often ask you to brainstorm disease prevention methods. Even if you know very little about the disease, you can brainstorm ideas from the chain of infection for the disease. For example, if the chain of infection describes that a disease is comes in contact with humans through sand at the beach and enters the body through any openings (mouth, nose, etc.), a prevention method could be putting up signs at beaches reminding the public to wash their hands before consuming any food.

Immunity

Active Immunity: Occurs when the person is exposed to a live pathogen, develops the disease, and becomes immune as a result of the primary immune response. There are two types of active immunity: infection (getting the disease firsthand and getting resistance) or vaccination (the body is introduced to an inactive form of the disease to gain immunity).
Passive Immunity: Short-term immunization by the injection of antibodies, such as gamma globulin, that are not produced by the recipient's cells. Naturally acquired passive immunity occurs during pregnancy, in which certain antibodies are passed from the maternal into the fetal bloodstream. There are two types of passive immunity: maternal (antibodies passed through the placenta during/before childbirth) or artificial (injection of monoclonal antibodies into the bloodstream through an IV tube).
Innate Immunity: The body's natural, nonspecific defense system that is present from birth. Examples include skin and stomach acid.
Different types of immunity.
Herd Immunity: Protecting a whole community from disease by immunizing a critical mass of its populace. Vaccination protects more than just the vaccinated person. By breaking the chain of an infection’s transmission, vaccination can also protect people who haven’t been immunized. But to work, this protection requires that a certain percentage of people in a community be vaccinated, and this number is called the herd immunity threshold. To calculate the herd immunity threshold, use the equation [math]\displaystyle{ 1-(1/R0) }[/math]. R0 is the reproductive number of the disease, or how many people get infected by a single infected individual on average. However, this formula does not account for vaccine efficacy, as it assumes a 100% effective vaccine. A formula involving vaccine efficacy is shown [math]\displaystyle{ (1-(1/R0))/E }[/math], where E is the vaccine efficacy. If the test only gives the R0, it is safe to use the first formula.

Yearly Topics

Bear in mind that Disease Detectives no longer focuses on one topic; rather all topics are used in tests!

Foodborne Illness

Here are lists of foodborne illnesses, and other important information regarding foodborne illnesses: List You should know that this year, specific diseases are not included in tests and competitions!

FAT TOM

Is a device to describe the six favorable conditions required for the growth of foodborne pathogens. It is an acronym for food, acidity, time, temperature, oxygen, and moisture.

FATTOM
F Food There are sufficient nutrients available that promote the growth of microorganisms. Protein-rich foods, such as meat, milk, eggs, and fish are the most susceptible.
A Acidity Foodborne pathogens require a slightly acidic pH level of 4.6-7.5, while they thrive in conditions with a pH of 6.6-7.5. The United States Food and Drug Administration's (FDA) regulations for acid/acidified foods require that the food is to be brought to pH 4.5 or below.
T Time Food should be removed from "the danger zone" (see below) within two-four hours, either by cooling or heating. While most guidelines state two hours, a few indicate four hours is still safe.
T Temperature Food-borne pathogens grow best in temperatures between 41 to 135 °F (5 to 57 °C), a range referred to as the temperature danger zone (TDZ). They thrive in temperatures that are between 70 to 104 °F (21 to 40 °C).
O Oxygen Almost all foodborne pathogens are aerobic, that is requiring oxygen to grow. Some pathogens, such as Clostridium botulinum, the source of botulism, are anaerobic.
M Moisture Water is essential for the growth of foodborne pathogens, water activity (aw) is a measure of the water available for use and is measured on a scale of 0 to 1.0. Foodborne pathogens grow best in foods that have aw between 0.95 and 1.0. FDA regulations for canned foods require aw of 0.85 or below.

Prevention

For Food Borne Illnesses, prevention tactics include:

  • Cook meat, poultry, and eggs thoroughly.
  • Don't cross-contaminate one food with another.
  • Chill and refrigerate leftovers promptly.
  • Clean and wash all produce.
  • Report suspected foodborne illnesses to the local health department.

Environmental Quality

Environmental quality typically deals with biological, chemical, and physical causes of illness. This can include air, water or noise pollution; smoking; natural disasters (flooding, drought, etc.); effects of toxins and pesticides, and more. According to WHO, around 24% of all diseases are caused by detrimental environmental factors.

Air Pollution:

  • In air pollution, ground-level ozone plays a major role in causing many health problems, such as asthma, worsened lung function, and premature deaths. This pollution can be from methane emissions, precursor chemicals, and to add on, wildfire emissions and air stagnation episodes can make the pollution even worse.
  • Particle pollution is the harmful spread of particles made of dust, dirt, or soot. It comes from primary and secondary sources. Primary sources are items/places that cause the pollution on their own, such as a wood stove. Secondary sources are items/places that let off gases that then form particles. Examples of this can be cars, factories, etc. Larger particles are called PM10 , while finer particles are called PM2.5.

Population Growth

Population growth follows factors that have come out of the growing world population, and what larger organizations have done to help. It also talks about relevant epidemics (like the flu) that have come out of population growth.

1. Global Population Growth Trends
  • Slowing Growth Rates: While the global population continues to grow, the rate of growth has declined in many regions due to lower fertility rates and improved access to family planning.
  • Regional Variations: Population growth is concentrated in developing regions, particularly in sub-Saharan Africa, where access to healthcare and disease management can be challenging.
2. Effects of Population Growth on Disease Dynamics

Because of rapid urbanization, crowded areas have resulted in rapid urbanization, increasing the transmission rate of infectious diseases. Additionally, expansion into previously uninhabited areas have also increased risk of zoonotic disease like Ebola and Covid-19.

Statistics

For Division C, statistics is a crucial part of the event (even though the rules specify that it should be less than 10% of the test material). Understanding statistics can be the difference between a good disease detective and an excellent disease detective. However, many disease detectives only make an effort to know the formulas that compute certain statistical measures without delving into the deeper (highly interesting!) meaning of statistics.

Basics

This is a crash course on the fundamentals of statistics. This is not a replacement for reading (and understanding) the SOINC guide on statistics in this event or better yet, taking a class or reading a textbook on statistics.

Populations and Samples

The population is the entire set under study. For example, the length of dung beetles. Because it is impossible to measure the length of every single dung beetle on planet earth, statistics use sampling. They take a subset of the dung beetles called a sample and use measurements from the sample to make inferences about the population as a whole. A population parameter is a characteristic of a population; for example, suppose 84% of Philadelphians preferred chocolate ice cream over vanilla ice cream. A sample statistic is an attribute of a sample; for example, we randomly sampled 10 Philadelphians and found that 70% preferred chocolate ice cream over vanilla ice cream.

Distribution Characteristics

Distributions are characterized by center, shape, and spread.

Central Tendency

A central tendency is a "typical" or "middle" value for a distribution.

Mean - Average of all of the values. [math]\displaystyle{ A=\dfrac{a_{1}+a_{2}+a_{3}+...+a_{n}}{n} }[/math] Means should not be used if the population is very skewed, as means are easily affected by extreme values.

Median - The middle value that separates the data into two halves. Medians are not as affected by extreme values, e.g. the mean number of arms per person in the world is less than 2, but the median is exactly 2.

Mode - The most frequently occurring value in the data set. Modes are useful for describing "peaks" in a distribution.

Shape

Skewedness - Distributions that have a few extreme values on the higher side are skewed to the right. Distributions that have a few extreme values on the lower side are skewed to the left.

Peaks - If a distribution has no peaks, it is uniform. If it has one peak, it is unimodal. If it has two peaks, it is bimodal.

Normal distributions - A set of data that is unimodal, symmetrical, and continues off to infinity on both tails. Also known as a Gaussian distribution. In the normal distribution, the mean, median, and mode are all the same. Technically, the normal distribution is continuous and infinite but can be approximated with discrete values.

Variability

Variability, scatter, and spread all have the same meaning: the extent to which a set of data is dispersed.

Range - The difference between the largest and smallest values in a set. It is not very useful except to get a sense of the possible spread of a distribution.

Interquartile Range (IQR) - The difference between the 75th (third quartile, or [math]\displaystyle{ Q_{3} }[/math]) and 25th (first quartile or [math]\displaystyle{ Q_{1} }[/math]) percentiles of a data set. To find [math]\displaystyle{ Q_{1} }[/math] and [math]\displaystyle{ Q_{3} }[/math], find the median of the data set, then divide the data set into two new sets, one with the data from the median up to the maximum and the other with the data from the median down to the minimum. The median values of the two new sets are [math]\displaystyle{ Q_{1} }[/math] and [math]\displaystyle{ Q_{3} }[/math]. The IQR is used with the median and is the most robust measure of variability, i.e. outliers do not affect the IQR as much. [math]\displaystyle{ IQR=Q_{3}-Q_{1} }[/math]

Variance - Average of the squared differences from the mean. The variance gives a very vague sense of how far apart the values in a data set are compared to the mean. [math]\displaystyle{ Var(x)=\dfrac{\sum(x-\bar{x})^2}{n-1} }[/math]

Standard Deviation (SD) - The square root of the variance. Quantifies the spread in a data set in the same units as the original data. Standard deviation is, in a sense, the average distance away from the mean. A low SD indicates that the data tends to be close to the mean and a high SD indicates the data is far away from the mean. SD and variance are used with the mean. Unlike IQR, SD is not resistant to outliers.

[math]\displaystyle{ SD(x)=s=\sqrt{\dfrac{\sum(x-\bar{x})^2}{n-1}} }[/math]

68-95-99.7 Rule - This rule states that 68% of the values in a normally distributed data fall within 1 SD of the mean, 95% fall within 2 SD of the mean, and 99.7% fall within 3 SD of the mean.

Example: Let a data set consist of integers 1 through 10, which sum to 55. The median and mean are 5.5.

To find the IQR, we can divide the data into two sets, one from 1 through 5 and the other from 6 through 10 inclusive. We find the median for each of these sets (3 and 8) and then subtract them. Thus, the IQR is 5.

To find the SD, we need to calculate the difference of each data value from the mean. Then we square the differences, add them, divide it by the sample size - 1 [math]\displaystyle{ (n=10, n-1=9) }[/math] and square root the result.

[math]\displaystyle{ SD(x)=\sqrt{\dfrac{(1-5.5)^2+(2-5.5)^2+\cdots+(10-5.5)^2}{9}}=2.87 }[/math]

Note that this population is uniform (each possibility has the same frequency of occurring), so the 68-95-99.7 rule for normal distributions does not apply. If it did apply, 68% of the data would fall in between the interval [math]\displaystyle{ (5.5-2.87, 5.5+2.87)=(2.63, 8.37) }[/math].

Standard Error of the Mean (SEM) - The SEM measures the variability of the mean of different samples around the population mean.

[math]\displaystyle{ SE_{\bar{x}}=\dfrac{s}{\sqrt{n}} }[/math]

Therefore, as a general rule, the SEM decreases as sample size increases.

Correlation

When two variables are revealed to have a relationship using statistical measures, the variables have a correlation. This correlation can be positive, negative, or zero. Without doing an experiment or trial, it is impossible to conclude that one variable causes another variable to act in some way. There is always the possibility of a third lurking or confounding variable that the original data does not account for. In this case, wording is extremely important. Correlation [math]\displaystyle{ \neq }[/math] causation.

The correlation coefficient [math]\displaystyle{ r }[/math] is a measure of the scatter around a linear relationship. It does NOT apply when a relationship is non-linear. Because the correlation coefficient is difficult to calculate by hand, exam writers will typically give the value and ask for the interpretation of the [math]\displaystyle{ r }[/math] value. The correlation coefficient is always [math]\displaystyle{ -1\lt r\lt 1 }[/math] and a value of 1 indicates a perfectly positively linear relationship. Conversely, a value of 0 indicates no relationship. Typically, [math]\displaystyle{ 0.9\lt |r|\lt 1 }[/math] is termed strong.

Standardization

The standard score or z score rescales the standard deviation of a normally distributed data set to 1 and mean to 0. Thus, we can model all normally distributed data using a single normal distribution with mean 0 and SD 1.

[math]\displaystyle{ z=\dfrac{x-\mu}{\sigma}=\dfrac{x-\bar{x}}{s} }[/math]

The first formula is for a population while the second is for a sample. [math]\displaystyle{ \sigma }[/math] represents the population standard deviation while [math]\displaystyle{ \mu }[/math] represents the population mean.

Infant Mortality Rate

The infant mortality rate is the ratio of deaths to births.

Rates in epidemiology are often expressed as a per-1000 or per-1 million, so if the infant mortality rate were 0.05, we could write that as 50 deaths per 1000 births.

Inference

Statistical inference is the process of inferring something about a population given a sample.

Confidence Intervals

Confidence intervals are used to estimate population attributes given statistics from a sample. However, confidence intervals do not take into account confounding or biases. The confidence level determines how wide the interval is. A common confidence level is 95%: "I am 95% certain that the interval captures the true population proportion/mean. This means that if the process used to obtain the interval were repeated many, many times, the interval generated would capture the true population proportion/mean 95% of the time."

Confidence Intervals for Proportions - Used to define a range of values within which a proportion may lie.

[math]\displaystyle{ \hat{p}\pm z^{*}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} }[/math]

A z table contains common values for z-star.

Confidence Intervals of Means - Used to define a range of values within which a mean may lie.

[math]\displaystyle{ \bar{x}\pm t^{*}\frac{s}{\sqrt{n}} }[/math]

A t table contains common values for t-star. Note that you need the number of degrees of freedom (df) to find the t-star. Generally, [math]\displaystyle{ df=n-1 }[/math].

Inference Tests

In an inference test, we use statistical inference to determine if a statement is likely or unlikely. We first create a null hypothesis ("the default"). For example, suppose that you were investigating whether drinking the punch at the party is associated with developing salmonellosis symptoms. The null hypothesis would be that eating cabbage is not associated with developing salmonellosis symptoms. The alternative hypothesis would be that eating cabbage is associated with developing salmonellosis symptoms. You would then look at your sample (people who were at the party and did/did not drink the punch and did/did not develop salmonellosis symptoms) and ask, How likely is it that this result occurred by chance, i.e. if the null hypothesis were true? This probability is called the p-value. Statisticians generally use a threshold of 0.05. If the p-value is below 0.05, the result is significant, and you reject the null hypothesis. Otherwise, you fail to reject the null hypothesis.

Error

A Type I error occurs if you reject [math]\displaystyle{ H_o }[/math] (the null hypothesis) when [math]\displaystyle{ H_o }[/math] is true. The probability of a Type I error is [math]\displaystyle{ \alpha }[/math], the significance level.

A Type II error occurs if you fail to reject [math]\displaystyle{ H_o }[/math] when [math]\displaystyle{ H_o }[/math] is false ([math]\displaystyle{ H_a }[/math] is true). The probability of a Type II error is represented by the letter [math]\displaystyle{ \beta }[/math].

The power of the test is the probability that the null hypothesis is rejected if [math]\displaystyle{ H_o }[/math] is false. The power of the test is equal to [math]\displaystyle{ 1-\beta }[/math].

Advanced

Sensitivity and Specificity

Sensitivity and specificity are ways to calculate the chance of having a specific disease given you do or do not have a disease.

Has disease Has no disease
People who test positive a b
People who test negative c d

Sensitivity is the chance of testing positive if you do have the disease. The equation to use for sensitivity is: [math]\displaystyle{ a \over a+c }[/math]

Specificity is the chance of testing negative if you do not have the disease. The equation to use for specificity is: [math]\displaystyle{ d \over d+b }[/math]

Chi Square

A chi-square is a statistical measure used to determine the difference between an expected value and an observed value. In epidemiology, it can be used to compare information from different groups (i.e. age) to a local or national average. A chi-squared test checks if there's a significant relationship between two categorical variables or if observed data matches expected data. Create a table of observed values, calculate expected values, and use the formula (O−E)2/E for each cell. Add these up for the chi-squared statistic. Compare it to a critical value based on degrees of freedom to decide if there's a significant difference. [math]\displaystyle{ \chi^2=\sum\frac{(O−E)^2}{E} }[/math]

T-Test

A t-test compares the means of two groups to see if they're significantly different. For a single sample, it checks if the sample mean differs from a known value. Calculate the t-statistic by dividing the difference between means by the standard error. Compare this to a critical value or use a p-value to determine significance.

Z-Test

A z-test is used when comparing sample and population means or proportions with a known population variance. Calculate the z-score by dividing the difference between the sample and population mean by the standard error. Compare the z-score to the standard normal distribution to check for significance.

Paired T-Test

Used to compare multiple sets of data.

Fischer's Exact Test

Fischer's test searches for non-random associations between two categorical variables.

McNemar's Test

The McNemar Test is similar to a Chi-Square, except that it uses matched paired data.

Maentel Haenszel Test

The Cochran-Maentel-Haenszel Test aims to find the association between variables while controlling for confounding.

ANOVA

The analysis of variance test, or ANOVA, is a statistical measure used to compare variances of two or more samples.

Resources

Epidemiology Unmasked Textbook
Cancer Epidemiology Instruction Manual
Principles of Epidemiology in Public Health Practice, 3rd Edition (This is the textbook referred to in official 2023-2024 Science Olympiad Notes)
Disease Detectives Resources
Kpalm1111's 2014 SSSS Notes
elg4's 2015 SSSS Notes
Brs' 2017 SSSS Notes