Difference between revisions of "Disease Detectives"

From Wiki - Scioly.org
Jump to navigation Jump to search
m (I added the formula to Infant Mortality)
(28 intermediate revisions by 9 users not shown)
Line 1: Line 1:
 
{{EventLinksBox
 
{{EventLinksBox
|active=Yes
+
|active=yes
 
|type=Life Science
 
|type=Life Science
 
|cat=Study
 
|cat=Study
 
|2009thread=[http://www.scioly.org/phpBB3/viewtopic.php?f=18&t=483 2009]
 
|2009thread=[http://www.scioly.org/phpBB3/viewtopic.php?f=18&t=483 2009]
|2009tests=[http://scioly.org/wiki/2009_Test_Exchange#Disease_Detectives 2009]
 
 
|2010thread=[http://www.scioly.org/phpBB3/viewtopic.php?f=65&t=1270 2010]
 
|2010thread=[http://www.scioly.org/phpBB3/viewtopic.php?f=65&t=1270 2010]
|2010tests=[http://scioly.org/wiki/2010_Test_Exchange#Disease_Detectives 2010]
 
 
|2011thread=[http://scioly.org/phpBB3/viewtopic.php?f=92&t=2212 2011]
 
|2011thread=[http://scioly.org/phpBB3/viewtopic.php?f=92&t=2212 2011]
|2011tests=[http://scioly.org/wiki/2011_Test_Exchange#Disease_Detectives 2011]
 
 
|2012thread=[http://www.scioly.org/phpBB3/viewtopic.php?f=121&t=2950 2012]
 
|2012thread=[http://www.scioly.org/phpBB3/viewtopic.php?f=121&t=2950 2012]
|2012tests=[http://scioly.org/wiki/2012_Test_Exchange#Disease_Detectives 2012]
 
 
|2013thread=[http://www.scioly.org/phpBB3/viewtopic.php?f=144&t=3699 2013]
 
|2013thread=[http://www.scioly.org/phpBB3/viewtopic.php?f=144&t=3699 2013]
|2013tests=2013
 
 
|2014thread=[http://www.scioly.org/phpBB3/viewtopic.php?f=167&t=4963 2014]
 
|2014thread=[http://www.scioly.org/phpBB3/viewtopic.php?f=167&t=4963 2014]
|2014tests=2014
 
|2014questions=[http://www.scioly.org/phpBB3/viewtopic.php?f=173&t=5025 2014]
 
 
|2015thread=[http://scioly.org/phpBB3/viewtopic.php?f=187&t=5890 2015]
 
|2015thread=[http://scioly.org/phpBB3/viewtopic.php?f=187&t=5890 2015]
|2015tests=2015
 
|2015questions=[http://scioly.org/phpBB3/viewtopic.php?f=193&t=5979 2015]
 
 
|2016thread=[http://scioly.org/phpBB3/viewtopic.php?f=208&t=7683 2016]
 
|2016thread=[http://scioly.org/phpBB3/viewtopic.php?f=208&t=7683 2016]
|2016tests=2016
 
|2016questions=[http://scioly.org/phpBB3/viewtopic.php?f=217&t=7769 2016]
 
 
|2017thread=[http://scioly.org/phpBB3/viewtopic.php?f=227&t=9293 2017]
 
|2017thread=[http://scioly.org/phpBB3/viewtopic.php?f=227&t=9293 2017]
 +
|2018thread=[https://scioly.org/forums/viewtopic.php?f=265&t=10854 2018]
 
|2017tests=2017
 
|2017tests=2017
 +
|2018tests=2018
 
|testsArchive=true
 
|testsArchive=true
|B Champion=[[Bala Cynwyd Middle School]]
+
|2014questions=[http://www.scioly.org/phpBB3/viewtopic.php?f=173&t=5025 2014]
|C Champion=[[Columbia High School (New York)|Columbia High School]]
+
|2015questions=[http://scioly.org/phpBB3/viewtopic.php?f=193&t=5979 2015]
 +
|2016questions=[http://scioly.org/phpBB3/viewtopic.php?f=217&t=7769 2016]
 +
|2017questions=[http://scioly.org/phpBB3/viewtopic.php?f=228&t=9637 2017]
 +
|2018questions=[https://scioly.org/forums/viewtopic.php?f=266&t=10931 2018]
 +
|B Champion=[[Chippewa Middle School (Minnesota)|Chippewa Middle School]]
 +
|C Champion=[[Mira Loma High School]]
 
}}
 
}}
'''Disease Detectives''' is a Division B/C event that focuses on epidemiology, that is, the study of diseases and how they spread.
+
'''Disease Detectives''' is an event in [[Division B]] and [[Division C]] that focuses on epidemiology, the study of diseases and how they spread.
  
==Event Format==
+
==Focus Topics==
 +
Disease Detectives rotates between topics every two years. Typically the topic has a minor impact on the content of the event, usually only affecting the specific types of diseases used in problems.
  
The event focus ''Population Growth'' covers many areas, including:
 
*Water Quality, Water Pollution, Water Demands
 
*Sanitation Needs
 
*Growth of Slums and Household Environment
 
*Environmental Degradation
 
*Air Pollution
 
*Infectious Disease Outbreaks
 
*Rapid Spread of Disease via Public Transportation and Air Travel
 
*Food Quality and Food Contamination
 
*Lack of food in poor nations vs. unhealthy fast food and drinks in technological societies
 
*Availability of health care for the poor and the aged
 
*People moving into uninhabited areas
 
*New pathogens, such as Lyme disease and Ebola
 
 
==Focus Topics==
 
 
{|class="wikitable"
 
{|class="wikitable"
 
! '''Year'''
 
! '''Year'''
 
! '''Topic'''
 
! '''Topic'''
 +
|-
 +
! [[2018]]
 +
| rowspan="2"|Food Borne Illness
 
|-
 
|-
 
! [[2017]]
 
! [[2017]]
| Food Borne Illness
 
 
|-
 
|-
 
! [[2016]]
 
! [[2016]]
| Population Growth
+
| rowspan="2"|Population Growth
 
|-
 
|-
 
! [[2015]]
 
! [[2015]]
| Population Growth
 
 
|-
 
|-
 
! [[2014]]
 
! [[2014]]
| Environmental Quality
+
| rowspan="2"|Environmental Quality
 
|-
 
|-
 
! [[2013]]
 
! [[2013]]
| Environmental Quality
 
 
|-
 
|-
 
! [[2012]]
 
! [[2012]]
| Food Borne Illness
+
| rowspan="2"|Food Borne Illness
 
|-
 
|-
 
! [[2011]]
 
! [[2011]]
| Food Borne Illness
 
 
|-
 
|-
 
! [[2010]]
 
! [[2010]]
| Population Growth
+
| rowspan="2"|Population Growth
 
|-
 
|-
 
! [[2009]]
 
! [[2009]]
| Population Growth
 
 
|-
 
|-
 
|}
 
|}
  
 +
==Event Format==
 +
 +
The ''Foodborne Illness'' focus covers areas that include:
 +
*Historical foodborne illness outbreaks
 +
*Characteristics of different foodborne illnesses (salmonella, botulinism, etc.)
 +
*Safe cooking temperature for different foods
 +
*Prevention techniques for stopping foodborne illness transmission
 +
*Food preparation steps and safety
 +
*Pulsenet and other national surveillance techniques for foodborne illnesses
 +
 +
The ''Population Growth'' focus covers areas that include:
 +
*Water Quality, Water Pollution, Water Demands
 +
*Sanitation Needs
 +
*Growth of Slums and Household Environment
 +
*Environmental Degradation
 +
*Air Pollution
 +
*Infectious Disease Outbreaks
 +
*Rapid Spread of Disease via Public Transportation and Air Travel
 +
*Food Quality and Food Contamination
 +
*Lack of food in poor nations vs. unhealthy fast food and drinks in technological societies
 +
*Availability of health care for the poor and the aged
 +
*People moving into uninhabited areas
 +
*New pathogens, such as Lyme disease and Ebola
  
 
==The Basics==
 
==The Basics==
Line 100: Line 103:
 
'''Cluster''' - An aggregation of cases over a particular period closely grouped in time and space, ''regardless of whether the number is more than the expected number''
 
'''Cluster''' - An aggregation of cases over a particular period closely grouped in time and space, ''regardless of whether the number is more than the expected number''
  
'''Outbreak''' - More cases of a particular disease than expected in a given area or among a specialized group of people over a particular period of time.  
+
'''Endemic Disease''' - Present at a continuous level throughout a population/geographic area; constant presence of an agent/health condition within a given geographic area/population; refers to the usual prevalence of an agent/condition.
 +
 
 +
'''Epidemic''' - Large numbers of people over a wide geographical area are affected.
 +
 
 +
'''Etiology''' - Study of the cause of a disease.
 +
 
 +
'''Fomite''' - A physical object that serves to transmit an infectious agent from person to person. An example of this is lice on a comb. The comb is the fomite and the lice is the agent that can make your hair itch.
 +
 
 +
'''Incubation Period''' - Time in between when a person somes into contact with a pathogen and when they first show symptoms or signs of disease.
 +
 
 +
'''Index Case''' - First patient in an epidemiological study (also known as '''patient zero''' or '''primary case''').
  
'''Epidemic''' - Large numbers of people over a wide geographical area are affected
+
'''Latent Period''' - Time in between when a person comes into contact with a pathogen and when they become infected.
  
'''Pandemic''' - An ''epidemic'' occurring over several countries or continents and affecting a large ''proportion'' of the population.  
+
'''Morbidity''' - Rate of disease in a population.
  
'''Surveillance''' - The systematic and ongoing collection, analysis, interpretation, and dissemination of health data. The purpose of public health surveillance is to gain knowledge of the patterns of disease, injury, and other health problems in a community so that we can work towards their prevention and control.
+
'''Mortality''' - Rate of death in a population.
  
'''Plague''' - A serious, potentially life-threatening infectious disease that is usually transmitted to humans by the bites of rodent fleas. It was one of the scourges of our early history. There are three major forms of the disease: bubonic, septicemic, and pneumonic.
+
'''Outbreak''' - More cases of a particular disease than expected in a given area or among a specialized group of people over a particular period of time.  
  
'''Vector''' - An animal that transmits disease. For example a mosquito is a vector for malaria.
+
'''Pandemic''' - An ''epidemic'' occurring over several countries or continents and affecting a large ''proportion'' of the population.  
  
'''Fomite''' - A physical object that serves to transmit an infectious agent from person to person. An example of this is lice on a comb. The comb is the fomite and the lice is the agent that can make your hair itch.  
+
'''Plague''' - A serious, potentially life-threatening infectious disease that is usually transmitted to humans by the bites of rodent fleas. It was one of the scourges of our early history. There are three major forms of the disease: bubonic, septicemic, and pneumonic.
  
 
'''Risk''' - The probability that an individual will be affected by, or die from, an illness or injury within a stated time or age span.  Risk of illness is generally considered to be the same as the Incidence (see below) and the terms are used interchangeably.  Age-span is not usually a consideration in this usage.  Risk of death from a particular illness is expressed as the Case Fatality Rate (Number deaths due to a disease/Number with the disease) or the Cause-specific Mortality Rate (Number deaths due to a disease/Number in population).  Age span is a more common consideration in this last usage.  
 
'''Risk''' - The probability that an individual will be affected by, or die from, an illness or injury within a stated time or age span.  Risk of illness is generally considered to be the same as the Incidence (see below) and the terms are used interchangeably.  Age-span is not usually a consideration in this usage.  Risk of death from a particular illness is expressed as the Case Fatality Rate (Number deaths due to a disease/Number with the disease) or the Cause-specific Mortality Rate (Number deaths due to a disease/Number in population).  Age span is a more common consideration in this last usage.  
  
'''Zoonosis''' - An infectious disease that is transmissible from animals to humans.
+
'''Surveillance''' - The systematic and ongoing collection, analysis, interpretation, and dissemination of health data. The purpose of public health surveillance is to gain knowledge of the patterns of disease, injury, and other health problems in a community so that we can work towards their prevention and control.
  
'''Incubation Period''' - Time in between when a person somes into contact with a pathogen and when they first show symptoms or signs of disease.
+
'''Vector''' - An animal that transmits disease. For example a mosquito is a vector for malaria.
  
'''Endemic Disease''' - Present at a continuous level throughout a population/geographic area; constant presence of an agent/health condition within a given geographic area/population; refers to the usual prevalence of an agent/condition.
+
'''Zoonosis''' - An infectious disease that is transmissible from animals to humans.
  
 
===Incidence, Prevalence, and Duration===
 
===Incidence, Prevalence, and Duration===
  
The INCIDENCE of an illness is the number of new instances of disease in a population over a given time period. It is expressed as "X cases/Y population/ Z time".  The PREVALENCE of an illness is number of affected persons in the population at any given point in time.  It is expressed as "X cases/Y population".  Note the only difference is that INCIDENCE (I) includes time while PREVALENCE (P) does not.  Time reflects the DURATION of the illness or condition.  If two conditions have the same incidence in a population, the one with the longer duration will have the greater prevalence. P = I x D
+
The '''incidence''' of an illness is the number of ''new'' instances of disease in a population over a given time period. It is expressed as "X cases/Y population/ Z time".  The '''prevalence''' of an illness is number of affected persons in the population at any given point in time.  It is expressed as "X cases/Y population".  There are two major ways in which prevalence is measured: '''period prevalence''' and '''point prevalence.''' Think of point prevalence as a snapshot of the population and its rate of a certain disease at a point in time while period prevalence tracks the prevalence over a certain duration. Note the only difference is that incidence (I) includes time while prevalence (P) does not.  Time (D) reflects the duration of the illness or condition.  If two conditions have the same incidence in a population, the one with the longer duration will have the greater prevalence. Importantly, [math]P = I*D[/math], so with two of the variables, it is possible to solve for the third.
  
 
==How to prove x caused y, or Causation==
 
==How to prove x caused y, or Causation==
Line 172: Line 185:
 
===Advantages and Disadvantages to Study Designs===
 
===Advantages and Disadvantages to Study Designs===
  
{| class="wikitable" style="width:75%; height:50px" border="1"
+
{| class="wikitable" style="width:100%; height:50px" border="1"
 
|-
 
|-
 
! Study Designs
 
! Study Designs
Line 183: Line 196:
 
| Time Consuming
 
| Time Consuming
 
Unethical for Harmful Exposures
 
Unethical for Harmful Exposures
 +
 
Most Expensive
 
Most Expensive
 
|-
 
|-
Line 188: Line 202:
 
| Most Accurate Observational Study
 
| Most Accurate Observational Study
 
Good Measure of Exposure
 
Good Measure of Exposure
 +
 +
Correct Time Sequence
 +
 +
Good for Rare Exposures
 +
 +
Easy Risk Calculation
 
| Time Consuming
 
| Time Consuming
 
Expensive
 
Expensive
 +
 +
Bad for Rare Diseases
 +
 +
Possible Loss of Follow-up
 +
 
|-
 
|-
 
| Case-Control Study
 
| Case-Control Study
 
| Can Study Rare Diseases
 
| Can Study Rare Diseases
 
Relatively Less Expensive and Relatively Fast
 
Relatively Less Expensive and Relatively Fast
 +
 +
Good for Rare Diseases
 +
 +
Good for Long Latency Periods
 +
 
| Possible Time-Order Confusion
 
| Possible Time-Order Confusion
Possible Error in Recalling Past Exposures
+
Error in Recalling Exposures
 +
 
 +
Only 1 outcome
 
|-
 
|-
 
| Cross-Sectional Study
 
| Cross-Sectional Study
 
| Fastest
 
| Fastest
 
Least Expensive
 
Least Expensive
 +
 +
Good for more than 1 Outcome
 
| Possible Time-Order Confusion
 
| Possible Time-Order Confusion
 
Least Confidence in Findings
 
Least Confidence in Findings
Line 224: Line 258:
 
Using the 2*2 Table, we can calculate odds ratio and relative risk. These calculations allow comparisons between the control group and the group afflicted with the condition. One is the neutral value and means that there is no difference between the groups compared; when the value is greater than one it means that there has been some difference between the two groups, whether it was caused by bias, chance, or an actual relationship between the exposure and outcome is yet to be seen. The P-value tells us whether the results of the study can be used. The P-value is the measure of how confident you are that your findings are correct. You can only trust your findings to be correct if the P-value is less than .05.  
 
Using the 2*2 Table, we can calculate odds ratio and relative risk. These calculations allow comparisons between the control group and the group afflicted with the condition. One is the neutral value and means that there is no difference between the groups compared; when the value is greater than one it means that there has been some difference between the two groups, whether it was caused by bias, chance, or an actual relationship between the exposure and outcome is yet to be seen. The P-value tells us whether the results of the study can be used. The P-value is the measure of how confident you are that your findings are correct. You can only trust your findings to be correct if the P-value is less than .05.  
  
'''Odds Ratio''' - used in case-control study, <math>a \cdot d \over b \cdot c</math>
+
'''Odds Ratio''' - used in case-control study, [math]a \cdot d \over b \cdot c[/math]
  
'''Relative Risk''' - used in cohort study, <math>(a/(a+b)) \over (c/(c+d))</math>
+
'''Relative Risk''' - used in cohort study, [math](a/(a+b)) \over (c/(c+d))[/math]
  
 
'''Attack Rate''' - the rate that a group experienced an outcome or illness equal to the number sick divided by the total in that group. (There should be a high attack rate in those exposed and a low attack rate in those unexposed.) For the exposed: a/(a+b)        For the unexposed: c/(c+d)
 
'''Attack Rate''' - the rate that a group experienced an outcome or illness equal to the number sick divided by the total in that group. (There should be a high attack rate in those exposed and a low attack rate in those unexposed.) For the exposed: a/(a+b)        For the unexposed: c/(c+d)
Line 345: Line 379:
 
'''Confounding bias''' is bias resulting from mixing effects of several factors. Unlike selection and information bias, confounding bias deals with causation and not variations in study results.
 
'''Confounding bias''' is bias resulting from mixing effects of several factors. Unlike selection and information bias, confounding bias deals with causation and not variations in study results.
  
==Resources==
+
==Statistics==
 +
{{Incomplete|section}}
 +
For Division C, statistics is a crucial part of the event (even though the rules specify that it should be less than 10% of the test material). Understanding statistics can be the difference between a good disease detective and an ''excellent'' disease detective. However, many disease detectives only make an effort to know the formulas that compute certain statistical measures without delving into the deeper (highly interesting!) meaning of statistics.
 +
 
 +
===Basics===
 +
This is a crash course on the fundamentals of statistics. This is not a replacement for reading (and understanding) the SOINC guide on statistics in this event or better yet, taking a class or reading a textbook on statistics.
 +
 
 +
====Population====
 +
 
 +
The '''population''' is the entire set under study. For example, the length of dung beetles. Because it is impossible to measure the length of every single dung beetle on planet earth, statistics use sampling. They take a subset of the dung beetles called a '''sample''' and use measurements from the sample to make inferences about the population as a whole. A '''population parameter''' is a characteristic of a population; for example, 80% of Science Olympiad teams are not gender-equal while a '''sample statistic''' is an attribute of a sample; for example, 91% of the 176 teams in Indiana Science Olympiad have never attended nationals.
 +
 
 +
====Central Tendency====
 +
 
 +
'''Mean''' - Average of all of the values. [math]A=\dfrac{a_{1}+a_{2}+a_{3}...}{n}[/math]
 +
 
 +
'''Median''' - The middle value that separates the data into two halves.
 +
 
 +
'''Mode''' - The most frequently occurring value in the data set.
 +
 
 +
====Variability====
 +
 
 +
Variability, scatter, and spread all have the same meaning: the extent to which a set of data is dispersed.
 +
 
 +
'''Range''' - The difference between the largest and smallest values in a set.
 +
 
 +
'''Interquartile Range (IQR)''' - The difference between the 75th (third quartile, or [math]Q_{3}[/math]) and 25th (first quartile or [math]Q_{1}[/math]) percentiles of a data set. To find [math]Q_{1}[/math] and [math]Q_{3}[/math], find the median of the data set, then divide the data set into two new sets, one with the data from the median up to the maximum and the other with the data from the median down to the minimum. The median values of the two new sets are [math]Q_{1}[/math] and [math]Q_{3}[/math]. The IQR is used with the median and is the most robust measure of variability, i.e. outliers do not affect the IQR as much.
 +
[math]IQR=Q_{3}-Q_{1}[/math]
 +
 
 +
'''Variance''' - Average of the squared differences from the mean. The variance gives a very vague sense of how far apart the values in a data set are compared to the mean.
 +
[math]Var(x)=\dfrac{\sum(x-\bar{x})^2}{n-1}[/math]
 +
 
 +
'''Standard Deviation (SD)''' - The square root of the variance. Quantifies the spread in a data set in the same units as the original data. A low SD indicates that the data tends to be close to the mean and a high SD indicates the data is far away from the mean. SD and variance are used with the mean. Unlike IQR, SD is not resistant to outliers.
 +
 
 +
[math]SD(x)=s=\sqrt{\dfrac{\sum(x-\bar{x})^2}{n-1}}[/math]
 +
 
 +
'''Normal Distribution''' - A set of data that is unimodal and symmetrical. Also known as a Gaussian distribution. Technically, the normal distribution is continuous but can be approximated with discrete values.
 +
 
 +
'''68-95-99.7 Rule''' - This rule states that 68% of the values in a normally distributed data fall within 1 SD of the mean, 95% fall within 2 SD of the mean, and 99.7% fall within 3 SD of the mean.
 +
 
 +
Example: Let a data set consist of integers 1 through 10, which sum to 55. The median and mean are 5.5.
 +
 
 +
To find the IQR, we can divide the data into two sets, one from 1 through 5 and the other from 6 through 10 inclusive. We find the median for each of these sets (3 and 8) and then subtract them. Thus, the IQR is 5.
 +
 
 +
To find the SD, we need to calculate the difference of each data value from the mean. Then we square the differences, add them, divide it by the sample size - 1 [math](n=10, n-1=9)[/math] and square root the result.
 +
 
 +
[math]SD(x)=\sqrt{\dfrac{(1-5.5)^2+(2-5.5)^2+...+(10-5.5)^2}{9}}=2.87[/math]
 +
 
 +
Therefore, 68% of the data would fall in between the interval [math](5.5-2.87, 5.5+2.87)=(2.63, 8.37)[/math] by the 68-95-99.7 rule.
 +
 
 +
'''Standard Error of the Mean (SEM)''' - The SEM measures the variability of the mean of different samples around the population mean.
 +
 
 +
[math]SE_{\bar{x}}=\dfrac{s}{\sqrt{n}}[/math]
 +
 
 +
Therefore, as a general rule, the SEM decreases as sample size increases.
 +
 
 +
====Correlation====
 +
 
 +
When two variables are revealed to have a relationship using statistical measures, the variables have a correlation. This correlation can be positive, negative, or zero. Without doing an experiment or trial, it is impossible to conclude that one variable ''causes'' another variable to act in some way. There is always the possibility of a third lurking or confounding variable that the original data does not account for. In this case, wording is extremely important. Correlation [math]\neq[/math] causation.
 +
 
 +
The correlation coefficient [math]r[/math] is a measure of the scatter around a linear relationship. It does NOT apply when a relationship is non-linear. Because the correlation coefficient is difficult to calculate by hand, exam writers will typically give the value and ask for the interpretation of the [math]r[/math] value. The correlation coefficient is always [math]-1<r<1[/math] and a value of 1 indicates a perfectly positively linear relationship. Conversely, a value of 0 indicates no relationship. Typically, [math]0.9<|r|<1[/math] is termed strong.
 +
 
 +
====Standardization====
 +
 
 +
The '''standard score''' or '''z score''' rescales the standard deviation of a normally distributed data set to 1 and mean to 0. Thus, we can model all normally distributed data using a single normal distribution with mean 0 and SD 1.
 +
 
 +
[math]z=\dfrac{x-\mu}{\sigma}=\dfrac{x-\bar{x}}{s}[/math]
 +
 
 +
The first formula is for a population while the second is for a sample. [math]\sigma[/math] represents the population standard deviation while [math]\mu[/math] represents the population mean.
 +
 
 +
====Infant Mortality Rate====
 +
To solve the infant mortality rate, use this formula. [math]d[/math] is the amount of deaths in the year, and [math]b[/math] is the amount of births in the year.
 +
 
 +
[math]\frac{d}{b}\times1000[/math]
 +
 
 +
====Confidence Intervals====
 +
 
 +
====Inference Tests====
 +
 
 +
====Significance====
  
[http://www.epibiostat.ucsf.edu/epidem/epidem.html Links to tons of great epidemiology resources]
+
====Error====
  
[https://sites.google.com/a/mail.fcboe.org/dd/ Disease Detectives Site]
+
===Advanced===
  
[http://seer.cancer.gov/training/manuals/ Cancer Epidemiology Instruction Manual]
+
===Sensitivity and Specificity===
  
[[User:Kpalm1111|Kpalm1111]]'s 2014 [[SSSS]] [[Media:Kpalm1111_disease_notes.pdf|Notes]]
+
==Resources==
 +
:User [[User:Brs|Brs]]' [[Media:DiseaseDet Notes BrS SSSS16.pdf|Notes]], from [[SSSS]] 2017.
 +
:[http://www.epibiostat.ucsf.edu/epidem/epidem.html Links to tons of great epidemiology resources]
 +
:[https://sites.google.com/a/mail.fcboe.org/dd/ Disease Detectives Site]
 +
:[http://seer.cancer.gov/training/manuals/ Cancer Epidemiology Instruction Manual]
 +
:[[User:Kpalm1111|Kpalm1111]]'s 2014 [[SSSS]] [[Media:Kpalm1111_disease_notes.pdf|Notes]]
 +
:[[User:elg4|elg4]]'s 2015 [[SSSS]] [[Media:elg4's Disease Detectives Notes.pdf|Notes]]
 +
:[https://en.m.wikipedia.org/wiki/Epidemiology Wikipedia - Epidemiology]
  
[[User:elg4|elg4]]'s 2015 [[SSSS]] [[Media:elg4's Disease Detectives Notes.pdf|Notes]]
 
  
 
[[Category:Event Pages]]
 
[[Category:Event Pages]]
 
[[Category:Study Event Pages]]
 
[[Category:Study Event Pages]]
 
https://en.m.wikipedia.org/wiki/Epidemiology
 

Revision as of 20:08, 28 September 2017

Template:EventLinksBox Disease Detectives is an event in Division B and Division C that focuses on epidemiology, the study of diseases and how they spread.

Focus Topics

Disease Detectives rotates between topics every two years. Typically the topic has a minor impact on the content of the event, usually only affecting the specific types of diseases used in problems.

Year Topic
2018 Food Borne Illness
2017
2016 Population Growth
2015
2014 Environmental Quality
2013
2012 Food Borne Illness
2011
2010 Population Growth
2009

Event Format

The Foodborne Illness focus covers areas that include:

  • Historical foodborne illness outbreaks
  • Characteristics of different foodborne illnesses (salmonella, botulinism, etc.)
  • Safe cooking temperature for different foods
  • Prevention techniques for stopping foodborne illness transmission
  • Food preparation steps and safety
  • Pulsenet and other national surveillance techniques for foodborne illnesses

The Population Growth focus covers areas that include:

  • Water Quality, Water Pollution, Water Demands
  • Sanitation Needs
  • Growth of Slums and Household Environment
  • Environmental Degradation
  • Air Pollution
  • Infectious Disease Outbreaks
  • Rapid Spread of Disease via Public Transportation and Air Travel
  • Food Quality and Food Contamination
  • Lack of food in poor nations vs. unhealthy fast food and drinks in technological societies
  • Availability of health care for the poor and the aged
  • People moving into uninhabited areas
  • New pathogens, such as Lyme disease and Ebola

The Basics

Epidemiology

Epidemiology is the study of distribution and determinants of health-related states in specified populations, and the application of this to control health problems. There are four basic reasons for why disease detectives study and research outbreaks and epidemics. These reasons are: Control and Prevention, Research Opportunities, Training, and Legal Concerns.

Two Basic Types of Epidemiology

Classical Epidemiology - population oriented, studies community origins of health problems related to nutrition, environment, human behavior, and the psychological, social, and spiritual state of a population. The event is more aimed towards this type of epidemiology.

Clinical Epidemiology - studies patients in health care settings in order to improve the diagnosis and treatment of various diseases and the prognosis for patients already affected by a disease. These can be further divided into:

  • Infectious Disease Epidemiology - heavily dependent on laboratory support
  • Chronic Disease Epidemiology - dependent on complex sampling and statistical methods

There all sorts of classification systems for epi and the above certainly are examples. One could add research epi vs applied epi to the above list. However probably the most fundamental and common system is Descriptive epi (e.g. person, place and time) vs Analytic epi (hypothesis testing - study design).

Basic Epidemiology Terms

Cluster - An aggregation of cases over a particular period closely grouped in time and space, regardless of whether the number is more than the expected number

Endemic Disease - Present at a continuous level throughout a population/geographic area; constant presence of an agent/health condition within a given geographic area/population; refers to the usual prevalence of an agent/condition.

Epidemic - Large numbers of people over a wide geographical area are affected.

Etiology - Study of the cause of a disease.

Fomite - A physical object that serves to transmit an infectious agent from person to person. An example of this is lice on a comb. The comb is the fomite and the lice is the agent that can make your hair itch.

Incubation Period - Time in between when a person somes into contact with a pathogen and when they first show symptoms or signs of disease.

Index Case - First patient in an epidemiological study (also known as patient zero or primary case).

Latent Period - Time in between when a person comes into contact with a pathogen and when they become infected.

Morbidity - Rate of disease in a population.

Mortality - Rate of death in a population.

Outbreak - More cases of a particular disease than expected in a given area or among a specialized group of people over a particular period of time.

Pandemic - An epidemic occurring over several countries or continents and affecting a large proportion of the population.

Plague - A serious, potentially life-threatening infectious disease that is usually transmitted to humans by the bites of rodent fleas. It was one of the scourges of our early history. There are three major forms of the disease: bubonic, septicemic, and pneumonic.

Risk - The probability that an individual will be affected by, or die from, an illness or injury within a stated time or age span. Risk of illness is generally considered to be the same as the Incidence (see below) and the terms are used interchangeably. Age-span is not usually a consideration in this usage. Risk of death from a particular illness is expressed as the Case Fatality Rate (Number deaths due to a disease/Number with the disease) or the Cause-specific Mortality Rate (Number deaths due to a disease/Number in population). Age span is a more common consideration in this last usage.

Surveillance - The systematic and ongoing collection, analysis, interpretation, and dissemination of health data. The purpose of public health surveillance is to gain knowledge of the patterns of disease, injury, and other health problems in a community so that we can work towards their prevention and control.

Vector - An animal that transmits disease. For example a mosquito is a vector for malaria.

Zoonosis - An infectious disease that is transmissible from animals to humans.

Incidence, Prevalence, and Duration

The incidence of an illness is the number of new instances of disease in a population over a given time period. It is expressed as "X cases/Y population/ Z time". The prevalence of an illness is number of affected persons in the population at any given point in time. It is expressed as "X cases/Y population". There are two major ways in which prevalence is measured: period prevalence and point prevalence. Think of point prevalence as a snapshot of the population and its rate of a certain disease at a point in time while period prevalence tracks the prevalence over a certain duration. Note the only difference is that incidence (I) includes time while prevalence (P) does not. Time (D) reflects the duration of the illness or condition. If two conditions have the same incidence in a population, the one with the longer duration will have the greater prevalence. Importantly, [math]P = I*D[/math], so with two of the variables, it is possible to solve for the third.

How to prove x caused y, or Causation

Hill's Criteria for Causation

Nine criteria must be met to establish a cause-and-effect relationship. This is commonly known as Hill's Criteria for Causation:

  1. Strength of Association - relationship is clear and risk estimate is high
  2. Consistency - observation of association must be repeatable in different populations at different times
  3. Specificity - a single cause produces a specific effect
  4. Alternative Explanations - consideration of multiple hypotheses before making conclusions about whether an association is causal or not
  5. Temporality - cause/exposure must precede the effect/outcome
  6. Dose-Response Relationship - an increasing amount of exposure increases the risk
  7. Biological Plausibility - the association agrees with currently accepted understanding of biological and pathological processes
  8. Experimental Evidence - the condition can be altered, either prevented or accelerated, by an appropriate experimental process
  9. Coherence - the association should be compatible with existing theory and knowledge, including knowledge of past cases and epidemiological studies

Hill's Criteria for Causation Explanations and History

Epidemiological Triad

Epidemiologists use two triads. The first is the foundation for descriptive epidemiology - person, place and time. The second is described in the next section.

Chain of Transmission Triad

This is another common triad, which is an altered form of the Chain of Infection described below. It is a companion to the Epidemiological Triad. It also has three components:

  1. An external agent
  2. A vector or fomite that transmits the disease
  3. A susceptible host for the disease

This is used to define the major points of a disease case.

Epidemiological Study Designs

Basic Studies

Ecological - comparisons of geographical locations

Cross Sectional - a survey,health questionnaire, "snapshot in time"

Case-Control - compare people with and without disease to find common exposures

Cohort - compare people with and without exposures to see what happens to each

Randomized Controlled Trial - human experiment

Quasi Experiments - research similarities with traditional experimental design or RCT, but lack element of random assignment to treatment/control

Advantages and Disadvantages to Study Designs

Study Designs Advantages Disadvantages
Trial Most Scientifically Sound

Best Measure of Exposure

Time Consuming

Unethical for Harmful Exposures

Most Expensive

Cohort Study Most Accurate Observational Study

Good Measure of Exposure

Correct Time Sequence

Good for Rare Exposures

Easy Risk Calculation

Time Consuming

Expensive

Bad for Rare Diseases

Possible Loss of Follow-up

Case-Control Study Can Study Rare Diseases

Relatively Less Expensive and Relatively Fast

Good for Rare Diseases

Good for Long Latency Periods

Possible Time-Order Confusion

Error in Recalling Exposures

Only 1 outcome

Cross-Sectional Study Fastest

Least Expensive

Good for more than 1 Outcome

Possible Time-Order Confusion

Least Confidence in Findings

2*2 Table

Table which has two columns and rows for people with or without exposure and with or without disease; shows amount of people with each characteristic.

Disease No Disease
Exposure a b
No Exposure c d

Using the 2*2 Table, we can calculate odds ratio and relative risk. These calculations allow comparisons between the control group and the group afflicted with the condition. One is the neutral value and means that there is no difference between the groups compared; when the value is greater than one it means that there has been some difference between the two groups, whether it was caused by bias, chance, or an actual relationship between the exposure and outcome is yet to be seen. The P-value tells us whether the results of the study can be used. The P-value is the measure of how confident you are that your findings are correct. You can only trust your findings to be correct if the P-value is less than .05.

Odds Ratio - used in case-control study, [math]a \cdot d \over b \cdot c[/math]

Relative Risk - used in cohort study, [math](a/(a+b)) \over (c/(c+d))[/math]

Attack Rate - the rate that a group experienced an outcome or illness equal to the number sick divided by the total in that group. (There should be a high attack rate in those exposed and a low attack rate in those unexposed.) For the exposed: a/(a+b) For the unexposed: c/(c+d)

Using Epi-Curves

An epi-curve is a histogram that shows the course of an outbreak by plotting the number of cases of a condition according to the time of onset. Epi-Curves fall into three classifications:

Point source epidemics occur when people are exposed to the same exposure over a limited, well define period of time. The shape of the curve commonly rises rapidly and contains a definite peak, followed by a gradual decline.


Nh mosquito epicurve week53.jpg

Continuous common source epidemics occur when the exposure to the source is prolonged over an extended period of time and may occur over more than one incubation period. The down slope of the curve may be very sharp if the common source is removed or gradual if the outbreak is allowed to exhaust itself.

Co mosquito epicurve week53.jpg


Propagated (progressive source) epidemics occur when a case of disease serves later as a source of infection for subsequent cases and those subsequent cases, in turn, serve as sources for later cases. The shape of this curve usually contains a series of successively larger peaks, reflective of the increasing number of cases caused by person-to-person contact, until the pool of those susceptible is exhausted or control measures are implemented. The distance between these peaks may be a rough indication of the incubation period of the disease. As the outbreak progresses, the peaks flatten out (think of the variance around a mean over multiple generations).

Ga mosquito epicurve week53.jpg

Disease and Disease Transmission

Chain of Infection

Agent leaves reservoir through portal of exit, and is conveyed by some mode of transmission, and enters the appropriate portal of entry to infect a susceptible host.

Chainofinfection.jpg

Agent - A microbial organism with the ability to cause disease.

Reservoir - A place where agents can thrive and reproduce.

Portal of Exit - A place of exit providing a way for an agent to leave the reservoir; the route a pathogen takes out of an infected host. Portals of exit tend to be fairly well defined. What serve as portals of exit are often not terribly surprising, at least, once something is known of how and where a pathogen replicates and enters new hosts. Respiratory infections tend to utilize the mouth and nose as portals of exit. Gastrointestinal diseases tend to exit in feces or saliva, depending on the site of replication. Sexually transmitted diseases tend to have portals of exit at the urethra or genital region. Blood-bourne diseases tend to exit via arthropods, needles, bleeding, or hyperdermic syringes. A more general portal of exit occurs when an infected animal is butchered or an infected person undergoes surgery. The three most common portals of exit are the skin, gastrointestinal tract, and respiratory tract.

Mode of Transmission - Method of transfer by which the organism moves or is carried from one place to another; the transfer of disease-causing microrganisms from one environment to another, particularly from an external environment to a susceptible individual. There are three general categories of transmission: contact, vehicle, and vector.

Portal of Entry - An opening allowing the microorganism to enter the host; the route a pathogen takes to enter a host. Just as with the portals of exit, many pathogens have preferred portals of entry. Many pathogens are not able to cause disease if their usual portal of entry is artificially bypassed. The most common portal of entry is the mucous membrane of the respiratory tract.

Susceptible Host - A person who cannot resist a microorganism invading the body, multiplying, and resulting in infection.

Chain of Infection: Diagram and Explanation

Characteristics of Agents

  1. Infectivity - capacity to cause infection in a susceptible host
  2. Pathogenicity - capacity to cause disease in a host
  3. Virulence - severity of disease that the agent causes to host

Modes of Disease Transmission

Contact Transmission - sub-categories include direct (person-to-person), indirect (fomite), or droplet.

  • Direct Contact - occurs through touching, kissing, dancing, etc . To prevent direct contact transmission, wear gloves and masks, etc.
  • Indirect Contact - occurs from a reservoir via inanimate objects called fomites. Fomites are basically almost anything an infected individual or reservoir can touch, upon which can be left a residue of contagious pathogen. Exceptions include the various inanimates ferred to as vehicles: food, air, and liquids. Typically, it is more difficult to avoid indirect contact transmission than it is to avoid direct contact transmission. A certain degree of organismal durability may be necessary to survive passage on a fomite. The best way to prevent indirect contact transmission is by avoiding contact with fomites, avoiding contact of hands with mucous membranes, especially when handling or potentially handling fomites, the use of barriers when handling fomites, and disinfecting fomites before handling.
  • Droplet Transmission - consequence of being coughed, sneezed, or spit on. To be considered droplet transmission, mucous droplets must still be traveling with the velocity imparted on it leaving the mouth. As a rule of thumb, this is up to one meter after exiting the mouth. Any further and this is considered airborne transmission. Given interaction within one meter of people is certainly more difficult to avoid droplet transmission than it is to avoid either direct or indirect transmission. Not surprisingly, it is especially respiratory diseases that are transmitted by droplets.

Vehicle Transmission - transmission via a medium such as food, air, and liquid, which are al routinely taken into the body, and thus serve as vehicles into the body.

  • Airborne Transmission - occurs via droplets (typically mucous droplets) where droplets are liquids that remain airborne whether as aerosols (very small droplets) or associated with dust particles. An example is within airliners where economizing measures reduces the turnover of cabin air and consequently increases air recycling. Organisms which can find their way into the air and remain viable thus have repeated opportunities to infect passengers. It requires greater organismal durability that droplet transmission simply because of the length of time the microorganism is exposed to the air, before infecting a new host, is longer. Increased durability is to the effects of desiccation, exposure to sunlight, etc. This is why breathing does not typically result in the acquisition of disease.
  • Food-bourne Transmission - any number of pathogens are found in food and not killed during processing may be transmitted via food product. Salmonella especially tends to be part of the normal flora of chickens and consequently associated with chicken products.
  • Water-bourne Transmission - fecal contaminated water. Generally, this is via sewage contaminated water supplies. It is especially gastrointestinal pathogens that are present in feces and therefore which rely on this type of transmission.

Vector Transmission - no entry.

  • Portals of Entry to the Nervous System - the brain is typically fairly resistant to bacterial infection. There are four common portals of entry to the nervous system. For an organism to take advantage of these routes, they must display increasingly specialized adaptations as read from first to last: parenteral, via the blood, via the lymphatic systems, and up the peripheral nerve axons. Ordering of blood and lymphatic system was arbitrary and not intended to imply that one serves as a significantly more difficult portal to take advantage of than the other.

The scheme used by the American Public Health Association and CDC (Principles of Epi, 3rd edition) is a bit different. The 2 main categories are DIRECT and INDIRECT TRANSMISSION (not Contact). DIRECT TRANSMISSION includes DIRECT CONTACT and DROPLET SPREAD. Examples of DIRECT CONTACT includes things like kissing, biting,and contact with soil containing infectious agents that penetrate the skin or enter wounds, DROPLET SPREAD is essentially an "in your face sneeze or cough". The idea is up close and immediate. INDIRECT TRANSMISSION includes AIRBORNE, VEHICLES and VECTORS. AIRBORNE transmission involves dust or droplet nuclei (the latter are essentially little (<5 micron) particles that remain suspended in the air. Time and distance are both greater than for droplet spread (distance >6-8 ft). VEHICLES include things like food, water, or fomites. Vehicles may passively carry pathogens or may promote growth or toxin production. VECTORS are arthropods (e.g. mosquitoes, flies, lice, ticks) that spread infectious agents. If the agent multiplies or undergoes a change in life stage (as with malaria) within the vector, the vector is said to be a BIOLOGIC VECTOR. If the agent is simply carried from one place to another (think of a fly landing on feces and then a bowl of potato salad) it is a MECHANICAL VECTOR. Generally vector-borne diseases are thought of only in the context of biologic vectors. Rabies from a dog bite would be direct contact. Note that while terms like food-borne, waterborne and zoonotic are not really included in this system - they are still valid.

Disease Prevention

Primary prevention - early intervention to avoid initial exposure to agent of disease preventing the process from starting

Secondary prevention - during the latent stage (when the disease has just begun), process of screening and instituting treatment may prevent progression to symptomatic disease

Tertiary prevention - during the symptomatic stage (when the patient shows symptoms), intervention may arrest, slow, or reverse the progression of disease

Quaternary prevention - set of health activities to mitigate or avoid consequences of unnecessary/excessive intervention of the health system. Social credit that legitimizes medical intervention may be damaged if doctors don't prevent unnecessary medical activity and its consequences.

For Food Borne Illnesses, the 2012 topic, prevention tactics include:

  • Cook meat, poultry, and eggs thoroughly.
  • Don't cross-contaminate one food with another.
  • Chill and refrigerate leftovers promptly.
  • Clean and wash all produce.
  • Report suspected food-borne illnesses to the local health department.

In Disease Detectives scenarios, event supervisors will often ask you to brainstorm disease prevention methods. Even if you know very little about the disease, you can brainstorm ideas from the chain of infection for the disease. For example, if the chain of infection describes that a disease is comes in contact with humans through sand at the beach and enters the body through any openings (mouth, nose, etc.), a prevention method could be putting up signs at beaches reminding the public to wash their hands before consuming any food.

Immunity

Active Immunity- occurs when the person is exposed to a live pathogen, develops the disease, and becomes immune as a result of the primary immune response


Passive Immunity-short-term immunization by the injection of antibodies, such as gamma globulin, that are not produced by the recipient's cells. Naturally acquired passive immunity occurs during pregnancy, in which certain antibodies are passed from the maternal into the fetal bloodstream.


Herd Immunity- protecting a whole community from disease by immunizing a critical mass of its populace. Vaccination protects more than just the vaccinated person. By breaking the chain of an infection’s transmission, vaccination can also protect people who haven’t been immunized. But to work, this protection requires that a certain percentage of people in a community be vaccinated.

Ten Steps to Investigating an Outbreak

Remember that this is a conceptual order, so steps have to be done simultaneously!

  1. Prepare for Field Work
  2. Establish the Existence of an Outbreak - Consider Severity, Potential for Spread, Public Concern, and Availability of Resources
  3. Verify the Diagnosis
  4. Define and Identify Cases - Case Definition and Line Listing
  5. Describe and Orient the Data in Terms of Person, Place, and Time - Descriptive Epidemiology
  6. Develop Hypotheses (Agent/Host/Environment Triad) = Chain of Transmission
  7. Evaluate Hypotheses - Analytical Studies (MUST Have a Control Group)
  8. Refine Hypotheses and Carry Out Additional Studies
  9. Implement Control and Prevention Measures (ASAP!)
  10. Communicate Findings

Ten Steps to Outbreak Investigation - Explanation of Steps

Validity of Study Results: Error and Bias

Random error is the result of fluctuations around a true value because of the sample population. Random error can result from poorly worded questions or misunderstanding of questions. As the term implies, it is random, so it is impossible to correct. However, random error can be reduced; some ways include increasing the sample size and making measurements more precise, either by using a more accurate measurement device or by taking more trials. While these techniques would increase random error, they can also be expensive. Better measurement devices will cost more, and more trials and a larger sample size will mean more work. Precision is a measure of random error that is inversely related, so increasing random error decreases precision.

Systematic error is any error other than random error. For example, systematic error can occur if the markings on your ruler are wider. This would make the numeric measurements less than what they actually are, making all data collected inaccurate. However, trends observed may still be preserved (shifting a line vertically preserves a line, as it is a rigid motion).

Selection bias occurs when selection of participants for a study is affected by an unknown variable that is associated with the exposure and outcome being measured.

An example of information bias is recall bias. When studied, some subjects may more easily recall specific habits related to a disease or condition than subjects not affected with the disease or condition.

Confounding bias is bias resulting from mixing effects of several factors. Unlike selection and information bias, confounding bias deals with causation and not variations in study results.

Statistics

For Division C, statistics is a crucial part of the event (even though the rules specify that it should be less than 10% of the test material). Understanding statistics can be the difference between a good disease detective and an excellent disease detective. However, many disease detectives only make an effort to know the formulas that compute certain statistical measures without delving into the deeper (highly interesting!) meaning of statistics.

Basics

This is a crash course on the fundamentals of statistics. This is not a replacement for reading (and understanding) the SOINC guide on statistics in this event or better yet, taking a class or reading a textbook on statistics.

Population

The population is the entire set under study. For example, the length of dung beetles. Because it is impossible to measure the length of every single dung beetle on planet earth, statistics use sampling. They take a subset of the dung beetles called a sample and use measurements from the sample to make inferences about the population as a whole. A population parameter is a characteristic of a population; for example, 80% of Science Olympiad teams are not gender-equal while a sample statistic is an attribute of a sample; for example, 91% of the 176 teams in Indiana Science Olympiad have never attended nationals.

Central Tendency

Mean - Average of all of the values. [math]A=\dfrac{a_{1}+a_{2}+a_{3}...}{n}[/math]

Median - The middle value that separates the data into two halves.

Mode - The most frequently occurring value in the data set.

Variability

Variability, scatter, and spread all have the same meaning: the extent to which a set of data is dispersed.

Range - The difference between the largest and smallest values in a set.

Interquartile Range (IQR) - The difference between the 75th (third quartile, or [math]Q_{3}[/math]) and 25th (first quartile or [math]Q_{1}[/math]) percentiles of a data set. To find [math]Q_{1}[/math] and [math]Q_{3}[/math], find the median of the data set, then divide the data set into two new sets, one with the data from the median up to the maximum and the other with the data from the median down to the minimum. The median values of the two new sets are [math]Q_{1}[/math] and [math]Q_{3}[/math]. The IQR is used with the median and is the most robust measure of variability, i.e. outliers do not affect the IQR as much. [math]IQR=Q_{3}-Q_{1}[/math]

Variance - Average of the squared differences from the mean. The variance gives a very vague sense of how far apart the values in a data set are compared to the mean. [math]Var(x)=\dfrac{\sum(x-\bar{x})^2}{n-1}[/math]

Standard Deviation (SD) - The square root of the variance. Quantifies the spread in a data set in the same units as the original data. A low SD indicates that the data tends to be close to the mean and a high SD indicates the data is far away from the mean. SD and variance are used with the mean. Unlike IQR, SD is not resistant to outliers.

[math]SD(x)=s=\sqrt{\dfrac{\sum(x-\bar{x})^2}{n-1}}[/math]

Normal Distribution - A set of data that is unimodal and symmetrical. Also known as a Gaussian distribution. Technically, the normal distribution is continuous but can be approximated with discrete values.

68-95-99.7 Rule - This rule states that 68% of the values in a normally distributed data fall within 1 SD of the mean, 95% fall within 2 SD of the mean, and 99.7% fall within 3 SD of the mean.

Example: Let a data set consist of integers 1 through 10, which sum to 55. The median and mean are 5.5.

To find the IQR, we can divide the data into two sets, one from 1 through 5 and the other from 6 through 10 inclusive. We find the median for each of these sets (3 and 8) and then subtract them. Thus, the IQR is 5.

To find the SD, we need to calculate the difference of each data value from the mean. Then we square the differences, add them, divide it by the sample size - 1 [math](n=10, n-1=9)[/math] and square root the result.

[math]SD(x)=\sqrt{\dfrac{(1-5.5)^2+(2-5.5)^2+...+(10-5.5)^2}{9}}=2.87[/math]

Therefore, 68% of the data would fall in between the interval [math](5.5-2.87, 5.5+2.87)=(2.63, 8.37)[/math] by the 68-95-99.7 rule.

Standard Error of the Mean (SEM) - The SEM measures the variability of the mean of different samples around the population mean.

[math]SE_{\bar{x}}=\dfrac{s}{\sqrt{n}}[/math]

Therefore, as a general rule, the SEM decreases as sample size increases.

Correlation

When two variables are revealed to have a relationship using statistical measures, the variables have a correlation. This correlation can be positive, negative, or zero. Without doing an experiment or trial, it is impossible to conclude that one variable causes another variable to act in some way. There is always the possibility of a third lurking or confounding variable that the original data does not account for. In this case, wording is extremely important. Correlation [math]\neq[/math] causation.

The correlation coefficient [math]r[/math] is a measure of the scatter around a linear relationship. It does NOT apply when a relationship is non-linear. Because the correlation coefficient is difficult to calculate by hand, exam writers will typically give the value and ask for the interpretation of the [math]r[/math] value. The correlation coefficient is always [math]-1<r<1[/math] and a value of 1 indicates a perfectly positively linear relationship. Conversely, a value of 0 indicates no relationship. Typically, [math]0.9<|r|<1[/math] is termed strong.

Standardization

The standard score or z score rescales the standard deviation of a normally distributed data set to 1 and mean to 0. Thus, we can model all normally distributed data using a single normal distribution with mean 0 and SD 1.

[math]z=\dfrac{x-\mu}{\sigma}=\dfrac{x-\bar{x}}{s}[/math]

The first formula is for a population while the second is for a sample. [math]\sigma[/math] represents the population standard deviation while [math]\mu[/math] represents the population mean.

Infant Mortality Rate

To solve the infant mortality rate, use this formula. [math]d[/math] is the amount of deaths in the year, and [math]b[/math] is the amount of births in the year.

[math]\frac{d}{b}\times1000[/math]

Confidence Intervals

Inference Tests

Significance

Error

Advanced

Sensitivity and Specificity

Resources

User Brs' Notes, from SSSS 2017.
Links to tons of great epidemiology resources
Disease Detectives Site
Cancer Epidemiology Instruction Manual
Kpalm1111's 2014 SSSS Notes
elg4's 2015 SSSS Notes
Wikipedia - Epidemiology