Correlation is not causation

Correlation

“Correlation doesn’t imply causation.” It’s true. But it’s also lazy.

When we encounter a correlation, our first instinct shouldn’t be to dismiss it. We should ask, “What’s really going on here? What is the real causal relationship?” There are three alternative explanations of correlation:

  1. Reverse causation
  2. Confounding
  3. Selection

Reverse Causation

Reverse causation is a logical error where someone mistakenly switches the cause and effect in a situation. They incorrectly assume that B causes A, when in reality, A causes B.

Definition

Reverse causation occurs when:

  1. Two things (A and B) are observed to happen together or be related.
  2. Someone concludes that B causes A.
  3. In reality, A actually causes B, or there’s another explanation for their relationship.

Examples

1. Exercise and Health

  • Observation: People who exercise regularly are less likely to develop heart diseases.
  • Incorrect conclusion: Exercise can significantly reverse existing heart conditions.
  • Reality: Individuals with healthier hearts are more likely to engage in regular exercise.

2. Ice Cream Sales and Drowning Incidents

  • Observation: Higher ice cream sales are associated with more drowning incidents.
  • Incorrect conclusion: Eating ice cream leads to an increased risk of drowning.
  • Reality: Both ice cream sales and drowning increase during the warmer months of the year; the underlying cause of both phenomena is the hot weather, not ice cream consumption causing drowning.

3. Job Experience and Salary

  • Observation: Employees with higher salaries have many years of experience.
  • Incorrect conclusion: Higher salaries cause employees to gain more experience.
  • Reality: More experienced employees typically earn higher salaries due to their accrued skills and qualifications over time.

Third-Cause Fallacy (confounding)

The third-cause fallacy is a logical error where someone incorrectly assumes a causal relationship between two correlated variables, when in reality both are caused by a third factor.

Definition

The third-cause fallacy occurs when:

  1. Two things (A and B) are observed to be correlated.
  2. Someone concludes that A causes B (or B causes A).
  3. In reality, both A and B are caused by a third factor C.

Examples

1. Coffee Shops and Property Values

  • Observation: Neighborhoods with more coffee shops tend to have higher property values.
  • Incorrect conclusion: Opening coffee shops increases property values.
  • Reality: Both are likely influenced by the neighborhood’s affluence and urbanization.

2. Sushi Consumption and Life Expectancy

  • Observation: Countries with higher sushi consumption tend to have longer life expectancies.
  • Incorrect conclusion: Eating sushi leads to longer life.
  • Reality: Both are influenced by overall wealth and access to healthcare in developed nations.

3. Social Media Usage and Depression

  • Observation: Increased social media usage correlates with higher rates of depression.
  • Incorrect conclusion: Social media directly causes depression.
  • Reality: Both might be influenced by factors like social isolation or life stressors.

4. Organic Food Sales and Autism Rates

  • Observation: As organic food sales have increased, so have autism diagnosis rates.
  • Incorrect conclusion: Organic food consumption causes autism.
  • Reality: Both trends are likely due to increased health awareness and improved diagnostic criteria.

Selection

Selection refers to a bias that occurs when the sample chosen for a study is not representative of the population intended to be analyzed. This results in conclusions that may not accurately reflect the true nature of the relationship between variables across the broader population.

Definition

Selection occurs when:

  1. A specific group is chosen or self-selects into a study.
  2. Conclusions drawn from this group are generalized to a larger population.
  3. The chosen group has distinct characteristics that bias the results.

Examples

1. Clinical Trials and Medication

  • Observation: A medication shows high effectiveness in a clinical trial.
  • Potential Selection Issue: The trial participants are predominantly young, healthy adults.
  • Reality: The medication might not be as effective for older adults or those with pre-existing conditions, who were underrepresented in the study.

2. School Performance and Technology

  • Observation: Students at schools with advanced technology perform better academically.
  • Potential Selection Issue: Schools with advanced technology are often in wealthier areas with more resources.
  • Reality: The higher performance could be influenced by socioeconomic factors, not just technology availability.

3. Employee Productivity and Remote Work

  • Observation: Employees who work remotely appear to be more productive.
  • Potential Selection Issue: Employees who choose to work remotely might already have environments conducive to productivity or self-select based on their ability to work independently.
  • Reality: The apparent increased productivity may not solely be due to the remote work setup but rather the characteristics of those who choose or are able to work in such an arrangement.

Selection bias vs third-cause bias

The main difference between the two lies in their origins and effects:

Selection bias affects the external validity of the study by impacting how well the results can be generalized to the broader population.

Confounding bias affects the internal validity by introducing a third variable that misleads the direct relationship between the studied variables.