A Graphical Catalogue of Threats to Validity

29/12/2024

URL: A Graphical Catalogue of Threats to Validity

Introduction

DAGs are directed acyclic graphs. They are widely used within social sciences to show the logic underneath inferences, and to express assumptions in research. They allow falacies to be spotted and so on, and have an associated formalism that makes for useful analysis. Nifty!

Once the correct DAG has been drawn, and the data correctly encoded, the analyis allows you to work out how causual effects can be identified, and which confounders need to be controlled.

In the 70s, Campbell and Stanley described a number of ways sociological studies failed to deliver causal results in terms of DAGs, and is very influential within sociology.

There are 4 types of validity within this model: - internal: extent to which estimated association in the study sample corresponds to a cuasal effect between exposure and outcome - statistical conclusion: appropriate use of statistics to assess relationships between study variables - construct: extent to which measured variables capture what the investigator intends to asses with those measures - external: extent to which study results can be generalised to other units/treatments/settings of study and so on.

Threats to Internal Validity

  1. Ambiguous temporal precedence: not knowing what order effects/exposures happen in result in confusion about which exposure caused what
  2. Selection bias: differences in respondent characteristics may create an effect that is confused with a causal effect
  3. History: events occurring concurrently with exposure may have effects that appear causal like that exposure
  4. Maturation: naturally ocurring changes over time may be confused with a causal effect
  5. Regression to mean: “units” (people??) selected for extreme scores are less likely to be extreme on other scores, and likely to have less extreme subsequent scores
  6. Testing: testing can effect scores
  7. instrumentation: way things measured may change over time
  8. Attrition: loss of respondents can produce artificial effects if correlated with other study variables
  9. Additive and interactive effects

When I think about this, it’s almost all in the examples of clinical trial research. Thanks Ben Goldacre!

Y’know, this is probably somewhat applicable to physics experiments too. It’s a really cool way to do reasoning about experiments and causal inference.

The examples given are all really good.

Threats to Statistical Conclusion Validity

  1. Low statistical power: study may incorrectly conclude there is no relationship between exposure and outcome
  2. Violated assumptions of statistical tests: maths comes out wrong
  3. Fishing and error rate problems: multiple tests can give rise to spurious sigfinicances, unless controlled for number of tests performed
  4. Inaccurate effect size estimation
  5. Extraneous variance in experimental setting: big errors may mask effects
  6. Heterogeneity of units: if outcomes vary widely anyway, that will mask small effects
  7. Unrelability of measures: measurement errors weaken relationships between variables
  8. Restriction of range:
  9. Treatement/exposure may be variably applied. Or measurements may only be partially completed

Some of these refer to null-hypothesis significance testing, which is increasingly widely regarded as a bad idea nowadays :)

Threats to Construct Validity

What you’re measuring does not relate to the theory/concept/idea you are trying to test. These can often also be thought of as confounders or statistical errors (because if you included it in your model it wouldn’t be construct invalidity anymore it would be internal invalidity).

  1. Inadequeate explication of constructs: construct poorly explained.
  2. Construct confounding: exposures usualyl involve more than one construct, and not thinking about all of them is bad
  3. Confounding constructs with level of constructs:e.g., effect estimates are extrapolated beyond range of observed data
  4. Mono-operation bias: any single measuremnt/inervention will underrepresent the construct, and measure irrelevant related constructs (related to confounders)
  5. Mono-method bias: when all operations use the same method, e.g., self-report, that method becomes part of the thing being studied.
  6. Treatment sensitive factorial structure: strucutre of measures may change as a result of treatment. Change may be hidden if same scoring is always used.
  7. Reactive self-erport changes: participants may self-report differently depending on how they’ve been assigned in the construct
  8. Compensatory equalization: if treatement provides something good, compensation is provided to those not recieving treatment. This must be included as part of the treatment construct description
  9. Compensatory rivalry: participants not being treated may feel motivated to do better than those being treated (IE, “watch me, i don’t need coaching to do better than them on the exam!”)
  10. Resentful demoralization: participants not recieving a desireable treatment may feel bad and respond more negatively.
  11. Reactivity to the experimental situation: these reactions become part of the construct being tested
  12. Experimenter expectencies: ditto to the above. Point behind, e.g., blinding.
  13. Novelty and disruption events: effect of modification to a routine routine must be taken into account
  14. treatment diffusion: similar treatments may end up elsewhere and participants are unknowingly exposed to them.

Threats to External Validity

  1. Interaction of causal relationship with units
  2. Interaction of causal relationship with setting (this and the previous: relationship might not hold if study were conducted in a different setting/with different units)
  3. context-dependent mediation:
  4. interaction of causal relationship with outcomes: effects might not hold if another outcome observation were used
  5. interaction of the causal relationship over treatment variation: changes in treatment may mean effects do not remain, or when tested treatment is combined with another, or only part of it is used.

My takeaways: