Causal inference in observational rail freight data

Last updated 30 Apr 2026

How Gauge Intelligence separates association from causation in published corridor reports, and the three-layer framework — DAG prerequisite, Bradford Hill gate, and quasi-experimental design — that governs every causal claim.

Rail freight performance data is purely observational. There are no experiments, no randomisation, no controlled interventions. Every figure in the public archive is drawn from a system that ran the way it ran, and the analyst’s job is to recover what that system was doing, not to have changed it.

Every “because”, “caused by”, “drove”, and “reflecting” sentence in a published report is therefore a causal claim built on top of a correlation. The leap from one to the other is the hardest move in observational analysis, and the easiest to get wrong.

Gauge Intelligence applies three layers of protection to that leap. First, a directed acyclic graph (DAG) prerequisite fixes what is being claimed. Second, a Bradford Hill scoring gate tests how strong the evidence is. Third, for intervention claims that warrant it, a quasi-experimental design approximates the controlled comparison the world denied us.

The internal working document that operationalises these checks is docs/editorial/causation_checklist.md. This page sets out the public-facing rationale.

The causal graph prerequisite

Before any criterion is scored, the analyst sketches the causal graph: name the exposure, name the outcome, and name every variable that is a plausible confounder, mediator, or collider on the path between them.

A confounder is a common cause of both the exposure and the outcome. It creates a spurious association between them and must be conditioned on. Corridor mix is a confounder for operator performance: the corridors an operator runs are a common cause of both the operator (which traffic they carry) and the punctuality outcome.

A mediator sits on the causal path from exposure to outcome. Conditioning on a mediator hides part of the effect being measured.

A collider is a common effect of both the exposure and the outcome. Conditioning on a collider creates a spurious association where none existed. Conditioning on “delay greater than three minutes” when asking whether possessions cause delay risks turning a possession indicator into a collider. Selection on the outcome contaminates the comparison.

The DAG sketch forces these distinctions to be made explicit before any adjustment is performed. The treatment is from Hernán & Robins, Causal Inference: What If (2020), Ch 1–6.

Bradford Hill criteria

The Bradford Hill criteria are a set of viewpoints, not a checklist of necessary conditions, for evaluating whether an observed association is likely to be causal. Spiegelhalter sets them out clearly in The Art of Statistics (2019), Ch 4.

Seven viewpoints apply at Gauge Intelligence: effect size relative to plausible confounding; temporal and spatial proximity of cause and effect; dose-response (does more exposure produce more outcome?); and plausible mechanism in known rail operations. The remaining three are consistency with prior knowledge from operators and Network Rail; replication across periods or corridors; and analogy with comparable known causal patterns.

The working bar is five of seven satisfied for a sentence to publish as an explicit causal claim. Below five, the language is downgraded to descriptive (“coincided with”, “alongside”, “during the same period as”) and the implied mechanism is not asserted. The full checklist with rail-freight translations of each criterion lives at docs/editorial/causation_checklist.md.

Quasi-experimental designs

Bradford Hill is necessary but not sufficient for the strongest causal claims. When an intervention is claimed (a timetable change, a possession regime, an operating practice shift), formal design provides stronger evidence than association alone.

Difference-in-differences compares a treated corridor to a control corridor over the same time window. The change in the treated corridor is differenced against the change in the control corridor; shared confounds (national weather, fleet-wide TRUST changes, sector-wide demand shifts) cancel out. The design rests on the parallel trends assumption: in the absence of treatment, the two corridors would have moved together.

Regression discontinuity exploits a sharp threshold: a timetable period boundary, a regulatory cutoff, a possession start date. Observations just above and just below the threshold form natural comparison groups that differ in treatment status but are otherwise similar. The design identifies the treatment effect locally without randomisation.

Both designs require multiple periods of accumulated data. The Gauge Intelligence archive is in its accumulation phase; difference-in- differences and regression discontinuity results will be published as the evidence base matures. The reference treatment is Angrist & Pischke, Mostly Harmless Econometrics (2009), Ch 5.

The causal audit trail for individual corridor claims (the DAG sketch, the Bradford Hill scorecard, the adjustment set chosen, and the design employed) is available in licensed analytical content. Licensed access provides reproducible documentation of each causal assertion, not just the published conclusion.

What this means for published language

The three layers map directly onto the language permitted in a published report.

“Caused by”, “drove”, and “produced” require Bradford Hill 5/7 plus a traceable operational mechanism. These are the strongest claims in the archive and are reserved for cases where the evidence supports them.

“Consistent with”, “coincided with”, and “alongside” require the DAG sketch and a plausible mechanism, but not the full Bradford Hill threshold. They describe association without asserting causation.

“Estimate based on quasi-experimental design” carries the full difference-in-differences or regression discontinuity apparatus, including disclosure of the parallel trends or local-continuity assumption being relied on.

No causal sentence is published without a Stage 3 Bradford Hill check in the period-end editorial workflow (docs/editorial/period_end_workflow.md). The check is recorded; the audit trail is preserved; the language in the published report reflects the evidence available at the time of publication.

The Great Eastern Main Line Period 13 2025-26 period report is the standing live example of the descriptive register at work. Period 13 2025-26 (1–28 March 2026) is the inaugural published period — TRUST coverage begins on 23 February 2026, so no prior comparator yet exists.

The six-percentage-point GBRf–Freightliner gap on near-equal corridor volume is described. The question of whether it reflects rostering, terminal dwell, or path allocation is named. The report explicitly defers to “future period-level analysis” rather than asserting a cause, and no sentence ascribes the gap to a specific mechanism.

That restraint is the visible product of the Bradford Hill gate: the evidence does not yet clear five of seven viewpoints, so the language stays at “consistent with” and the implied mechanism is not asserted.

Version history

Version 1.0 — April 2026. Initial publication. Three-layer causal inference framework: DAG prerequisite, Bradford Hill gate at 5/7, and quasi-experimental designs for intervention claims. Applies to every published causal sentence from April 2026 onward.

Sources

Angrist, J. & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press. Ch 5.

Hernán, M. & Robins, J. (2020). Causal Inference: What If. Chapman & Hall/CRC. Ch 1–6.

Spiegelhalter, D. (2019). The Art of Statistics. Pelican. Ch 4.