Recovery time: how long does a corridor take to stabilise after disruption?

Last updated 30 Apr 2026

How Gauge Intelligence measures corridor recovery after incidents using survival analysis — including how censoring is handled when recovery is still in progress at the end of the observation window.

An incident’s blast radius (services affected, delay-minutes, S8 exposure) is one dimension of disruption. The second is recovery time: how many days until the corridor’s punctuality returns to its pre-incident baseline.

Recovery time is a time-to-event question, and time-to-event data requires survival analysis. The key complication is censoring: some corridors are still recovering at the end of the observation window. Ignoring censored observations understates recovery times; treating them as recovered overstates performance.

Gauge Intelligence uses the Kaplan-Meier non-parametric estimator, the standard method for recovery time distributions under right-censoring.

Defining recovery

The recovery event is defined as the day on which DailyResidual (the STL residual for the corridor) returns to within 1σ of the pre-incident corridor mean for three consecutive days. The three-day requirement prevents single-day reversions from being recorded as recovery.

The pre-incident mean is the rolling 28-day mean of DailyResidual in the 28 days prior to incident onset. The standard deviation σ is computed over the same window.

Where recovery has not occurred by the end of the current observation window, the observation is right-censored at that time. The corridor contributes survival information up to the censoring date but is not recorded as recovered.

Incidents are partitioned into magnitude classes: under 500 delay-minutes, 500–2,000 minutes, and over 2,000 minutes. The class determines which survival curve applies (Klein & Moeschberger, Survival Analysis (2003), Ch 1–3).

Kaplan-Meier estimation

The Kaplan-Meier curve estimates S(t) = P(recovery time > t) as the product of conditional survival probabilities at each observed recovery event time. The estimator handles censoring correctly without distributional assumptions.

The 90% confidence interval is computed via Greenwood’s formula. The published statistic is the median recovery time (the time at which S(t) = 0.5), with a 90% interval, reported separately for each magnitude class (Klein & Moeschberger (2003), Ch 4).

A median is preferred over a mean. With right-censoring, the mean is undefined unless the longest observed recovery has occurred; the median is estimable as soon as half the cohort has recovered.

Why a non-parametric estimator

A parametric survival model (Weibull, log-normal, exponential) yields narrower intervals at small samples, but at the cost of assuming a recovery-time distribution Gauge Intelligence has no basis to assert.

Freight corridor recoveries combine slow asset-fault clearances, fast Control-led re-paths, and intermediate timetable rebuilds; the hazard function is unlikely to match any single parametric family. Under misspecification a Weibull median can sit several days from the empirical median while reporting a deceptively tight interval.

Kaplan-Meier’s main assumption is non-informative censoring: corridors still recovering at the cut-off are not systematically different from those that have recovered. This holds when the cut-off is the calendar end of the observation window. Where a known intervention has truncated observation, the corridor is excluded and the exclusion disclosed in licensed analytical content.

Worked example

Consider a Felixstowe-bound corridor incident: a signalling failure accumulates 1,400 delay-minutes and falls into the 500–2,000 magnitude class. The pre-incident 28-day rolling mean of DailyResidual is +0.4 pp with σ of 1.6 pp; the recovery band is therefore −1.2 to +2.0 pp.

In the days following onset the residual prints −5.1, −3.8, −2.1, −1.4, −0.7, +0.2, +0.6, −0.3 pp on consecutive operational days. The first day inside the band is day 5 (−0.7 pp); days 6, 7 and 8 also sit inside.

The three-consecutive-day requirement is satisfied on day 7, recorded as the recovery date. Recovery time is 7 days. Day 5 alone would have been a single-day reversion.

Edge cases

Cascading incidents. A second qualifying incident before recovery completes is treated as a new onset; the first is right-censored at the second onset date. Extending the first through the second disturbance would conflate two recovery processes.

Cancellations. Cancelled services are excluded from DailyResidual on the day they would have run. A corridor with cancellation share above the trailing 28-day 95th percentile is flagged low-volume; if more than three post-onset days are low-volume the observation is right-censored at the first clean-volume day.

Terminated journeys. Journeys terminating short of their booked destination contribute to the residual on the segments they ran, weighted by segment length. The TRUST train_id recycle window (~30 days) is irrelevant at corridor level: DailyResidual uses the journey’s schedule association, not train_id.

Recovery time versus adjacent measures

Three nearby measures answer different questions:

On-time arrival is point-in-time: did this service arrive within tolerance. It does not describe how long a corridor took to return to baseline. A corridor can record several days of poor on-time figures and still recover quickly if the residual reverts.

Average lateness aggregates minutes per service across a window, mixing recovered and unrecovered days. Two corridors with identical average lateness can have different median recovery times.

Delay propagation is within-day: how a primary delay spreads to subsequent services. Recovery time runs across days and asks whether the next day’s schedule prints to the pre-onset baseline.

Data accumulation requirement

Reliable survival curves require at least 20 recovery events per magnitude class; below this threshold the Greenwood interval is too wide for editorial framing. Until each magnitude class clears 20 events, the public archive shows the count of observed events and the date a curve is expected to be publishable.

Provisional curves below the threshold are available in licensed analytical content with a disclosure note: preliminary, wide interval, not for operational decisions.

No live Kaplan-Meier curve is yet rendered in the public archive: no magnitude class has cleared 20 events. Until then, this page is the standing methodology and the live application is held back rather than surfaced under-evidenced. The same discipline applies on the linked data-window methodology, which sets out which event classes TRUST cannot yet observe.

Version history

Version 1.0 — April 2026. Initial publication. Kaplan-Meier estimator with Greenwood 90% interval; median recovery time per magnitude class. Applies to all corridor recovery statistics from the date of first published curve.

Where this is implemented

Recovery::SurvivalCurve (at app/models/recovery/survival_curve.rb) carries the Kaplan-Meier estimator described above. Given a delay-event observation set it builds the at-risk and event tables (#build_observations), partitions by magnitude stratum (#compute_strata), and handles right-censoring for journeys that completed before recovery was achieved (#censored). Every published recovery curve and median-recovery-time figure resolves through this class.

The Greenwood variance for the 90% interval is computed inline within the survival curve object; no separate interval-helper class is invoked. Magnitude-class boundaries follow the strata documented above and are configurable per corridor via the constructor.

Sources

Klein, J. P. & Moeschberger, M. L. (2003). Survival Analysis: Techniques for Censored and Truncated Data. 2nd ed. Springer. Ch 1–4.