Forecasting methodology — seasonal-naive baseline

Last updated 29 Apr 2026

How Gauge Intelligence produces period-ahead punctuality forecasts: seasonal-naive baseline, empirical prediction intervals, and the conditions under which a forecast is published or withheld.

Every forward-looking figure published by Gauge Intelligence carries a 90% prediction interval and a disclosed baseline method. No point forecast is published alone.

Baseline method: seasonal-naive

The baseline forecast for any rail period is the seasonal-naive estimate (SNAIVE): the observed value from the same period in the previous financial year. For example, a Period 2 2026–27 forecast uses Period 2 2025–26 as the point estimate.

SNAIVE is deliberately simple. Its value is not that it is optimal; it is that it is a well-understood floor. Any more complex model that cannot beat SNAIVE on a rolling held-out test is wrong by definition (Hyndman & Athanasopoulos, Forecasting: Principles and Practice, 3rd ed., Ch 5 §5.2). Published forecasts will move to stronger methods when those methods pass that test.

Prediction intervals

The 90% prediction interval is computed from the within-period daily variance of the training period. The margin is:

margin = t(n−1, 0.95) × σ × √(1 + 1/n)

where n is the number of trading days in the training period and σ is the sample standard deviation of daily punctuality.

The t-multiplier scales with sample size: 2.0 for n < 10, 1.75 for n < 20, and 1.645 (the z-value) for n ≥ 20. This ensures the interval widens honestly when the training window is short.

Intervals are clamped to [0%, 100%]. Punctuality is a bounded proportion.

When a forecast is published

A forecast is published only when a complete prior-year same-period training window exists. In the first year of operation for any corridor or operator, no forecast is published. The absence of a forecast is itself disclosed in the report.

This constraint is deliberate. Publishing a forecast with no prior-year data would require assumptions that cannot be validated. Stating “no comparable baseline exists yet” is more honest than manufacturing one.

Inaugural year disclosure

Where a prior-year training window exists but covers only a single period (inaugural year), the data window note discloses this explicitly. The interval will be wide; readers should treat it accordingly.

Extreme-value tail framing

Period-ahead punctuality forecasts report the expected outcome. For Schedule 8 liability planning, the tail of the delay distribution matters more: a single catastrophic incident can exceed an entire period’s expected S8 exposure. Expected-value forecasts do not characterise this risk.

Gauge Intelligence supplements period-ahead forecasts with extreme-value tail framing for incident delay-minutes. A Generalised Pareto Distribution (GPD) is fitted to the upper tail of the delay-minute distribution above a threshold chosen using the mean excess plot.

The GPD is theoretically justified by the Pickands–Balkema–de Haan theorem: for any distribution with a finite or slowly-varying upper tail, exceedances above a high threshold converge to a GPD regardless of the underlying distribution.

Published statistics from the tail model include the 1-in-20-period return level (the expected worst single incident over five years) and the 1-in-50-period return level, each with a 90% profile-likelihood interval. These figures are published alongside the per-period S8 summary as complementary risk characterisations: the period summary describes typical exposure; the return levels describe the plausible catastrophic case that dominates long-run liability.

Source: Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer. Ch 1–4.

Forecast interval calibration

Publishing a 90% prediction interval is a probabilistic commitment: the actual outcome should fall within the stated interval approximately 90% of the time. Interval derivation guarantees this in theory; empirical verification is required to confirm it in practice.

After each period closes, Gauge Intelligence checks whether the prior-period forecast interval contained the actual outcome. Empirical coverage (the proportion of closed periods where the actual fell within the stated interval) is accumulated and published alongside the current forecast. When empirical coverage diverges more than 10 percentage points from nominal coverage over a rolling 8-period window, the interval method is reviewed and recalibrated before the next publication.

Gneiting & Raftery (2007) decompose probabilistic forecast quality into calibration (coverage) and sharpness (interval width). Calibration is the primary criterion: an interval that is wide enough to always contain the actual is perfectly calibrated but uninformative. The target is calibrated intervals that are as narrow as the data permit, verified empirically.

The Great Eastern Main Line Period 13 2025-26 period report is the standing worked example of this pairing in published form. The headline 81% A2F figure ships with its 90% Wilson interval 80%–83% across 2,434 traversals.

The report discloses that no period-ahead forecast is published for the corridor because Period 13 2025-26 (1–28 March 2026) is the inaugural published period and no prior-year baseline yet exists. Inaugural-year disclosure is applied as written above, not paraphrased.

Source: Gneiting, T. & Raftery, A. E. (2007). “Strictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association, 102(477), 359–378.

Version history

Version 1.1 — April 2026. Added extreme-value tail framing (EVT/GPD) and forecast interval calibration sections.

Version 1.0 — April 2026. Initial publication. SNAIVE baseline with empirical t-interval. Applies to all period-ahead punctuality forecasts from Period 2 2026–27 onward.

Where this is implemented

The seasonal-naïve forecast itself is generated by an internal pipeline (commercial-licence tier; not exposed in the public archive). What the public archive does publish is the calibration verification: empirical coverage of the 90% prediction interval against the realised next-period outcome, computed retrospectively when each period closes.

ForecastCalibrationJob (at app/jobs/forecast_calibration_job.rb) is the scheduled job that builds the calibration report. Its #build_report method computes the realised coverage and #notify_appsignal raises an alert when empirical coverage diverges from nominal by more than 10pp (see feedback_calibration_verification.md for the operating discipline). Every “forecast calibration: X% empirical vs 90% nominal” disclosure in a published report resolves through this job.

The forecast-generation entry point itself is not named publicly until the commercial-licence tier ships its own methodology page. The calibration-verification path is, by design, the only forecasting code the public archive cites — the discipline is “publish what we measure, withhold what we sell”.

Source

Hyndman & Athanasopoulos, Forecasting: Principles and Practice (3rd ed), Ch 5 §5.2 (seasonal-naive method) and §5.5 (prediction intervals).