Grey Swan / Archimedes  ·  Work in Progress  ·  April 2026

What this model does and does not claim

The honest question about any foresight model is whether it is making the world clearer or just making uncertainty look more manageable than it is. This note addresses that question directly. The methodology deserves scrutiny, and the people reading the intelligence brief deserve to know what the numbers do and do not mean.

The short answer is: not delusional, genuinely limited, worth continuing.

What the model genuinely does well

The governance architecture is the strongest element. Most scenario planning is built in workshops, refreshed irregularly, and justified by expert narrative with no audit trail. Grey Swan makes an explicit mechanism for every probability move, requires persistence across multiple readings before any change is registered, reverses prior moves when signals fade, and logs every change with its source and rationale. That is a real methodological improvement over standard practice in the field.

The public-data-only constraint is a genuine discipline. The evidence tiers separate slow structural context from operational signals in a way that prevents the model from reacting to noise while remaining sensitive to genuine shifts. The six-monthly cadence enforced by the persistence requirement is correct: long enough to test whether a signal is real, short enough to remain useful for strategic decision-making.

The cross-cutting driver architecture is sound. Treating climate, geopolitical shocks, and economic conditions as forces that operate across multiple levers — rather than assigning each a standalone lever — avoids double-counting while keeping their effects visible in the data. The Economic Stress Flag, added in v11.9, was the right correction to a model originally designed in a more stable environment.

The flag system is an honest acknowledgment that the data environment itself can deteriorate. When surveillance coverage degrades or geopolitical stress suppresses institutional capacity, positive signals in exposed levers require stronger corroboration before being credited. This asymmetric logic — harder to register progress under stress, not harder to register deterioration — reflects how reform actually works in practice.

Known limitations

What the numbers actually mean

The probabilities in Grey Swan encode three things: direction (this outcome is currently more likely than that one), trend (things have moved in this direction since the last run), and distance from the boundaries of the outcome space (some outcomes are credibly near-zero under current conditions).

They do not encode frequentist probability in the sense that "if we ran this scenario 100 times, this outcome would occur 22 times." Treating them that way is a misreading of the model.

The persistence and corroboration requirements mean the model is deliberately slow to move. This is a feature rather than a bug for a six-monthly strategic instrument, but it means the numbers should not be compared directly with probabilistic forecasting systems that update continuously on high-frequency data.

The most useful way to read the outputs is comparatively and directionally: across the two scenarios, across the four outcomes, and across the three time horizons. The 2030 figures are more constrained than the 2050 figures because there is less time for compounding effects to operate in either direction. The gap between the DTR and LIR probabilities at any horizon is a reasonable indicator of how much the choice of leadership behaviour matters at that time point. The movement between runs is the most reliable signal the model produces.

Development trajectory

The model is at v11.9. The Spring 2026 run is only the second run in the model's history. Two data points do not make a trend. The value of the methodology will become clearer over multiple runs as the reversion logic, persistence requirements, and flag system are tested against actual evidence trajectories. The most important test of the model is not whether the current probability estimates are correct — they cannot be verified against future events in real time — but whether the process of updating them is disciplined, transparent, and resistant to motivated reasoning. That test is ongoing.

Item Status
Governance architecture, persistence rules, reversion logic Complete   Documented in WP11 and the v11.9 transfer prompt.
Three cross-cutting drivers (climate, geopolitical, economic) Complete   Introduced in v11.9 with associated flag system.
Public-data-only constraint and Tier-1/Tier-2 evidence architecture Complete   Operational from v11.6 onward.
Full open-source publication of threshold and movement-limit parameters Open   Currently proprietary. Target: WP12.
Calibration against historical base rates Open   Requires more runs and a formal calibration methodology.
Adversarial scenario testing Open   Not yet attempted. On the roadmap for v12.
Non-OECD evidence base expansion Open   Partial progress via global proxies; systematic coverage work outstanding.
WP12: full technical companion paper Open   Will document operating parameters in sufficient detail for independent replication.
Sector and country-level applications Open   Architecture supports this; no sector or country runs completed yet.

The bottom line

Grey Swan is a significant improvement over standard qualitative scenario planning. It is a genuine methodological contribution in a field where audit trails are thin, update cadences are irregular, and the link between evidence and revised judgments is opaque. The governance architecture is serious and the public-data constraint is real.

It is not yet a fully calibrated probabilistic forecasting system. The probability numbers should be read as disciplined ordinal judgments, not as frequentist likelihoods. The global scope claim outruns the evidence base in the current version. The Wealth-Diffusion Gate has a measurement lag that cannot be resolved with currently available data. The model has not been adversarially tested.

These are honest limitations, not disqualifying ones. The model is improving with each version and each run. The gap between what it claims and what it can deliver is closing. This note exists to make that gap visible rather than to pretend it is not there.

← Return to the Spring 2026 intelligence brief