How WetherVane Works

WetherVane discovers communities of voters who move together politically — then uses that structure to propagate new information across geography. A poll in one state updates predictions in every state that shares those communities.

The Key Insight#

Most forecasting models treat counties as independent: average the polls, adjust for house effects, output a number. WetherVane starts from a different question: what is the underlying structure that makes places move together?

Thousands of places share hidden behavioral patterns. A rural evangelical county in Georgia moves with rural evangelical counties in Iowa — not because anyone coordinates, but because the same forces act on similar communities. Discovering that structure — not just reading the surface results — is what makes prediction defensible.

The model is built for readers who want to understand electoral dynamics, not just consume a top-line forecast. If you read FiveThirtyEight for the methodology write-ups, this is for you.

How It Works#

Step 1: Measure Shifts

The model begins by computing how every county shifted politically across each pair of elections from 2008 to 2024. These shift vectors capture direction and magnitude — did this county swing toward Democrats or Republicans, and by how much?

Step 2: Discover Types (KMeans J=100)

KMeans clustering (J=100) groups counties with similar shift patterns into electoral types. Presidential shifts are weighted because they carry cross-state signal. Governor and Senate shifts are state-centered first — subtracting the statewide swing — so clustering captures within-state variation, not just red-state/blue-state geography.

The result is 100 fine-grained electoral types and 5 super-types (broad behavioral families), which form the colors of the stained glass map.

Step 3: Map Soft Membership

No county belongs to just one type. Each county has partial membership in multiple types, computed via temperature-scaled inverse distance in shift space (temperature T=10). A suburban Atlanta county might be 40% "College-Educated Suburban" and 30% "Black Belt & Diverse."

Soft membership reduces calibration error by ~37% compared to hard assignment. The map color reflects dominant type; predictions use the full membership vector.

Step 4: Estimate Covariance (Ledoit-Wolf)

Types that share electoral behavior tend to co-move. The model estimates a 100×100 covariance matrix capturing how much each pair of types correlates, using observed electoral correlation with Ledoit-Wolf regularization (validation r = 0.936).

This covariance structure encodes which types are behaviorally coupled — and therefore how information should flow between them when a new poll arrives.

Step 5: Propagate Polls

When a new poll arrives, the model uses a Bayesian Gaussian (Kalman filter) update — exact and closed-form, no simulation needed. Multiple polls stack as independent observations. Because types cross state lines, a Florida Senate poll shifts Georgia predictions too.

The final ensemble uses Ridge + Histogram Gradient Boosting with 43 pruned features from 8 independent data sources, achieving a leave-one-out r of 0.731.

Model Performance#

All metrics are on the 2024 presidential election. LOO (leave-one-out) cross-validation excludes each county from its own type mean before predicting it — this is the honest generalization metric that cannot be inflated by self-prediction.

LOO r (Ensemble)
0.731
Ridge + HGB, 43 pruned features
LOO r (Ridge)
0.533
Type scores + county mean
Holdout r
0.698
Standard hold-out validation
Coherence
0.783
Within-type political agreement
RMSE
0.073
Root mean squared error
Covariance Val r
0.936
Ledoit-Wolf regularized
Counties
3,154
All 50 states + DC
Types
100
KMeans discovered

The standard holdout r (0.698) is inflated by ~0.22 because counties help predict their own type means. LOO r (0.731) is the correct metric for evaluating generalization. Both are reported for transparency.

Historical Accuracy#

Cross-election validation tests whether the type structure discovered from historical shifts generalizes to new cycles. Across four presidential election pairs, mean LOO r = 0.476 ± 0.10.

Election CycleLOO r
2012 → 20160.52
2016 → 20200.55
2020 → 20240.38
2008 → 20120.43

Not all cycles are equally predictable. 2020→2024 (r=0.38) was the hardest — the Harris-Trump dynamic produced unusual cross-type movement, particularly among Hispanic communities. The 2012→2016 transition (r=0.52) was most predictable: Trump's initial surge followed existing type fault lines closely.

2022 Backtest Validation#

The cross-election LOO metrics above measure whether the type structure generalizes to new presidential cycles. But the harder test is this: does the model generalize to a completely unseen election type — a midterm it has never trained on?

To test this, we retrained the model from scratch after removing all 2022 data — no 2022 governor shifts, no 2022 Senate shifts, none of it. We then used the 2020 presidential Dem share as a county-level prior, and predicted 2022 outcomes purely from the structure the model had learned through 2020.

Senate races — 1,880 counties
0.9705
county r — presidential prior
Type-mean baseliner = 0.6789
RMSE0.0401
Governor races — 1,900 counties
0.8856
county r — presidential prior
Type-mean baseliner = 0.6359
RMSE0.0825

The Senate result — county r = 0.97 across 1,880 counties — is the strongest evidence that the partisan geography structure the model learns is real and durable. Correlations this high mean the model gets the relative ordering of counties nearly perfect: which ones lean Democratic, which lean Republican, and by roughly how much. It does not mean point estimates are perfect — the RMSE of ±4pp on Senate races reflects real uncertainty about national environment and candidate effects.

Governor races are harder (r = 0.89) because governor outcomes depend more heavily on candidate-specific factors and state-level dynamics that a national type model cannot fully capture. The type structure still explains 79% of the variance in county-level governor outcomes — but the remaining 21% is genuine candidate effect.

Seven competitive Senate races, never seen during training

The model predicted these outcomes using only the type structure learned from 2008–2020. Errors are predicted minus actual Democratic share (state-level, vote-weighted average across counties).

StateRacePredictedActualError
AZKelly vs Masters50.0%52.5%-2.5pp
GAWarnock vs Walker49.9%50.5%-0.6pp
NCBeasley vs Budd49.3%48.4%+0.9pp
NVCortez Masto vs Laxalt50.9%50.4%+0.5pp
OHRyan vs Vance45.6%46.9%-1.3pp
PAFetterman vs Oz50.1%52.5%-2.4pp
WIBarnes vs Johnson50.3%49.5%+0.8pp

Five of seven races land within ±2.5pp. The type-mean baseline (r = 0.68) predicts these same races with errors up to ±17pp — showing that the presidential-prior enrichment is doing substantial work beyond just knowing which communities lean which way.

Extended multi-election backtest

Beyond the 2022 holdout, the model has been validated across 11 elections spanning presidential (2008–2020), Senate (2014–2022), and governor (2018, 2022) cycles using year-adaptive Ridge priors. The combined backtest achieves r = 0.939 with direction accuracy of 88–100% across all elections. The model shows an expected temporal gradient — stronger on recent elections where the political geography more closely matches the training era — but maintains predictive power even for elections 16 years in the past.

What Makes This Different#

  • Structure from behavior, not demographics. Types are discovered from how places shift electorally. Demographics describe the types after discovery — they do not define them. This avoids baking in assumptions about which demographic groups drive politics.
  • Cross-state information sharing. Because types cross state lines, a poll in one state informs predictions in another. Most models treat states as independent. WetherVane treats the country as one connected landscape.
  • Full uncertainty quantification. Every prediction comes with 90% credible intervals. Intervals widen where the model has less data and tighten where type signals are strong.
  • Transparent and interpretable. Every prediction traces back to specific types, their shift patterns, and the polls that influenced them. Not a black box — inspect it on the map.
  • Free data only. No proprietary datasets, no paid subscriptions. Every source listed below is publicly available, making the model fully reproducible.

Data Sources#

WetherVane uses exclusively free, public data. No proprietary datasets or paid subscriptions.

Election returnsMIT Election Data & Science Lab (MEDSL)
DemographicsUS Census Bureau — Decennial 2000/2010/2020 + ACS 5-year
Religious congregationsARDA — Religious Congregations & Membership Study (RCMS 2020)
Industry compositionBLS Quarterly Census of Employment and Wages (QCEW)
Health behaviorsCounty Health Rankings (Robert Wood Johnson Foundation)
Migration flowsIRS Statistics of Income (SOI) — county-to-county migration
Social connectivityFacebook Social Connectedness Index (county-pair network)
Broadband accessFCC / ACS — internet subscription at county level
Polling dataFiveThirtyEight archives + Silver Bulletin pollster ratings
Governor returnsAlgara & Amlani (Harvard Dataverse) — 2002-2022 governor

Current Status#

WetherVane is in active development, targeting the 2026 midterm elections. The model currently covers all 50 states and DC, tracking 33 Senate races.

The poll scraper runs weekly, ingesting new polls and updating race forecasts automatically. Individual race forecasts are available on the forecast page.

3,154
Counties
100
Electoral types
5
Super-types
33
Races tracked

Planned improvements: BEA regional economic data, FEC donor density features, richer poll ingestion with crosstab disaggregation — crosstabs tell us which types were sampled, so a poll oversampling college-educated voters should pull harder on types with high college-educated membership.

Credits#

Built by Hayden Haines.

Methodology inspired by The Economist's 2020 presidential model (Heidemanns, Gelman & Morris). Type-covariance architecture adapted to shift-based community discovery.

Election return data from MIT MEDSL and Algara & Amlani (Harvard Dataverse). All other data sources are listed above.