Regular Season Recap: What the Model Got Right, Wrong, and Why

Published: 4/14/2026

Author: Oleksandr Honchar

Reading Time: 7 min

760 predictions. One regular season. Here is an honest breakdown of every tier, every surprise, and what the numbers actually mean going into the playoffs.

The 2025/26 NBA regular season ended yesterday. For DataProven, that means it is time to do what we committed to doing from day one: put all the results on the table, explain what worked, and be direct about what did not.

This is not a highlight reel. It is a full audit.

The headline numbers

The platform launched on December 22 with the original XGBoost_TOP14 model, which ran for 108 games before we transitioned to the current XGBoost_TOP17_v1.4_Platt in mid-January. The Platt-scaled model — the one running today — covered 652 of the 760 total predictions.

Overall accuracy (Platt model)68.56%652 predictions

Combined season accuracy67.63%760 total predictions

High certainty accuracy85.92%213 picks

Average model confidence66.98%Platt model

The combined figure of 67.63% is slightly lower than the Platt model's standalone 68.56% because the TOP14 baseline period pulled it down. The current model is the better performer — and that is the one running for playoffs.

You can track all of this live on the calibration dashboard, which shows Brier score, ECE, ROC curves, and the full confidence breakdown updated after every game.

The tier breakdown is where the real story lives

Raw accuracy is a vanity metric. We wrote about this at length in the 3-months update. What matters is whether the model's stated confidence matches what actually happens on the court. That is calibration.

Here is the full picture for the Platt model across the regular season:

Confidence tier	Picks	Correct	Actual %	Model said	Gap
High (74%+)	213	183	85.92%	80.16%	-5.76 pp
Medium (66–74%)	105	73	69.52%	70.20%	+0.68 pp
Low (55–66%)	219	138	63.01%	60.25%	−2.77 pp
Very Low (<55%)	115	53	46.09%	52.47%	+6.38 pp

The model is underconfident in 2 out of 4 tiers (High and Low), overconfident in the Very Low tier, and shows near-perfect calibration in the Medium bracket.

High certainty: the platform's strongest signal

High certainty

At 85.92% on 213 picks, High certainty tier outperformed its own probability estimate by nearly 6 percentage points. When the model is confident, it delivers.

This tier is composed of games where the model outputs 74% or higher after Platt scaling. As we explained in the 3-months update, that scaling step is what transforms raw XGBoost scores into genuine probabilities. In the High tier, those probabilities turned out to be conservative — the actual win rate was higher than even the model expected.

For context: elite forecasters in tournament prediction markets typically land between 75–82% on their highest-confidence brackets. 85.92% over 213 live games, with real odds, is a meaningful result.

Medium certainty: textbook calibration

69.52% actual vs 70.20% model estimate — a gap of 0.68 percentage points across 105 games. This is as close to perfect calibration as you will find in sports prediction.

The Medium tier is the clearest proof that Platt scaling did its job. Before we applied it, probabilities in the 66–74% range were overloaded with both strong and weak signals mixed together. Now they are separated. When the model says 70%, it means 70% — and the season bore that out.

Low certainty: a genuine edge, modest but real

219 picks at 63.01% accuracy against a model estimate of 60.25% is a positive surprise. The model is underconfident here, too — but the gap is narrower than in the High tier, which makes sense. At the margins, probability estimation is harder.

This tier covers games where the model sees something but is appropriately cautious. The data says that caution was slightly overdone.

Very Low certainty: below coin-flip, and we said so

46.09% accuracy on 115 picks. This is the one tier that underperformed — not just the model estimate, but random chance.

This is the honest part. When the model's confidence falls below 55%, something is structurally off. The model says ~52% but outcomes resolve at 46%. Over 115 games that is not noise — it is a pattern.

We flagged this issue in the 3-months report and the diagnosis has not changed. This tier identifies games where the model's signals are genuinely weak. The right response to a Very Low pick is not to ignore it — it is to weight it accordingly, or to use it as a flag that the game is too close to call.

The fix for this tier is on the roadmap. It is not something we will paper over with a broader recalibration that masks the issue elsewhere.

Team-level patterns

The 2025–26 season confirms that predictability peaks at the league’s poles. While the "messy middle" remains a chaos engine of fluctuating effort, the model excels at tracking teams with fixed identities — whether they are elite contenders or fundamentally flawed rosters.

Easiest to predict (Final 2025-26 Regular Season)

Team	Accuracy	Why
BKN	82.2%	Offensive & Glass Floor: Finished last in PPG (105.9), FG% (44.3%), EFF (115.4), ORtg (108.7), REB (40.4) and DREB (29.8). This "one-and-done" offensive cycle made scoring droughts extremely predictable.
WAS	79.5%	Defensive Transparency: Finished 17–65 with a league-worst DRtg (122.7). High predictability in "Opponent Over" totals due to consistent lack of interior rim protection.
SAS	77.8%	Systemic Stability: 62-win season with a top-3 defense. Maintained a remarkably stable +8.30 total season margin. Their predictability was anchored by a consistent +8.63 home margin, making them a "set-and-forget" favorite.
DET	77.3%	Elite Rolling Form: #1 seed in the East (60-22). While elite, they were most predictable at home (+10.49 home margin) due to a disciplined defensive system. Overall, finished first in Opponent Turnovent Percentage (14.8).
UTA	77.3%	Defensive Volatility: Allowed a league-high 126.0 PPG. Outcomes shifted from competitive to blowouts based on roster "management" signals.
BOS	75.6%	Four Factors Discipline: 56-win season driven by the league's #1 defense (107.2 PPG allowed) and #2 Off Rtg (120.8). High eFG% (55.3) and low turnover percentage (11.2) made their output highly stable.

Conclusion: The Predictability of Extremes

San Antonio, Detroit and Utah highlight the model's versatility: DET and SAS are now elite contenders whose systemic discipline removes "luck" from the equation, while UTA is a tanking engine whose developmental rotations create readable defensive collapses.

The model handles both ends of the spectrum with high precision. Boston’s 75.6% accuracy further proves that elite efficiency is as mathematically sound as fundamental failure. Ultimately, it is the teams in the "messy middle" — lacking either the discipline of a contender or the transparent direction of a rebuilder — that cause the most trouble for predictive modeling.

Hardest to predict

Team	Accuracy	Why
CHA	53.5%	Youth Volatility: 7th youngest team (24.60). Scored more on the road (116.98) than home, creating high-variance outcomes that defied their age-based developmental baseline.
CHI	55.8%	Road Defensive Collapse: Tied for 7th youngest. High-variance "points-in-paint" allowed and erratic defensive rotations made their spreads impossible to pin down. While margins stayed somewhat relative, they conceded a massive 124.44 on the road compared to 118.59 at home.
MIA	55.8%	Tactical Form Swings: Extreme scoring variance; multiple 140+ point offensive outbursts followed by sub-100 point games without warning. High home ceiling (123.46 PPG) followed by unpredictable offensive stagnation on the road.
HOU	58.7%	Selective Veteran Effort: 3rd oldest team (27.39), predictability suffered from veteran rest-day disruptions and "clutch-time" performance swings. Maintained elite home defense (107.29 conceded) but bled 112.73 on the road, suggesting effort-based "coasting" that the model couldn't pin down.
MIN	59.5%	Identity Shift: Perfectly mirrored margins (+3.68 vs +3.02) masked a total system flip. They transformed from a low-scoring home team (114.78) to a high-octane road engine (121.22).
ATL	60.5%	Undisciplined Pace: 3rd youngest team (23.79). Like MIN, they saw a massive defensive spike on the road (118.10 conceded), making their spread stability highly unreliable.

Charlotte, Chicago, and Miami were the three teams the model struggled with most. All three occupied the volatile 9th–12th seed range in the East, characterized by significant within-season variance. Our analysis of 5,000 NBA games shows that rolling form differentials are the dominant signal—and these teams possessed the least consistent form curves in the league. Charlotte and Chicago, both among the league's youngest rosters, were prone to youthful energy swings, while Miami exhibited a "Heat Culture" variance where tactical adjustments led to dramatic road/home performance flips.

Houston’s difficulty was unique. Despite being the 3rd oldest team in the league, they oscillated between a top-tier contender and a lottery-adjacent team. The model reads form, but Houston’s unpredictable "Veteran Variance" — elite home defense (107.29) vs. road defensive bleeding (112.73) — created a noise profile typically seen in much younger teams.

How the season compares to backtest

Season Performance

Backtest vs Live

When we launched in January, our backtest showed 67.25% accuracy. After 652 live predictions on the current model, we are sitting at 68.56% — 1.31 points above backtest. That direction is unusual. Models almost always give ground moving from historical data to live conditions.

The original 16-day report (published here) showed accuracy at 63.8%, well below backtest. We attributed that to small-sample noise and the normal friction of live deployment. That diagnosis was correct. As the sample grew and Platt scaling was applied, the model converged — and then slightly exceeded backtest performance.

We are not declaring victory. 1.31 points over backtest is within statistical noise for a dataset this size. But the direction matters: the model is not degrading, and calibration has improved significantly since January.

What changes for playoffs

Playoffs are structurally different from the regular season, and the model knows it.

The feature set is trained on regular season dynamics: back-to-backs, rest days, rolling five- and ten-game form windows, season-aggregate Four Factors. Playoff basketball compresses rotations to seven or eight players, eliminates garbage time, and introduces series-level momentum that no per-game model fully captures.

Our playoff predictions will use the same model. The tier structure and calibration will hold — but treat each pick's confidence as a starting point, not a final answer. Games 3–7 of a series carry information that a per-game model cannot see in advance.

A few things to watch:

Home court advantage tightens. In the regular season, home teams won 56.44% of games in our prediction set. In playoff series, the seeded team (usually the home team) wins the series roughly 65–70% of the time — but game-level home advantage is closer to 55–58%. The model's existing signals should hold.

Short rotations help the model. Fewer players means fewer variables. Teams with stable seven-man rotations are more predictable than teams that rotate twelve guys and rest stars on back-to-backs.

Series context is everything. After a Game 1 blowout, the Game 2 dynamics shift entirely. The model will not know this unless you tell it. Use the picks as probabilistic baselines and layer in what you know about the series.

The honest summary

Season Summary

68.56% overall. 85.92% on High certainty. Near-perfect calibration in the Medium tier. One underperforming tier that we have been transparent about since March. That is the season in four lines.

The goal of DataProven was never to claim 80% accuracy and hide the methodology. It was to build something that is right about what it does not know as much as what it does. The Very Low tier delivering below-chance results is not a failure of transparency — it is what transparency looks like. The model told you it was uncertain. It was right to be uncertain.

For the playoffs, High and Medium certainty picks remain the core signal. Low certainty picks offer a modest edge. Very Low picks should be treated as a flag that the game is genuinely too close to model.

The full season ledger, per-tier breakdown, and calibration dashboard are all public at https://dataproven.bet/en/calibration. Nothing is hidden. That was the commitment from day one, and it holds going into the postseason.

Join the Journey

We're building the most transparent NBA prediction platform in the world. Follow along:

Preview Tier

See 1-2 predictions daily

Free

Core Tier

Access all predictions with confidence levels

€9.90/month

Insight Tier

Full methodology, advanced analytics, calibration data

€24.90/month

Every prediction is timestamped. Every result is tracked. Every performance metric is public.

Because in a world of 80% claims, 67.5% honesty is the competitive advantage.

Questions about methodology or the model's features? The methodology page covers the full feature set, training pipeline, and calibration approach. The calibration dashboard shows live Brier scores, ECE, and reliability diagrams updated after each game.