Regular Season Recap: What the Model Got Right, Wrong, and Why
760 predictions. One regular season. Here is an honest breakdown of every tier, every surprise, and what the numbers actually mean going into the playoffs.
The 2025/26 NBA regular season ended yesterday. For DataProven, that means it is time to do what we committed to doing from day one: put all the results on the table, explain what worked, and be direct about what did not.
This is not a highlight reel. It is a full audit.
The headline numbers
The platform launched on December 22 with the original XGBoost_TOP14 model, which ran for 108 games before we transitioned to the current XGBoost_TOP17_v1.4_Platt in mid-January. The Platt-scaled model — the one running today — covered 652 of the 760 total predictions.
The combined figure of 67.63% is slightly lower than the Platt model's standalone 68.56% because the TOP14 baseline period pulled it down. The current model is the better performer — and that is the one running for playoffs.
You can track all of this live on the calibration dashboard, which shows Brier score, ECE, ROC curves, and the full confidence breakdown updated after every game.
The tier breakdown is where the real story lives
Raw accuracy is a vanity metric. We wrote about this at length in the 3-months update. What matters is whether the model's stated confidence matches what actually happens on the court. That is calibration.
Here is the full picture for the Platt model across the regular season:
| Confidence tier | Picks | Correct | Actual % | Model said | Gap |
|---|---|---|---|---|---|
| High (74%+) | 213 | 183 | 85.92% | 80.16% | -5.76 pp |
| Medium (66–74%) | 105 | 73 | 69.52% | 70.20% | +0.68 pp |
| Low (55–66%) | 219 | 138 | 63.01% | 60.25% | −2.77 pp |
| Very Low (<55%) | 115 | 53 | 46.09% | 52.47% | +6.38 pp |
The model is underconfident in 2 out of 4 tiers (High and Low), overconfident in the Very Low tier, and shows near-perfect calibration in the Medium bracket.
High certainty: the platform's strongest signal
High certainty
At 85.92% on 213 picks, High certainty tier outperformed its own probability estimate by nearly 6 percentage points. When the model is confident, it delivers.
This tier is composed of games where the model outputs 74% or higher after Platt scaling. As we explained in the 3-months update, that scaling step is what transforms raw XGBoost scores into genuine probabilities. In the High tier, those probabilities turned out to be conservative — the actual win rate was higher than even the model expected.
For context: elite forecasters in tournament prediction markets typically land between 75–82% on their highest-confidence brackets. 85.92% over 213 live games, with real odds, is a meaningful result.
Medium certainty: textbook calibration
69.52% actual vs 70.20% model estimate — a gap of 0.68 percentage points across 105 games. This is as close to perfect calibration as you will find in sports prediction.
The Medium tier is the clearest proof that Platt scaling did its job. Before we applied it, probabilities in the 66–74% range were overloaded with both strong and weak signals mixed together. Now they are separated. When the model says 70%, it means 70% — and the season bore that out.
Low certainty: a genuine edge, modest but real
219 picks at 63.01% accuracy against a model estimate of 60.25% is a positive surprise. The model is underconfident here, too — but the gap is narrower than in the High tier, which makes sense. At the margins, probability estimation is harder.
This tier covers games where the model sees something but is appropriately cautious. The data says that caution was slightly overdone.
Very Low certainty: below coin-flip, and we said so
46.09% accuracy on 115 picks. This is the one tier that underperformed — not just the model estimate, but random chance.
This is the honest part. When the model's confidence falls below 55%, something is structurally off. The model says ~52% but outcomes resolve at 46%. Over 115 games that is not noise — it is a pattern.
We flagged this issue in the 3-months report and the diagnosis has not changed. This tier identifies games where the model's signals are genuinely weak. The right response to a Very Low pick is not to ignore it — it is to weight it accordingly, or to use it as a flag that the game is too close to call.
The fix for this tier is on the roadmap. It is not something we will paper over with a broader recalibration that masks the issue elsewhere.
Team-level patterns
The 2025–26 season confirms that predictability peaks at the league’s poles. While the "messy middle" remains a chaos engine of fluctuating effort, the model excels at tracking teams with fixed identities — whether they are elite contenders or fundamentally flawed rosters.
Easiest to predict (Final 2025-26 Regular Season)
| Team | Accuracy | Why |
|---|---|---|
| BKN | 82.2% | Offensive & Glass Floor: Finished last in PPG (105.9), FG% (44.3%), EFF (115.4), ORtg (108.7), REB (40.4) and DREB (29.8). This "one-and-done" offensive cycle made scoring droughts extremely predictable. |
| WAS | 79.5% | Defensive Transparency: Finished 17–65 with a league-worst DRtg (122.7). High predictability in "Opponent Over" totals due to consistent lack of interior rim protection. |
| SAS | 77.8% | Systemic Stability: 62-win season with a top-3 defense. Maintained a remarkably stable +8.30 total season margin. Their predictability was anchored by a consistent +8.63 home margin, making them a "set-and-forget" favorite. |
| DET | 77.3% | Elite Rolling Form: #1 seed in the East (60-22). While elite, they were most predictable at home (+10.49 home margin) due to a disciplined defensive system. Overall, finished first in Opponent Turnovent Percentage (14.8). |
| UTA | 77.3% | Defensive Volatility: Allowed a league-high 126.0 PPG. Outcomes shifted from competitive to blowouts based on roster "management" signals. |
| BOS | 75.6% | Four Factors Discipline: 56-win season driven by the league's #1 defense (107.2 PPG allowed) and #2 Off Rtg (120.8). High eFG% (55.3) and low turnover percentage (11.2) made their output highly stable. |
Conclusion: The Predictability of Extremes
San Antonio, Detroit and Utah highlight the model's versatility: DET and SAS are now elite contenders whose systemic discipline removes "luck" from the equation, while UTA is a tanking engine whose developmental rotations create readable defensive collapses.
The model handles both ends of the spectrum with high precision. Boston’s 75.6% accuracy further proves that elite efficiency is as mathematically sound as fundamental failure. Ultimately, it is the teams in the "messy middle" — lacking either the discipline of a contender or the transparent direction of a rebuilder — that cause the most trouble for predictive modeling.
Hardest to predict
| Team | Accuracy | Why |
|---|---|---|
| CHA | 53.5% | Youth Volatility: 7th youngest team (24.60). Scored more on the road (116.98) than home, creating high-variance outcomes that defied their age-based developmental baseline. |
| CHI | 55.8% | Road Defensive Collapse: Tied for 7th youngest. High-variance "points-in-paint" allowed and erratic defensive rotations made their spreads impossible to pin down. While margins stayed somewhat relative, they conceded a massive 124.44 on the road compared to 118.59 at home. |
| MIA | 55.8% | Tactical Form Swings: Extreme scoring variance; multiple 140+ point offensive outbursts followed by sub-100 point games without warning. High home ceiling (123.46 PPG) followed by unpredictable offensive stagnation on the road. |
| HOU | 58.7% | Selective Veteran Effort: 3rd oldest team (27.39), predictability suffered from veteran rest-day disruptions and "clutch-time" performance swings. Maintained elite home defense (107.29 conceded) but bled 112.73 on the road, suggesting effort-based "coasting" that the model couldn't pin down. |
| MIN | 59.5% | Identity Shift: Perfectly mirrored margins (+3.68 vs +3.02) masked a total system flip. They transformed from a low-scoring home team (114.78) to a high-octane road engine (121.22). |
| ATL | 60.5% | Undisciplined Pace: 3rd youngest team (23.79). Like MIN, they saw a massive defensive spike on the road (118.10 conceded), making their spread stability highly unreliable. |
Charlotte, Chicago, and Miami were the three teams the model struggled with most. All three occupied the volatile 9th–12th seed range in the East, characterized by significant within-season variance. Our analysis of 5,000 NBA games shows that rolling form differentials are the dominant signal—and these teams possessed the least consistent form curves in the league. Charlotte and Chicago, both among the league's youngest rosters, were prone to youthful energy swings, while Miami exhibited a "Heat Culture" variance where tactical adjustments led to dramatic road/home performance flips.
Houston’s difficulty was unique. Despite being the 3rd oldest team in the league, they oscillated between a top-tier contender and a lottery-adjacent team. The model reads form, but Houston’s unpredictable "Veteran Variance" — elite home defense (107.29) vs. road defensive bleeding (112.73) — created a noise profile typically seen in much younger teams.
How the season compares to backtest
Season Performance
When we launched in January, our backtest showed 67.25% accuracy. After 652 live predictions on the current model, we are sitting at 68.56% — 1.31 points above backtest. That direction is unusual. Models almost always give ground moving from historical data to live conditions.
The original 16-day report (published here) showed accuracy at 63.8%, well below backtest. We attributed that to small-sample noise and the normal friction of live deployment. That diagnosis was correct. As the sample grew and Platt scaling was applied, the model converged — and then slightly exceeded backtest performance.
We are not declaring victory. 1.31 points over backtest is within statistical noise for a dataset this size. But the direction matters: the model is not degrading, and calibration has improved significantly since January.
What changes for playoffs
Playoffs are structurally different from the regular season, and the model knows it.
The feature set is trained on regular season dynamics: back-to-backs, rest days, rolling five- and ten-game form windows, season-aggregate Four Factors. Playoff basketball compresses rotations to seven or eight players, eliminates garbage time, and introduces series-level momentum that no per-game model fully captures.
Our playoff predictions will use the same model. The tier structure and calibration will hold — but treat each pick's confidence as a starting point, not a final answer. Games 3–7 of a series carry information that a per-game model cannot see in advance.
A few things to watch:
Home court advantage tightens. In the regular season, home teams won 56.44% of games in our prediction set. In playoff series, the seeded team (usually the home team) wins the series roughly 65–70% of the time — but game-level home advantage is closer to 55–58%. The model's existing signals should hold.
Short rotations help the model. Fewer players means fewer variables. Teams with stable seven-man rotations are more predictable than teams that rotate twelve guys and rest stars on back-to-backs.
Series context is everything. After a Game 1 blowout, the Game 2 dynamics shift entirely. The model will not know this unless you tell it. Use the picks as probabilistic baselines and layer in what you know about the series.
The honest summary
Season Summary
68.56% overall. 85.92% on High certainty. Near-perfect calibration in the Medium tier. One underperforming tier that we have been transparent about since March. That is the season in four lines.
The goal of DataProven was never to claim 80% accuracy and hide the methodology. It was to build something that is right about what it does not know as much as what it does. The Very Low tier delivering below-chance results is not a failure of transparency — it is what transparency looks like. The model told you it was uncertain. It was right to be uncertain.
For the playoffs, High and Medium certainty picks remain the core signal. Low certainty picks offer a modest edge. Very Low picks should be treated as a flag that the game is genuinely too close to model.
The full season ledger, per-tier breakdown, and calibration dashboard are all public at https://dataproven.bet/en/calibration. Nothing is hidden. That was the commitment from day one, and it holds going into the postseason.
Join the Journey
Preview Tier
See 1-2 predictions daily
Free
Core Tier
Access all predictions with confidence levels
€9.90/month
Insight Tier
Full methodology, advanced analytics, calibration data
€24.90/month
Every prediction is timestamped. Every result is tracked. Every performance metric is public.
Because in a world of 80% claims, 67.5% honesty is the competitive advantage.
Questions about methodology or the model's features? The methodology page covers the full feature set, training pipeline, and calibration approach. The calibration dashboard shows live Brier scores, ECE, and reliability diagrams updated after each game.