Brier score is the standard accuracy and calibration metric used in prediction markets and forecasting. PropsBot’s MLB model posts a Brier of 0.1903 against the Vegas closing-line baseline of 0.1947 across 101,881 logged props. That gap means our model is better calibrated than the consensus market. The NHL model is also under the closing-line baseline. NBA and NFL aren’t there yet, and we say so.
What is a Brier score?
A Brier score is the mean squared error of your predicted probabilities versus actual outcomes. You take every prediction you made, square the difference between the probability you assigned and what happened (1 for hit, 0 for miss), and average them.
The formula is straightforward:
Brier = (1/N) × Σ (predicted_probability − actual_outcome)²
The range is 0 to 1. Lower is better. A Brier of 0 means perfect predictions. A Brier of 0.25 is what you get from random guessing on a 50/50 market. Anything above 0.25 means your model is doing worse than a coin flip.
For sports betting, the benchmark isn’t random guessing. The benchmark is the Vegas closing line, the price the market settles on right before the game starts. That’s the most efficient sports betting price, set by the consensus of every sharp bettor and every bookmaker adjustment up to that point. If your model produces a lower Brier than the closing line on a large sample, you have real edge. If it doesn’t, you don’t.
Why Brier matters more than win rate
Win rate alone hides too much. A model can hit 60% on its picks and still be a money-loser if it’s overconfident, or a money-maker that’s wildly underconfident on the margin. Win rate doesn’t tell you which.
Brier penalizes both overconfidence and underconfidence. If you say a player has a 90% chance to go over and they hit 60% of the time, your Brier explodes. If you say 55% and they hit 80%, you also get punished. You left money on the table by being too cautious.
This is why FiveThirtyEight grades election forecasts with Brier. Why weather services use it for precipitation calls. Why any serious quant team running a probability model reports it. A win rate is a marketing number. A Brier score is a math number.
For player props, Brier is the right metric because the market is already a probability problem. Every prop line implies a probability, every model spits out a probability, and the only honest way to grade the model is to compare those probabilities to the result. More on how the closing line factors into model grading on our closing-line value glossary entry.
PropsBot’s Brier numbers, sport by sport
We publish the numbers we have. We don’t publish numbers we don’t have.
| Sport | PropsBot Brier | Vegas Closing-Line Brier | Edge vs Vegas |
|---|---|---|---|
| MLB | 0.1903 | 0.1947 | -0.0044 (better) |
| NHL | Below baseline | Closing-line reference | Beating Vegas |
| NBA | Under analysis | Closing-line reference | Not yet beating Vegas |
| NFL | Under analysis | Closing-line reference | Not yet beating Vegas |
The MLB number is the cleanest, longest-running, and largest sample. NHL beats the closing-line baseline on the current sample size. NBA and NFL aren’t there yet, and pretending otherwise would defeat the whole point of publishing this metric. The exact NHL Brier and the rolling sport-by-sport figures live on /performance-methodology/, updated as the season runs.
How we measure Brier on player props
Every pick our model produces is logged with a timestamp at publication, the line at the time of pick, the closing line, the predicted probability, and the result. Brier is calculated over that ledger.
Sample size for the MLB number is 101,881 props. That’s the High ROI Signal cohort we publish on /mlb-player-props/ and across the public ledger. Every one of those picks has a stake, a timestamp, an opening line, a closing line, and a graded outcome. The Brier is the rolling calibration score across that whole sample.
We don’t cherry-pick windows. The 101,881 number is everything. If we had a bad week, it’s in there. If we had a hot streak, that’s in there too. That’s what makes the Brier credible.
The other thing that matters: the predicted probability is locked at the time the pick is generated, before the game starts. We don’t grade against a number we tweaked after the fact. The pick goes out, the game plays out, the result is graded, and the squared error gets folded into the rolling Brier. Every step is logged with a timestamp you can audit on the dashboard.
What a Brier of 0.1903 vs 0.1947 means in dollar terms
A Brier 0.0044 below Vegas sounds tiny. Sustained over 100,000+ picks, it’s enormous.
That same cohort that produces the 0.1903 Brier produces the 31.7% ROI we publish on the High ROI Signal. The math lines up because it has to. A model with better calibration than the closing line will, by definition, find prices the market mispriced often enough to grind out edge. Volume turns 0.0044 of calibration edge into +60,000 units of tracked profit and a 32% High ROI Hit Rate on the picks the model flagged hardest.
Put differently: the Brier gap is the math. The 31.7% ROI is the receipt for the math. Either one without the other is half a story.
The 82.6% win rate on 136,953 high-confidence picks is the same story from a different angle. That cohort filters for picks where the model’s edge translates into a high hit rate rather than a fat ROI per pick. Both cohorts share the underlying calibration. You can’t post a 0.1903 Brier and a 32% High ROI Hit Rate on the High ROI Signal by accident.
What Vegas’s Brier of 0.1947 represents
The 0.1947 number isn’t arbitrary. It’s the calibration of the consensus closing market itself, the line every sharp bettor and every bookmaker adjustment converges to right before lock. Most quant work treats that price as the most efficient available signal in sports betting.
When a model beats it on a large enough sample, you’re seeing structural edge. Not a hot week. Not a small-sample fluke. A real disagreement with the consensus market that the consensus market loses, repeatedly, in a way you can audit. That’s the whole reason the Brier comparison matters.
Why most “AI prop tools” don’t publish a Brier score
Brier is the metric that exposes overconfident models. It’s also the metric that’s hardest to fake.
A tool can claim a 90% win rate and still post a Brier of 0.22, which would mean the model is worse than Vegas despite the marketing copy. Most prop tools don’t publish their Brier because once you do, the math is right there. You can’t massage it. You can’t rotate cohorts. You can’t pick a hot week.
That’s why we publish ours. It’s the strongest claim a probability model can make about itself, and it forces honesty on every other claim. The 31.7% ROI gets challenged constantly online. Then people see the Brier and realize the calibration math forces the ROI to exist.
How to verify our Brier numbers
Three places. All public.
The full performance breakdown lives at /performance-methodology/. That’s the public ledger: every cohort, every sport, the methodology, the sample sizes, the rolling Brier by sport.
Our /about/ page covers the founder and team, the modeling background, and the auditability principle. If you care about who’s behind the numbers, start there.
The live data dashboard sits at dashboard.propsbot.ai, where today’s picks are visible with the same Confidence and Edge values paid users see. Run them against the ledger for a week before you decide whether the math holds up. Related concepts like positive-EV props explain how the calibration edge converts into bettable plays slate by slate.
FAQ
What is a Brier score in sports betting? A Brier score is the mean squared error of predicted probabilities versus actual outcomes. It grades how well-calibrated a probability model is. Range is 0 to 1, lower is better, and the Vegas closing-line Brier is the gold-standard baseline.
How is a Brier score calculated? Take every prediction your model made, square the difference between the probability you assigned and the actual outcome (1 for hit, 0 for miss), and average across all predictions. Formula: Brier = (1/N) × Σ (predicted − actual)².
What’s a good Brier score for sports betting? Anything below the Vegas closing-line Brier on the same market and sample. For MLB player props, that closing-line baseline is roughly 0.1947 in our data. PropsBot posts 0.1903 across 101,881 props, which is real edge.
Do other AI prop tools publish a Brier score? Most don’t. It’s the metric that exposes overconfident models, so tools that claim a high win rate without publishing Brier are usually hiding overconfidence. PropsBot publishes it as a transparency check.
Can I see PropsBot’s raw Brier data? Yes. The public ledger at /performance-methodology/ has the cohort breakdown and methodology. The live dashboard at dashboard.propsbot.ai shows current picks. Every pick that produced the 0.1903 number is timestamped and visible.
Is a Brier of 0.1903 vs 0.1947 a meaningful edge? On 101,881 props, yes. The gap is small per pick and enormous in aggregate. That same sample produces the 31.7% ROI we publish on the High ROI Signal. Calibration edge that consistent over six-figure volume is what generates +60,000 units of tracked profit.
Does Brier score apply to all sports? Yes, anywhere you can express predictions as probabilities. PropsBot’s MLB model beats the Vegas closing-line Brier. NHL also beats the baseline. NBA and NFL aren’t there yet on current samples, and we publish that honestly rather than spinning it.
Where can I track PropsBot’s Brier over time? The rolling Brier by sport lives on /performance-methodology/, updated as each season runs. Specific sport pages like /mlb-player-props/ and /nhl-player-props/ carry the relevant cohort numbers.
Bottom line
Brier score is how forecasters separate models that work from models that just sound confident. PropsBot publishes a Brier of 0.1903 against the Vegas closing-line baseline of 0.1947 in MLB across 101,881 props. NHL also beats the baseline. The math is auditable, the cohort is huge, and every pick is timestamped.
If a prop tool claims accuracy and won’t publish a Brier, you already know the answer.
Try it free at propsbot.ai. Run it against the public ledger before you pay anything.