G’s Exploration or Is the 2020 Formula Actually Better?

Greetings and happy Saturday. There are no football games today, but college football is still a hot topic. Today, I’d like to analyze the new formula for 2020. Late in the season, really during the beginning of the bowl games, I recognized an anomaly. I ran some what if scenarios and the results, quite frankly, make no sense. Specifically, while Clemson was a good team, it had them remaining in the top spot even if they lost the semifinal by 50 and Ohio State lost the championship by 50. At the same time, 4-loss Auburn, a good team with a seriously tough schedule (see the next blog for the top 25 in different categories lists), showed to be in the top 4. Something was clearly off. I wish I could say the GCR formula is easy to decipher. Let me be clear, I REALLY wish the formula was easier to decipher and debug. I finally found a formula that was created when I added the ability for teams to play twice (a complete rewrite) that caused the issue. The issue was, I’ll try to be succinct, that wins counted more than once. That’s not exactly it, but there was double counting going on. When I corrected the formula, things made more sense. The other change I made for next year is on a piece that doesn’t have a great impact, but could make a cumulative difference. Within the performance score is an adjustment, up or down, based on the score differential of the game. It’s premise is that the first point is the most important (in a 20-19 game, the team with 20 won – ergo most important). Each point after that is progressively less important. Once a team wins by 3 touchdowns, each additional point scored has almost no impact at all on the rating. But I realized as I was trying to decipher the formula, that the GCR didn’t recognize that beating an FCS team or a 2-10 team by 30 was not the same thing as beating an Oregon or Michigan or even a Louisville or Texas Tech by 30. The whole concept of point differential must have a diminishing returns factor in a 12-game schedule or a blow-out has undue weight (if I did this for major league baseball, the 162 games would wash out the occasional 12-2 win). That logic dictates that there needs to be an additional damper in running up the score on a bad team – effectively stopping the reward for beating up a weaker opponent. I made the formulaic change and the performance scores were impacted by around 1%, but I feel better because I believe them to be more accurate. By the way, the opposite applies to a weak team getting pummeled by the stronger. The point differential loss is lower in that situation than it was before.

…..

These changes impact both of the major outputs from the GCR: overall ranking and predictions. I talked about the overall rankings already – they make more empirical sense now. A top 6 of LSU, Ohio State, Penn State, Florida, Clemson, Georgia with Auburn at 12 is somewhat logical. It certainly makes more sense that the top 6 of Clemson, Georgia, LSU, Alabama, Florida, and Auburn. I think we can say, based on the 2019 season, the new formula passes the ranking test. For the prediction test, I looked at all 1552 games between Division I teams (I did not count FCS vs Division II or NAIA because I don’t predict those). I separated them by deciles (50.0-59.9, 60.0-69.9, 70.0-79.9, 80.0-89.9, 90.0+) and analyzed a couple of different ways. Let’s look at the 2019 formula first. By decile, I projected a win/loss. This is pretty straightforward. For the 50s decile, I assume an average win rate of 55%, 60s: 65% and so on. Therefore, the overall expected win rate is heavily influenced by the percentage of games in each decile. The more games in the lower deciles, the lower the overall expectation. Let’s look at last year using the 2019 formula.

DecileProjected RecordActual Record Pct Win
Difference
Count% of Total
50s 314-256 335-2350.588 21 57037%
60s 303-163 362-1040.777 59 46630%
70s 224-75 267-320.893 43 29919%
80s 118-21 129-100.928 11 139 9%
90s 74-4 75-30.962 1 78 5%
       
Total 1033-519 1168-3840.753 1351552100%

…..

Let’s start on the right hand side. Fully 37% of all games were predicted in the might-as-well-be-a-coin-toss 50s decile. 67% were less than 70% to win. I’ll be the first to say it: that’s really wimpy. I predict, with all of the volatility of college football, that 2/3 of the time, it’s almost too close to call. Seriously?!? That’s over a thousand games and they have an overall expected correctness rate of less than 60%. Geez, that’s just pathetic. More on that in a minute. Read across and there’s the expected record and the actual record (granted in hind-sight). In each decile, the winning percentage is above expectation. The difference in games won, at 135, is certainly acceptable, all things considered. Being right 75.3% of the time is…ok, I guess (remember, this is in hindsight so I would expect the number to be higher than the 66.8% live). That number is pretty interesting because it is similar to the projected percentage or 66.6%. Now, let’s take a look at the same games, with the same prediction formula, but using the 2020 formulaic changes mentioned above.

DecileProjected RecordActual Record Pct Win
Difference
Count% of Total
50s 239-186 271-1640.623 32 43528%
60s 220-119 269-700.794 49 33922%
70s 213-71 253-310.891 40 28418%
80s 181-32 199-140.934 18 21314%
90s 267-14 275-60.979 8 28118%
       
Total 1121-431 1267-2850.816 135 1552100%

…..

Again, starting at the right, we see a much more even distribution among the deciles. In this view, 50% of the games are close (less than 70% to win) compared to 67% before. And 32% are in major upset territory (80%+ to win) compared to just 14% before. Because of this we see an increase in overall expected wins from 1033 to 1121 (+88) which increased the bogie from 66.6% to 72.2%. Now the 75% from above doesn’t look as favorable. We see, as before, that each decile shows a better than expected result. And we see the overall actual win total jumping from 1168 to 1267 (+99), not only bringing the winning percentage to just under 82%, but a bigger increase than the expected result. Here’s a chart that compares the two sets of results.

Decile2019 RecordPct2020 RecordPctBasis
Points
50s 335-2350.588 271-1640.623 350
60s 362-1040.777 269-700.794 170
70s 267-320.893 253-310.891 (20)
80s 129-100.928 199-140.934 60
90s 75-30.962 275-60.979 170
     
Total 1168-3840.753 1267-2850.816 630
Exp % Win 66.6%  81.6%  560

…..

Because the number of games is different in the deciles, before and after change, we must compare using basis points (a basis point is .01% so 100 basis points is a full percentage point). In this we see, with the exception of the 70s, each decile showed pretty significant improvement (not sure the 80s at just 60 basis points is significant). Overall we see a 630 basis jump compared to the expected 530 improvement just on better distribution of games.

…..

I think this shows that the new formula passes the prediction test. Next year, I’m going to test it live, of course, but also against Vegas (it would be much too difficult to complete that analysis at this point). Next year, we’ll put it to the test, but if this type of improvement carries through, it will be a fun season, indeed.

…..

That’s it for this week. I’ll see you next weekend for top 25’s. Please comment/question/challenge, and most importantly, share with others. Thanks, G

2 Replies to “G’s Exploration or Is the 2020 Formula Actually Better?”

  1. You have plainly upped your game with the new formula. Just one question, Robert: Was the prediction test run using data from the whole year? I’m thinking yes, since that would be a pain to save each week’s data and then run the test for 18(?) groups of data. How do the think the prediction test would differ if it were in real time throughout the season? The early weeks should probably be discounted.

    1. Great question, Carl. This analysis is a post-season review of all the games, as if they all happened now, but with the season data. I looked at the data week by week (publishing the summary), but that’s not your question. I can go back and look week by week “real-time” starting in week 4 (prior to that the ratings are not valid). Using the 2019 data, I was able to correctly predict around 2/3!of the time. I’ll use that as a baseline for comparison. Good suggestion as always. Thanks!!!

Comments are closed.