The Inherent Flaw Of Computer Rankings

I’ve talked about computer rankings on this blog many times. But a lot of people are still unclear as to the methodology of these rankings. Let me explain how they work and why trying to use a computer formula to rank teams is inherently flawed.

With any formula you have variables. Variables are the numbers you input into the formula so that it can analyze them and spit out a number. For example, suppose I wanted to make a formula that measured the amount of money I made from my lemonade stand in a day. It might look something like this:

A x B = C

Where A is the number of glasses of lemonade I sold that day, B is the price per glass of lemonade and C is the money I made. I have a formula here. It’s not much use if I don’t fill in the variables. So say I sold 100 glass of lemonade for $1 each. My formula tells me when I input these numbers that I made $100 that day. Pretty simple.

Computer formulas that rank college football teams work differently but are essentially the same. You have some variables, you input some values for those variables, you get a result. So let’s look at some of the variables:

1. Wins – pretty self explanatory. Get more of them and you rank better.
2. Losses- pretty self explanatory. Lose less and rank better.
3. Schedule strength- very convoluted calculation. Compares rankings of teams you have played with rankings of teams others have played.
4. Quality wins- wins against top tier teams.
5. Quality losses- losses to top tier teams.
6. Bad losses- losses to poorly ranked teams

Am I missing anything? Oh yes, margin of victory, points scored, points allowed…but these are not allowed in BCS computer rankings for fear coaches would run up the score on everyone to get better rankings. So for better or worse, they are not used.

So now how can this be all that bad? Seems like a computer formula could work just fine here. Except it can’t. Because no team is being ranked for what it does. Every team is being ranked solely by one measure: who did they play. That’s it. Nothing else to it. And that is why a computer model is inherently flawed.

For example, say Louisville had ended the season undefeated as well as Oregon. Why is Oregon ranked higher? Because the teams they beat were better than the teams Louisville beat. See? Performance means literally nothing. Zip. Zilch. Nada. This is why 9-1 Florida State is ranked so bad by the computers. Only one quality win against Clemson. And their loss is considered a very bad loss. Compare that to Florida or Georgia. Quality wins. Better yet, their only losses came to quality opponents.

So it doesn’t matter how you play. It matters WHO you play, plain and simple. This is why Kansas (1-9) is ranked by Jeff Sagarin’s computer at #77 while 8-2 Ohio is ranked #96. Kansas has played the hardest schedule in the country. Ohio’s schedule strength is rated 164th most difficult (includes FCS). First of all, Ohio beat Penn State 24-14 in the season opener. Penn State is ranked #31 by Jeff Sagarin. They are 6-4. If Ohio could beat them, why is Jeff Sagarin’s computer so sure 1-9 Kansas is a better team than Ohio? For that matter, why does it think Penn State is a better team than Ohio? Plain and simple: Ohio has played a bunch of losers.

Here’s the ironic part. Switch the schedules of Ohio and Alabama. Yeah, Ohio probably only wins 3-4 games. I get that. But what does Alabama do? They crush everyone and go undefeated. So what if Ohio had crushed everyone and gone undefeated? Would we think they are the best in the land?

You see, we know Alabama is a great team. We would expect them to steamroll Ohio’s schedule. But your ranking should not be determined by who you play. Period. It should be about how you play the schedule you have. And if you are going to use opponents, then ask yourself, how well have Penn State’s other opponents fared? Well, 3 others beat them. 6 others lost to them. So where does that put Ohio in terms of other teams played by Penn State? Better than 6 of them, worse than or equal to 3?

Nope, not the case. Only Temple and Illinois are ranked below Ohio as far as teams played by Penn State. Virginia who only beat them by one and at home? Ahead of Ohio. Navy, Iowa, Purdue? All teams beaten by Penn State…are ranked ahead of Ohio. It’s ridiculous, any person could see that Ohio is better than Iowa. But the blind computers just go with the numbers. So if you do badly in a good conference you are apparently better than doing well in a bad conference. Unbelievable.

But there is a simple solution. There are other variables available you can use. This problem can indeed be solved.

Take the case of Texas A&M, LSU and Alabama. They all played each other. They each beat one of the others. Computers would all say Alabama is the best team of the three because they only have one loss while the other two each have two losses. But wait…Texas A&M and LSU both got their second loss from the same team…Florida. Did Alabama play Florida? Nope. If they did, would they also be a two loss team? There’s no telling. But if Florida beat LSU who was ahead of Alabama all game until the final minute, and if Florida beat Texas A&M who just took down Alabama at Alabama, wouldn’t logic dictate that Florida is probably going to beat Alabama if they played each other?

Now look at South Carolina, Georgia and Florida. They all played each other. They each beat one of the other teams. South Carolina though has two losses and so is ranked behind the other two. And since Georgia beat Florida, logic says they are the best team of the three. Plus Florida beat LSU as well. But wait, what if Georgia played LSU? Would they be a two loss team like South Carolina? And if so, would they be a worse team than Florida in spite of winning the head to head matchup?

These are difficult questions to answer. But there is a way. The key variable is time in the lead. Both teams start the game tied and so at least the first few seconds of every game are tied. But after a few plays teams start to score. Leads are built or changed. But dominant teams have on thing in common- they always control their games. Look at a team like Oregon. They score so much and so fast that their defense spends a whole lot of time on the field. This gives opponents lots of opportunities to score. But Oregon has never trailed in a game. They are always in the lead. They don’t letting you score 50% of the time because they can score more often than that and beat you. Winning is all that matters, not the number of points.

So the variable to look at is time in the lead. Each game is 60 minutes long. So add up all the minutes and seconds played while tied, while trailing and while leading. This is a key variable.

Now compare it to your opponent’s opponents. Let’s say Oregon was ahead of USC for 59 minutes. Minus that game, USC on average is in the lead for 40 minutes. Or in other words, USC’s other opponents managed to hold leads against them for 20 minutes. Oregon managed it for 59. Do that for all their opponents. Then you begin to see how dominant a team is.

Once you can determine this, you can start simulating games, like what if Alabama played Florida? What if Georgia played LSU? Then you can use data to make your best guess as to who would win each of those games. Use what you do know to fill in what you don’t. From that you can simulate a season in which every team plays every other team. Would someone go 120-0? Who knows. But from that you can then rank the teams more accurately. Then the head to head matchup doesn’t make a difference. In all reality, Penn State would probably win more games than Ohio if they each played everyone else. Alabama would probably win more games than Texas A&M. The data from each game only gives you something to work with to fill in the gaps and try and see what a 120 game schedule would look like for everyone.

Then you simply rank the teams by wins. But wait, what if a 108-12 Florida team loses in simulation to a 105-15 Georgia team? Shouldn’t Georgia be ranked higher? No, they shouldn’t. This wouldn’t be a ranking of who is better than another team. It’s a ranking of who is better than the most teams. Yeah, according to that, you could get data that says Georgia would beat Florida. But that same data might suggest that Florida would beat more teams than Georgia. That makes Florida a higher ranked team.

I know it’s a lot to handle, but if you could just calculate time in the lead and that was an available stat, you could get much more accurate rankings. In the meantime we are stuck with flawed systems that only rank you based on who you have played. They don’t actually compare you to any other team specifically (else head to head matchup makes a difference). So the stated goal of the BCS is to pit the two best teams against each other, right? So that would suggest they are saying #2 is a better team than #3, right? But the computer rankings don’t and can’t say that. All the computers tell us is who played the two hardest schedules the best. And that’s no way to rank anyone.

Comments