This fall season did not go as planned, as far as this blog is concerned. The blog is lower on my priority list than family life, my job, graduate school, and charity work, which together sucked up almost all my time. On the other hand, the football season went much better than I expected, with Notre Dame going undefeated, and Texas A&M being ranked in the top 10 and having defeated then #1 (in the polls) Alabama. Both teams have viable Heisman Trophy candidates as well.
The FBS college football rankings faired about the same as in years past. Some teams ranked near the top at the beginning of the season stayed near the top with only one or two losses. Using the AP poll as an example, Alabama was ranked #2 in the preseason and is #2 right now with an 11-1 record, prior to today's SEC Championship matchup with Georgia. Other teams that fall into this category are Georgia (#6, #3, 11-1), Oregon (#5, #6, 11-1), and LSU (#3, #9, 10-2). Other teams low or unranked in the top-25 at the beginning of the season moved into the top ten with better-than-expected seasons [Notre Dame (#26, #1, 12-0), Florida (#23, #5, 11-1), and Texas A&M(#36, #10, 10-2)].
Unfortunately for some teams, they were ranked near the top at the start of the season, but underperformed. Examples include USC (#1, #34, 7-5), West Virginia (#11, unranked, 6-5), and Arkansas (#10, unranked, 4-8). It's hard to know which team to feel more sorry for - the one that was rated #1 in the preseason but lost 5 close games? The one that won its first 5 games with awesome offensive power, but then lost the next 5 (three blowouts, and two by 1 point)? Or the one that just never got anything going, and became the whipping boy of the SEC?
I close with three points to ponder. First, an illustration of the weakness of the current human poll system when choosing teams to participate in a national championship (whether 2 teams or 4 teams). In week 10, the top 4 teams in the AP poll, were (in order): Alabama (8-0), Oregon (8-0), Kansas St (8-0), and Notre Dame (8-0). In the preseason, these same teams were ranked, respectively: #2, #5, #22, #26. In the BCS computer rankings for that week (which factor in strength of schedule), they were: Notre Dame and Kansas St (tied at #1), Alabama (#3), and Oregon (#5). This illustrates only too well my thesis at the beginning of the season, that where you are ranked at the beginning of the season is a huge factor in where you end up at the end of the season. If all four teams had continued to win out, there is no doubt that Notre Dame would have been left at #4 in the BCS rankings, while #1 in the computer rankings. In a 4-team playoff, no big deal; but in the 2-team playoff system that will be used this season, well, that's just controversy waiting to happen.
Still not convinced? What about the next four teams in week 10: LSU (7-1), Ohio St (9-0), Georgia (7-1), and Florida (7-1)? Setting Ohio St aside because there is obvious bias against them due to being on probation for the season, the other three teams in the preseason were ranked #3, #6, and #23. Okay, you might argue that Georgia beat Florida and should be ranked ahead of them. But what about LSU - they were beaten by Florida? And note that the computer rankings for those three teams were: Florida (#4), LSU (#6), and Georgia (#7).
Second point: In the 2014 4-team playoff, what are they going to do about teams in the same conference? Prior to conference championship weekend, there are 6 teams from the SEC in the top 10 of the BCS rankings, two from the PAC-12, one from the BIG-12, and one independent. Seems to me that it is unfair in principle to let a team that hasn't won its own conference into a 4-team playoff, but consider Florida, which is #2 in the computer rankings and will probably move up to #3 after Alabama and Georgia play today for the SEC championship. Fate chose the wrong team for them to lose to this season.
Final point: It's always interesting to me how "vicious circles" work in football, (i.e., Team A beats Team B, which beats Team C, which beats Team A). This season, the SEC participated in a doozy. Alabama beat LSU, which beat South Carolina, which beat Georgia, which beat Florida, which beat Texas A&M, which beat Alabama. Considering that these are the 6 SEC teams currently in the top 10 of the BCS, and that their only losses all season are to one or two of the other six (LSU and South Carolina also lost to Florida, and Texas A&M also lost to LSU), maybe the SEC needs to reconsider how it sets up its schedule to prevent one team after another from knocking their "friends" off the playoff perch.
FBS Playoff System - The Mader Way
Saturday, December 1, 2012
Sunday, September 30, 2012
Principles for a Computer-based Ranking System
In
the last post, I implied that a well-designed, simple, and transparent computer
program would be superior to the existing human-expert system (polls), when
selecting teams to compete in a playoff.
Let’s consider what I mean by each of the three characteristics. By “well-designed”, I mean a system that
evaluates and ranks teams based on available data, and which mimics how human
evaluators would rate the teams if the evaluators were truly unbiased (i.e.,
blind to the hype and history surrounding a team, and how others have recently
rated it). By “simple”, I refer to a
system that does not require complicated calculations (like derivatives or
logarithms), or a large set of data points for each team, or ambiguous or
potentially misleading figures (like attendance, turnover ratio, or total yards
gained). Instead, a simple system would use
a small set of essential data points from each game, and stick to basic math
operations, such as adding, multiplying, and dividing. It is important to note that a simple system would
still require a large number of calculations because of the number of teams and
games involved (but to a computer which can do repetitive calculations with
100% accuracy, that is simple).
Finally, a “transparent” computer program would have the source code
available to the public for scrutiny to be sure it met the principles of the
proposed ranking system, and it would be documented in such a way that other
programmers could write their own programs that meet the same principles, and
get exactly the same results.
Since the last post, there have been five weeks of games. Some teams ranked high at the beginning of the season are undefeated and continue to rank high, but others have dropped out after losing once or twice. Still others have risen dramatically, and then have dropped with an unexpected loss. I'll discuss some specifics in the next post.
Then
what principles make sense to rate teams for the purpose of selecting a few to
participate in a national championship?
There are several strategies to choose from, such as calculating which
teams are the strongest at the end of the season (sometimes referred to as a
power rating), or determining which teams have the most potential to win their
next game. Another possibility would be
teams that are most likely to have strong audience support. It could be argued that all three of those
strategies are used to select at-large college basketball teams for March
Madness, and that in general those strategies have been effective. However, it’s my opinion that in college
football, the teams that most deserve to participate in a playoff are the ones
who have achieved the most over the course of the season. Why?
As mentioned before, the season is short relative to the number of teams
involved, so it’s difficult to compare teams on a head-to-head basis. More importantly, football is a game in which
a team’s performance, compared to its potential, can vary greatly from one game
to the next. Also, opponent strengths can
vary significantly from week to week. For
example, one week a contender may face a team with a strong passing game, and
the next it may face a team that specializes in the run, followed by a third
team that finds success in a strong defense and a well-disciplined kicking
game. With one week to prepare, it can
be difficult to make adjustments. Therefore,
it seems the fairest way to assess a team is to measure if it has been balanced
and consistent over the length of the season.
By extension, then, an unfair way to measure a team is to expect it to
be perfect for the entire season.
How
should one measure achievement? The most
straightforward answer is: by winning games.
Winning a game is always an achievement, no matter how poor the opponent,
or how close the score. On the other
hand, there is no achievement in losing a game, even if an underdog team comes
close to knocking off a strong opponent.
But to distinguish great teams from good ones, one needs to measure more
than the number of games won; one needs to assign a value to each
win/achievement. What criteria can be
used to assign a value? Certainly, a win
against a strong team (one that few have defeated) should be worth more than a
win against a weak team (such as one that has been defeated by many). Secondly, a win by a large margin is better
than a small margin, against opponents of essentially equal strength. Finally, wins in the later part of the season,
when all teams have matured, are more important than wins in the early part of
the season. In summary, then, the ideal playoff-worthy
team will win a lot of games against strong teams by a large margin, with the
best wins coming late in the season.
The
three listed criteria are described in general terms so far; specific details are
necessary to assign values. For now, it
is worth exploring what criteria should not be used to assign value. The attendance figures, television
appearance, home field advantage, or competency of the officiating crew do not
seem valid tools for measuring how good the achievement is. Likewise, injuries or suspensions of key
players, weather conditions, and “lucky breaks” should not be considered in
calculating the value of a win. While it
is possible every one of those criteria will affect the outcome of a game, it
would be nearly impossible to quantify the effects. In the end, a truly great team, nearly all
the time, will find a way to win despite any disadvantages or mishaps that
occur along the way. It is also worth
noting here that in-game statistical criteria such as number of take-aways, 3rd-down
conversion rate, time of possession, or even yardage gained should not be used
to help assign value to a win, since it is certainly possible to win in all or
nearly all of the statistical categories, and still lose the game. Coaches design their game plans around how to
score more points than the opponent, not how to beat the opponent in
statistical categories. While it may be
interesting after the game to discuss how amazing it was that Team A outgained
the opponent 500 yards to 195 yards, or that Team B had a +3 take-away margin,
the bottom line is, who won? And, how
close was the final score?
Since the last post, there have been five weeks of games. Some teams ranked high at the beginning of the season are undefeated and continue to rank high, but others have dropped out after losing once or twice. Still others have risen dramatically, and then have dropped with an unexpected loss. I'll discuss some specifics in the next post.
Thursday, August 30, 2012
What about computer-based evaluation systems?
Instead
of a human-expert evaluation system, let’s consider if a computer-based
evaluation system could be employed in a fair way to pick the top 4 teams to
participate in a post-season playoff. The
BCS committee has been using computer-based systems as part of the selection
process since the inception of the BCS process, but they are not held in high
esteem by the committee. Out of six
computer rankings, the BCS system averages the results from the middle four for
each team, then averages that result with two human-expert polls (the USA Today
Coaches Poll and the Harris Interactive Poll) to determine the overall
ranking. Thus, each computer-based
system has about 6% influence in the final results, while each human-expert
system has about 33% influence. The full
BCS selection procedures can be found at this website: BCS Selection
Procedures
Currently,
there are 6 different computer-based methodologies in use (Anderson &
Hester, Richard Billingsley, Colley Matrix, Kenneth Massey, Jeff Sagarin and
Peter Wolfe). All of them employ some
method of calculating schedule strength, but by decree of the BCS committee,
none of them employ margin of victory directly or indirectly in their
calculations. This website: BCS
Know How - Computer Rankings summarizes each one of them in turn. There are many other computer ranking systems
developed, with their results available online.
Each ranking system emphasizes the available statistics differently,
based on the philosophy of the developer.
Some are interested in a power ranking (how good is the team right
now?); others are interested in potential (how will the team do in its next
game?). Still others are interested in
results (how much has the team achieved so far this year?) Some think that the location of games (home
vs. away) is critical; some even think that the number of people in the stands
is important.
One
could argue that we should not rely much on computer-based ranking systems
because computers are not emotional and therefore can’t fairly evaluate the
“intangibles”. But that is exactly the
advantage that a computer-based system has, if designed adequately – it can
provide an unbiased assessment of the
results on the field, because it can repeatedly and accurately evaluate the
data without being distracted by emotion. One could also argue that we should not rely
on computer-based ranking systems because computer programs have the programmer’s
bias built in, and inevitably have bugs in them, which will lead to bad results. While this is a possibility, a well-designed,
simple, and transparent computer program would certainly be less likely to provide
undesirable or controversial results than depending on a human-expert system
which makes no attempt to correct or prevent a known source of bias.
Next:
how to design a computer-based ranking system that mimics “blind” evaluators.
Tuesday, August 14, 2012
Bias in the Ranking System
The scarcity of head-to-head non-conference games between top FBS teams magnifies the real culprit in creating a fair way to determine the national champion: bias built into the existing ranking system. Now, the rank of a team is primarily determined by votes of persons who are FBS football experts (65 sportswriters/broadcasters in the AP poll, 57 FBS coaches in the USA Today poll, and 115 school representatives and members of the media for the Harris Interactive poll). Theoretically, the voting system is designed to spread out personal bias so that its effect is negligible. For instance, the voters are chosen from all parts of the country, different age groups, and a wide variety of college backgrounds, which should dilute out all that emotional stuff, like regional bias, historical favorites, and conference or alma mater loyalty.
However, the selection systems are not designed to root out “expert” and “consensus-building”biases. What do I mean by this? With their first vote, members independently rank order the top 25 teams by whatever system they wish to devise, and send their choices to a central group which compiles all the votes and publishes a top 25 list that everyone (including the experts) can see. Early in the season, the “expert” bias is in play because the voters have little head-to-head information available to them, so they choose teams based on their potential, which has proven to be deceiving. The “expert” bias can be clearly seen nearly every year, when early favorites lose a game but do not drop out of the top 25, and sometimes they don’t even drop below a team that beat them on the field. The “expert” bias can also be recognized at the end of the season by checking the final records of the top 25 teams selected in the first poll of the season. While some pre-season favorites complete the season with two or fewer losses, there are always several “surprises” that finish mediocre at best, with five or more losses.
After the first poll or two, the “consensus-building” bias is in play – once members know what the consensus results are, they start to make adjustments to their votes to fall more in line with the other voters. The “consensus-building” bias can be seen by comparing the AP and USA Today polls after a few weeks in the season, relative to their results in the pre-season poll. At the fourth week or so of the season there are many undefeated teams, yet the two polls always look remarkably alike with regards to who is ranked where. Another visible aspect of the “consensus-building”bias is that a team tends to keep its position in the poll when it wins, moving up only if a team a few notches above it loses, and usually keeping the same position relative to the other winners (for example, if the #5 and #7 teams lose, then #6, #8 and #9 will move to positions 5 through 7 respectively, regardless of their opponents or the margins of victory). It is also evident when looking at teams with the same number of losses – the one with the earlier loss in the season is more likely to be ranked higher, because it has more time to recover.
One might argue that the “expert” bias doesn’t really matter because the voters have time to correct their early mistakes and choose the best teams by the end of the season. That argument would be reasonable, except for the “consensus-building” bias which tends to keep teams at the top if they continue to win, aggravated by the fact that only 2 teams (or 4 starting in 2014) are selected for the national championship game(s). In a 2-team or 4-team playoff, inevitably there are going to be several teams with nearly identical records vying for that last playoff spot, and the “consensus-building” bias is going to have a lot of influence. At the end of the season, a team that was considered to be one of the best at the start of the season is more likely to be ranked higher in the polls, compared to another team with an identical win-loss record which was ranked lower at the start of the season. Ultimately, end of season rankings are biased by the perception of who is best at the beginning of the season, which is clearly an unreliable predictor of talent.
If an expert evaluation system is to be employed to rank football teams, it can only be unbiased if the evaluators are “blind”; that is, the voters may be provided any desired statistics or information about the FBS teams, except for their identity, the conference they belong to, and the previous week’s rank. That may be an unusual proposal, but it’s what’s necessary to make the system truly fair.
Saturday, August 11, 2012
Why Polls are Not an Effective Way to Choose the top 2 FBS teams
Given the current regular season scheduling method, it’s impossible for pollsters to compare top FBS
football teams with the precision needed to select the best two (or four). Why? There are 124 schools
with teams competing in the FBS during the 2012 season. Those teams are divided up into 11
conferences, along with a few teams that are independent. Nearly all the teams play just 11 or 12 games against
other FBS teams, and the vast majority of those games (8 or more) are played against teams
within their own conference. Therefore,
while the system is set up nicely to judge the best teams within a conference, there
is very little head-to-head evidence established to compare the top teams in each conference
to each other. The math is fairly
straightforward. Teams in each
conference play just 2 or 3 FBS games outside of their conference. Since there are about 110 non-conference
schools available for each team, it is likely that only one of the
non-conference games will be against a top-35 (or better) team. It’s inevitable that any system designed to
select just 2 teams out of 124, or even 4 teams out of 11 conferences, is going to
appear unfair to fans or coaches most of the time, if there are few
head-to-head battles as a basis for comparison.
Compare
this to NCAA Division I men’s basketball.
There are currently 345 teams, also organized into conferences (32 this year). A typical NCAA basketball team plays
approximately 30 regular season games, with about half of them against
non-conference foes. In fact, many top
teams schedule at least a half-dozen non-conference games against other
top-50 teams. With such a large number of
non-conference games available as evidence, and a post-season tournament that includes 68 teams, it is not difficult at all for a national consensus polling system to precisely select all of the top 25 teams for the tournament. In other words, even if two experts disagree on nearly every pick for the top 25, it’s virtually
certain that all of their picks will have a chance to earn the national
championship on the court. I am not saying the NCAA Division I basketball tournament selection process is flawless, nor am I saying that the FBS needs to have a 32- or 64-team playoff - I am merely pointing out that the expert polling system is a viable option for identifying top basketball teams, but not a good system for football because of the significant differences in how the regular seasons are set up.
Saturday, August 4, 2012
A short summary of the controversy
The
argument about how to choose a college football national champion has been
waged for decades. At the highest level
of college football (known as the Football Bowl Subdivision (FBS) since 2006,
and Division I-A prior to that), a national champion has been named by some
form of consensus since 1883.
Controversy has existed since at least 1925, when the Helms Foundation
selected Alabama as the champion and the Dickinson System selected Dartmouth. There have been split-champions at least 17 more
times since 1925, most recently in 2003 when Louisiana State and Southern Cal
shared the honors.
Examples
of other ambiguous championships include 1947, when the AP first selected 9-0
Notre Dame prior to the bowl season, and then reversed itself after bowl season
to select 10-0 Michigan after they won the Rose Bowl (compared to Notre Dame
which did not participate in bowl games at the time). In 1975, Arizona St finished 12-0, defeating
Nebraska in the Fiesta Bowl, but finished second in the national championship
to 11-1 Oklahoma. In 1977, five teams
finished the bowl season with 11-1 records.
Alabama fans cried foul when their previously ranked #3 team, and easy winner
of the Sugar Bowl, was leap-frogged by the regular season #5 team Notre Dame,
which had handily beaten #1 Texas in the Cotton Bowl. A similar situation occurred in 1983 when #5 Miami
beat # 1 Nebraska in the Orange Bowl, jumping over #3 Auburn, which won the
Sugar Bowl. The Division I-A national
championship is littered with multiple other controversies during the 20th
century.
Why
does such a long history of controversy exist?
There are many contributing factors, but the primary reason is that the FBS
is the only NCAA-sponsored sport without an organized tournament to determine
its champion. In 1998, after 9
consecutive years of disputes in which one conference after another felt they
were cheated out of a national champion because of the existing bowl agreements,
the Bowl Championship Series (BCS) was born.
The BCS was supposed to end all controversy by finally giving the top
two teams an opportunity to play each other in a final game of the year. Unfortunately for the architects of the BCS
system, they did not anticipate the difficulty involved in determining which
teams were the top two, so the controversy has continued. Examples include the 2006 season, when Boise
St completed the regular season 12-0 and was not invited to play the only other
undefeated team Ohio St, and the 2008 season, when 12-1 Florida defeated 12-1
Oklahoma, but undefeated Utah was not deemed worthy to play either team.
Sunday, July 29, 2012
A 4-team playoff. Really?
Since the 1980's, I've been frustrated by the irrational way that a Division 1-A (FBS) football champion is selected. Neither the NCAA nor the sports media have been able to come up with something reasonable that survives the test of time, or even two or three seasons, without significant controversy.
During the last decade, the BCS committee has tried tinkering with the system during the off-season and generally made the situation worse.
Now, in the summer of 2012, the BCS committee has finally agreed in principle to a 4-team playoff system, starting with the 2014 season. They (and many college football fans) are hoping this, finally, will end the controversy. Me, I am still skeptical. While a 4-team playoff may look like a step forward, only a well thought-out system for selecting those playoff teams will enable the controversy to significantly subside. I think the BCS committee should ask fans to help them design a selection method, but it appears they want to do this on their own, in a secret back room.
This blog is dedicated to developing a rational, unbiased method for recognizing an FBS national champion. Since neither the NCAA or the BCS committee has asked for my opinion, I hope to build consensus for a transparent method that makes sense, while there's still time . . .
Now, in the summer of 2012, the BCS committee has finally agreed in principle to a 4-team playoff system, starting with the 2014 season. They (and many college football fans) are hoping this, finally, will end the controversy. Me, I am still skeptical. While a 4-team playoff may look like a step forward, only a well thought-out system for selecting those playoff teams will enable the controversy to significantly subside. I think the BCS committee should ask fans to help them design a selection method, but it appears they want to do this on their own, in a secret back room.
This blog is dedicated to developing a rational, unbiased method for recognizing an FBS national champion. Since neither the NCAA or the BCS committee has asked for my opinion, I hope to build consensus for a transparent method that makes sense, while there's still time . . .
Subscribe to:
Posts (Atom)