Saturday, December 1, 2012

Some reflections near the end of the season

This fall season did not go as planned, as far as this blog is concerned.  The blog is lower on my priority list than family life, my job, graduate school, and charity work, which together sucked up almost all my time.  On the other hand, the football season went much better than I expected, with Notre Dame going undefeated, and Texas A&M being ranked in the top 10 and having defeated then #1 (in the polls) Alabama.  Both teams have viable Heisman Trophy candidates as well.

The FBS college football rankings faired about the same as in years past.  Some teams ranked near the top at the beginning of the season stayed near the top with only one or two losses.  Using the AP poll as an example, Alabama was ranked #2 in the preseason and is #2 right now with an 11-1 record, prior to today's SEC Championship matchup with Georgia.  Other teams that fall into this category are Georgia (#6, #3, 11-1), Oregon (#5, #6, 11-1), and LSU (#3, #9, 10-2).  Other teams low or unranked in the top-25 at the beginning of the season moved into the top ten with better-than-expected seasons [Notre Dame (#26, #1, 12-0),  Florida (#23, #5, 11-1), and Texas A&M(#36, #10, 10-2)]. 

Unfortunately for some teams, they were ranked near the top at the start of the season, but underperformed.  Examples include USC (#1, #34, 7-5), West Virginia (#11, unranked, 6-5), and Arkansas (#10, unranked, 4-8).  It's hard to know which team to feel more sorry for - the one that was rated #1 in the preseason but lost 5 close games?  The one that won its first 5 games with awesome offensive power, but then lost the next 5 (three blowouts, and two by 1 point)?  Or the one that just never got anything going, and became the whipping boy of the SEC?

I close with three points to ponder.  First, an illustration of the weakness of the current human poll system when choosing teams to participate in a national championship (whether 2 teams or 4 teams).  In week 10, the top 4 teams in the AP poll, were (in order): Alabama (8-0), Oregon (8-0), Kansas St (8-0), and Notre Dame (8-0).  In the preseason, these same teams were ranked, respectively: #2, #5, #22, #26.  In the BCS computer rankings for that week (which factor in strength of schedule), they were: Notre Dame and Kansas St (tied at #1), Alabama (#3), and Oregon (#5).  This illustrates only too well my thesis at the beginning of the season, that where you are ranked at the beginning of the season is a huge factor in where you end up at the end of the season.  If all four teams had continued to win out, there is no doubt that Notre Dame would have been left at #4 in the BCS rankings, while #1 in the computer rankings.  In a 4-team playoff, no big deal; but in the 2-team playoff system that will be used this season, well, that's just controversy waiting to happen. 

Still not convinced?  What about the next four teams in week 10:  LSU (7-1), Ohio St (9-0), Georgia (7-1), and Florida (7-1)?  Setting Ohio St aside because there is obvious bias against them due to being on probation for the season, the other three teams in the preseason were ranked #3, #6, and #23.  Okay, you might argue that Georgia beat Florida and should be ranked ahead of them.  But what about LSU - they were beaten by Florida?  And note that the computer rankings for those three teams were: Florida (#4), LSU (#6), and Georgia (#7).

Second point:  In the 2014 4-team playoff, what are they going to do about teams in the same conference?  Prior to conference championship weekend, there are 6 teams from the SEC in the top 10 of the BCS rankings, two from the PAC-12, one from the BIG-12, and one independent.  Seems to me that it is unfair in principle to let a team that hasn't won its own conference into a 4-team playoff, but consider Florida, which is #2 in the computer rankings and will probably move up to #3 after Alabama and Georgia play today for the SEC championship.  Fate chose the wrong team for them to lose to this season.

Final point:  It's always interesting to me how "vicious circles" work in football, (i.e., Team A beats Team B, which beats Team C, which beats Team A).  This season, the SEC participated in a doozy.  Alabama beat LSU, which beat South Carolina, which beat Georgia, which beat Florida, which beat Texas A&M, which beat Alabama.  Considering that these are the 6 SEC teams currently in the top 10 of the BCS, and that their only losses all season are to one or two of the other six (LSU and South Carolina also lost to Florida, and Texas A&M also lost to LSU), maybe the SEC needs to reconsider how it sets up its schedule to prevent one team after another from knocking their "friends" off the playoff perch.

Sunday, September 30, 2012

Principles for a Computer-based Ranking System

In the last post, I implied that a well-designed, simple, and transparent computer program would be superior to the existing human-expert system (polls), when selecting teams to compete in a playoff.  Let’s consider what I mean by each of the three characteristics.  By “well-designed”, I mean a system that evaluates and ranks teams based on available data, and which mimics how human evaluators would rate the teams if the evaluators were truly unbiased (i.e., blind to the hype and history surrounding a team, and how others have recently rated it).  By “simple”, I refer to a system that does not require complicated calculations (like derivatives or logarithms), or a large set of data points for each team, or ambiguous or potentially misleading figures (like attendance, turnover ratio, or total yards gained).  Instead, a simple system would use a small set of essential data points from each game, and stick to basic math operations, such as adding, multiplying, and dividing.  It is important to note that a simple system would still require a large number of calculations because of the number of teams and games involved (but to a computer which can do repetitive calculations with 100% accuracy, that is simple).  Finally, a “transparent” computer program would have the source code available to the public for scrutiny to be sure it met the principles of the proposed ranking system, and it would be documented in such a way that other programmers could write their own programs that meet the same principles, and get exactly the same results.

Then what principles make sense to rate teams for the purpose of selecting a few to participate in a national championship?  There are several strategies to choose from, such as calculating which teams are the strongest at the end of the season (sometimes referred to as a power rating), or determining which teams have the most potential to win their next game.  Another possibility would be teams that are most likely to have strong audience support.  It could be argued that all three of those strategies are used to select at-large college basketball teams for March Madness, and that in general those strategies have been effective.  However, it’s my opinion that in college football, the teams that most deserve to participate in a playoff are the ones who have achieved the most over the course of the season.  Why?  As mentioned before, the season is short relative to the number of teams involved, so it’s difficult to compare teams on a head-to-head basis.  More importantly, football is a game in which a team’s performance, compared to its potential, can vary greatly from one game to the next.  Also, opponent strengths can vary significantly from week to week.  For example, one week a contender may face a team with a strong passing game, and the next it may face a team that specializes in the run, followed by a third team that finds success in a strong defense and a well-disciplined kicking game.  With one week to prepare, it can be difficult to make adjustments.  Therefore, it seems the fairest way to assess a team is to measure if it has been balanced and consistent over the length of the season.  By extension, then, an unfair way to measure a team is to expect it to be perfect for the entire season.

How should one measure achievement?  The most straightforward answer is: by winning games.  Winning a game is always an achievement, no matter how poor the opponent, or how close the score.  On the other hand, there is no achievement in losing a game, even if an underdog team comes close to knocking off a strong opponent.  But to distinguish great teams from good ones, one needs to measure more than the number of games won; one needs to assign a value to each win/achievement.  What criteria can be used to assign a value?  Certainly, a win against a strong team (one that few have defeated) should be worth more than a win against a weak team (such as one that has been defeated by many).  Secondly, a win by a large margin is better than a small margin, against opponents of essentially equal strength.  Finally, wins in the later part of the season, when all teams have matured, are more important than wins in the early part of the season.  In summary, then, the ideal playoff-worthy team will win a lot of games against strong teams by a large margin, with the best wins coming late in the season.

The three listed criteria are described in general terms so far; specific details are necessary to assign values.  For now, it is worth exploring what criteria should not be used to assign value.  The attendance figures, television appearance, home field advantage, or competency of the officiating crew do not seem valid tools for measuring how good the achievement is.  Likewise, injuries or suspensions of key players, weather conditions, and “lucky breaks” should not be considered in calculating the value of a win.  While it is possible every one of those criteria will affect the outcome of a game, it would be nearly impossible to quantify the effects.  In the end, a truly great team, nearly all the time, will find a way to win despite any disadvantages or mishaps that occur along the way.  It is also worth noting here that in-game statistical criteria such as number of take-aways, 3rd-down conversion rate, time of possession, or even yardage gained should not be used to help assign value to a win, since it is certainly possible to win in all or nearly all of the statistical categories, and still lose the game.  Coaches design their game plans around how to score more points than the opponent, not how to beat the opponent in statistical categories.  While it may be interesting after the game to discuss how amazing it was that Team A outgained the opponent 500 yards to 195 yards, or that Team B had a +3 take-away margin, the bottom line is, who won?  And, how close was the final score?

Since the last post, there have been five weeks of games.  Some teams ranked high at the beginning of the season are undefeated and continue to rank high, but others have dropped out after losing once or twice.  Still others have risen dramatically, and then have dropped with an unexpected loss.  I'll discuss some specifics in the next post.

Thursday, August 30, 2012

What about computer-based evaluation systems?


Instead of a human-expert evaluation system, let’s consider if a computer-based evaluation system could be employed in a fair way to pick the top 4 teams to participate in a post-season playoff.  The BCS committee has been using computer-based systems as part of the selection process since the inception of the BCS process, but they are not held in high esteem by the committee.  Out of six computer rankings, the BCS system averages the results from the middle four for each team, then averages that result with two human-expert polls (the USA Today Coaches Poll and the Harris Interactive Poll) to determine the overall ranking.  Thus, each computer-based system has about 6% influence in the final results, while each human-expert system has about 33% influence.  The full BCS selection procedures can be found at this website: BCS Selection Procedures

Currently, there are 6 different computer-based methodologies in use (Anderson & Hester, Richard Billingsley, Colley Matrix, Kenneth Massey, Jeff Sagarin and Peter Wolfe).  All of them employ some method of calculating schedule strength, but by decree of the BCS committee, none of them employ margin of victory directly or indirectly in their calculations.  This website:  BCS Know How - Computer Rankings  summarizes each one of them in turn.  There are many other computer ranking systems developed, with their results available online.  Each ranking system emphasizes the available statistics differently, based on the philosophy of the developer.  Some are interested in a power ranking (how good is the team right now?); others are interested in potential (how will the team do in its next game?).  Still others are interested in results (how much has the team achieved so far this year?)  Some think that the location of games (home vs. away) is critical; some even think that the number of people in the stands is important.

One could argue that we should not rely much on computer-based ranking systems because computers are not emotional and therefore can’t fairly evaluate the “intangibles”.  But that is exactly the advantage that a computer-based system has, if designed adequately – it can provide an unbiased assessment of the results on the field, because it can repeatedly and accurately evaluate the data without being distracted by emotion.  One could also argue that we should not rely on computer-based ranking systems because computer programs have the programmer’s bias built in, and inevitably have bugs in them, which will lead to bad results.  While this is a possibility, a well-designed, simple, and transparent computer program would certainly be less likely to provide undesirable or controversial results than depending on a human-expert system which makes no attempt to correct or prevent a known source of bias.

Next: how to design a computer-based ranking system that mimics “blind” evaluators.

Tuesday, August 14, 2012

Bias in the Ranking System

The scarcity of head-to-head non-conference games between top FBS teams magnifies the real culprit in creating a fair way to determine the national champion: bias built into the existing ranking system. Now, the rank of a team is primarily determined by votes of persons who are FBS football experts (65 sportswriters/broadcasters in the AP poll, 57 FBS coaches in the USA Today poll, and 115 school representatives and members of the media for the Harris Interactive poll). Theoretically, the voting system is designed to spread out personal bias so that its effect is negligible. For instance, the voters are chosen from all parts of the country, different age groups, and a wide variety of college backgrounds, which should dilute out all that emotional stuff, like regional bias, historical favorites, and conference or alma mater loyalty.

However, the selection systems are not designed to root out “expert” and “consensus-building”biases. What do I mean by this? With their first vote, members independently rank order the top 25 teams by whatever system they wish to devise, and send their choices to a central group which compiles all the votes and publishes a top 25 list that everyone (including the experts) can see. Early in the season, the “expert” bias is in play because the voters have little head-to-head information available to them, so they choose teams based on their potential, which has proven to be deceiving. The “expert” bias can be clearly seen nearly every year, when early favorites lose a game but do not drop out of the top 25, and sometimes they don’t even drop below a team that beat them on the field. The “expert” bias can also be recognized at the end of the season by checking the final records of the top 25 teams selected in the first poll of the season. While some pre-season favorites complete the season with two or fewer losses, there are always several “surprises” that finish mediocre at best, with five or more losses.

After the first poll or two, the “consensus-building” bias is in play – once members know what the consensus results are, they start to make adjustments to their votes to fall more in line with the other voters. The “consensus-building” bias can be seen by comparing the AP and USA Today polls after a few weeks in the season, relative to their results in the pre-season poll. At the fourth week or so of the season there are many undefeated teams, yet the two polls always look remarkably alike with regards to who is ranked where. Another visible aspect of the “consensus-building”bias is that a team tends to keep its position in the poll when it wins, moving up only if a team a few notches above it loses, and usually keeping the same position relative to the other winners (for example, if the #5 and #7 teams lose, then #6, #8 and #9 will move to positions 5 through 7 respectively, regardless of their opponents or the margins of victory). It is also evident when looking at teams with the same number of losses – the one with the earlier loss in the season is more likely to be ranked higher, because it has more time to recover.

One might argue that the “expert” bias doesn’t really matter because the voters have time to correct their early mistakes and choose the best teams by the end of the season. That argument would be reasonable, except for the “consensus-building” bias which tends to keep teams at the top if they continue to win, aggravated by the fact that only 2 teams (or 4 starting in 2014) are selected for the national championship game(s). In a 2-team or 4-team playoff, inevitably there are going to be several teams with nearly identical records vying for that last playoff spot, and the “consensus-building” bias is going to have a lot of influence. At the end of the season, a team that was considered to be one of the best at the start of the season is more likely to be ranked higher in the polls, compared to another team with an identical win-loss record which was ranked lower at the start of the season. Ultimately, end of season rankings are biased by the perception of who is best at the beginning of the season, which is clearly an unreliable predictor of talent.

If an expert evaluation system is to be employed to rank football teams, it can only be unbiased if the evaluators are “blind”; that is, the voters may be provided any desired statistics or information about the FBS teams, except for their identity, the conference they belong to, and the previous week’s rank. That may be an unusual proposal, but it’s what’s necessary to make the system truly fair.

Saturday, August 11, 2012

Why Polls are Not an Effective Way to Choose the top 2 FBS teams


Given the current regular season scheduling method, it’s impossible for pollsters to compare top FBS football teams with the precision needed to select the best two (or four).  Why?  There are 124 schools with teams competing in the FBS during the 2012 season.  Those teams are divided up into 11 conferences, along with a few teams that are independent.  Nearly all the teams play just 11 or 12 games against other FBS teams, and the vast majority of those games (8 or more) are played against teams within their own conference.  Therefore, while the system is set up nicely to judge the best teams within a conference, there is very little head-to-head evidence established to compare the top teams in each conference to each other.  The math is fairly straightforward.  Teams in each conference play just 2 or 3 FBS games outside of their conference.  Since there are about 110 non-conference schools available for each team, it is likely that only one of the non-conference games will be against a top-35 (or better) team.  It’s inevitable that any system designed to select just 2 teams out of 124, or even 4 teams out of 11 conferences, is going to appear unfair to fans or coaches most of the time, if there are few head-to-head battles as a basis for comparison.

Compare this to NCAA Division I men’s basketball.  There are currently 345 teams, also organized into conferences (32 this year).  A typical NCAA basketball team plays approximately 30 regular season games, with about half of them against non-conference foes.  In fact, many top teams schedule at least a half-dozen non-conference games against other top-50 teams.  With such a large number of non-conference games available as evidence, and a post-season tournament that includes 68 teams, it is not difficult at all for a national consensus polling system to precisely select all of the top 25 teams for the tournament.  In other words, even if two experts disagree on nearly every pick for the top 25, it’s virtually certain that all of their picks will have a chance to earn the national championship on the court.  I am not saying the NCAA Division I basketball tournament selection process is flawless, nor am I saying that the FBS needs to have a 32- or 64-team playoff - I am merely pointing out that the expert polling system is a viable option for identifying top basketball teams, but not a good system for football because of the significant differences in how the regular seasons are set up.


Saturday, August 4, 2012

A short summary of the controversy

The argument about how to choose a college football national champion has been waged for decades.  At the highest level of college football (known as the Football Bowl Subdivision (FBS) since 2006, and Division I-A prior to that), a national champion has been named by some form of consensus since 1883.  Controversy has existed since at least 1925, when the Helms Foundation selected Alabama as the champion and the Dickinson System selected Dartmouth.  There have been split-champions at least 17 more times since 1925, most recently in 2003 when Louisiana State and Southern Cal shared the honors. 

Examples of other ambiguous championships include 1947, when the AP first selected 9-0 Notre Dame prior to the bowl season, and then reversed itself after bowl season to select 10-0 Michigan after they won the Rose Bowl (compared to Notre Dame which did not participate in bowl games at the time).  In 1975, Arizona St finished 12-0, defeating Nebraska in the Fiesta Bowl, but finished second in the national championship to 11-1 Oklahoma.  In 1977, five teams finished the bowl season with 11-1 records.  Alabama fans cried foul when their previously ranked #3 team, and easy winner of the Sugar Bowl, was leap-frogged by the regular season #5 team Notre Dame, which had handily beaten #1 Texas in the Cotton Bowl.  A similar situation occurred in 1983 when #5 Miami beat # 1 Nebraska in the Orange Bowl, jumping over #3 Auburn, which won the Sugar Bowl.  The Division I-A national championship is littered with multiple other controversies during the 20th century.

Why does such a long history of controversy exist?  There are many contributing factors, but the primary reason is that the FBS is the only NCAA-sponsored sport without an organized tournament to determine its champion.  In 1998, after 9 consecutive years of disputes in which one conference after another felt they were cheated out of a national champion because of the existing bowl agreements, the Bowl Championship Series (BCS) was born.  The BCS was supposed to end all controversy by finally giving the top two teams an opportunity to play each other in a final game of the year.  Unfortunately for the architects of the BCS system, they did not anticipate the difficulty involved in determining which teams were the top two, so the controversy has continued.  Examples include the 2006 season, when Boise St completed the regular season 12-0 and was not invited to play the only other undefeated team Ohio St, and the 2008 season, when 12-1 Florida defeated 12-1 Oklahoma, but undefeated Utah was not deemed worthy to play either team.

Sunday, July 29, 2012

A 4-team playoff. Really?

Since the 1980's, I've been frustrated by the irrational way that a  Division 1-A (FBS) football champion is selected.  Neither the NCAA nor the sports media have been able to come up with something reasonable that survives the test of time, or even two or three seasons, without significant controversy.  During the last decade, the BCS committee has tried tinkering with the system during the off-season and generally made the situation worse. 

Now, in the summer of 2012, the BCS committee has finally agreed in principle to a 4-team playoff system, starting with the 2014 season.  They (and many college football fans) are hoping this, finally, will end the controversy.  Me, I am still skeptical.  While a 4-team playoff may look like a step forward, only a well thought-out system for selecting those playoff teams will enable the controversy to significantly subside.  I think the BCS committee should ask fans to help them design a selection method, but it appears they want to do this on their own, in a secret back room. 

This blog is dedicated to developing a rational, unbiased method for recognizing an FBS national champion.  Since neither the NCAA or the BCS committee has asked for my opinion, I hope to build consensus for a transparent method that makes sense, while there's still time . . .