Thursday, August 30, 2012

What about computer-based evaluation systems?


Instead of a human-expert evaluation system, let’s consider if a computer-based evaluation system could be employed in a fair way to pick the top 4 teams to participate in a post-season playoff.  The BCS committee has been using computer-based systems as part of the selection process since the inception of the BCS process, but they are not held in high esteem by the committee.  Out of six computer rankings, the BCS system averages the results from the middle four for each team, then averages that result with two human-expert polls (the USA Today Coaches Poll and the Harris Interactive Poll) to determine the overall ranking.  Thus, each computer-based system has about 6% influence in the final results, while each human-expert system has about 33% influence.  The full BCS selection procedures can be found at this website: BCS Selection Procedures

Currently, there are 6 different computer-based methodologies in use (Anderson & Hester, Richard Billingsley, Colley Matrix, Kenneth Massey, Jeff Sagarin and Peter Wolfe).  All of them employ some method of calculating schedule strength, but by decree of the BCS committee, none of them employ margin of victory directly or indirectly in their calculations.  This website:  BCS Know How - Computer Rankings  summarizes each one of them in turn.  There are many other computer ranking systems developed, with their results available online.  Each ranking system emphasizes the available statistics differently, based on the philosophy of the developer.  Some are interested in a power ranking (how good is the team right now?); others are interested in potential (how will the team do in its next game?).  Still others are interested in results (how much has the team achieved so far this year?)  Some think that the location of games (home vs. away) is critical; some even think that the number of people in the stands is important.

One could argue that we should not rely much on computer-based ranking systems because computers are not emotional and therefore can’t fairly evaluate the “intangibles”.  But that is exactly the advantage that a computer-based system has, if designed adequately – it can provide an unbiased assessment of the results on the field, because it can repeatedly and accurately evaluate the data without being distracted by emotion.  One could also argue that we should not rely on computer-based ranking systems because computer programs have the programmer’s bias built in, and inevitably have bugs in them, which will lead to bad results.  While this is a possibility, a well-designed, simple, and transparent computer program would certainly be less likely to provide undesirable or controversial results than depending on a human-expert system which makes no attempt to correct or prevent a known source of bias.

Next: how to design a computer-based ranking system that mimics “blind” evaluators.

Tuesday, August 14, 2012

Bias in the Ranking System

The scarcity of head-to-head non-conference games between top FBS teams magnifies the real culprit in creating a fair way to determine the national champion: bias built into the existing ranking system. Now, the rank of a team is primarily determined by votes of persons who are FBS football experts (65 sportswriters/broadcasters in the AP poll, 57 FBS coaches in the USA Today poll, and 115 school representatives and members of the media for the Harris Interactive poll). Theoretically, the voting system is designed to spread out personal bias so that its effect is negligible. For instance, the voters are chosen from all parts of the country, different age groups, and a wide variety of college backgrounds, which should dilute out all that emotional stuff, like regional bias, historical favorites, and conference or alma mater loyalty.

However, the selection systems are not designed to root out “expert” and “consensus-building”biases. What do I mean by this? With their first vote, members independently rank order the top 25 teams by whatever system they wish to devise, and send their choices to a central group which compiles all the votes and publishes a top 25 list that everyone (including the experts) can see. Early in the season, the “expert” bias is in play because the voters have little head-to-head information available to them, so they choose teams based on their potential, which has proven to be deceiving. The “expert” bias can be clearly seen nearly every year, when early favorites lose a game but do not drop out of the top 25, and sometimes they don’t even drop below a team that beat them on the field. The “expert” bias can also be recognized at the end of the season by checking the final records of the top 25 teams selected in the first poll of the season. While some pre-season favorites complete the season with two or fewer losses, there are always several “surprises” that finish mediocre at best, with five or more losses.

After the first poll or two, the “consensus-building” bias is in play – once members know what the consensus results are, they start to make adjustments to their votes to fall more in line with the other voters. The “consensus-building” bias can be seen by comparing the AP and USA Today polls after a few weeks in the season, relative to their results in the pre-season poll. At the fourth week or so of the season there are many undefeated teams, yet the two polls always look remarkably alike with regards to who is ranked where. Another visible aspect of the “consensus-building”bias is that a team tends to keep its position in the poll when it wins, moving up only if a team a few notches above it loses, and usually keeping the same position relative to the other winners (for example, if the #5 and #7 teams lose, then #6, #8 and #9 will move to positions 5 through 7 respectively, regardless of their opponents or the margins of victory). It is also evident when looking at teams with the same number of losses – the one with the earlier loss in the season is more likely to be ranked higher, because it has more time to recover.

One might argue that the “expert” bias doesn’t really matter because the voters have time to correct their early mistakes and choose the best teams by the end of the season. That argument would be reasonable, except for the “consensus-building” bias which tends to keep teams at the top if they continue to win, aggravated by the fact that only 2 teams (or 4 starting in 2014) are selected for the national championship game(s). In a 2-team or 4-team playoff, inevitably there are going to be several teams with nearly identical records vying for that last playoff spot, and the “consensus-building” bias is going to have a lot of influence. At the end of the season, a team that was considered to be one of the best at the start of the season is more likely to be ranked higher in the polls, compared to another team with an identical win-loss record which was ranked lower at the start of the season. Ultimately, end of season rankings are biased by the perception of who is best at the beginning of the season, which is clearly an unreliable predictor of talent.

If an expert evaluation system is to be employed to rank football teams, it can only be unbiased if the evaluators are “blind”; that is, the voters may be provided any desired statistics or information about the FBS teams, except for their identity, the conference they belong to, and the previous week’s rank. That may be an unusual proposal, but it’s what’s necessary to make the system truly fair.

Saturday, August 11, 2012

Why Polls are Not an Effective Way to Choose the top 2 FBS teams


Given the current regular season scheduling method, it’s impossible for pollsters to compare top FBS football teams with the precision needed to select the best two (or four).  Why?  There are 124 schools with teams competing in the FBS during the 2012 season.  Those teams are divided up into 11 conferences, along with a few teams that are independent.  Nearly all the teams play just 11 or 12 games against other FBS teams, and the vast majority of those games (8 or more) are played against teams within their own conference.  Therefore, while the system is set up nicely to judge the best teams within a conference, there is very little head-to-head evidence established to compare the top teams in each conference to each other.  The math is fairly straightforward.  Teams in each conference play just 2 or 3 FBS games outside of their conference.  Since there are about 110 non-conference schools available for each team, it is likely that only one of the non-conference games will be against a top-35 (or better) team.  It’s inevitable that any system designed to select just 2 teams out of 124, or even 4 teams out of 11 conferences, is going to appear unfair to fans or coaches most of the time, if there are few head-to-head battles as a basis for comparison.

Compare this to NCAA Division I men’s basketball.  There are currently 345 teams, also organized into conferences (32 this year).  A typical NCAA basketball team plays approximately 30 regular season games, with about half of them against non-conference foes.  In fact, many top teams schedule at least a half-dozen non-conference games against other top-50 teams.  With such a large number of non-conference games available as evidence, and a post-season tournament that includes 68 teams, it is not difficult at all for a national consensus polling system to precisely select all of the top 25 teams for the tournament.  In other words, even if two experts disagree on nearly every pick for the top 25, it’s virtually certain that all of their picks will have a chance to earn the national championship on the court.  I am not saying the NCAA Division I basketball tournament selection process is flawless, nor am I saying that the FBS needs to have a 32- or 64-team playoff - I am merely pointing out that the expert polling system is a viable option for identifying top basketball teams, but not a good system for football because of the significant differences in how the regular seasons are set up.


Saturday, August 4, 2012

A short summary of the controversy

The argument about how to choose a college football national champion has been waged for decades.  At the highest level of college football (known as the Football Bowl Subdivision (FBS) since 2006, and Division I-A prior to that), a national champion has been named by some form of consensus since 1883.  Controversy has existed since at least 1925, when the Helms Foundation selected Alabama as the champion and the Dickinson System selected Dartmouth.  There have been split-champions at least 17 more times since 1925, most recently in 2003 when Louisiana State and Southern Cal shared the honors. 

Examples of other ambiguous championships include 1947, when the AP first selected 9-0 Notre Dame prior to the bowl season, and then reversed itself after bowl season to select 10-0 Michigan after they won the Rose Bowl (compared to Notre Dame which did not participate in bowl games at the time).  In 1975, Arizona St finished 12-0, defeating Nebraska in the Fiesta Bowl, but finished second in the national championship to 11-1 Oklahoma.  In 1977, five teams finished the bowl season with 11-1 records.  Alabama fans cried foul when their previously ranked #3 team, and easy winner of the Sugar Bowl, was leap-frogged by the regular season #5 team Notre Dame, which had handily beaten #1 Texas in the Cotton Bowl.  A similar situation occurred in 1983 when #5 Miami beat # 1 Nebraska in the Orange Bowl, jumping over #3 Auburn, which won the Sugar Bowl.  The Division I-A national championship is littered with multiple other controversies during the 20th century.

Why does such a long history of controversy exist?  There are many contributing factors, but the primary reason is that the FBS is the only NCAA-sponsored sport without an organized tournament to determine its champion.  In 1998, after 9 consecutive years of disputes in which one conference after another felt they were cheated out of a national champion because of the existing bowl agreements, the Bowl Championship Series (BCS) was born.  The BCS was supposed to end all controversy by finally giving the top two teams an opportunity to play each other in a final game of the year.  Unfortunately for the architects of the BCS system, they did not anticipate the difficulty involved in determining which teams were the top two, so the controversy has continued.  Examples include the 2006 season, when Boise St completed the regular season 12-0 and was not invited to play the only other undefeated team Ohio St, and the 2008 season, when 12-1 Florida defeated 12-1 Oklahoma, but undefeated Utah was not deemed worthy to play either team.