Update Dec 16, 2002.
I wrote this about 5 years ago and pretty much had not
even reread it until now. This is just a general
description. The method of summing z-scores of statistics
has not changed. Which variables I include in the sum may
have changed. For example I do not use passing yards, I now
use team quarterback rating. The NFL currently uses more
inputs than NCAA.
I find it interesting to test different rating methods. Hence,
the creation of the Prediction Tracker. It is my intention to write
up a brief description of each model: performanz, elo, least squares
regression, least absolute value regression, logistic regression,
scoring effeciency, pythagorean, etc, just enough to let
you know what each is doing mathematically. When I get that done
remains to be seen.
PerformanZ got it's start a little differently than many of the
other computer systems that are around today. What interested
me was the non-transitivity of football. Team A beats Team B,
Team B beats Team C, and Team C beats Team A. Which is the
better team? The other thing I was interested in was comparing
the results (predictions) of different types of systems.
The obvious first choice for a model would be least squares
regression. This would work best if team played each other
both home and away (Like the NBA perhaps?) I think least squares is
a little more appropriate modeling the NFL than I do college football,
where there are over 100 teams and you have about 10 leagues that have
little interaction if any with each of the other leagues. Least
squares(or least absolute value) reqression makes more sense
in the setting of the NFL, which has(had) 30 teams with
more interdivisional linkage than college football.
So I took least squares to be my standard for comparisons.
I then looked to see if I could come up with something totally
different that would have an accuracy at least as good as
least squares. Another thing I would like from a system would
be that it be able to predict games more accurately than the
vegas line. Now, trying to come up with a system that can
beat both least squares and the vegas line is not an easy task.
My first thought went back to the non-transitivity question,
other people will give you different answers when they are
asked why Team B is rated above team A even though team A has
beaten Team B. To me the answer to that is so simple and obvious
that people tend to forget about it. Simply put, the better
team does not always win. Whether it is from injuries,
weather, overconfidence, or turnovers, it certainly happens
that in football and all other sports that there are times
when the better team loses to the lesser team. And what
that implies is that the scoreboard may not always be the
best place to turn when you want to compare two teams. This
flys in the face of all the other systems that believe
winning is the only thing that matters. If that was the case
why not just rank teams according to winning percentage
and break ties based on difficulty of schedule.
Where else can we look other than the scoreboard? Well I
am one of those that believe in 'winning on the field'. I
believe that in the majority of all games the team that plays
better on the field wins the game. So most of the time the
scoreboard is a good place to look. But where I differ from
the 'just win baby' crowd would be that in my mind a team
can still 'win' even if they lose. The better team may have
been the better team during the game, just not on the scoreboard
at the end of the game. So I decided to take a look at game
performances. These are the factors in my college PerformanZ Ratings.
The factors are similar for the NFL, and in basketballimportant game
statistics such as points and rebounds can be used.
1 Team winning percentage.
2 How well a team can score points.
3 How well a team can stop the other team from scoring.
4. How well a team runs the ball.
5. How well a teams stops the run.
6. How well a team throws the ball.
7. How well a team stops the pass.
8. Turnovers
I use the team yardage data, such as offensive rushing yards per
carry, to measure a teams offensive running ability. These things
get blown out of proportion if a good team plays a bad team,
therefore they are weighted by a difficulty of schedule factor.
Thus, Nebraska is not rewarded for piling up 500+ rushing yards
against a weak team like Pacific.
Each of the eight factors are then transformed to a standard
normal distribution (they are already approximately normal
to begin with). Now all the factors are on the same scale.
Summing the eight factors gives a raw total. Teams can then
be ranked based on these totals. Generally the higher
ranked teams are teams that are good all across the board.
A team with a great offense won't be at the top unless
they also have a good defense. The totals are just meaningless
numbers but by mapping them to a points per game scale I am able
to make comparisons to the point spread and make game predictions.
So basically I feel that PerformanZ is a measure of a teams
true ability through their games to date. For example, in
1997 I had a 1-2 UCLA team in the top 5, which seems very odd.
But then UCLA proceed to win all of their remaining games
and finished in the top 5 in the national rankings. So it
doesn't necessarily take long for teams to reach their true
ability. And even if a team is losing it is still possible
that they are a very good team and would have a legitimate chance
to win any form of playoff tournament. I'll get a handful of team like
this every year that deviate from where the consensus would place them.
I think these are the teams to look at closely. These teams are
often teams that are over or under rated by the public at large.
Or they could just be the team with great players but some other
factor, such as coaching, keeps them from reaching the winning
potential.
PerformanZ is created to measure past ability but
obviously past ability is the best predictor of future
ability. So how well does this system predict?
Last season in the rec.sports.football.college college football
pool (cfpoool.com) I was the highest rated system of all.
I was 73.7% on their preselected matchups. Compared to
the BSC systems, Sagarin 70.6% and Dunkel 68.2%
(I encourage everyone with a system to enter their picks
as a sytem this year.)
In 1998 my straight up picking percentage for all Div IA was
80.7% and was 70.4% for NFL games. The normal approximations
work better with the 112+ teams in NCAA than the 30+ of the NFL.
That and the parity in the NFL explains why there is such a large
difference.
Notice from my 1998 NFL bias/variance plot that my PerformanZ ratings
make unbiased predictions. That is, when I make predictions
for next weeks games, on average I am off by 0 points. Ideally
the best system would be one that is unbiased (the vegas point
spread was shown to be unbiased in Stern, 1991) and also have
the smallest variance. Also, notice from the bias/variance plot
how large the standard deviations are. They are about 14 points
for all of the NFL systems measured. That is huge. So if Vegas says
Dallas is a 3 point favorite over Chicago, a 95% confidence interval
on that game ranges anywhere from Dallas winning by 31 to Chicago
winning by 24. So that gives you an indication of why a team can
be ranked higher than a team they have lost to. The ratings/lines
may be correct on average but the variances are extremely large. For
college football these variances are even larger than the 14 points seen
in the NFL.