Sunday, May 26, 2013

The Stat Score: A Rat's Nest of Numbers

Since I refer to the Stat Score rather frequently when it comes to wide receivers, I thought I should lay out how it all works.  Unfortunately, this is something that starts off very simple, and proceeds to get weirder and more questionable as I try to untangle the mess of numbers that I use.  To keep things as clear as possible, I'll try to explain it all in steps, before I get to some of the more deranged issues.

On its most basic level, it revolves around one core idea.  How much of a team's offense was a receiver responsible for?  Well, that is simple to figure out.  If a college team gained 5000 yards, and a receiver in that offense gained 1000 yards, then he was involved in/responsible for 20% of the total yards.  So, why should we care about this percentage?  The reason for this is that raw yardage doesn't tell us very much about how valuable a player was to their team.  For instance, look at this example:

Team's Total Offense                Receiver's Yardage               % of Offense
6000                                             1200                                    20%
5000                                             1000                                    20%
4000                                               900                                   22.5%

As you can see, as the team's total output goes up, the receiver has to produce even more to be as proportionally valuable.  While the receiver who generated only 900 yards initially appears less impressive, he was possibly  more valuable to his team's offense than the player with 1200 yards.  It's easy to get excited about a draft prospect who has gaudy stats, but sometimes we lose track of the context in which the stats are generated.  A player who is generating a larger percentage of his team's offense is probably going to be the focus of the opposing defense.  So, generating significant stats, while being the main focus of your team is more challenging.

As I've mentioned before, this is very similar to Shawn Siegele's Dominator Rating system.  The only real difference is that I base the percentage off of the total offense, and Shawn does it based off of the passing offense.  In the end, it is sort of like the difference between  Dr. Pepper versus Mr. Pibb.  It's debatable which approach is better, and maybe I am wrong to do it this way, but I am too lazy to change my database now.  I think that a team's ability to run the ball should be a part of the score, as it alleviates some of the pressure on the  receivers, so I take that into account.

Beyond this difference of opinion, relating to the running game, there is one other issue.  Instead of giving a score for the player's last year in college, I use their last 2 years.  If one year of production is good, two is better.  This is just my way of weeding out one year wonders.  The average percentage of the offense that a draft prospect is responsible for in his last college year is 17.75%.  In their next to last year, the average is 15.34%.  The actual averages would really be lower, if I considered all receivers, and not just the ones who had historically gotten drafted.

After I have these scores I convert them, to see how many standard.deviations the result is away from the average, because being able to make bell curves is neat, if perhaps somewhat pointless.  You could really just stick with the basic percentage and still be perfectly fine, but I like to complicate things for myself.  One standard deviation for their final year would be 6.581%, and for their next to last year 6.674%.  So, the math works like this for their final year (next to last year is the same, but obviously putting in a different set of numbers, and the correct corresponding standard deviation):

(Player's % of offense as senior - 17.75)/6.581

The resulting score lets you calculate what percentile the player would be in, or create a bell curve to see how common their results are.  Generally, though distributions can vary from position to position, or subject to subject, one positive standard deviation will place a player near the 70th percentile, and one negative deviation around the 30th percentile.

Now things start to get weird.  Just for the sake of being able to easily peruse a player's numbers I will combine some of their scores to create a sort of overall score.  It's generally a bad idea to rely too much on these overall scores, especially when they start to combine large numbers of smaller scores, as they can hide weird imbalances or irregularities.  It's kind of like judging someone based on their college GPA, instead of looking at their grades in individual classes.  The grades in individual classes will give you a better picture of what is going on.  Maybe someone was average in 6 out of eight classes, excelled in one, and bombed another.   If that one class he bombed was something important (Nuclear Physics for Dummies), and the one he excelled in was something stupid (Basket Weaving 101), that is worth making a note of.  Combining scores into one larger score, just lets me get a quick glimpse of a player, though it can be less accurate sometimes.  I'll still refer to them in my posts for the sake of simplicity, but I'll also try to point out areas of concern when they arise.

Here's one issue/observation that I should mention.  It seems to me that smaller players tend to perform at a higher level in college, at a younger age, while the bigger guys tend to really hit their stride around their junior or senior year.  If you see a player with multiple 1000 yard seasons, he is more often going to be built to a more average size.  This is just mad speculation on my part, but I have to wonder if this has something to do with the bigger guys needing more time to grow into their frame, or reach their physical peak, compared to a 5'10" and 190# player, who might have finished growing by his senior year of high school.  Like I said, mad speculation.  Either way, I combine their 2 years of production in different ways because of this.  For smaller guys (under 210#), I value their final 2 years evenly, in a 50/50 split.  For the bigger guys (over 200#) I value their 2 years with an emphasis on their final year, with a 70/30 split.  This is all a highly debatable and possibly stupid thing to do, but for now, that's what I'm doing.

The last thing I do is to look at their average yards per catch over both years (though I just average their two years here, and don't worry about the two separate averages), and their number of receptions in both years.  This is also highly questionable, but I like having the data.  I'll list here the average result for the 2 years and the standard deviation.  Calculating how many standard deviations away from the average a result is can be done the same as I mentioned earlier.

Yards Per Catch                          Average                       Standard Deviation
Combined 2 year  Avg.                    14.966                                   2.645                         

                                               Avg. # of Receptions        Standard Deviation
Final Year                                        62.06                                      23.6
Next to Last Year                             51.75                                      22.68

I just like to have these scores to get an idea of how big an impact a player has on a per play basis, and what their total volume of plays looks like.  To some extent, a player who does well in YPC will tend to have a lower volume of receptions, and vice versa.  So, a player is unlikely to score well in both.  Extremely explosive deep threats generally have fewer receptions, and reliable high volume underneath guys tend to have a lower YPC.  It's just something I like to look for, but it isn't terribly important, so I usually only let it count for about 10-20% of a player's total Stat Score, depending on their size.  Since the two scores tend to knock each other out, they usually have very little overall impact.  This is just something I sprinkle in there as a garnish.

In the end, I wouldn't get too carried away with trying to divine mystical meanings from these scores.  A lot of stat geeks are trying to find the 'One Stat To Rule Them All', and I'm not convinced that such a thing exists.  I'll use stats that sometimes might appear to be trying to reach this goal, but I don't take them too seriously.  I'm just trying to find an acceptable standard for making comparisons between prospects.  I'm not trying to make any claims of having found the one true answer.  The stats are useful for drawing your eye towards the exceptional versus the mundane, but beyond that you are still going to be better informed when you look at the broader set of data that goes into these stats..  This is an ESPN world though, and people seem to want overly simple answers to relatively complex questions.  With that said, even a method such as this, that is at the very least questionable, one can get a better sense of a player than just raw stats from college.

As time moves on, I will undoubtedly make alterations to my views on certain stats, and will update past numbers, and explain when/why I'm making such changes.  Or, at some point, my eyes will fall out from staring at spreadsheets for too long.

No comments:

Post a Comment