Wednesday, June 29, 2011

Why Shooting Stats Are Better Than Goals

Let's say you are asked to rank the NHL teams halfway through the season. Which stats should you use to do this?

Before getting to that, we need to think carefully about what a ranking means. The best team in the NHL is the one that is the best at winning games, and so on down the line. This comes down to two things - scoring goals and preventing the opposition from doing the same. If someone says X "is the best team in the league", what they mean is that X is the best at outscoring their opponents. Similarly, if Y is the best player in the league that means that he is the best at the combination of generating goals for his team and preventing them for the other.

Success at scoring and preventing goals in hockey, like every activity, is a combination of skill and luck. For some things, e.g. roulette, luck is the dominant factor. In others, like sprinting 100m, skill overwhelmingly wins the day. Hockey falls somewhere in the middle, perhaps closer to roulette than anyone would care to admit. Getting back to ranking the teams, that means figuring out which are the strongest at the skill part. Note that I'm using skill loosely here to refer to any skills that help a team score goals and prevent them, including those like grit and mental toughness that pundits love to talk about.

There are a few ways to tease out this skill component, all of which I will use in various articles in the future. Here I will compare stats from each team in two different groups of games - each half of the season, numbered even vs. odd, etc. The idea behind this is that luck in the first half of the season and luck in the second half of the season should be completely unrelated. Sometimes your team will get lucky in the first half and unlucky in the second half of the season, but the opposite is just as likely. Think of it like two coin tosses. If you win the first coin toss then you are no more likely to win the second than if you'd lost it. In contrast with the luck factor, your team should usually be about as skilled in the second half of the season as the first. If there is no relationship, known as correlation, between luck in the first half of the season and the second any link will be due to skill.

Mostly due to the availability of data, I restrict attention to 5-on-5 situations where both goalies are on the ice. For each of the past four years, I split the season in half and look at how goals and shooting stats in the first half relate to goals in the second half. Because we care about both scoring and allowing goals, I expressed this as a percentage: goals for divided by the sum of goals for and against (GF/(GF+GA)). The same goes for shooting stats.

Here is a graph of the relationship between goal percentage in the first half of the season and the second. All data are from timeonice. See links on the right.


It looks rather weak. The numbers back that up - the correlation is just 0.13. This is not statistically significant. Even ignoring that, it's pretty clear that putting up good scoring numbers 5-on-5 with the goalies in net in the first half of the season doesn't mean much in the way of predicting performance in the second half.

The relationship between Corsi percentage in the first half of the season and goal percentage in the second half is far stronger. Corsi percentage is like goal percentage, but for all types of shots, including missed shots and blocked shots. Here is the scatterplot:


You can see a distinguishable up-and-right pattern, which indicates a stronger relationship between the two. The correlation is 0.36, which is statistically significant. Keep in mind that we're looking at how shooting ratios in the first half relate to goals in the second half.

Let's look at the best and worst teams in the first half of this last season. The New Jersey Devils were an impressively bad 10-29-2 on January 8th, with an overall goal differential of -58 (72 - 130). 5-on-5 with goalies in their goal differential was -48 (45 - 93) and goal percentage 32.6%. That is the worst goal percentage in either half for any team in any of the four seasons of data that is available at timeonice. In contrast, the Flyers looked like world beaters halfway through. Their record was 26-10-5, goal differential +30 (137-107) and goal% 5-on-5 a cool 60%. What happened in the second half? The Devils put up one of the best turnarounds in NHL history, nearly making the playoffs, and the Flyers record was mediocre. The Devils went 28-10-3, the Flyers 21-13-7. The Devils had an overall goal differential of +23 (102-79), the Flyers +6 (122-116). 5-on-5 with goalies, New Jersey had a goal differential of +23 (76-53), 58.9%, and Philly 0 (81-81), 50%.

How could the worst team in the league in the first half have a better second half than the best team by such a large margin? The answer comes down to the luck factor I discussed above. In the first half, New Jersey took 52.6% of all the 5-on-5 Corsi shots in their games. Philadelphia was actually worse, just better than even at 50.6%. Despite that, the Devils got hugely outscored and the Flyers got far more goals than their opponents. While skill may be a factor in shots going in and being saved by your own goalie, the topic of my next article, luck plays a massive role in scoring over just a half season. The Devils were clearly not getting the bounces and the Flyers were. In the second half of the season, Philadelphia's luck was about average and New Jersey actually caught the breaks.

You can see how much better Corsi stats handle luck by looking at the two teams in the graphs above. New Jersey is the red point and Philadelphia orange. You can see that the Devils are a huge outlier when you look at goals in the first and second, but not so looking at Corsi in the first half and goals in the second, though you can see that they were fortunate. The goals graph is so scattered that the Flyers don't stand out much, but you can see that they dropped off a lot by how far they are from the top of graph. On the Corsi graph they are right in the middle, so from that perspective their second-half performance should have been expected instead of surprising.

Other articles might stop there, but things get more interesting if you run a regression. Regression analysis is a tool I will use pretty frequently. It allows you to separate out different effects. In our case, we want to know how important goals in the first half are once you take Corsi into account, and vice-versa. The regression makes it very clear that Corsi% is a far, far better predictor of goal% in the second half than first-half goal%. Not only that, it appears that virtually all of the tiny amount of explanatory power you get from goal% comes from the fact that goals are a type of shot.

When the regression spits out a formula, the size of the coefficient tells you how big its effect is. When both first-half goal% and Corsi% are included, the goal% coefficient is a minuscule 0.007. For the stats nerds, the standard error is 0.087 so the p-value is an astonishing 0.936. This is about as statistically insignificant as it gets. For comparison, the coefficient for Corsi% is 0.550 (SE of 0.142, p < 0.001) which is very strongly significant. If you have a team that breaks even on goals in the first half of the season but Corsi outshoots its opponents 60-40 then they will average about 83.3 goals scored and 66.7 allowed in the second half of the season (assuming 150 total 5-on-5 goals, which is close to the league average). If instead you have a team that was even on shots but won the goal battle by that much then they will average 75.2 goals in the second half and concede 74.8.

Once Corsi is taken into account, goals do not at all predict future success.


Topics left for future articles:
- What about score effects?
- What about Fenwick?
- What about special teams?
- Is shooting all luck, then?

Putting Skaters in a Context: The World of Advanced Hockey Metrics

With the world of advanced hockey metrics continually improving, we are now beginning to see hockey players evaluated in more diverse ways than ever before. Since the beginning of many a hockey fandom, a quick glance at a skater’s goals, assists and total points has been the measure that grades offensive prowess across the league’s scorers. Now, however, the emergence of a few newer (and quite frankly, better) statistics allows us to take these age-old points totals and put them in a context, showing just how valuable a player may or may not be to his team’s success. Here at Driving Play, while attempting to evaluate different players across the league we will be commonly referring to many of these newer statistics within our analysis. Below is a quick list that will attempt to make clear just what we may be referring to if an unfamiliar term happens to appear within one or more of our posts.

A Corsi Number – Similar to a +/- statistic, Corsi gives a player a (+) upon the event of his team generating either a shot on goal, a missed shot, or a blocked shot directed at the opponent’s net while he is on the ice. Similarly, a player earns a (-) if the opponent generates a shot on goal, missed shot, or a blocked shot directed at his own net. Sometimes this can be expressed as a percentage, i.e. the percentage of the total shots that are directed at the opponent’s net while a player is on the ice. Corsi can also be expressed in a “Relative Corsi” number which is the difference between a player’s on-ice Corsi score and the shot differential while he is on the bench. Relative Corsi is generally used to look at which players are having the most positive effects on shot totals relative to their teammates.

A Fenwick Number – Since many consider shot-blocking a measurable skill in the hockey world, a Fenwick number is the same as a Corsi number, except blocked shots are taken out of the equation. So, a player will earn a (+) if his team generates a shot on goal or a missed shot whilst he is on the ice, and a (-) if either event occurs for the opponent.

Quality of Competition (QUALCOMP) – The fact is, all ice time in the NHL is not created equal. Having to line-up toe-to-toe with Sidney Crosby is a much different task than Jesse Winchester, the hockey player or the musician. QUALCOMP more or less weighs the on-ice +/- (the familiar statistic measured in goals) of a player’s opponents relative to the rest of his teammates, and averages this rating across every player faced during the season. The higher the resulting rating, the better the competition a player is facing and vice versa. There is also a CorsiRelQUALCOMP number which does the same thing, except uses Relative Corsi instead of +/-.

Quality of Teammates (QUALTEAM) – Similar to QUALCOMP, QUALTEAM weighs a player’s teammates using the exact same formula as QUALCOMP. Just like QUALCOMP, a player’s QUALTEAM rating will be higher if he is playing with first-line teammates and vice versa if he is playing with fourth-line enforcers. Also similarly, CorsiRelQUALTEAM will measure a player’s teammates using Relative Corsi.

Zone Start Percentage – A zone start percentage measures the percent of the time any player starts his shift in the offensive zone. As you might expect, players with a high defensive prowess are often called upon to start in the defensive zone frequently, and vice versa is true for those players who are more inept in their own end. This particular statistic is important in that it can directly affect a player’s aforementioned Corsi or Fenwick percentage since players who are starting in the offensive zone more frequently will have an easier time generating more shots towards the opponent’s net. What’s more, players who are more immediately deployed in defensive roles will have a harder time finding shot opportunities than their counterparts who are already starting in prime offensive positions. 

Score Effects – Within the ebbs and flows of a hockey game, it has been a long-believed ideal that teams will go into more of a “defensive mode” while ahead and try and get just about every shot possible on net while behind. Using Corsi and Fenwick percentages, it has been shown that teams who enjoy an advantage in the score are commonly outshot at improving rates as the game progresses and vice versa. With the score tied, the disparity in shot totals is most close to even which is why many advanced hockey statisticians choose to look at Corsi/Fenwick with the score tied at even strength to put players’ ice time on a level playing field.

Coming back to the original point regarding putting different skaters in a context, we are now able to more closely examine the situations that different players are playing in. For this reason, it is now much easier to come to a conclusion about their value to their respective teams. Before these statistics came into play, we could look at two players, Patrice Bergeron and Ville Leino for example, who had similar point totals during the regular season (57 and 53 respectively). In a vacuum, it may seem as if they are both comparable players toward Boston and Philadelphia’s total success. However, a little scratching beneath the surface reveals that Bergeron played against much tougher competition than Leino, and Leino enjoyed the luxury of skating with better teammates. Leino started in the offensive zone a walloping 62.3% of the time compared to Bergeron’s 42.7%, showing us that Leino was given far more prime scoring opportunities to begin his shifts which undoubtedly had a positive effect. Finally, Bergeron’s Corsi and Fenwick percentages with the score tied at even strength were 52.7 and 52.8% respectively, compared to Leino’s 54.9 and 53.1%. While a higher percentage of the on-ice shots were directed at the opponent’s net while Leino was on the ice, we have of course already noted that Bergeron faced tougher opponents and played with worse teammates than Leino which gave Leino an advantage in putting up better numbers in these categories. Had Leino, a notoriously subpar defensive forward (see: 2 seconds of average shorthanded time-on-ice/game in ’10-11) been given minutes similar to Bergeron’s, the point totals most certainly would not have looked anything similar. Considering the minutes they were given, Bergeron most certainly had an excellent season while Leino performed at a level around what we might expect from a forward given “softer” minutes during each game.