Monday, October 3, 2011

On The Problem With Corsi Rel

Advanced hockey statistics are in their infancy. There's a lot about the game that is going ignored or being glossed over in the hockey numbers community, and we're still years behind others sports. I've seen the baseball community make tremendous strides in the five or so years that I've been aware of sabermetrics. The sabermetrics overview book Baseball Between The Numbers, published around 2006, contains many notions that are outdated - we just know more about the game now. I also used to be a fan of the site Football Outsiders, until I saw that too many of their articles took logical leaps based on less-than-solid evidence. I certainly didn't have the answers to the questions they were posing, but I sure as hell didn't believe in their answers. We can get only so far with statistics that remove certain aspects of play and focus on others. All this preamble leads up to the point of this article: Corsi Rel is a flawed statistic. That doesn't mean we should reject it outright, but it does mean that we have to be careful when using it. (All numbers in this post courtesy of behindthenet.ca, except where indicated).

Examples of where Corsi Rel can lead us astray:

A. A poor territorial team

Let's take the Islanders, who according to Vic Ferrari's Time On Ice script, were 46.5% Fenwick in all even strength situations this year. We know they were driven back a lot. This is going to create a problem if we just look at one player's Corsi Rel with his Zone Start, because a player may have a positive Corsi Rel with a below 50% Zone Start - these two things together are assumed to indicate skill.

The other issue with the Islanders is their atrocious 4th line. Trevor Gillies had a woeful 26.4% Fenwick. Expressed in Corsi/60, he was -52.07. These horrendous results are going to skew all the Islanders' Corsi Rel numbers, because they pull down the team's total Corsi by a not-insignificant amount.

B. A team with lots of injuries

Take the Islanders again, where no defenseman played more than 64 games for the team. They also had lots of players going in and out of the lineup at forward - 10 forwards played 42 or fewer games last season for the Isles. Perhaps I'm just unclear on the meaning of Corsi Rel, but as I understand it, it compares a player's results to the players who are also on the team when he is playing. Depending on what lineup the team is icing, it may give a skewed picture, especially if that lineup is particularly worse or better than the lineup the team 'typically' puts out.

C. An excellent team

This is basically A reversed. A team like the 2009-10 Chicago Blackhawks, which featured 11 players with above a 50% zone start and most players with out of this world Corsi production, could make things look ridiculous. For instance, Dustin Byfuglien had a -5.4 Corsi Rel that year, but he had a 10.1 Corsi On. He was still probably an excellent bottom-6 forward, despite the negative Corsi Rel on a great team.

D. Quality of Competition

My colleague JaredL helpfully pointed this out, and I think he wrote it so well that I won't even change it: "Corsi Rel tends to exacerbate usage issues. If a player plays against tough competition then that means his team's competition when he's off the ice is going to be easier. That makes it a double whammy - his Corsi On takes a hit because of the tough competition and his Corsi Off gets a boost because of the weaker competition. The same thing applies to O-Zone Starts."

E. What are we really measuring, anyway?

We've mentioned before the thought experiment where we consider two hockey teams playing against one another with static lines - line A goes against line A of the other team, line B vs. line B and so forth. We know that hockey is a more fluid game than that - teams match up different lines against different lines, whether by chance or by choice, and players move around on lines due to injury and performance. Regardless, what exactly does the first line have to do with the fourth line? How much influence does the third line's play have on the first line? Corsi Rel assumes that that relationship is particularly meaningful.

I'm certainly not smart enough to synthesize Zone Start, Quality of Teammates and Competition, as well as Corsi, into one statistic that would comprehensively define territorial play. I'm hoping for that day soon. Until that day comes, we will be stuck with Corsi Rel - it's far from perfect, but in some ways it's still the best we've got.

5 comments:

  1. You should have Adjusted Relative Corsi relatively soon. Maybe the end of the week. I've been working on it for awhile.

    ReplyDelete
  2. I've had many of these same concerns about Relative Corsi, and have shied away from using it recently. Instead I've been adjusting Corsi for Zone Starts, and that's it for now. QualComp & QualTeam aren't really suitable to use for regression purposes, at least not yet. If you regress for those measures against Corsi, you basically come up with no relationship at all.

    ReplyDelete
  3. To touch on the last point, I know it's on/off ice but think a lot of people use or think about Relative Corsi as a way to control for teammates. It does a pretty poor job of that because it mainly takes into account guys you don't play with. I think it does a decent job of dealing with teammates in the other unit, how good your defensemen are for forwards and vice versa, but is problematic for the same color line.

    ReplyDelete
  4. I think of it as adjusting for quality of team (not teammates, team)...I mean, RPG was under water by Corsi, worse than some teams' fourth lines, surely raw Corsi won't be right there.

    ReplyDelete
  5. RAL,

    That's a great way to think of it. Problem is, I'm not sure how useful that is outside of something like a Hart debate. It's not clear to me why we should take into account guys someone never sees the ice with for a value assessment, at least in a way not more sophisticated than subtraction.

    I do think the top Anaheim line is underrated by Corsi. The main reason is that they are far better than average at turning possession into scoring. Getzlaf's on-ice 5-on-5 shooting percentage the last 4 years has been: 11.67, 9.23, 10.89 and 11.97 last year. I'm sure particularly those at nearly 12% were helped by luck, but it's pretty clear that they're good at getting better shots and putting them away.

    ReplyDelete