Critiquing Wages: a Comprehensive Index

A few months back, when this blog first launched, I wrote a piece that was meant to be less a discussion and more a diatribe starring the work done by the Wages of Wins blog. It was the opening piece of our Juwan a Blog? series, and it currently stands as by far the most negative portrayal of any blog I've reviewed. Aaron and I both have strong opinions about what Wages does, and both its strengths and weaknesses. One strength, which we really could've done a better job of highlighting, is the sheer volume of intelligent people the WoW network has accumulated -- while we can disagree with their orthodoxy and strict adherence to their way of thought (which, obviously, we do), you can't knock the hustle, nor can you knock their intelligence. There are a lot of smart, smart people at Wages, and in my takedown of their methods, I didn't necessarily articulate that. So, please consider this articulated.

Here at the Gothic, what we hope to do more than anything is start conversations. When Aaron wrote his piece this weekend examining Kobe via Stavrogin, he didn't intend it to be the be-all or end-all of his writing on Kobe, or the discussion of his comparison -- he intended it to start the conversation. How valid is the comparison? How well does it fit Kobe, and can one stretch it further? Those are the questions we like to ask. When we created the STEVE NASH model, we didn't do so intending it to be a be-all and end-all of our statistical meandering -- we merely want to add another model to the discussion atop the various standard prediction models, and see if we can't get a few more ideas on the table. Like my pantheon, it's only the stepping stone to -- hopefully -- a some-day valuable index of the absolute best sportswriting the NBA has produced. More than anything, that's what we like doing here.

I say this all because Wages of Wins recently addressed diminishing returns on defensive rebounds in an update to their main metric, Wins Produced. Such a tweak might sound fairly standard, but they've previously exhibited stubbornness and an almost impossibly high standard for making even minor changes. By their standards, it's a huge deal. They also published a link to this very post, summarizing more well-written (and my own) critiques with links to the pieces. Again, that might not seem like a big deal that they posted it, but they didn't used to post comments like that. And the fact that they seem at least marginally earnest about starting a dialogue is fantastic. It would be intellectually dishonest to ignore progress to suit my existing narrative. So good on them. I thought it would be more fair to them as a self-contained blog if I could stop cluttering their comment pages and repost this as an well-linked, oft-updated summary of their primary critiques here on Gothic Ginobili -- ripe for their own responses, when they get a chance, and ourselves isolating things we feel should be addressed. This is that ostensible entry, if you haven't gathered already. Let's get to it.

 • • •

Before I get into the critiques, let me note that apparent critiques of "style" are inextricable from those of "substance". This isn't because everyone that disagrees is trying to invent a hole in substance by calling Dr. Berri curmudgeonly. It's because just as Berri values the peer review process in and of itself, much of the Internet (especially among the stats/blogger niche) finds the closed-access, privileged nature of this process to be anathema to the open spirit of inquiry of the Internet. They feel that Berri's measured, stubbornly-academic approach has negative effects on the substance, context, fluidity, and ultimate ceiling of his research that has allowed others to leapfrog WP48. That said, while this is meant to be a compendium of critiques against WP, it's hardly for attention (we're kind of burying this between two pieces more important to us, to be honest) and it's not meant to be a purely negative. It's just meant to be a conversation-starter, and a positive force for the understanding of basketball through statistics and discussion. If we can raise the tenor of debate, we've done our job and that's the bottom line.

  • In the spirit of this disclaimer, shall we start with my own criticism? (In their collective defenses, this was written before I knew about the team dreb% adjustment, which does begin to challenge my narrative to a real extent.) I argue that (and this will be a theme in all of these) the Wins Produced metric is perfectly reasonable but is not so reasonable that we have to throw out all our other reasonable concepts and metrics, and given Dr. Berri's stubborn and ideological approach to the metric, it's unlikely that WP48 will ever get to the point that justifies the arrogant, often lazy attitude of this blog towards its metric. It's not absolutely substantive, I admit, but I don't think it's fallacious, either. I read the books and I read this blog far too often, and I feel I've diagnosed the key "problem" that many individuals smarter than I have with their stat and their approach.
  • Here are a couple of links that - in the first one - fantastically detail the state of basketball statistics from a well-reasoned, overarching point of view. In the second link, the blogger EvanZ (a friend of the blog and of STEVE NASH; hell, dude even helped us find the last two links) posits and computes a strong, substantive +/- analogue to WP that (as far as I understand it) uses play-by-play data to award what is captured by the box score credit in a similar way to WP. Evan does change how rebounds, assists, shot attempts, and defense are weighted and it is a completely different metric, but his ezPM starts with (and is most apt to be apprecated by) people that get WP and agree with it to large extent but find it has troublesome components.
  • I don't know anything about Phil Birnbaum, but this response to the rebounding section of WoW's FAQ is very well-argued. Accounting for diminishing returns on rebounds as Dr. Berri et al. did recently is a step in the right direction, but as far as I can tell, these critiques of WP and rebounding are still absolutely valid as conversation-starters, at least, and best of all they actually start with the words of Dr. Berri, decreasing the amount of abstraction into which fallacies and sentiment can enter the conversation.
  • Here are two disgruntled amazon.com reviewers (at least one of which runs a stats blog that I know of) sketch out their frustrations with the WP model.
    • Nathan Walker argues (and, to be fair, partially rants) that there is existing, solid empirical evidence against the empirical value of offensive rebounds and the assignment of team statistics to player statistics is extremely shortsighted, looking instead for more sophisticated +/- models. This captures perhaps the most visceral statistical critique of WP: "We've already thought about this along the same lines. Who are you to call us more irrational?"
    • D. Blum "eclectic reader" puts together a constellation of objections whose home planet is the loaded terms "Wins Produced" and "productivity" as used in the WoW books and blog. A lot of us can accept (because of its reliability) that WP measures something, and that something is often pretty close to basketball productivity. But using the term (with an associated absolutist approach) forces the metric into an uncomfortable Platonic standard which it doesn't seem to live up to for critics. Probably the strongest critique here is #3, which persuasively argues that WP's fixation on retrodicted correlation with Win% doesn't (by itself) indicate a robust statistic by creating an absurd - but illuminating - parody.
  • The Problem with Wins Produced by dhackett1565 of Raptors HQ is a wonderful (and - as far as I can tell - mathematically sound) examination of the rebounding question by showing that WP is a degenerate (in the good, mathematical way) special case of a slightly more complicated formula. It illustrates that WP does make choices in its allocation of individual statistics not necessarily based on strict logic and correlation, and even purports to show a contradiction in how WP is awarded over the course of a single possession. (Personally, I don't see the contradiction yet. I'll update if I do.) I think this is an important step which shows both the elegance and the problematic simplifications of WoW. For me as a math major? Well, this one gave me a lot to think about. The rules and structure of basketball (esp. the demand for transition defense and the 3-second rule) make problematic the idea (seen in WP) of offensive rebounding as a regained possession without further context. Also, this link is a bit more respectful than the Amazon links, and really does try to get at both sides of the argument. Hat tip to EvanZ (@thecity).
  • Dave Berri's Dismal Science - An alternative conclusion to the WoW network's typical conclusion of the bounded rationality of NBA decision-makers, SilverBird5000 of Freedarko.com deconstructs the economics of the scorer's market in a world run by Berri's metric, reversing the causal chain between pay and quality to an extent, to argue (somewhat convincingly) that scorers are "overpaid" partially for the risk-taking activity of scoring as opposed to crashing the boards. I don't know that I agree totally, but it is at least reasonable and a solid interpretation.

All of us care (way more than we should) about getting it right, WoW and above critics included. That might sound trivial, but I think it's a good first step. Thanks for reading.

18 comments on “Critiquing Wages: a Comprehensive Index

    • You could just start with Dre's latest post, in which he "explains" that "the impact of team defense is not that large." He tells us that criticisms of WP for not incorporating defense are off the mark because, among other reasons, team defense doesn't vary much. If you follow his link, you'll see that WP's team defense component has a standard deviation of only about 5 wins.

      Now, this is of course wrong. Teams differ a lot in opponent efficiency, and it's a hugely important part of winning in the NBA -- about as important as one's own shooting efficiency. Evan has shown that a 1 SD change in opponent eFG% means about 8 wins, and since the WP defensive component also includes things like non-steal TOV and team rebounds, it's clear that WP is measuring at most half of the real difference in opponent shooting.

      What's cool about this is that Dre is so oblivious to the additional problem he is inadvertently exposing in WP. After all, WP almost perfectly predicts wins, despite it's failure to account for opponent efficiency. How can this be? By hiding that value somewhere else, of course. In WP, it goes to rebounders, who are credited as though they not only grabbed the rebound, but also forced the missed shot (yes, this is partially ameliorated by the recent adjustment, but not entirely). But Dre doesn't have a clue that he is confirming this fundamental critique of WP.

      What's extra cool about this is that it obviously never even occurs to Dre that we actually KNOW how much teams vary in their ability to suppress opponent shooting efficiency. To him, looking at WP and its components is the same thing as looking at real basketball outcomes -- maybe better! If WP says team defense isn't a big deal, then it isn't. This is exactly the kind of epistemic closure that really distinguishes the WOW crowd (and which Alex wrote about in his earlier post).

      • And that is only part of the problem. While the defensive component saves them in order to get the high correlation to winning (going by last season the correlation coefficient without the team adjustments is 0.87 in comparison to 0.97 with), they also distribute the offense wrongly.

        The value of the scorer basically goes to the offensive rebounder. That comes from the fact that each offensive rebound reduces the amount of used possessions for scoring, and in the WoW world each FGA is ending a possession. There is nothing in there to compensate for the missed turnover a FGA gives the team. And while a turnover is a lost possession in all cases, a missed FGA is not.

        If they would incorperate that, they would also realise that the differences between the positions are going away. Right now they have big differences between guard, forward and centers. Instead of asking whether that might be due to a bad theory in the beginning, they just concluded that they need a positional adjustment. So, at the end of their algorithm they decided to throw the whole hypothesis out of the window in order to get at least a somehow useful ranking. If the underlying hypothesis is so good (regression by using ORtg, DRtg, equations for ORtg and DRtg), why are they not believe in their results from that hypothesis?

        Anyway, at the end of the day they needed like half a decade in order to react to the critism on the rebounding part. Given the fact how they reacted to those critism in the first place, it is very hard to believe that they are reacting any different to any new critique. It is a lack of understanding on their part, they have little to no clue what they are talking about. Sometimes they are right, but unfortunately for the wrong reasons.

      • As far as I understand it (as a math major, not a stats guy), linear regression is a powerful tool for establishing correlation because it both ignores and encompasses all at once all the variables not being directly regressed upon. If a team's point differential is seven points per game better than a given model predicts, well, that model is missing seven points somewhere (in "intangibles," better weights, model-atypical offensive or defensive situation, a widespread correlation that a given team completely subverts, etc.). A regression's ability to filter out and systematically adjust for/ignore such exceptions is precisely why it's so powerful. If even one team's point differential is seven points better than a given model predicts, a regression is likely going to massage the input weights to ameliorate the error to something much nicer, to minimize the model's error to something much less than seven points (at least in a league of thirty teams and 66/82 games a season). This is an amazingly cool thing and can lead to amazingly robust results.

        But it's also a potential hazard, isn't it? Because without an understanding of what is being measured by the data, you could end up massaging the weights of a variable without properly assessing what it means to massage that weight. This is where domain-specific knowledge (from coaches, players, analysts and other luminaries in the field).

        And that's what's going on with WP. As far as I understand from these criticisms I posted and from what you're saying, what dberri is doing with rebounding* is to deliberately ignore rebounding as a function (percentage-wise) of # of shots missed. Instead, the WP calculation uses the box-score volume of rebounds to act as a proxy variable for BOTH # of shots missed and rebounding rate (specifically their product). So crediting a player for a defensive rebound is also crediting them for being on a good defensive team that causes misses. Crediting a player for an offensive rebound is also crediting them for being on a bad offensive team (that is rather speciously "made up for" in the regression by assuming that a bad offensive team will also hurt their personal box score statistics roughly an equal amount). I could be completely wrong on this, because I'm not a stats guy.

        *aside from of course the long-standing, empirically unsound equivalency of dreb and oreb in a league with 3-sec rules, legal box-outs and the possibility of getting back on d (which oddly he's acknowledged as a factor in oreb rates, but he still treats an oreb like a "steal" in the semi-platonic weighting. Which, considering for gambling for steals is similar to gambling for orebs (but more harmful because big men are more crucial to defense than guards), might honestly make some screwed up, "magic of regression" kind of sense).

        • Alex, correct. Linear regression gives you exactly the statistical relationship. But only because something correlates with something else, doesn't mean that there is causality. And as far as I can tell Berri never showed the causality for individual players.

  1. Folks,

    EvanZ initially had a 501-character rebuttal of many of the sources here, despite his obvious disdain for WoW. Unfortunately, all the sources were youtube links of Kobe from 2002. Frankly, that's not why we deigned to have a comment PRIVILEGE on such an intellectual, clever blog in the first place, and so I deleted it. Further, comments are no longer to exceed 500 characters (even ours). It's just a matter of reputation.

    Besides, Kobe hasn't produced at a superstar level since 2008 and

  2. Well, now WoW doesn't even post my comments that don't have links. I tried to post a comment about their latest "article" criticizing +/- (yet again). The comment was basically to Berri's assertion that +/- is not reliable. If it's not "reliable", then why is RAPM a better predictor than WP? If being a better predictor is not the definition of "reliable", what exactly is the definition that Berri is using? And why should I care if it does not give better predictions? Oh, well. I'm sure Berri or one of his minions would have just told me to get the FAQ out of there.

    • I would really like to know what the motivation for Berri&Co. to constantly criticizing +/- based approaches. They showed often enough that they lack the proper skills to evaluate such things, they are assuming stuff and then go on trying to disprove their own beliefs. That is hardly an academic approach at all.

      And the more and more I think about the positional adjustment the more and more am I baffled by the fact that WP48 actually honestly thinks that a point scored by a guard is more valuable than by a forward. Did Berri&Co. ever explained why that should be the case?

    • Evan: that +/- post also appeared at NBAGeek -- you should post your comment there.

      Mystic: There is a point to the continual, redundant (and ignorant) attacks on +/-. It's a device for discrediting critics, so that the followers don't start to question the Leader. The "online community" is assumed to equal "believers in plus/minus," so beating up on the crudest version of +/- establishes that no one should listen to WOW critics. This is Cult Maintenance 101. It's also important to make sure there is no way to validate the Leader's claims other than by his Approved Methods, as explained in the Sacred FAQ (which I believe was found on a mountain top). Thus, Dre's post explaining why predicting future team wins is NOT a valid method for evaluating metrics. "Pay no attention to those failed predictions behind the curtain -- the only valid test is "predicting" past wins, WP has a .95 R^2, yada yada." The circle is completed, and no one can escape.

        • Evan, what you mean is validity, btw. WP48 may be reliable, like every other boxscore-based metric (I explained why in the thread on RealGM), but it is hardly valid. WP48 is worth at predictions than other metrics, my lineup check showed that, in the test from 2007 to 2011 of 200 different lineups (Top50 in minutes per basketballvalue.com) with an average of 393 minutes played, WP48 had a 0.57 correlation coefficient to the performance level of those lineups, that is as good as the correlation coefficient of PER. WS/48 and my PRA (bases of the SPM metric) both have 0.67.
          We can very well expect that RAPM is doing better at such a test (given the nature of that metric). WP48 fails the validity test, it is reliable. RAPM may be less reliable, but it is more valid. Well, not quite sure whether J.E. did make a year-to-year analysis of the results. Maybe he can do it. Boxscore based metrics usually correlating from year-to-year with 0.8 to 0.85 (which is driven by the minutes and role of said players at least as much as by the boxscore metric).

  3. Welp, I just did some reading about RAPM last night, and so I decided to leave a comment. Just going to dump it here. (Trying to take their perspective at face value). Again (and I'm not just saying this to be self-deprecating) I just know math. I don't know stats. I was trying to spell "Freebase Cocaine" and accidentally spelled "Bayesian Prior". Grain of salt? No, pinch of salt.

    That said, I feel I'm going to (rope Aaron into) write a longer post about RAPM or add this fantastic explanation to the Pantheon. I still have some gaps in understanding that I'd like to work through.

    http://godismyjudgeok.com/DStats/2011/nba-stats/a-review-of-adjusted-plusminus-and-stabilization/

    Check it out. Damn, that's good explainin'.

    Anyway, here's my WP comments:

    “But he has played more than 500 career minutes against other NBA players (perhaps not the best NBA players, but these weren’t high school players he was facing) and he has been very productive. ”

    Okay (and I think this is an entirely valid response given your wording): HOW MUCH WORSE WERE THEY? I get the importance of having something well-correlated to wins. I don’t understand why don’t you guys – when we have PBP data for the last 6 seasons at basketballvalue.com – quantify this type of information directly? Why not just say, “Oh, the average big man playing against Evans had such-and-such a WP48 value?”If you believe in the weightings for WP48 and that they are directly meaningful to wins, then you could use them as a Bayesian prior and blend your approach with RAPM,right?With sensible handling of marginal players, at worst it would be as good as weights you believe in, and at best it would be even better in ways you could get behind!.Not shilling, just trying to help.

    Just an addition: I get the sample size issues with +/- and RAPM; I really do, and it’s not trivial, but when you’re talking about bench players that are playing against other bench players more than starters, I just think such statements demand some sort of meaningful quantification for real empirical validity, or at least some strong countermanding arguments that establish why you don’t need to quantify it but people that believe the box score is flawed need to quantify how much it’s flawed. Got me?

  4. Interesting read. I notice Wages of Wins and a lot of other sports metrics people have the feel of amateur scientists, laymen take way too much stock in the absolute "factualness" of their conclusions once they learn a little about statistics and correlation. It seems like people without a strong background in science think that just because they use these methods, and they think they're being careful, that they are going to definitely get empirical facts back for all their efforts. And it's something you just learn with experience, seeing very intelligent researchers outsmart themselves, publish results they really believe are meaningful, and then just watch as the rubber doesn't meet the road once their ideas meet the real world. Just from what I've seen, we really seem like we're a long way from accurately recording the value of basketball players beyond what any intelligent coach could put together by just looking at the regular boxscore stats.

  5. Pingback: A Very Gothic Ginobili Statistical Q&A: Part 1, Playoffs | The Gothic Ginobili

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>