Unlearning Basketball: Thought Experiments Run Amok

Posted on Fri 11 January 2013 in Uncategorized by Alex Dewey

Who is this? What sport does he play? Why is he on Sesame Street?

"I never forget a face, but in your case, I'll make an exception."

Let's try something. Let's try forgetting everything we ever knew about basketball. Let's... look, you saw The Matrix, right? But "the Matrix" is replaced with "your present conception of basketball". Free your mind. Ain't no such things as halfway crooks. Go all the way. Forget everything. Okay, take a deep breath. If you've done it right, it's all gone. Everything's forgotten about the game. "Basketball" now looks like a curious misspelling of "baseball".

That's how fully I've bought into this hypothetical.

The first thing we find in our quest to discover the meaning of "Basketball" is a bunch of random box scores sprawling across the Internet, and an unaccountable sense that these things form a self-consistent system to try to understand. We all individually forgot, but the Internet remembered! ... But not enough to tell us the rules of basketball. (This is already getting really convoluted. Bear with me, okay?) We're all alone in the universe, and nothing can change that. Except for this hypothetical we've decided to undertake, together. Come a little closer and we can discuss the implications of this hypothetical further. It's cold outside, after all. Put on the samovar, Natascha.

• • •

Anyway. We run some mad regressions on the box scores we find. Some happy ones too, but mostly the seething ones. We notice a lot of stuff going on, a lot of equations that are always satisfied. Ignorant of the structure of the game, we form a hypothesis from these box scores: basketball is a self-consistent system perfectly derivable from the box scores, and this self-consistent system explains why every team wins or loses and to what extent. Categorically. Whatever the case, Implication #1 of the "Perfectly Derivable From The Box Score Theory" is the idea that if you take away any one number from a box score, you ought to be able to derive it precisely from every other number on the sheet. If one player's point totals are missing from every game, filling those numbers in simply a matter of adding up the first three numbers in the three dashed columns (FG, 3P, FT). If a player's field goals are missing? We take their point total, and from their point total, we subtract their 3P and FT. These are equations that are exactly accounted for in nearly every box score. Exceptions in these equations (we correctly reason) are mistakes in recording, not of mistakes in the system itself. And we're cool with that. Every point is accounted for.

But points are pretty easy to account for given only the box score. What about the rest? Well, we form more sophisticated theories to explain the relation between the other 11 pieces of information associated with every player, and eventually, after a few years, we put it all together in something akin to Newton's Laws of Motion or the inverse square law of gravity. We design the Possession Model, humanity's best attempt to make sense of the alien laws of "basketball." Every shot is a resource expended, made or missed. Every turnover is a resource expended. Every rebound is a resource gained. Every steal is a resource gained. Every set of free throws not as a part of an and-one is a resource expended. In all cases, the resource is the same. It's called a possession, and -- other than blocks, assists, and fouls -- our box scores are almost completely devoted to encoding this type of resource: how many of these a team gets, how efficiently it uses them to score points, how it loses them.

Our only disagreements here aren't all that fundamental. Just as we basically agree that something thrown upwards must come downward in a parabolic arc, we agree on the Possession Model of Basketball. Our disagreements boil down to a few lingering questions. First, how much is each of the resources is "worth"? Second, how do we credit the resources to the individual players that accrue the statistics? After all, certain things like rebounds and missed shots seem pretty commonplace, while steals and turnovers are comparatively rare. Maybe we can figure something out with that, you know? Those probably don't work out to the same value in an actual game, right? Or maybe they do. I don't know. Neither do you. None of us has ever seen a game of basketball, because we're forgetting! That's the whole point of this exercise, darn it!

Different researchers come to different conclusions with each of these questions, but overall, the consensus we share is more powerful than the minor points of disagreement. We go to the moon with this model by the end of the first decade of forgetting basketball, to keep up with the Newton analogy. We explain so much of what goes on with this model that we feel that we understand the box scores on a deep level, even if only as a self-consistent system. So we're all pretty satisfied we would be all major league at basketball if we knew what the heck it was. But for all our hilariously misplaced hubris, we still have doubts. We're only human, after all. We still ask ourselves about the little questions, the unknowns about this system we've found. Why are the two final scores never exactly equal? Haven't these people heard of ties?

Also, there's the giant elephant in the room. What on Earth is an assist?

• • •

"Curiouser and curiouser!" cried Alex.

Assists appear to be the only facet of the box score that "doesn't fit the plan" of a perfectly self-consistent game. Blocks and fouls are important but somewhat rare, they mostly seem like a less meaningful version of a steal and a negative statistic associated with opponent free throws, respectively. Yet, all we have are correlations: sometimes assists seem more like the concentrated pulp of a dying tree's fruits than the sweet pulp of a flourishing system, not entirely negative but not a panacea either. So we're left with a question and it's not clear where to begin answering it. How do we explain assists?

Spoiler alert! We can't.

With mystical explanations we can scrap together a narrative, but when you're dealing with science you can't take that much on faith alone. So we gather what we know about the fabled statistic. They're pretty consistent between players, teams, and what assists do for a win. Even if we don't know quite what they denote. Assuming basketball is a perfectly self-consistent system described by the box score, we try all sorts of mathematical models on the box score, but we come up short. In want of more data, we come to a lot of intriguing alternative hypotheses to explain assists.

  • Hypothesis #1: Assists are essentially random events in the economy of a basketball game that can favorably occur to each team that Make Everyone Better in a mystical sense. Rather like the weather and climate, we can project and explain how assists happen in a large-scale sense even if chance prohibits us from going much further. Some teams are great at having fortune smile upon them or, equivalently, at having fortune frown upon their opponents. We assume that point guards are rather akin to clerics or scientists or strategists, tipping the balance of fortune in their team's favor, because it's the only reason they seem to be in the game on their own merits. The best leaders and priests and scientists are inordinately valuable to their teams. Is this the role of the point guard? And this explanation certainly jibes with our intuition, even if it doesn't have any empirical strength. It's a cool idea and it catches on. Everyone is happy someone thought of Hypothesis #1, even if it does feel like sort of a holdover/straw man while someone figures out something better.

  • Hypothesis #2: We go the pure mathematical route: There are approximately 20-24 players in each game and each of them can have a number of assists. This gives us a point in 24-dimensional phase space, called "assisted space". It is thought that basketball is very simple on the lonely whole and very complicated on the margins of assisted space. We believe that this point in assisted space explains the remaining difference between the computed score (from all other stats) and actual score. Unfortunately, inferring a function with 24-dimensional domain is essentially impossible and this line of inquiry leads nowhere except to some neat graphics. Of course, neat graphics form 80% of the economy of the Internet and Hypothesis #2 is named Time's Person of the Year in a gimmick. In a tidbit sidebar the "net worth" of this Hypothesis is estimated at $600 billion.

  • Hypothesis #3: We presume basketball is a system of inordinate complexity. As the focus moves from the possession to the assist, we note in our box scores that assists aren't simply correlative magnets for efficiency, but almost certainly have something to do with field goals and turnovers. If the possession is a resource, then the assist appears to be a secondary resource. It's further noted that just as defensive rebounds have to do with the other team's shots, assists have to do with one's teammates' shots. As a self-consistent system, assists are seen as the cooperative counterpart to turnovers, which sacrifice resources to the other team. Assists sacrifice one's own shots attempts for those of the other players, likely because the synergy of those players is the most efficient use of possessions.

Hypothesis #3 eventually takes over the academic community. This is not because it fits the evidence best, no. Too easy. This is because, after a long protracted debate over the meaning of the assist, hockey goes into a lockout and some random dude that watches hockey is sitting at home and is asked by a scholar of the mythical game of "basketball" what an "assist" means, as though it's some cryptic, unfathomable divination. After twenty-five minutes of a discussion held with that particular ironic relish of the French-Canadian accent, our hockey friend tells us everything about the nature of an assist. We understand, now.

Yet, even though humanity long abandoned Hypotheses #1 and #2, we can't let go of the mystical assist-givers as the grandmasters or prophets of basketball, accounting for all the complexity. Everything eventually falls into place for our conception of basketball. But in the back of our minds, point guards are elevated to the paragons of the sport, and the possession model is seen not as fundamental construct but as a holdover to the glorious exaltation of the prodigious assist-givers. And the other stats, in their self-consistency, now don't seem important so much as relatively preordained and fixed, as a perfunctory stage in which the assist-givers can thrive in their personal versions of chaotic fluidity and complexity. We run fascinated regressions on each assist-giver attempting to valuate the good and the bad. We understand, now.

• • •

Come to think of it, the word means "assist." That seems pretty obvious in retrospect. But then again, we're the people that spent an entire afternoon pretending to forget a sport we love because some idiot on a blog told us to. But let's be real: it was all a fever-dream and you forgot nothing about anything. No one did. Not even me. We all decided to pretend to forget basketball just to be polite to my absurd fascinations, and I thank you for it, but the experiment was an utter failure and during everything I described you still envisioned it all in terms of the basketball you've known and will continue to know. I asked the impossible of you and I'm frankly disappointed that you couldn't deliver. My faith in all of us is shook.

Look, I'm sorry.

But it kind of made you think, didn't it? Those assists, oh man, we went crazy with those assists. And you could probably buy all of that, right? If we didn't know what assists were, we'd have to account for them somehow, and they're the only things that aren't almost perfectly accounted for in the box score, at least in stats that actually "seem" to matter. And they do matter: we know assists matter a lot, and anyone that watches basketball or does any kind of regression knows that assists matter a great deal. Still, we shouldn't fetishize them like the people in that example. There's nothing mystical about a drive-and-kick. There's nothing mystical about good basketball, on the whole (even if there is something awesome about a great point guard that evokes a mystic). No, nothing fancy: work-a-day assists are just one part of the Big Picture Fundamentals, and that's all there is to it. We understand, now.

On the other hand, for all our sophistication in box score stats, most of them boil down to: "possession model" + "assists are generally positive" = "good players care about possessions, using them correctly, and racking up assists". And that's something worth noting. Because I'd argue that not many people actually approach the game that way, and while it's a neat "self-consistency + context" construct, it's also not intuitive and arguably not rational, either. I calculate the number of possessions and the players' individual points per possession when I look at the box score, to be honest, just because I'm a math guy and it's very easy for me to add field goal attempts and turnovers and half of free throws attempted and subtract offensive rebounds. Then I take the secondary/tertiary stats as "context". But I never really buy that as a good description. It just doesn't work for me. I know on some level that this approach is flawed, that it's priming me the wrong way, and that it gives a false linear narrative to something holistic and multifaceted. "Oh, he was inefficient at shooting, but he got a lot of steals and blocks and assists" is often a valid thing to say, but when the second part of your sentence can completely deconstruct the first part of your sentence? That's fundamentally flawed.

"Oh, Chris Paul got 20 points on 20 shots, that's kinda trash. Also he got 18 assists and 7 rebounds and 6 steals. I guess he's not such a trashy offensive player after all." I shouldn't prime myself that way, and I shouldn't have to engage in complicated mental arithmetic or random, non-intuitive single-number stats just to have a basic idea of what statistically a player is contributing. There's no baseline, there are a few stats columns jutting out arbitrarily, and overall, there's not a solid picture of what's going on. Just because assists don't fit the Possession Model doesn't mean they have anything mystical or tacked-on to them, or anything particular that separates them out from any other statistic. They just fit a self-consistent model less easily. There's nothing wrong with that. It just makes it hard for human beings to contextualize them correctly. There's nothing special about at-rim shots or threes, other than that teams generating these shots more often than their opponents will tend to win more games. Basketball is not unfathomably complex as a game, but sometimes it seems that way between the hodgepodge of different sites with analytics on the one hand and the fluidity and gospel truth of what one sees on the other. So let's get back to basics somehow.

So maybe that's the first alternative reality that is actually worth noting here. It's worth trying to blend analytics and what we see in a much more fluid way. The visual candor of the game can fool, but it's a powerful tool when you apply the proper rigor. We're people that are convinced by the eyes and the numbers, but the truth lies not in disjoint fragments but in a simultaneous picture containing both.

And instead of going all-out in the other direction from the Possession Model (which is very powerful, intuitive, and explanatory on a team level), perhaps we could build a set of assist-and-shot-location-heavy stats into the default box score that would accommodate the Quarterback Model for understanding basketball as an alternative. The Quarterback Model, akin to Hypothesis #3, could sit as a sort of brother theory to the Possession Model, with equal standing but with slightly different perspective. To explain what I mean: If the key axiom underlying the Possession Model is to maximize the value of possessions while minimizing the value of opponent possessions, the key axiom of the Quarterback Model is that the key skill for players is to generate efficient shot attempts for themselves and their teammates (and stop the generation of those efficient shot attempts for their opponents).

There are some fundamental problems with recording assists: For example, there aren't any non-examples, it's not a stat you can just not rack up, or rack up negative examples by feeding teammates in the wrong spots and making them miss with bad passes. To have weak assist stats right now doesn't mean to fail an efficiency test, it means quite literally to have fewer assists, pace/teammate/schedule-adjustments notwithstanding. But getting location-specific data for the assists (and for players' own shots) could help us go beyond the nebulous cliches. It's worth noting that in fact, to a lot of folks, that is the default version of basketball, not because they're ignorant of the Possession Model but because they find it less satisfactory or aesthetic or explanatory.

• • •

This is a conclusion, because pieces are supposed to have conclusions. We went on a journey into an alternate reality, came back, talked about what we learned, and box scores should be better and reflect assists less as a stat-padding extra that kinda doesn't fit but we all know is good and more as a part of holistic model of how teams generate shot attempts efficiently. That's it. That's my thesis, in short, and, in the grand tradition of conclusions we're now supposed to go more general and end with a sort of open literary ending. You know, much like the end of a conversation with friends is filled with well-wishing and deliberately constructed loose ends to signal, above all else, that that was a good conversation, that it was meaningful, and that (some time in the future) we should continue some of these loose ends in future conversations and keep one another in our minds in the meantime. Because... people, man. That's what's important in life, not getting the statistics exactly right for a simple game.

A game whose name escapes me right now. A curious misspelling of baseball, right?