Sloan Conference, Conclusions: When Statisticians Paint a Picture

Posted on Fri 08 March 2013 in 2013 Sloan Conference by Aaron McGuire

kirk goldsberry dirk sloan

One of Sloan's standouts this year was Kirk Goldsberry, the intrepid visualization expert you know from Grantland, the New York Times, and Court Vision. Goldsberry's value starts with his data, which is simply better than the data most of us have to play with. He partners with SportVU, allowing him to delve into an increasingly rich set of real-time court location data. It's pretty amazing stuff. That said? For the second year in a row, Kirk Goldsberry didn't win Sloan's yearly paper competition. In it, eight finalist papers and presentations are assessed by the conference leads in competition for a ten thousand dollar prize. Every year, the vast majority of the people who attend the presentations swear that Goldsberry has a lock on it. "His work is the best," they say. "There's no way any of the other presentations can stand up." For the second year in a row, they were completely wrong.

The reason's straightforward. Much like last year, Goldsberry had the best presentation. It doesn't take a genius to understand Goldsberry's goals and methods, and his visual metrics, humor, and graphs made his presentation infinitely more fun and engaging than any of the other competitors. But the ten thousand dollar prize isn't awarded to the best presentation in a vacuum. It's awarded for the paper's analytic heft. Goldsberry's metrics were simple, straightforward, and extraordinarily well presented. I can't imagine that more than 1 or 2 individuals who attended all the paper sessions thought otherwise. When I think back on the 2013 conference in a year or two, I can almost guarantee that Goldsberry's paper will be the one I remember. But Goldsberry's success at capturing our minds with visuals and intuitive presentation obfuscates the fact that his work simply wasn't the most technically advanced or statistically interesting of the presented papers.

He calculated field goal percentage based on a defender's distance from the offensive player, and delved into some particularly excellent examples. It was a nice sleight of hand and an excellent filtering of data. It wasn't rocket science, but it was effective and intuitive rather than gaudy and statistically brilliant. Contrary to popular belief, Goldsberry's intuitive statistical arguments don't have to be the sort of analysis that wins a paper competition. Goldsberry's work has its own value, and outside the carefully combed and regulated world of the paper competition, it's far more valuable for real-world analytics than anything that won the competition over him. Immediately after his presentation last Friday concluded, R.C. Buford made a beeline for the podium and gave him a business card. R.C. Buford! Goldsberry has a good shot at consulting with half the teams in the NBA by this time next year.

His value becomes even more obvious when the conference attendees step back and take a look at themselves -- I talked to dozens of smart analysts who swore up and down that Goldsberry's presentation was not only the best presented presentation, it was also the greatest step in analytics since Babbage invented the difference engine. It wasn't, and I'd venture that anyone who'd read the papers would agree with me -- it was smart and poppy but significantly less interesting as a statistical concept than any number of the competing papers. But it never had to be that advanced. It simply had to be well-communicated. It was a real-time example of the conference's general theme. Having the best ideas can win you a paper competition among your statistician peers, and it can earn you the respect of many.

But if you really want to resonate, it's simply not enough to present good numbers.

You have to be able to tell a story.

• • •

Advocates of statistically minded basketball analysis have an odd tendency to conflate the strength of a communicator with how advanced and groundbreaking their statistics are. It betrays what I find to be a fundamentally hilarious undertone to most of the statistics-backing analysts in the basketball sphere. The sphere of basketball analytics has been blessed with a large group of brilliant and clever individuals with some incredible ideas. But for the most part, they aren't statisticians. There's absolutely nothing wrong with that, mind you! But the fundamental truth that most members of the "statistical movement" aren't statisticians becomes a bit ridiculous when you note just how vehemently the very same people assert themselves to be excellent judges of statistical technique and complexity. When people insisted that Goldsberry's work was statistically brilliant, I was bemused -- it's phenomenal work, but calling it brilliant because of its statistical heft is just wrong. It was brilliant because Goldsberry is a borderline savant at distilling numbers into a story. He's a phenomenal communicator who relates to his audience in a way that should be the envy of all who profess to analyze anything.

Now, granted, the point here in the most general sense is that there's nothing wrong with being unable properly assess the statistical complexity of a person's argument. But it's terribly confusing to me that people insist on doing it anyway. There's this strange desire to attribute Goldsberry's brilliance to everything but his status as a communicator, as though that somehow lessens his accomplishments or his ideas. It doesn't. At all. Let me put it this way -- I'm a statistician in the corporate world, and I'm not going to reveal to anyone who isn't close to me exactly what I do. But communication isn't just some tertiary part of my job. It's essentially my entire job. I make models, big and small, and I work with statistical analysis daily. But if I couldn't communicate the results of my work to an exceedingly wide range of people, I'd be homeless. Once you get to a certain level in the professional frame, the discipline of statistics is less about coming up with amazing statistical innovations as it is simply finding ways to share the ones you already have. It's about couching your numbers in the proper confidence intervals, and figuring out the best ways to differentiate your story as your audience changes.

In fact, one could say that it's about painting.

See, the numbers don't tell a story on their own. They simply don't. An advanced database of basketball statistics isn't a storybook, it's an exceedingly large collection of paints and primers. The analysts -- the ones sifting through the database and poking at insight -- they're the one the story comes from. They're the painters. A palette tells you nothing without a painter around to unveil the contents of their mind. The numbers are not self-explanatory. You can't simply hand a smart person a spreadsheet and tell them to read it -- you need to guide them to the point and the core idea, and you need to persuade them that the ideas are worth their time. You need to paint different styles for different audiences, too. A young painter may look at your 10 minute modern art splatters and understand your point immediately -- a crotchety old traditionalist may need a classically-minded several-hour portrait if you want to express to him the same one. It's a process. A beautiful painted process.

• • •

I've read a few response pieces to the conference openly questioning the focus on communication. "Are there really GMs and team owners who don't understand that you need to hire communicators?" First: yes, there are a few. Second: even if there weren't, I'd argue that the message isn't just for them. The message is to the statistically minded analyst who wants to be listened to. The message is to the fans who lean on statistical arguments. The message is to everyone who thinks that any numbers -- no matter how overwhelming -- can speak for themselves. There's a reason numbers never lie, you know -- they can't talk. They're illuminating under the right conditions and support good arguments under many other conditions. But it's the responsibility of the communicator to ensure that they're presenting an argument their audience can get on board with. It's not the responsibility of the audience to ensure that they're attuned to the presenter's whims and fancies.

Kirk Goldsberry's communication isn't some tertiary part of what makes him an excellent analyst. On the contrary -- it's the core. As someone who works professionally in the field, I respect that quite a bit -- Goldsberry is an academic, but he's got exactly the skillset that would make him an amazing industry statistician. The moral of the Sloan conference, to me, was a reflection of Goldsberry's success. The ability to function as a good communicator is the most important skill you can have if you're a statistician working in industry. News flash: statistical analysis in sports is no longer an academic exercise. Sports statistics have become their own large-encompassing industry, and along with that, the skills that make statisticians valuable in industry have become the focus in sports statistics.

Statistical analysis in sports is no longer a novelty -- it's a necessity.

It's time for the communicators to match that reality.

Continue reading

Sloan Conference, Day #2: "The Lockout is Dead, Long Live the Lockout"

Posted on Sat 02 March 2013 in 2013 Sloan Conference by Aaron McGuire

Hey, folks! This year, I'm covering the Sloan Sports Conference straight from Boston's Convention and Exhibition Center. If you're there, be on the lookout for the tall guy in a suit who hasn't slept in a decade. Over the duration of the conference, I'm going to try to post some quick reflections on the panels I attended. Fun stuff, right? Here are the panels covered in the post, thus far:

  • 9:00-10:00 -- "THE CHANGING NATURE OF OWNERSHIP" In this panel, Peter Keating-- from ESPN the Magazine -- asked questions about how ownership is changing over time and the challenges of owning teams. I responded by spitting like an alpaca and neighing like @horse_ebooks. It was a weird moment, I admit.

  • 10:20-11:20 -- "BIG DATA: LESSONS FOR SPORTS" A bunch of experts on Big Data -- people from HP, MIT, m6d, and other specialists -- got together to talk a bit about big data. I spend a bit of time recalling prior experiences with Big Data conferences and generally express appreciation for the overall tact taken by the presenters. ... Even if the overall panel was a bit forgettable.

  • 11:40-12:40 -- "ESPN'S USE OF ANALYTICS IN STORYTELLING" Michael Smith headed a panel including Tom Haberstroh, Dean Oliver, Alok Pattani, and Mike Sando in a discussion of how ESPN uses analytics and statistical data in storytelling. Although Haberstroh was the Bledsoe of the panel (thanks for the joke, @kpelton), it was a really fun look at how ESPN has attempted to incorporate more statistical thoughts and methods into their work. I reflect on the general themes.

• • •


A panel with Adam Silver, Jonathan Kraft, Stan Kasten, and John Skipper discuss how ownership is changing in the NBA.

Ever been in a class where you're trying really hard to pay close attention, but you can't seem to shake the nagging feeling that your professor isn't being entirely forthcoming? Your eyes start to gloss over, the machinations start turning in the back of your head, and you find yourself lagging behind and intractably stuck in befuddlement at statements you simply can't figure out. You'll have to excuse me. I didn't cover the lockout from the ground floor, and I've never had a chance to listen to Silver or Stern speak before. They're smart, smart guys. But some of the quotes from this panel were simply incredible. From the first 20 minutes alone:

  • "As you know, we were completely upfront with all of our true numbers about team losses during the lockout."

  • "We all agree that profits are the focus of owning a sports team, rather than resale value."

  • "Here's the problem with Forbes -- which neither Adam or I believe. I have no idea how they come up with their valuations."

My short response to each of these statements: ... w-what?

A bit longer: what, you mean the team loss numbers that made absolutely no sense and were quickly picked apart by several prominent business writers? If profits are ACTUALLY the sole focus of sports owners, they're being obscenely stupid businessmen. Owning a team is valuable for the cache and ego inherent in general ownership, not the X's and O's of making a marginal profit on a year-to-year basis. And the resale value IS the big thing, even if the people on the panel all rejected the entire concept that it was a worthy discussion topic. On a much larger level, it's like owning a house in a city you don't really like. You may not love the city, and you may know with absolute certainty you're never going to pay in full that 30-year mortgage. But you DO know that when you decide to sell, even if it's at a relatively bad market period, you'll make back a decent amount of what you invested in the house (plus or minus a bit) and make up for a decent number of your year-to-year losses.

If you improved the house, maybe you'll make a bit more. If it fell into disrepair, maybe you lose a bit more. But you take out the mortgage knowing that you're building equity instead of simply lighting money on fire with massive rent payments. Teams are similar. Resale isn't just an important topic to discuss, it's arguably the only topic to discuss when you're assessing franchise valuations. Finally, as for the Forbes data... they've been relatively upfront about stating that their franchise valuations comes from the rough market value of everything that team owns. It's really disingenuous to imply that they're a completely mysterious black box with absolutely no logical backing. The land, the team quality (measured through revenues), the stadiums, the television contracts, et cetera. It's not a completely public formula, but "no idea" is gross oversimplification of their methods.

Regardless. I don't mean to bloviate, but this was a hard panel to sit through. And to be completely honest with you, the panel completely lost me at forty minutes into the presentation. I almost left the room. The owners turned to the subject of lockouts, and "who won". The money quotes, from (as Bomani Jones noted) non-billionaires John Skipper and Adam Silver:

"... Nobody won the lockouts." This was followed by a nod of assent and a strong agreement from Adam Silver. Silver then followed with, almost unbelievably... "Look, the players had ENORMOUS leverage during the NBA lockout. Enormous leverage. We never even considered using replacement players. That's leverage. ... We felt 50% was a huge win for the players. That was nowhere near what we went into the negotiations looking for."

OK, Adam. Good talk. As for the rest of you? If you're looking for the big story of the day, I've got four words for you:

There's another lockout coming.

And pillaging victors -- incredibly -- don't think they won the last round.

• • •


A panel with several big data analysts discuss how big data impacts sports.

I went to a large trade conference late last year. Big Data was a huge subject matter -- I'd say 50-60% of the seminars and sessions were related to big data and the various insights and uses of big data. Spoiler alert: it was the worst conference I've ever attended. Under the bright lights of the conference stage, the concept became this comically broad abstraction. "Solve your problems with big data. Analyze better with big data. Be careful with big data ... but only if you're using this product that you need to approach big data because if you don't use big data you'll die a painful everlasting death." It was excruciating, especially as an analyst who's dealt with several real-life manifestations of "big data" in research I've done and seen. There was virtually nothing of value amidst all the hubbub.

Ever since, I've tried to seek out big data panels, mostly in an effort to find something better. ANYTHING better. Perhaps better isn't the right word -- the best word for what I want isn't better, it's realistic. There are a lot of inherent issues in using massive gobs of data to try and approach problems. Having that kind of data can be extremely helpful to the clever analyst, if you have the right mindset and the right understanding of the inherent hazards you're exposing yourself to when you try to use data that massive. The problem isn't that it can't be helpful, the problem's that it won't always be. There's an incredibly large amount of noise in data that's that humongous, and while you can get a lot of value if you increase the burden of proof and keep a strong eye to sample size and logical reasoning, you can just as easily take a problem and make it an intractable mess of noise-puffing mush. You can make your analysis completely useless. With that in mind, early in the presentation, Chris Selland shared this particular gem:

"If you torture data enough, you can get any answer you want. We have to be careful"

My response? THANK YOU! If the entire panel was simply the panelists getting up, reciting that quote, dropping the mic and fleeing the room with abandon... it still would be more useful than the aforementioned trade conference! You can show specific methodologies and share new technology all you want. If you don't apply the proper context and warn the audience of the method's drawbacks, you're doing a fundamental disservice to the point and use of overwhelming amounts of data. You have to transform, structure, and regulate your data to effectively use large data analytics. You have to clean your data, you have to scale the data to answer your questions. Big data is a method to solve problems -- it's not the be-all and end-all of problem solving. As a whole, the panel wasn't anything particularly special. It was simultaneously a very high-level discussion that was significantly more useful to people with experience in the field. But -- and perhaps this is just my bad experiences before -- I much appreciated the way the presenters generally embraced uncertainty while supporting hypothesis-driven analytics and smart use of big data methodologies. It's a common theme in academic circles and a rare theme in trade circles -- Sloan exists in the hazy boundary between the two, and it's nice that they took a bit of the best of both worlds to put together the Big Data panel.

• • •


A panel discussing how ESPN uses statistics and analytics to tell stories. Starred Tom Haberstroh, Dean Oliver, Michael Smith, Mike Sando, and Alok Pattani.

To start the presentation, ESPN provided a neat little video showing about five minutes of clips from various ESPN shows. The theme was simple: demonstrate places that ESPN uses statistics in storytelling. There were a lot of different examples, from Skip Bayless to Jay Bilas to Numbers Never Lie. But the general goal was to express ESPN's growing comfort level with using statistical thought to support on-air arguments. Their on-air talent has bought in -- to some extent -- to the use of statistical facts and figures to tell stories and share findings. The money quote from Pattani: "It's like value-added to your IQ." The overall attitude towards statistical thought has changed in the organization -- the world has changed, and if you use statistics to dispel myths, you can get a strong response from the viewers.

Which, in general, was what the ESPN panel was about -- how can you get a stronger educational product on national TV? ESPN isn't always on top of the game, but the panel was extremely elucidating. They put together a group with a bunch of the network's best analytic journalists and let them discuss the implications of what they do. They try to dispel myths, give proper context, and give credibility and cache to the numbers they use. It's about trying to educate fans without being snippy and self-superior. It's about making the numbers part of the popular conception, and using statistics that shed light on the games in fun and interesting ways. "Criticize on a higher level."

Adding statistical analysis to popular sporting discourse isn't about simply showing other people you're smarter than they are. It's about changing the universe fans get exposed to. You're building the starfield, the galaxy, the rocketship. You're inspiring curiosity and enticing the reader to use statistics and numbers to illuminate their fandom.

• • •

I'll try to make these a bit smoother and a bit quicker to the uptake -- if you missed it, I actually finished editing all of yesterday's panel discussion early this morning. Go check that out, if you get the chance. I'm on to the next session. More later.

Continue reading

Sloan Conference, Day #1: Observations from the Almond Gallery

Posted on Fri 01 March 2013 in 2013 Sloan Conference by Aaron McGuire

Hey, folks! This year, I'm covering the Sloan Sports Conference straight from Boston's Convention and Exhibition Center. If you're there, be on the lookout for the tall guy in a suit who hasn't slept in a decade. Over the duration of the conference, I'm going to try to post some quick reflections on the panels I attended. Fun stuff, right? Here are the panels covered in the post, thus far:

  • 9:00-10:00 -- "REVENGE OF THE NERDS!" On this particular panel (featuring a loaded roster of Morey, Cuban, Silver, Lewis, and Marathe), the big theme was the rise of statistics in sports and the challenges the high-rolling panelists faced during their rise. I discuss the overall oeuvre of the panel and the one place I wish they'd gone.

  • 10:20-11:20 -- "DATA VISUALIZATION" This panel involved -- surprise! -- a discussion on data visualization. Moderated by Rockets' Sam Hinkie and starring a variety of visualization experts, the panel took a broad view of philosophies behind their data visualization strategies through minutiae and examples. Strange format, but a fun panel to follow.

  • 11:40-2:50 -- "THE DIRGE OF THE SHAMROCK SHAKES" ... OK, no, this wasn't really a panel. It had panels in it, but that's beside the point. Due to the nature of the panels that were located in this timeframe, I won't actually be covering them in this post, but if you all are really good this year, Santa will tell you the story sometime later because it was a hell of a lot of fun.

  • 3:30-4:00 -- "THE DWIGHT EFFECT" This was a paper presentation with Kirk Goldsberry, discussing new ways to assess interior defense in the NBA. It was a great presentation. I've shared some of the biggest findings and the best practices Goldsberry used to strengthen his case in this recap.

  • 4:00-4:30 -- "THE VALUE OF FLEXIBILITY IN BASEBALL ROSTERS" Although this is a basketball blog, this was actually a REALLY neat paper with a tantalizing core idea. Here, I describe the rough summary of what their work implied as well as the NBA-related continuation of the idea I'd love to see in the near future.

  • 5:00-6:00 -- "XY PANEL: THE REVOLUTION IN VISUAL TRACKING ANALYTICS" Kirk Goldsberry joined a who's who of visual analytics gurus to discuss the technology in a broad sense and some of the challenges and triumphs it's faced thus far. This ended up being one of my favorite panels of the day -- in my reflections, I describe the limitations of communication and the fun discussion on why teams don't use these analytics.

• • •

9:00-10:00 -- REVENGE OF THE____ NERDS

A panel with Daryl Morey, Mark Cuban, Nate Silver, Michael Lewis, and Paraag Marathe discussing their sports upbringing and various values and tradeoffs they've faced in their rise and their jobs.

"How did you get here?"

There was a short sigh from Cuban and a knowing nod from Morey. His one sentence answer? "It's not easy."

A wave of laughter spread through the ballroom. Amusing as it was, it wasn't really a joke -- getting there wasn't easy for the men on the stage. Sports is a strange world for a statistician, in a few ways. It's not that it's not a good laboratory -- it's actually PHENOMENAL one, with unbiased criterion for success and rich multivariate datasets. It’s a wonderful world to explore. Sports is an excellent place to learn statistics, and with the sole exception of Mark Cuban, everybody up there cut their teeth on applied statistics at a young age through baseball's fruitful data. In a perfect vacuum, sports is the perfect field for statistical analysis -- it's simply a beautiful place to analyze.

But that's all in a vacuum -- it ignores the very discipline-centric problems that analytics aficionados face in the sporting frame. Silver, Morey, and Cuban emphasized many of them, in different ways for each. The biggest issue? It's SPORTS! It's a world of loud testosterone-rippled men who don't love changes to the status quo. When analytics began to rise to prominence, there was a large pushback from coaches and the traditional analysts. There's a knife-edge balancing act between the long-term and the short-term. There are intractably huge datasets and false-leads that can lead a franchise astray. There's the randomness, the injuries, the coaching. The exogenous pressures from the traditional analysts has begun to wane over the years, and when asked how much of a challenge he experiences today when trying new analytic techniques, Cuban's answer summarized the sea-change in perspectives: "none whatsoever. Advances are always welcome." Things are different, hence the name. Revenge of the Nerds implies that the so-called nerds have won.

All things considered, it was a fun panel – I highly recommend watching it online yourself, as you can do here later today. But I felt that in their effort to simply describe how things happened and how things are, the presenters missed the most interesting angle -– that is, the why behind the analytical sea-change. Why have perspectives shifted? Why have sports analytics become so ubiquitous? Statistical analysis doesn't only involve success stories, even if we'd like to think it does; some teams fail miserably at it, and the first thing any statistician would tell you is that a lot of statistical analysis can be useless and ineffective. The reason things have shifted isn't just some rah-rah success story about a perfect way of thought, it's a story of statistically minded analysts learning to communicate. It's a story about how Nate Silver's writing made statistical thought engaging. It's a story about how Daryl Morey learned to navigate the give-and-take with his coaching staff and his players. It's a story about how deeply Mark Cuban understands the business structures at the core of a sports team. That's the story I was hoping to hear, and while the panel was fun regardless, I felt they missed the boat a bit by covering the what instead of the why.

• • •


A panel with Sam Hinkie, Joe Ward, Ben Fry, and Martin Wattenberg discussing the visualization of data in sports analytics. Broad theme, right? Well...

There was a lot of audible grumblings in the audience during much of this panel -- most of us skipped out on the Stan Van Gundy panel for this one. Hopes were high that we'd get a lot of new and interesting data visualization methods. Final verdict? Not so much on the "new and interesting" front. As it turned out, the panel turned into something akin to a nerdy fever-dream spinoff of American Idol -- they placed a bunch of (primarily) public-use data visualization tools on a large screen and picked them apart.

There were a few panelists who were generally focused on descriptive problems, and issues of presentation. Who was the visualization meant for? How did it do its job? What were the positives, drawbacks, et cetera? Some of them liked most of the visualizations, most of them had a few comments for improvement, some of them They even had their own Simon Cowell in Joe Ward, the graphics editor for the New York Times. Ward was a tireless critic, giving drawbacks and missed opportunities for almost every single visualization put on the screen. One that I found particularly funny -- a relatively old Kirk Goldsberry shot visualization came on-screen, and Ward almost immediately pointed out a small problem endemic to most of Goldsberry's oldest work -- the drop shadow on each basket area actually muddles the coloring in many locations, which can badly obfuscates the point of the chart.

Overall, a bit of a weird result for the attendees. A panel with a theme as broad as "Data Visualization" ended up being (essentially) a treatise on the minutiae that made up their personal philosophies on data visualization. Take out that drop shadow! Realize your minimalist dreams! Consider your audience! Et cetera, et cetera. While enjoyable, it was somewhat of a surprise for most of the audience, which led to all the grumblings -- especially when Van Gundy was talking about Dwight Howard just a few rooms over. I'm glad I went, if only for professional reasons -- a lot of the work I do in my Clark Kent job involves tireless data visualization, and although most of the things discussed were things I've thought of before, it was actually quite helpful to hear that kind of critique and analysis on a broader scale. It was a pretty strange format for a data visualization panel, and I expect I could gotten a bit more out of the Van Gundy panel. But I don't think many attendees who stayed til the end were disappointed in the overall result. Unless you aren't a ridiculous visualization nerd. If that's the case, this probably was excruciating for you.

... That said, why would you be here if you weren't a ridiculous nerd? The world may never know.

• • •


The Dwight Effect is a research paper presented by visual analytics guru Kirk Goldsberry, whose work you may be familiar with from and Court Vision. He discussed new metrics to measure interior defensive efficiency and shared some of his most interesting findings.

This one was different than the above two, as well as the panel on randomness that I'm waiting a bit to write about because I want to mull over some of their statements a tad more. This was simply Kirk Goldsberry (the visualization guru behind the much-lauded Court Vision location analytics) presenting a few explications of new defensive analytics. He'd brought a few of them out before, but never quite this starkly. I think it's safe to say that this was one of the best individual presentations you can put together at a statistical conference -- in classic Goldsberry fashion, his presentation was light on the tables and heavy on the visuals, with the highlight being a 2 minute blooper reel of David Lee's defensive mishaps in last week's Golden State/Minnesota game. The panel focused specifically on a few of the most notable findings from Goldsberry's work. It started with "LARRY SANDERS!", the statistically-minded blogosphere's new mancrush.

Sanders has developed into one of the best defenders in the NBA, completely shutting down the restricted when he gets within an arm's reach of another player. Given that he has such long arms that "arm's reach" is akin to Tyler Zeller's "thirty miles away" (TBJ joke!), that helps him destroy players in the post and when they try to shoot anything close when he's on the court. Conversely, while Anderson Varejao is an excellent rebounder and a player who does a good job getting into defensive position, Goldsberry's metrics showed that he was surprisingly permissive when a player actually got a shot off against him. He got to a lot of shots, though, and that turned out to be an important key -- Goldsberry's metric was only half of the story, as it measured what opponents shoot when the player actually gets to his spot and makes an attempt to guard it. What wasn't covered in depth during his presentation (but WAS a part of his paper as a whole) was the flip side of that defensive equation, essentially summarizing how often the player actually gets to his spot.

They were two extremely intuitive splits to measure defensive efficacy, with Andrea Bargnani being presented as the number one example. As I mentioned in the Player Capsule for Bargnani, his problem isn't that he allows a crazy percentage when he actually defends the shot -- he's a reasonably effective defender, overall, and Goldsberry had him in the top 5 of his "FG% against" in the restricted area. His problem is simply that he never gets to the shots! He lays back and misses rotations and posts very few challenges, in effect creating a defensive vacuum that's easy to observe when you watch him play. He may defend the shots he defends reasonably well, but if he's refusing to defend, who really cares?

One last thing I liked a lot: Goldsberry gave an excellent drop-down of some of his limitations. All too often in conferences like this do you see presenters and panelists become too wrapped up in their own work and refuse to acknowledge their biggest limitations. Goldsberry pointed out that he ran into major time/space constraints due to the heavily visual nature of his work, he had limited timespans of data to work with, and -- after being aptly pointed out by Andres Alvarez from Wages of Wins -- the fact that Goldsberry's defensive metrics remain untuned to the impacts of fouls and free throws drawn. All that gave the presentation a nice aura of a work in progress, and a general air of a presenter who didn't think he was the greatest thing since sliced bread. As someone who's seen far too many presenters take (essentially) that exact tactic, it was much appreciated for me.

• • •


This paper analyzed the value of positional flexibility in baseball.

I won't spend too long on this one, for obvious reasons -- namely that this is a basketball blog rather than a baseball blog. But I have to mention the overall theme and the response beating so vividly through my head after the presentation concluded. The main idea was that a researcher could create a two-stage model to assess the true value of flexibility for baseball players. It essentially involved a first stage where you measure the probability of injury for each player and a second stage where you measure a player's potential efficacy when that player's role changed on the field. For instance, if your center fielder was struck by lightning, who was the best replacement on the team? There's a really interesting subtext to these questions and this mode of analysis -- you're essentially trying to come up with a way to predict how a player will function in a role they've theoretically never played before, and assess the value of a player when they've been sequestered to the role. It's interesting stuff.

My first thought, though? Isn't this possible in basketball? Don't get me wrong -- we could try to use the cardinal five positions, but I don't think that's quite the best way to approach it. I was thinking about the question with certain skillsets instead of positions. Take, for instance, Tim Duncan. You can argue all day whether he's a center or a power forward, but NOBODY who's ever watched the Spurs needs to argue about his rebounding. When Duncan is on the court, he's San Antonio's primary rebounder. Period. When Steve Nash is on the floor, he's the primary ballhandler. When Kyrie Irving is on the floor, he's the primary shot-consumer. Et cetera, et cetera. Instead of measuring how basketball players fit into a somewhat outdated positional archetype, I think a neat way to approach the question would be to pose -- statistically -- the idea of giving each player a score of how well their underlying metrics imply that they'd function (relative to the league average and baking in adjustments for role) if they had to switch roles.

What if we lived in some horrible Don Nelson alternate universe where Tony Parker was asked by Popovich to be San Antonio's primary rebounder? What if we examined what a team would be like if they asked Darko Milicic to be their primary ballhandler? Et cetera, et cetera. It opens up a lot of interesting questions, and if you make the overall point of the statistic a cardinality ranking, you could potentially uncover certain lineups and player-roles that coaches rarely use that could be more advantageous than the fan or the coach might initially assume. It's an interesting question, in any event -- it'll be interesting to see (if I happen to get the time, or if one of our readers takes this idea and runs with it) what this kind of analysis uncovers.

• • •


For the last panel of the day, Kirk Goldsberry joined a who's who of visual tracking analytic experts and discussed

No typo: Tony Parker's max speed in an NBA game last year was 20.9 MPH.

That sort of mind-blowing stat is the sort of thing we'd see more of if we had public access to the sort of visual player-tracking data that SportVU measures on a day-to-day basis. They're cooking some ridiculous numbers back there. What's the goal of data like that? Put simply, it's a matter of learning how space affects the game. In a vacuum, the idea that Tony Parker can go 20.9 MPH with the basketball in an NBA game has no value -- in context, it can be more important than any individual line on the box score. It's a representation of the marginal tidbits of skill that makes Tony Parker such a brilliant basketball player. It represents Parker's ability to change the entire shape of the defense, and in a broader sense, measuring in-game speed like that allows more granular analysis of draft combine data and other such physical attributes.

After shocking the audience -- or at least Tim Varner and I -- with a jaw-dropping opening stat like that, the panel changed course into the more interesting question: if the stats of location analytics are so game changing, why do only 15 out of the 30 NBA teams buy them? The panelists provided a lot of great answers, and they touched on what I feel is the main problem: the confusing interpretability of multivariate coordinate data. What do you actually DO with that data, especially without the right personnel? Kirk Goldsberry put it best when he noted that the data -- while burgeoning with potential and beautiful to analyze from a purely academic perspective -- is borderline gibberish when you aren't putting it through the proper treatment and proper rigor. Not every team has the sort of academics you'd need in order to apply that sort of treatment. If you don't have those people, you aren't really going to get much out of data like this -- you may get a few facts and figures that make scouting a bit easier, but you'll barely scratch the surface at best and waste a whole lot of money at worst. You need a certain set of people to really sift through this sort of data. Simply acquiring the data doesn't tell you enough about how the team is using it, or even how CAPABLE they are of using it.

That whole problem points to a deeper problem with the general sports analytics community that Kirk Goldsberry -- more than almost anyone -- understands and dares to approach. It's similar to the inherent issue I talked about in the blurb above during "Revenge of the Nerds." Even if analytic thinking was a golden bullet that solved every single basketball problem on the planet, you can't simply try to impress people with great information -- you have to be good at sharing it. There's a big lumbering abstraction that many like to lean that statistical-leaning basketball analysts like to lean on; it's this idea of the grizzled old Clint Eastwood-esque coach is just constantly belittling and whining at the poor wide-eyed stat-guys trying to bring them the good work. It's a tale where those who refuse statistical analysis are ignorant, incompetent, and intransigent to change. That's a fun little story to repeat ad infinitum, but it's also completely wrong. The fault isn't on the coach in this situation, necessarily -- the fault lies just as strongly on the statistical analyst who refused to communicate effectively with his audience! Communication isn't simply a war of attrition, it's an earnest effort to find the right ways to present and fashion your work in a way that your audience can actually approach it. It's one thing to impress a bunch of people with a logorrhea of numbers and figures. It's quite another to actually make a presentation your audience can actually use.

One way to do that -- as Goldsberry knows full well -- is visuals. Another way is to hire the right people to analyze it. And yet another is to -- as many teams have -- ignore the data entirely and wait until a team figures it out and the information leaks. It's the classic "Lobochevsky-Lehrer strategy", where an organization can remain willfully resistant to pouring money into innovation with the goal of piggybacking off the first successful model once the details leak. (Lobochekvsy-Lehrer strategy is a rarely-used metaphor, so I'll explain in short -- I'm a big Tom Lehrer fan, and Lehrer has a song about the great mathematician Nikolai Lobochevsky. The song outlines the merits of plagiarizing in the academic world. Lobochevksy himself wasn't ACTUALLY an esepcially notablle plagiarist, and the song isn't a slur on his character. It's simply an example of a far-more-common-than-you-think tactic that companies, academics, and organizations use when they don't want to invest in research. Simply put, they ignore the problem entirely and hope that when someone actually figures it out they'll be able to find a leaked version of the work and piggyback off of that. It's a bit annoying, but it's something that happens extremely often. It's actually more surprising to me when everyone tries to innovate than it is when half the organizations try to piggyback. It's tried and true. It works, you know?)

Anyway. All-in-all, it was an excellent panel. As with Revenge of the Nerds, when MIT puts the video online, I highly recommend taking it in.

• • •

Day #2 coverage comes on Saturday. GET EXCITED!

Continue reading