By Matt Broadhead
Editor’s Note: This article was written by frequent commenter Matt Broadhead (aka EmeraldTwinkie). We appreciate his contribution.
I’m a bit of a math nerd: I was a math major in college, and I was captain of my high school’s math team. The fact I’m willing to admit that shows you how much of one I am. One of the ways this manifests: I enjoy playing around with sports stats.
When I say “playing around with stats,” I mean “pulling data from various sources, collating them in spreadsheets, and analyzing them to answer questions.”
About a week ago, I found myself once again wading through stats, looking for ways to tease apart individual players’ value and contributions, and I wondered how the performance of individual players in various stat categories game-by-game correlates to the team’s wins and losses. So, I pulled the individual advanced game logs from basketball-reference.com for each of the top 9 Wizards players (total minutes) and ran the numbers.
Here’s how to interpret this table:
- Each 3-digit decimal number is the Pearson (linear) correlation coefficient between the player’s performance in the metric (at the top of the column) in each game and the outcome of that game.
- Definitions of the metrics can be found by going here and clicking on “Glossary.”
- The correlation coefficient is a measure of how related two data series are: how often does a larger number in one series correspond to a larger number in the other — or in our case, how often does a larger number in a player’s particular game metric correspond to a win.
Under ideal circumstances (which these are not), coefficients can be roughly interpreted as follows:
- +.70 or higher — Very strong positive relationship
- +.40 to +.69 — Strong positive relationship
- +.30 to +.39 — Moderate positive relationship
- +.20 to +.29 — Weak positive relationship
- +.01 to +.19 — No or negligible relationship
- 0 — No relationship [zero correlation]
- -.01 to -.19 — No or negligible relationship
- -.20 to -.29 — weak negative relationship
- -.30 to -.39 — Moderate negative relationship
- -.40 to -.69 — Strong negative relationship
- -.70 or higher — Very strong negative relationship
Of course, as is usual with sports metrics, there are a lot of caveats (as I write this, today’s xkcd is perfect) — the usual ones about small sample size, unaccounted for factors, etc., but also the big one for this particular methodology: correlation is not causation. And there’s a more subtle caveat to using correlation coefficients: they do not account for comparative scale.
For example, if player A and player B have exactly the same eFG% in every game, but player A plays three times as many minutes, they will nevertheless have the same correlation between eFG% and winning. That is (presumably) a big part of why Aaron Holiday’s BLK% has a .518 correlation with Wizards wins.
What do the correlations say?
- The Wizards may need defense and defensive rebounds from Bradley Beal.
- More minutes for Kentavious Caldwell-Pope and Raul Neto have a strong positive correlation with wins. Neto’s shooting also correlates with winning.
- Strong overall performances from Spencer Dinwiddie and Kyle Kuzma have a strong correlation with winning.
Acknowledging that last caveat, I decided to see what running the same analysis with raw stats instead of “rate” stats would look like. Raw stats are less subject to the scale issue — while they still don’t account for consistent scale differences between players, they do better account for game-to-game differences for a particular player. A player might have the same AST% in two games while playing five minutes in one and 20 minutes in the other, but if that was true, their raw number of assists would be much different between the games.
Here’s the raw stats table (note: this table reflects 33 games, while the one above reflects 31):
Some observations on this table:
- There are two obvious anomalies among category leaders: Harrell at .483 for 3P% and Holiday at .500 (!) for BLK (again). Harrell’s is because he’s only attempted threes in 7 games, and they won all three games where he hit one. Holiday’s is because he has 8 blocks across 7 games, and they won all of those games.
There are some interesting negative correlations for positive stats:
- Avdija and Gafford for FGA
- Beal for 3P
- Avdija, Neto, and Beal for 3PA
- Neto for FT and FTA
- Gafford and (sorta) Harrell for ORB
- Kuzma for AST
- Holiday for STL
Most of the negative correlations are pretty weak, and some of them make sense, but one stands out to me: Kuzma’s assists. It appears that when he’s been called upon to facilitate the offense, it hasn’t gone well. Kevin Broom theorized it might be because his assists are correlated to turnovers, and that certainly seems to be the case: the correlation between his AST and the team’s TOV is .374.
Kevin also had a theory on the Gafford and Harrell negative ORBs: that they correlated with poor team shooting. The evidence is a little more mixed on this one: Gafford has a mild negative correlation (-.138) between his ORB and the team’s FG%, but Harrell’s is massive (-.545).
There are some interesting contrasts within categories:
- When Dinwiddie and Kuzma hit lots of threes, the Wizards were likely to win. When Beal did, they were more likely to lose.
- Gafford getting blocks is strongly positive, but Harrell getting blocks is very weakly negative.
- When Gafford or Kenatvious Caldwell-Pope get in foul trouble, it’s bad, but for Neto, Kuzma, Harrell, Beal, and Avdija committing fouls is mildly good.
- Harrell leads in FGM, FGA, and PTS, which suggests that running more of the offense through him has led to success — this is corroborated by the first table, where he’s the leader (by far) in USG%.
- Gafford having exactly 0 correlation for DRB is just bizarre.
In my correspondence with Kevin, he made a couple of suggestions: that I run correlations with scoring margin, and that I run them at the team level. Here’s the table with correlations to score margin instead of win/loss:
The trends are pretty similar, which is not surprising, though some things are accentuated a bit more or less. The biggest changes in each direction are KCP’s ORB going up from .026 to .266, and Kuzma’s FTM dropping from .329 to .099. I think that I prefer the win/loss correlation, because it seems like the margin correlation might be skewed by the fact that the Wizards’ average margin is -2.8 points, since they’ve had 11 double-digit losses versus 4 double-digit wins.
The team table introduces an interesting new element — opponent stats:
The most important of these factors (by a large margin) is opponent assists, with a whopping -.580 correlation. This perhaps makes sense, given that the Wizards’ two biggest defensive weaknesses relative to the league are spot up shooting (where they’re in the 10th percentile) and handoffs (7th percentile).
Incidentally, this also supports my current theory for the Gafford defense enigma — metrics disagree on his defensive value, largely because his box score defensive stats are good, but his on/off stats are not. Perhaps the answer is that he actually plays defense well, which forces opponents to kick to shooters, where they’re relatively more successful. When Harrell is in, opponents try to drive more — though they’re paradoxically less successful: the Wizards are in the 89th percentile defending PNR ball handler.
One might ask, “How does this compare to other teams?” Here’s the table for the currently division-leading Heat:
Their biggest factor is Team FG% (followed by Opp TRB, a closely related stat). So, their story is pretty straightforward: when they’ve shot well, they’ve generally won.
The data is similar for the Wizards, which makes sense because the team that shoots better wins about 78% of the time in the NBA. For Washington, the strongest indicators of winning and losing are shooting from the floor, three-point percentage and blocking shots. More simply: shooting well and playing good defense.