The NBA Through Advanced Statistics and Regression

In the sports MBA program, we’re often encouraged to complete projects around real-life sports data, and back in our Statistics class Dom Lucq and I tried to discover how NBA advanced stats lead to wins, playoff appearances and championships through 13 seasons of NBA data from the NBA’s media stats site.

The first insight comes from the correlation matrix. Correlation coefficients will always be between -1 and 1, with 1 indicating a perfect positive linear relationship, -1 a perfect negative linear relationship and 0 no relationship.

Offensive rating is more closely correlated to wins than defensive rating (0.675 to -0.611, keeping in mind that defensive rating will be negative since a lower rating is better). In addition, there is a 0.470 correlation between offensive rating and playoffs and -0.597 between defensive rating and playoffs as well as a 0.155 correlation between offensive rating and championships and -0.214 between defensive rating and championships. That lends credence to the oft-repeated saying that offense wins games but defense wins championships. Keep in mind that there’s a smaller sample size for championships since there were only 13 champions in this time but 208 playoff teams.

It’s interesting to note the stats that are negatively correlated to wins. They include team turnover percentage (-0.376), pace (-.108) and offensive rebound percentage (-0.021). These carried over for playoffs (-0.295 turnover, -0.097 pace, -0.043 offensive rebounds) and championships (-0.028 turnovers, -0.037 pace, -0.007 offensive rebounds).

This is not to say that teams should turn the ball over, play slow and fail to get offensive rebounds. Perhaps it means more dynamic offenses turn the ball over as a function of having playmakers who can do more things. Also, maybe some of the slower teams are more efficient and perhaps the teams who grab a smaller percentage of offensive rebounds do a better job getting back on defense. That’s particularly true since overall rebound percentage is very much positively correlated with wins, as is assist-to-turnover ratio. True shooting percentage is key as well, with a 0.593 correlation to wins, a 0.410 correlation to playoff appearances and a 0.163 correlation to championships.

In our first go at the project, Dom and I included as many variables as we thought might prove useful. However, we have since learned about multicollinearity, a situation whereby two or more independent variables are closely correlated and thus explain the same relationship. For example, we included offensive rating, defensive rating and net rating although net rating is offensive rating minus defensive rating as well as rebound percentage, offensive rebound percentage and defensive rating percentage when the later two explain the former. This can be seen on the correlation matrix with high degrees of correlation.

After eliminating duplicate variables from the model and taking out the lockout-shortened 2011-12 season, I used XLMiner’s multiple linear regression function to create the following model with “Wins” as the dependent variable: 51.935 + 2.319(NetRating) + 0.289(AssistRatio) + 25.439(Rebound%) – 34.813(TeamTurnover%) + 0.924(TrueShooting%) – 0.255(Pace). This model explains 95.2 percent of the variation in team wins because it possesses an adjusted r-squared value of 0.952. However, almost all of this comes from net rating, as a model with only net rating and the constant has an adjusted r-squared value of 0.948. If one were to take net rating out of the model, the new model would explain 67.7 percent of the variation in wins.

I chose two teams close to my heart to manually check out the model, the 2006-07 Suns and the 2009-10 Suns. The 2006-07 edition was projected to win 58.895 games by the model and won 61 for an error of about -2.1, and the 2009-10 team won 54 after being predicted to win 53.657.

Looking at the entire dataset, the model missed most in terms of positive error with the 2005-06 Portland Trail Blazers, who won 21 games after being predicted to win 13. Third on that list are the 2006-07 Dallas Mavericks, who won 67 games despite being predicted to win 60 before flaming out in the first round of the playoffs against the Golden State Warriors. The model could see the Mavericks weren’t quite as good as a typical 67-win behemoth.

On the other end of the spectrum, no team fell short of expectations quite like the 2007-08 Toronto Raptors, who won 41 after being predicted for 50.5. Their residual of -9.5 was about 2.5 wins worse than the next worst projection. The 2005-06 Raptors ranked third on this list at 6.7 wins worse than expectations.

The 2008-09 Boston Celtics were just 0.0376 wins better than their prediction by winning 62 games and the 2003-04 Cleveland Cavs were 0.0238 under during their 35-win season better known as LeBron James’ rookie year, making them the teams the model predicted best. Overall 94 teams (26.4 percent) came within one win of their projection and 181 (50.8 percent) were within two wins.

For the original project, we performed a number of two-proportion and two-group mean comparisons and discovered some noteworthy items. In the original dataset, which didn’t include 2012-13 data but did include the lockout-shortened campaign, we found that playoff teams scored 7.81 points per 100 possessions more than non-playoff teams. We are 95 percent confident playoff teams are 7.2-8.5 points per 100 better than non-playoff squads. We are also 95 percent confident playoff teams grab between 0.94-1.71 percent more defensive rebounds than non-playoff teams. In addition, a championship defense is 2.4-6.4 percent more efficient than a defense that doesn’t win a title (there goes that “defense wins championships” thing again).

We also found some notable insights in the 12-year data set through descriptive statistics. The difference between the best and worst teams in net rating was 27 points, so if the 2007-08 Boston Celtics got into a time machine to play a series against the 2011-12 Charlotte Bobcats, they would be expected to win each game by 27 points on average. There is a 4.945 standard deviation in net rating, so the majority of the teams should fall between -9.88 and 9.90, which ended up happening with only two teams above that mark and nine below it in a sample of 356 data points. Those historically bad Bobcats were three standard deviations worse than average.

The lowest rebounding percentage came from the 2009-10 Golden State Warriors, as they grabbed a mere 44.4 percent of the available boards, 3.77 standard deviations below average. The 2011-12 Chicago Bulls were best at 53.9 percent, 2.61 standard deviations above average. To put the discrepancy in context, if there were 100 rebound opportunities per game (typically there are somewhere between 80 and 90), and each one creates a scoring opportunity, the Warriors averaged approximately 12 fewer scoring opportunities per game than their opponents, while the Bulls averaged about eight more per game.

The lowest true shooting percentage came from the 2002-03 Denver Nuggets at 46.9 percent, and the highest was from the 2006-07 Phoenix Suns at 59 percent. Based on two standard deviations, a majority of the results should fall between 49.25 percent and 57.05 percent (our low and high are not within the range as they are 3.2 and three standard deviations from the average, respectively).  Additionally, the Suns’ next three seasons were the next three highest true shooting percentages at 59 percent, 58.4 percent and 58.5 percent, respectively.  Finally, we are 95 percent confident that a future year’s cumulative true shooting percentage for all teams would fall between 52.95 percent and 53.35 percent, an incredibly narrow range.

As you watch the 2013-14 NBA season that tips off Tuesday night, perhaps some of these statistical truths can help inform your viewing.

2 Replies to “The NBA Through Advanced Statistics and Regression”

Comments are closed.