Tools Used: R, Excel

Skills: Data Modeling · Regression Analysis · Statistical Diagnostics · Storytelling with Data

A personal data case study focused on using regression analysis to uncover the key in-game statistics that impact NBA game outcomes by predicting point differential — how much a team wins or loses by.

NBA Point Differential Model: Understanding What Drives Team Wins

I've always been curious about what separates a good team from a great one.

This project let me combine my love for sports with data analysis to explore that question. Instead of focusing on wins and losses, I wanted to understand why — which performance metrics actually impact a team's success.

My goal was to build a model that not only predicts outcomes, but also offers insights for coaches, analysts, or even curious fans.

What actually drives a team to win — and by how much?

GOAL

I analyzed a dataset of 2,460 NBA games, each with detailed in-game statistics.

Using R, I developed a multiple linear regression model to predict point differential based on variables like rebounds, turnovers, three-pointers made, and shooting efficiency.

To ensure the model’s reliability, I evaluated its assumptions using diagnostic plots and applied a Box-Cox transformation to improve linearity. I also used stepwise regression based on AIC to refine the model for interpretability and performance. 



Modeling What Moves the Scoreboard

APPROACH

The final model explained approximately 76% of the variance in game outcomes, with an adjusted R2 of 0.76.

Defensive rebounds, steals, and offensive rebounds were the strongest positive predictors of a team’s margin of victory, while turnovers and high field goal attempts (without corresponding efficiency) had the most negative impact.

Turnovers in particular stood out as the most detrimental to a team’s success.

These results emphasize the importance of both possession control and defensive strength in winning games.

Possession Control & Defense Drive Victory

KEY INSIGHTS

Impact of In-Game Stats on Point Differential

Q-Q Plot of Standardized Regression Residuals

MODEL DIAGNOSTICS

Validating Regression Assumptions

To ensure the model’s reliability, I conducted diagnostic checks to validate the assumptions of linear regression.

The residuals vs. fitted values plot shows a fairly even scatter around zero, indicating that the relationship between the predictors and the response variable is approximately linear and that the residuals have constant variance.

The Q–Q plot further confirms that the residuals are roughly normally distributed, which supports the validity of statistical inferences made from the model.

These diagnostics gave me confidence that the model is well-behaved and that its insights are trustworthy — a critical step when building models that will drive real decisions.

Residuals vs. Fitted Values

To test whether a transformation would improve model performance, I applied a Box-Cox transformation to the response variable and refit the model.

While the transformed model showed a slightly lower residual standard error and a marginally higher adjusted R², it also came with higher AIC and BIC scores. Since the transformation added complexity without meaningful gains in interpretability or predictive power, I chose to move forward with the original model.

This comparison reflects a common product decision-making tradeoff: balancing precision with clarity. Sometimes, the simpler, more interpretable solution is more valuable — especially when sharing insights across cross-functional teams.

Balancing Accuracy and Interpretability

MODEL COMPARISON

Comparison of Original vs. Box-Cox Transformation Model

Metrics

AIC

BIC

Res. Std. Error

1.705e+04

1.715e+04

7.734e+00

7.599e-01

7.606e-01

1.164e+01

1.916e+04

1.906e+04

Adjusted R²

Original

Box-Cox

This project pushed me to think critically about how data modeling connects to real-world outcomes — and how much thought goes into ensuring a model is both statistically sound and easy to communicate. It taught me the importance of balancing accuracy with interpretability, especially in fast-paced environments like product management where insights must be trusted and acted on quickly.

If I were to iterate on this project, I’d love to explore contextual factors like home vs. away games, back-to-back schedules, or player-specific performance — and eventually turn this into an interactive dashboard for sports analysts and fans.

REFLECTION

Next Project

Canvas Redesign

NBA Game Breakdown

What Really wins games?

Solo Project
Data Analytics & Modeling
Dec 2024 - Jan 2025