
Tools Used: R, Excel
Skills: Data Modeling · Regression Analysis · Statistical Diagnostics · Storytelling with Data
A personal data case study focused on using regression analysis to uncover the key in-game statistics that impact NBA game outcomes by predicting point differential — how much a team wins or loses by.
NBA Point Differential Model: Understanding What Drives Team Wins
I've always been curious about what separates a good team from a great one.
This project let me combine my love for sports with data analysis to explore that question. Instead of focusing on wins and losses, I wanted to understand why — which performance metrics actually impact a team's success.
My goal was to build a model that not only predicts outcomes, but also offers insights for coaches, analysts, or even curious fans.
What actually drives a team to win — and by how much?
GOAL
I analyzed a dataset of 2,460 NBA games, each with detailed in-game statistics.
Using R, I developed a multiple linear regression model to predict point differential based on variables like rebounds, turnovers, three-pointers made, and shooting efficiency.
To ensure the model’s reliability, I evaluated its assumptions using diagnostic plots and applied a Box-Cox transformation to improve linearity. I also used stepwise regression based on AIC to refine the model for interpretability and performance.
Modeling What Moves the Scoreboard
APPROACH


The final model explained approximately 76% of the variance in game outcomes, with an adjusted R2 of 0.76.
Defensive rebounds, steals, and offensive rebounds were the strongest positive predictors of a team’s margin of victory, while turnovers and high field goal attempts (without corresponding efficiency) had the most negative impact.
Turnovers in particular stood out as the most detrimental to a team’s success.
These results emphasize the importance of both possession control and defensive strength in winning games.
Possession Control & Defense Drive Victory
KEY INSIGHTS

Impact of In-Game Stats on Point Differential


Q-Q Plot of Standardized Regression Residuals
MODEL DIAGNOSTICS
Validating Regression Assumptions
To ensure the model’s reliability, I conducted diagnostic checks to validate the assumptions of linear regression.
The residuals vs. fitted values plot shows a fairly even scatter around zero, indicating that the relationship between the predictors and the response variable is approximately linear and that the residuals have constant variance.
The Q–Q plot further confirms that the residuals are roughly normally distributed, which supports the validity of statistical inferences made from the model.
These diagnostics gave me confidence that the model is well-behaved and that its insights are trustworthy — a critical step when building models that will drive real decisions.
Residuals vs. Fitted Values
To test whether a transformation would improve model performance, I applied a Box-Cox transformation to the response variable and refit the model.
While the transformed model showed a slightly lower residual standard error and a marginally higher adjusted R², it also came with higher AIC and BIC scores. Since the transformation added complexity without meaningful gains in interpretability or predictive power, I chose to move forward with the original model.
This comparison reflects a common product decision-making tradeoff: balancing precision with clarity. Sometimes, the simpler, more interpretable solution is more valuable — especially when sharing insights across cross-functional teams.
Balancing Accuracy and Interpretability
MODEL COMPARISON
Comparison of Original vs. Box-Cox Transformation Model
Metrics
AIC
BIC
Res. Std. Error
1.705e+04
1.715e+04
7.734e+00
7.599e-01
7.606e-01
1.164e+01
1.916e+04
1.906e+04
Adjusted R²
Original
Box-Cox
This project pushed me to think critically about how data modeling connects to real-world outcomes — and how much thought goes into ensuring a model is both statistically sound and easy to communicate. It taught me the importance of balancing accuracy with interpretability, especially in fast-paced environments like product management where insights must be trusted and acted on quickly.
If I were to iterate on this project, I’d love to explore contextual factors like home vs. away games, back-to-back schedules, or player-specific performance — and eventually turn this into an interactive dashboard for sports analysts and fans.
REFLECTION
Next Project
Canvas Redesign

NBA Game Breakdown