Understanding Basketball Through Data
From the first box score to a scouting report a coaching staff can run a game plan on — a practical, end-to-end guide to how modern basketball is measured, modelled and explained.
Why Basketball Analytics Matters
Basketball is a high-frequency, high-possession game where every trip down the floor is a decision with a measurable expected value. That combination makes it the team sport where good measurement most directly, and most quickly, changes how the game is played.
Unlike low-scoring invasion games, basketball gives you roughly 100 possessions per team per game — a large sample, every night. Points are frequent, the floor is small and well-bounded, and the same actions (a pick-and-roll, a closeout, a corner three) repeat thousands of times a season. That density is why basketball analytics moved from the fringe to the front office faster than almost any other sport: the data is rich, the patterns are stable, and the financial stakes of a single roster or shot-selection decision are enormous.
The defining insight of the analytics era is brutally simple and it reshaped the sport: not all shots are worth the same. A long two-pointer and a corner three are taken from almost the same spot, but one is worth 50% more when it goes in. Once teams started valuing shots by their expected points rather than counting them equally, the mid-range jumper collapsed and the three-point revolution began. Analytics didn't just describe basketball — it changed what the game looks like.
From efficiency to the three-point revolution
The intellectual foundation was laid by Dean Oliver's Basketball on Paper (2004), which reframed the game around possessions and efficiency rather than raw per-game totals. Scoring 110 points means little until you know how many possessions it took. From that single shift — measuring points per possession — flow nearly all modern metrics: offensive and defensive ratings, true shooting, the Four Factors, and ultimately the shot-value logic that pushed teams to the arc and the rim while abandoning the mid-range.
Who this handbook is for
Aspiring analysts
You will get the conceptual scaffolding and the practical Python/visualization workflow used in pro and college programs — enough to build a portfolio that gets you hired.
Scouts & front offices
You will learn how metrics become player profiles, why role and context matter, and how to fuse the eye test with impact data and on/off numbers.
Coaches
You will see how raw feeds become a scouting report you can act on — personnel tendencies, coverage plans, where to attack and where you're exposed.
The curious fan
You'll never watch a game the same way. TS%, usage, RAPM and shot quality will stop being broadcast jargon and start being lenses.
The promise & the limit
Data does not replace the coach's eye or the scout's instinct. It disciplines them — it tells you when intuition is fighting the evidence, sizes the sample you're reasoning from, and turns "he's a good shooter" into "he's in the 88th percentile on catch-and-shoot threes off the right wing." The best practitioners are bilingual: fluent in basketball and in data, and humble about both.
How to read this handbook
The four parts build on each other. Part 1 establishes what data is and how it's collected. Part 2 covers the advanced metrics and models that turn data into meaning. Part 3 is the hands-on toolkit — workflow, Python, visualization and video. Part 4 ties it together with realistic, worked case studies, ending in a full scouting report. Read it linearly or jump via the sidebar; cross-references are linked throughout.
1.1 The Role of a Basketball Data Analyst
"Basketball data analyst" is not one job. It's a family of roles that share a skill set but point at very different decisions — from the draft room to the film session to the trade deadline.
At its core the role is translation. On one side sits high-volume tracking and play-by-play data; on the other sit basketball people — coaches, GMs, scouts — who need a clear answer to a concrete question. The analyst turns "we have a season of tracking data" into "their drop coverage gives up the pull-up three; we should hunt their center in space." The value is never the chart. It is the decision the chart changes.
| Role | Primary question | Typical employer |
|---|---|---|
| Advance / opponent scout | How does the next opponent play, and how do we beat them? | Coaching staff |
| Player personnel / draft analyst | Who should we draft, sign or trade for, and at what value? | Front office |
| Coaching / performance analyst | Is our offense/defense getting good possessions? | Coaching staff |
| Player development analyst | What should this player work on, and is it improving? | Development staff |
| Data scientist / engineer | How do we build the models and pipelines everyone uses? | Teams, vendors, media |
| Betting / quant analyst | Is the market price wrong, and by how much? | Sportsbooks, syndicates |
What the job actually demands
Three competencies sit underneath all of these: basketball understanding (you must know why a number means something tactically), technical skill (querying, cleaning and modelling — typically Python, SQL and a BI tool), and communication (a brilliant model nobody acts on is worthless). Beginners over-index on the technical axis. In a team, a clear one-slide answer beats an elegant notebook every time.
Mindset
Think of yourself as decision-support, not a stats provider. Before any task, ask: "Whose decision am I trying to improve, and what would change their mind?" If you can't answer that, you're not ready to open the data.
1.2 Data Literacy and a Scientific Approach to Data
Data literacy isn't memorising metrics. It's the habit of asking where a number came from, what it can and cannot say, and how confident you should be — the scientific method applied to a basketball court.
Question
Start with a basketball question, not a dataset. "Are we generating efficient shots in the half-court?" is researchable. "Let's look at the data" is not.
Hypothesis
State what you expect and why. "I think our poor offense is a shot-selection problem, not a talent problem." A hypothesis you can be wrong about is the engine of good analysis.
Evidence
Gather the right data at the right grain. Match the metric to the question and be honest about sample size.
Test
Compare against a baseline — league average, the opponent, your own season. A number without a reference point is meaningless.
Communicate
Deliver the answer in the language of the decision-maker, with the uncertainty attached.
Three habits that separate professionals from dashboards
Per-possession, not per-game. Basketball's foundational adjustment is pace. A team scoring 115 points isn't necessarily good — if it plays fast, that may be below average per possession. Always think in points per 100 possessions, not per game. This one habit prevents more bad conclusions than any other.
Sample size and stabilization. Different stats stabilize at different rates. Usage and shot volume settle quickly; three-point percentage is famously noisy and needs a large sample before you trust it. A hot week of shooting is usually variance, not a new skill. Professionals distrust extreme rates from small samples and favour multi-game windows.
On/off and context. A lineup that outscores opponents by 15 per 100 might be carried by a teammate, a weak bench it faces, or pure noise over 80 minutes. Raw on/off is a starting point, not a conclusion — it must be adjusted for who else is on the floor (the whole motivation for RAPM in 2.5).
Common analytical traps
Per-game illusion — ignoring pace. Small-sample shooting — over-reading a few good/bad nights from three. Outcome bias — calling a good process bad because the shots didn't fall. Counting-stat worship — rewarding volume (points, rebounds) without efficiency or context.
1.3 Data Types and Data Collection Systems
Everything downstream — every metric, model and report — is constrained by the data underneath it. Knowing the layers of basketball data, and their blind spots, is the single most useful piece of literacy in the field.
The four layers of basketball data
1 · Box-score data
The traditional line: points, rebounds, assists, steals, FG%. Universal and free, but context-poor — it can't tell a contested shot from an open one, or a hockey assist from a kick-out.
2 · Play-by-play (PBP) data
Every game event in sequence with score, clock and players on court. The basis for lineup analysis, on/off, pace and possession-level ratings. The workhorse of public analytics.
3 · Tracking data
The position of all 10 players and the ball, sampled ~25–60 times per second. Captures the off-ball game — spacing, screens, closeouts, defensive matchups, shot contests.
4 · Biometric / load data
From wearables and force plates: workload, jump load, fatigue, sleep. The fuel for sports-science and availability decisions.
Play-by-play and the possession
PBP data is where most public analysis lives. Each row is an event — a shot, foul, rebound, substitution — with a timestamp, the score, and (in good feeds) the five players on the floor for each team. From this you reconstruct possessions, the atomic unit of the modern game, and compute offensive/defensive ratings, pace, lineup splits and on/off. It is rich enough for most metrics but blind to anything off the ball: it records that a shot was missed, not that a perfect closeout forced it.
0.44 approximates how many free-throw trips end a possession (and-ones, technicals, etc.)
Tracking data and the off-ball game
Tracking data turns basketball into geometry. With every player's coordinates dozens of times a second, you can measure how open a shooter was, how far the nearest defender stood, how a screen created separation, who actually guarded whom, and how much rim deterrence a center provides. It powers the most advanced models in the sport — shot quality, defensive matchup data, off-ball gravity and spacing analysis.
The NBA's collection history is the reference timeline for the field:
- SportVU (STATS) introduced optical tracking in 2009, going league-wide by the 2013–14 season — the first major US league to track every game.
- Second Spectrum became the official provider from 2017 to 2023, using a "center of mass" single-point model and adding the AI/visualization layer.
- Since the 2023–24 season, Sony's Hawk-Eye system provides raw 3D optical tracking at 60 fps with 29 pose points per player; Sportradar distributes the data and Second Spectrum still supplies AI analysis and broadcast augmentation. The WNBA uses Second Spectrum/Genius Sports tracking.
The data & tools landscape
For learners and pros alike, a layered ecosystem has grown up around this data. Basketball-Reference and the official stats.nba.com are the canonical public sources; the nba_api Python package wraps the NBA stats endpoints (including tracking-derived stats and shot locations). pbpstats and PBP Stats serve possession-level and lineup data; Cleaning the Glass offers garbage-time-filtered team and player data prized by analysts. On the scouting side, Synergy Sports (play-type breakdowns synced to video) and Hudl/InStat are the professional standards.
| Source | Strength | Data type |
|---|---|---|
| stats.nba.com / nba_api | Official stats incl. tracking & shot locations | Box, PBP, tracking |
| Basketball-Reference | Best free historical box & advanced stats | Box, advanced |
| pbpstats / Cleaning the Glass | Possession, lineup & on/off data | PBP, lineups |
| Synergy Sports | Play-type efficiency synced to video | Tagged events + video |
| Hawk-Eye / Second Spectrum | Raw optical tracking (team-side) | Tracking |
Key takeaway
Match the data type to the question. "How efficient was our offense?" → PBP and ratings. "Was that a good shot?" → tracking-based shot quality. "Who actually guarded the star?" → tracking matchup data. "Is the player overworked?" → biometrics. Most mistakes come from forcing one data layer to answer a question it cannot see.
1.4 A Day in the Life of a Professional Basketball Data Analyst
The job runs on a relentless schedule — an 82-game season means a game every other night and an opponent always looming. Most of the work is preparation for a film session and a game plan that have to land in hours, not weeks.
Below is a representative day for an advance scout / coaching analyst the day before a game.
Last night's games are processed. Verify the opponent's recent games imported cleanly, refresh lineup and on/off databases, scan for injury/rotation news that changes the matchup.
Pull the opponent's last 10–15 games. Build the picture: pace, offensive and defensive rating, primary actions (pick-and-roll, ball-screen coverages), shot diet, personnel tendencies and matchups to exploit.
Tie numbers to film. "They guard the PnR in drop" is abstract; three Synergy clips of their center sitting back makes it coachable. Cut 8–12 key clips for the staff.
Present the game plan to the head and assistant coaches: coverages, what to take away, where to attack. Lead with the answer; hold detail in reserve.
Translate the plan into scouting cards and personnel clips players actually use — tendencies for the man they're guarding, our keys on offense. Less is more.
Between deadlines: a GM query on a trade target, maintaining the draft model, building tooling for next week's road trip.
On game night the rhythm changes: live tracking of lineups and fouls, in-game numbers for the coaching staff, and a fast post-game summary. The day after is for the post-game deep-dive (see 4.3). The throughline is that data work on a team is never an end in itself — it is always scaffolding for a coaching or personnel decision, delivered against the clock of the next tip-off.
"The analyst's job is to make the coach's next decision a little less of a guess — and to do it before the shootaround."A working principle of team analytics staffs
2.1 Introduction to Advanced Metrics
Traditional stats count what happened. Advanced metrics measure efficiency and impact — points per possession, value per shot, and how a player changes the scoreboard. That shift, from counting to valuing, is the whole revolution.
"Points per game" is a count that rewards volume and pace. True Shooting % replaces it with efficiency: how many points a player produces per scoring attempt, properly crediting threes and free throws. Do that systematically — value possessions, value shots by location, value each player's net effect — and you get a family of models built on one foundation: the possession, and the points expected from it. Three properties make a good advanced metric:
- Predictive — it should describe future performance better than the raw outcome (efficiency predicts future scoring better than past points).
- Stable — it should settle on a reasonable sample, reflecting skill not variance.
- Interpretable — a coach should understand what it rewards. A black-box number nobody trusts gets ignored.
The Four Factors — the foundation
Before any fancy model, learn Dean Oliver's Four Factors. From Basketball on Paper (2004), they are the four things that decide who wins, and together they explain roughly 96% of the variance in wins. Every team-level analysis starts here.
| Factor | What it measures | Oliver's weight |
|---|---|---|
| Shooting — eFG% | Shooting efficiency, crediting the 3 | ~40% |
| Turnovers — TOV% | Turnovers per possession | ~25% |
| Rebounding — ORB% | Share of available offensive rebounds | ~20% |
| Free throws — FTR | Getting to the line and converting | ~15% |
The mental model
Picture an "expected points" value attached to every possession. Each action — a shot taken from one spot rather than another, a turnover, an offensive rebound, a trip to the line — moves that expected value up or down. Almost every metric in Part 2 is a different way of estimating that movement, at the level of a shot, a player, or a lineup.
2.2 Scoring & Shooting Metrics
Shooting metrics answer two separate questions beginners constantly conflate: how efficiently did they score? and were those good shots to take? The first is about conversion; the second is about shot selection.
True Shooting % and Effective FG%
Raw field-goal percentage is broken: it treats a three like a two and ignores free throws. Two fixes:
Effective Field Goal % (eFG%) credits the extra value of the three-pointer. True Shooting % (TS%) goes further, folding in free throws to capture total scoring efficiency per possession used. TS% is the single best one-number scoring-efficiency stat, and league-average TS% (~0.57–0.58 in the modern NBA) is the reference every player should be measured against.
TS% = PTS ÷ ( 2 × (FGA + 0.44 × FTA) )
eFG% fixes the 3-pointer · TS% adds free throws 0.44 = the free-throw-trip adjustment from 1.3
Shot value: the logic that reshaped the NBA
The deepest idea in basketball analytics is expected points per shot (xPPS) — the points a shot is worth on average, given where (and how) it's taken: the probability it goes in, times its point value. This single concept explains the modern game. A corner three and a long two come from similar distances, but the three's higher payoff makes its expected value far greater even at a lower make rate. Rank every shot location by xPPS and you get the league's collective answer: shoot at the rim, shoot from three, and avoid the long mid-range.
e.g. a 38% three → 0.38 × 3 = 1.14 xPPS a 40% long two → 0.40 × 2 = 0.80 xPPS the three wins despite the lower make rate
Reading a shot chart
The shot chart is basketball's signature visual. Each attempt is plotted where it was taken, coloured by make/miss (and, in advanced versions, by efficiency relative to league average from that spot). At a glance you read a player's or team's shot diet: a healthy modern profile clusters at the rim and behind the arc; a heavy mid-range concentration is a flag (unless it's an elite shot-maker).
Shot quality vs shot making
Tracking data split shooting into two skills. Shot quality (how good a look was — distance, defender proximity, catch-and-shoot vs pull-up, time of possession) measures shot selection and creation. Shot making — actual results versus the quality-expected baseline — measures the shooter's skill at converting. A player can post a great FG% on easy looks (good system, weak self-creation) or a modest FG% on brutal contested shots (elite shot-maker). Separating the two is one of the most valuable things tracking data does, and it's the basketball analogue of the expected-vs-actual logic used across sports.
The cardinal sins of shooting stats
Don't judge a shooter on 3P% over a small sample — it's the noisiest common stat. Don't reward raw points without efficiency or usage context. Don't treat a hot stretch as a permanent skill upgrade. And don't confuse a player who takes good shots (selection) with one who makes hard ones (skill) — tracking shot quality is how you tell them apart.
2.3 Playmaking & Creation Metrics
Scoring is the visible output; creation is the engine. These metrics value the passing, driving and gravity that manufacture good shots for others — and the burden a player carries doing it.
Usage, assists and the cost of creation
Usage Rate (USG%) estimates the share of his team's possessions a player "uses" while on the floor (via shots, free-throw trips and turnovers). It's the volume axis: a 30%-usage star is the offense's hub; a 12%-usage role player is a finisher. Crucially, efficiency must always be read against usage — keeping a high TS% at 30% usage is far harder, and more valuable, than at 15%.
Assist % (AST%) estimates the share of teammate field goals a player assisted while on court. But the traditional assist is a crude, scorer-dependent stat: it ignores the "hockey assist" (the pass before the pass), the drive-and-kick that bent the defense, and the gravity that freed a shooter without a pass at all. That's why creation analysis increasingly leans on tracking.
Tracking-era creation: passing value and gravity
Tracking data finally lets us value creation properly. Potential assists and assist points created credit passes that should have led to baskets regardless of the finish — the playmaking analogue of expected assists in other sports. Drives, paint touches and passes out of the paint quantify rim pressure. And gravity — measured from how defenders collapse toward a player — captures the value of a shooter or scorer who creates space for teammates without touching the ball. Together these reveal the creation that box-score assists miss entirely.
How to read it
Never read efficiency without usage, or assists without role. A "low" TS% on enormous usage can be elite offense; gaudy assist totals next to a star scorer can be inflated. Pair volume (USG%), efficiency (TS%) and creation (AST%, potential assists) to see a player's true offensive job — and how well they do it.
2.4 Defense & Rebounding Metrics
Defense is the hardest part of basketball to measure, because the best defensive play is often the shot that's never attempted, or the drive that never happens. Box-score defense is a weak proxy; tracking changed the game.
Why box-score defense misleads
Steals and blocks are real skills but capture a tiny, gamble-prone slice of defense. They reward risk and ignore the core job: positioning, closeouts, rim deterrence, and the discipline of not fouling or gambling. A great defender who quietly forces opponents into bad shots may post unremarkable steals and blocks. This is why, historically, the best single-number defensive signal came not from the box score but from impact metrics (on/off and RAPM — see 2.5) and, more recently, from tracking.
Team defense & the possession lens
Defensive Rating (DRtg) — points allowed per 100 possessions — is the team foundation, and the defensive Four Factors (opponent eFG%, forced TOV%, defensive rebound %, opponent FTR) decompose it into actionable pieces: are we giving up good shots, fouling too much, or losing the defensive glass?
Rebounding the right way
Rebounding must be measured as a rate, not a total. Rebound % (the share of available rebounds a player or team grabs) controls for pace and opportunity — a center on a fast, miss-heavy team will rack up boards that a rate stat correctly discounts. Offensive and defensive rebound rates are two of the Four Factors and a core part of any team profile.
Tracking-era defense
Tracking finally put numbers on the invisible work: matchup data (who actually guarded whom, and how that assignment fared), contests and rim protection (opponent FG% at the rim with a defender present versus the league baseline — true rim deterrence), closeout speed and distance contested. These let an analyst credit the defender who held the opposing star below his average, or the center whose presence alone suppressed shots at the rim — the "play that never had to happen."
The defensive humility rule
No single public number captures individual defense well. The honest approach triangulates: impact metrics (RAPM/EPM), tracking (matchup & rim data), opponent shot quality, and film. Anyone selling you one clean "defensive rating" for a player is overselling — defense is where data and the eye must work together most closely.
2.5 Player Impact, Lineups & Clustering
The holy grail of basketball analytics is a single, fair number for a player's total value. The journey to it — through plus/minus, adjusted plus/minus and RAPM — is the most important modelling story in the sport, and clustering complements it by defining roles.
From plus/minus to RAPM
The chain of reasoning is worth understanding because every modern impact metric sits on it:
Plus/Minus
The net score change while a player is on court. Intuitive, but hopelessly confounded — a role player next to stars looks great; a star on a bad team looks poor.
Adjusted Plus/Minus (APM)
Use regression to estimate each player's effect controlling for the other nine players on the floor. Conceptually right, but noisy and unstable.
Regularized APM (RAPM)
Apply ridge regression (a Bayesian shrinkage toward zero) to tame the noise, producing far more reliable estimates from limited data. RAPM is the modern foundation.
Box + RAPM hybrids
Pure RAPM needs many seasons to stabilize, so modern metrics blend box-score and play-by-play info to predict RAPM with a Bayesian prior — faster and more stable.
This is why the alphabet soup of impact metrics exists. BPM (Box Plus-Minus) is box-only — quick but blind to defense it can't see. EPM (Estimated Plus-Minus) combines box and play-by-play per 100 possessions and is widely regarded among the best public metrics. DARKO emphasises day-by-day, predictive player tracking; LEBRON and the now-retired RAPTOR are other public blends; PER (Hollinger) is the original box-score all-in-one, useful but offense-biased. Most of them, under the hood, are predicting RAPM.
How to use impact metrics responsibly
Treat them as estimates with error bars, not gospel. They disagree with each other and with the eye — that disagreement is information, not noise to ignore. Use them to rank, flag and stress-test opinions, never to settle an argument alone. And know each metric's bias: box-only metrics miss defense; pure on/off needs huge samples.
Lineups and on/off
Because basketball is five-on-five, lineup data is uniquely powerful: you can measure how any five-man unit performs per 100 possessions, and how a team does with a player on versus off the floor. It surfaces combinations that work for non-obvious reasons (spacing, defensive cover) and ones that don't despite the talent. The caveat from 1.2 always applies: small samples and shared-court confounding mean raw lineup splits are a starting hypothesis, not proof.
Clustering: roles beyond position
The traditional five positions (PG, SG, SF, PF, C) are nearly obsolete as descriptors. A modern "center" might be a rim-running lob threat or a stretch-five who lives at the arc — opposite profiles sharing a label. Clustering represents each player as a vector of style metrics (shot location mix, usage, assist rate, rim pressure, three-point rate, defensive activity) and groups players whose statistical fingerprints match, regardless of nominal position. The output is data-driven roles — "stretch big," "low-usage 3&D wing," "high-usage shot creator" — that power player-similarity search and roster construction.
Why front offices love clustering
It answers "find me a cheaper version of this player's role," ensures new signings fit the system and the stars' play styles, and surfaces undervalued players whose contribution isn't visible in points and rebounds. A caution: clustering groups by style, not quality or level — always pair the role label with an impact and level-of-competition check before drawing a conclusion. See 4.4.
2.6 How Data Shapes Roster & Scouting Decisions
Metrics are only worth the decisions they improve. Here's how the models above feed the decisions a franchise spends most money and emotion on: who to acquire, how to play, and whether the process is working.
Roster construction & the draft
Modern team-building is a portfolio problem, and data widens the funnel. Instead of the handful of prospects a scout happened to see, a front office can screen every draft-eligible or available player against a statistical and physical profile, then send scouts to watch the shortlist. Impact metrics rank value; clustering ensures fit (you can't play five high-usage scorers); age, contract and injury data flag value and risk. The model never makes the pick — it decides who gets watched and valued, which is where the leverage is. It also guards against bias: the eye over-weights a great tournament game; the data remembers the whole season and adjusts for competition.
Tactics & game planning
Data shapes both how a team plays and how it prepares for opponents. A staff can see, with evidence, that the offense generates too many long twos and redesign actions toward the rim and arc; that the defense leaks corner threes and adjust its rotations; that an opponent's star is far less efficient going left and load the strong side. The opponent report in 4.2 is this in action.
Development & performance review
Post-game, data separates process from result (4.3): did we generate efficient possessions and force bad ones, regardless of the final margin? For individuals, tracking and shot-quality data set development targets — a young guard whose shot selection lags his shot-making, a big whose rim protection is elite but whose closeouts leak points.
The governing principle
Data earns its place when it changes a decision the team would otherwise make worse. If a beautiful model wouldn't alter who you draft, how you set up, or what you tell a player, it's decoration. Start from the decision and work backward to the metric — never the other way around.
3.1 The Basketball Data Analyst Workflow
Behind every clean shot chart is a pipeline. Professional analysis is repeatable: the same steps, run reliably, before every game. Learn the workflow and the specific tools become interchangeable.
Acquire
Get the data — the NBA stats API via
nba_api, Basketball-Reference exports, pbpstats, or a team's tracking feed. Know the source and schema before anything else.Clean & standardise
Reconstruct possessions, standardise player IDs and shot coordinates, filter garbage time. Unglamorous and essential.
Transform & model
Aggregate to the right grain (per 100 possessions, per shot), compute metrics (TS%, USG%, ratings) and apply models (shot quality, RAPM).
Visualise & explore
Shot charts, lineup tables, impact maps, radars — both to find the story and to tell it.
Communicate
Package the answer for the decision-maker: a scouting card, a slide, a film cut-up with the numbers overlaid.
Automate
Turn the one-off into a template. Tomorrow's opponent report should be a parameter change, not a rebuild.
| Layer | Tools | Why |
|---|---|---|
| Wrangling & modelling | Python (pandas, numpy, scikit-learn) | The industry default for analysis & ML |
| Basketball-specific | nba_api, pbpstats, py-ball | Stats endpoints, possessions, shot data |
| Storage / querying | SQL, PostgreSQL, DuckDB | Season-scale data lives in databases |
| Reporting / BI | Tableau, Power BI, Excel | Stakeholder-facing dashboards |
| Video | Synergy, Hudl, InStat | Play-type tagging & clip delivery |
Where to start, free
Install Python (via Anaconda), then pip install nba_api pandas matplotlib. The nba_api package wraps the official NBA stats endpoints — box scores, play-by-play, tracking-derived stats and shot locations — and Basketball-Reference covers historical and advanced stats. You can reproduce most of this handbook on real games without spending a penny.
3.2 Python Basics for Basketball
You don't need to be a software engineer. You need to pull a table, filter it, group it and compute a metric — 80% of basketball analysis is exactly that, done well.
The workhorse is pandas, whose core object is the DataFrame — a spreadsheet you control with code. Below we pull a team's season stats from the NBA stats API with nba_api and compute True Shooting % — the most common starting task in basketball analysis.
from nba_api.stats.endpoints import leaguedashteamstats
import pandas as pd
# 1 — pull base team stats for a season
df = leaguedashteamstats.LeagueDashTeamStats(
season="2023-24", per_mode_detailed="Totals"
).get_data_frames()[0]
# 2 — True Shooting %: PTS / (2 * (FGA + 0.44 * FTA))
df["TS%"] = df["PTS"] / (2 * (df["FGA"] + 0.44 * df["FTA"]))
# 3 — rank the league by scoring efficiency
top = (df[["TEAM_NAME", "PTS", "FGA", "FTA", "TS%"]]
.sort_values("TS%", ascending=False)
.head(5)
.round({"TS%": 3}))
print(top.to_string(index=False))
# TEAM_NAME PTS FGA FTA TS%
# Celtics 9405 7203 1487 0.603
# Nuggets 9268 7340 1626 0.589
# ...
Three steps — pull, derive, rank — turn raw totals into the league's scoring-efficiency table.
The same filter → derive → group pattern scales to almost everything. Below, we pull a player's shot data and compute expected points per shot by zone — the shot-value logic from 2.2, in code.
from nba_api.stats.endpoints import shotchartdetail
import numpy as np
shots = shotchartdetail.ShotChartDetail(
team_id=0, player_id=201939, # Stephen Curry
season_nullable="2023-24", context_measure_simple="FGA"
).get_data_frames()[0]
# point value of each attempt (3 if behind the arc, else 2)
shots["pts_value"] = np.where(shots["SHOT_TYPE"].str.contains("3PT"), 3, 2)
# expected points per shot = make rate * point value, grouped by zone
xpps = (shots
.groupby("SHOT_ZONE_BASIC")
.apply(lambda z: (z["SHOT_MADE_FLAG"].mean() * z["pts_value"].mean()))
.sort_values(ascending=False)
.round(2))
print(xpps)
# Restricted Area 1.28 # at the rim — highest value
# Above the Break 3 1.16
# Left Corner 3 1.20
# Mid-Range 0.78 # the dead zone
Feature engineering: deriving expected points per shot by zone. The result rediscovers why teams shoot at the rim and the arc, not the mid-range.
Beginner pitfalls
Always think per-possession — totals and per-game numbers hide pace; divide by possessions (or per 100). The NBA API rate-limits — add small delays between calls and cache results locally. Filter garbage time for true-context stats. Mind small samples — groupby will happily compute a 3P% from nine attempts and report nonsense; carry an attempts column and filter on it.
3.3 Data Visualization Basics for Basketball
In basketball, the half-court is the canvas. A good basketball visual places data in spatial context so a coach reads it in seconds. The shot chart and the player radar are the two you'll build most.
Plotting a shot chart
Using the shots data above, you draw a court and scatter each attempt at its (LOC_X, LOC_Y), coloured by outcome. (The court lines can be drawn with matplotlib patches, or with a helper like the popular nba-api community court-drawing functions.)
import matplotlib.pyplot as plt
made = shots[shots["SHOT_MADE_FLAG"] == 1]
miss = shots[shots["SHOT_MADE_FLAG"] == 0]
fig, ax = plt.subplots(figsize=(7, 6.6))
draw_court(ax) # community helper: paints lines/arc
ax.scatter(miss["LOC_X"], miss["LOC_Y"], c="#9aa0a6", s=18, alpha=.5, label="Miss")
ax.scatter(made["LOC_X"], made["LOC_Y"], c="#1e7d4f", s=22, alpha=.8, label="Make")
ax.set_xlim(-250, 250); ax.set_ylim(-50, 420); ax.axis("off")
ax.legend(loc="upper right"); ax.set_title("Curry — shot chart, 2023-24")
plt.show()
The same half-court shot chart shown schematically in 2.2 — here generated from real shot coordinates. Hexbin versions colour each hex by efficiency vs league average.
The player radar
Radars (spider charts) show a player's statistical fingerprint against positional peers. Each spoke is a metric scaled to a percentile, so the shape — not the raw value — carries the role. The interactive chart below renders the idea for the web.
Principles for basketball charts
- Percentiles beat raw numbers for comparison — "90th percentile TS% at his position" lands instantly; "0.61" doesn't until you know the distribution.
- Per-100, not per-game — always neutralise pace before comparing players or teams.
- Always show the reference — versus league average, position peers, or the opponent. A lone number isn't a story.
- Colour shots by value, not just make/miss — the advanced shot chart colours each zone by efficiency relative to league average, revealing where a player beats the baseline.
- Less ink, more signal — a coach has thirty seconds. Strip chart-junk; highlight the one thing that matters, and keep palettes colour-blind- and projector-safe.
3.4 Combining Video and Data Using Professional Tools
Numbers convince analysts; video convinces players. The decisive skill on a staff is welding the two so a statistic becomes a coachable clip in a film session.
"They score 1.05 points per possession in pick-and-roll" means nothing to a player. Three clips of their guard turning the corner against drop coverage means everything. The tool that connects them is Synergy Sports, which tags every possession by play type (pick-and-roll ball-handler, spot-up, transition, post-up, isolation, hand-off…) and syncs each to video — so a data filter instantly returns the matching clips. Hudl and InStat play similar roles.
Play-type tagging
Synergy classifies and grades every possession by type and outcome, giving you efficiency numbers (points per possession) for each action — and the film behind each number in one click.
Synced data + film
Because events carry timestamps and player tags, you can query the data ("their center in pick-and-roll defense") and jump straight to those clips — analysis and evidence in one motion.
The data-to-video loop in practice
Find the pattern in the data
e.g. "they give up 1.12 PPP to pull-up threes against drop coverage."
Filter to the possessions
Pull every Synergy possession matching the play type and outcome.
Pull the clips
The platform returns those exact video windows, tagged and trimmed.
Build the package
A short, sharp cut-up — ideally with the coverage and the number on screen — for the film session and the scouting card.
Why this is the job
The analyst who can say "here's the tendency, here's the proof on film, here's our coverage" is worth far more than one who only makes charts. Video is how data clears the final, hardest hurdle: getting a player to believe it and change behaviour.
3.5 A Day in the Life of a Professional Basketball Data Scientist
If the analyst (1.4) lives on the game schedule, the data scientist lives on the model-and-pipeline calendar — less about the next opponent, more about the systems everyone else depends on.
Check overnight jobs. Did every game's play-by-play and tracking data ingest? Did the nightly metric run finish? Data engineering is most of data science, and broken pipelines block the whole department.
Improve an in-house model — re-fit the RAPM/impact model on new games, refine a shot-quality model with tracking features, debug a draft projection that's drifting.
A model is only trusted if it's tested. Back-test projections against what actually happened; quantify uncertainty; resist shipping something that fit last season but won't generalise.
Build the internal app, query layer or notebook template that lets non-coding analysts and scouts self-serve. Force-multiplying the department is the highest-leverage work here.
Read a new paper (shot quality, defensive value, lineup priors), prototype an idea, present findings to the front office. The public field moves fast; staying current is part of the job.
| Data analyst | Data scientist | |
|---|---|---|
| Cadence | The game schedule | The model/release cycle |
| Output | Scouting reports, clips, answers | Models, pipelines, tools |
| Audience | Coaches, scouts | Analysts, front office |
| Core skills | Basketball + comms + analysis | ML + engineering + stats |
In smaller programs one person wears both hats; in NBA front offices they're distinct teams. Both need basketball understanding — a data scientist who can't tell a meaningful feature from a noisy one builds elegant, useless models.
4.1 Introduction to Use Cases
Everything so far — data types, metrics, tools — exists to serve a decision. Part 4 walks four realistic jobs an analyst is actually handed, each ending in something a coach or front office can use.
The four cases below are deliberately the everyday core of the role, not exotic projects. They use representative (not real) numbers so you can follow the reasoning, which is the transferable part. Each follows the analytical loop from 1.2: a question, a hypothesis, evidence weighed against a baseline, and a clearly communicated answer with its uncertainty.
4.2 Reading the opponent
Pre-game. How do they play, and where can we attack them?
4.3 Post-game review
Did our process work, beyond the final score?
4.4 Player analysis
Is this player a fit — for our system and our cap sheet?
4.5 The scouting report
Pulling it together into a document the staff acts on.
4.2 Reading Your Opponent: A Practical Case Study
The brief: "We play the Vipers tomorrow. Tell the staff how they play and how we beat them." You have their last fifteen games of play-by-play, tracking and Synergy data.
Step 1 — Establish identity (the baseline)
Start broad. Over fifteen games the Vipers play at a fast pace (101 possessions), post a top-five offensive rating (118) but a middling defensive rating (114), and take a heavy share of their shots from three. Already a picture forms: a high-octane, three-happy offense that can be outscored if you control the glass and the paint. The strategic question becomes where their good defense ends and their bad defense begins.
Step 2 — Find the weakness (test the hypothesis)
Hypothesis: their drop-coverage defense leaks efficient offense in the pick-and-roll. The Synergy data backs it — the Vipers allow 1.09 points per possession to pick-and-roll ball-handlers (bottom-third of the league), most of it as pull-up threes their center concedes by sitting deep in drop. The tracking data confirms their center's average contest distance is among the league's longest. The plan writes itself: hunt their center in ball-screens and take the pull-up three he gives you.
Step 3 — Personnel & matchups
Granularity wins games. The tracking matchup data shows their star wing is excellent on the ball but their backup guard is targeted relentlessly — a matchup to hunt by setting screens to switch him onto our best scorer. Individually: their leading scorer shoots far worse going left and over-helps off our corner shooter; both go on the scouting card. We also flag their best offensive rebounder to box out by committee, since second-chance points are how fast teams snowball.
Step 4 — Tendencies & situations
Finally, the situational edges: after timeouts they run the same two sideline-out-of-bounds sets; late in the shot clock they funnel everything to their star's isolation (a possession to load up on); and in transition — their most efficient offense — they leak out early, so getting back beats crashing the offensive glass against them.
The answer, in one line
"Slow the game down, attack their center in pick-and-roll, hunt their backup guard, and get back in transition rather than crashing the glass." Everything else in the report is evidence for that sentence. A staff can build a shootaround around it — which is the test of a good opponent scout.
4.3 Post-game Analysis: A Practical Case Study
The brief: "We lost by 6 to a team we should beat. The coach is frustrated. Was the performance actually bad?" This is where analytics most directly protects a team from over-reacting to a result.
Separate process from outcome
The scoreline says "bad night." The underlying numbers may say something different. Suppose we generated good shots and forced bad ones, but lost the game on the Four Factors margins — specifically a brutal three-point shooting night and a few extra turnovers. If our shot quality was strong (we got to the rim and the arc) and our defense forced contested mid-range and tough threes, the loss is largely a cold shooting night against a good process — variance that will regress, not a structural failure to overhaul.
The post-game checklist
- Offensive & defensive rating — points per 100 each way. Did we win the efficiency battle even in defeat?
- The Four Factors, both ways — which factor decided it? A loss driven by eFG% variance is very different from one driven by turnovers or the glass (both more controllable).
- Shot quality vs shot making — did we generate good looks and miss (variance), or settle for bad shots (a fixable process problem)?
- Lineup splits — did one unit get caved in? On/off for the game flags rotation issues (with the small-sample caveat from 1.2).
- Turnovers & transition defense — the most controllable, snowball-prone areas. Systemic, or a few loose-handle plays?
The discipline of not over-reacting
Outcome bias is the post-game analyst's chief enemy — and basketball's frequent scoring makes single-game variance huge, especially from three. A win can hide a poor process; a loss can mask a strong one. Report the process honestly, and know when the result genuinely did reflect a fixable problem (turnovers, transition defense, shot selection) rather than a cold night. One game is a tiny sample; trends across several are where truth lives.
4.4 Player Analysis: A Practical Case Study
The brief: "We have a roster need at backup wing and cap space for one. Here are three targets in budget. Who fits, and what are the risks?"
Step 1 — Define the role, not the position
"Backup wing" is too vague. From 2.5, define the role by function: we need a low-usage 3&D wing who spaces the floor for our high-usage stars, defends multiple positions, and doesn't need the ball. That becomes a target profile — high catch-and-shoot three-point volume and efficiency, strong defensive matchup data, low usage and turnover rate.
Step 2 — Compare like-for-like, in context
Now the discipline of 1.2 bites. Candidate B has eye-catching scoring numbers — but he posted them as a high-usage first option on a bad team, taking tough shots he won't get next to our stars, and his efficiency on catch-and-shoot threes (the shots he'd actually take here) is mediocre. Adjust for role, usage and team context before comparing. Percentile radars against positional peers make the styles legible at a glance.
Step 3 — Weigh fit, quality, risk and value
| Target | Role fit | Risk flags | Verdict |
|---|---|---|---|
| A | Excellent — true 3&D wing | Modest upside, age 29 | Best fit; strong value signing |
| B | Scorer, not a fit off-ball | Context-inflated stats, low-value shots | Tempting numbers, poor fit |
| C | Elite defense, no spacing | Negative shooting gravity | Only if we need pure defense |
The honest recommendation
"A is the cleanest fit and best value; B's scoring is a context-inflated mirage for our role; C only works if we're punting offense for defense in that slot." Notice the data didn't make the decision — it framed the trade-offs and stopped the front office over-rating B's empty-calories scoring. Pair every model output with film and live scouting before committing cap space.
4.5 From Data to Tactics: Building a Scouting Report
The scouting report is where the whole handbook converges: data types, metrics, tools and interpretation, compressed into a document that changes how a team plays the next game.
A report nobody reads is a failure regardless of its analytical quality. The craft is ruthless prioritisation — surfacing the two or three things that decide the game and burying the rest in an appendix. Structure beats volume; players remember three keys, not thirty.
Anatomy of a scouting report
1 · Keys to the game — the one-card answer
Three bullets a coach can hammer in the pre-game talk. "Attack their center in the pick-and-roll; get back in transition; box out their crasher by committee." If players remember nothing else, this is the report.
2 · Opponent offense
Pace, primary actions (pick-and-roll, spot-up, transition), shot diet, and the personnel who drive it — each claim backed by a Synergy clip or a tracking number, never an assertion alone.
3 · Opponent defense & coverages
Their ball-screen coverage (drop, switch, blitz), how they guard our actions, and where they leak efficiency — translated into where we attack.
4 · Personnel cards
One card per rotation player: tendencies, strong/weak hand, where they shoot it, how to guard them. The part players actually study.
5 · Situations & special teams
After-timeout sets, late-clock tendencies, inbounds plays, and transition behaviour — increasingly a decisive, coachable edge.
6 · Our game plan
It loops back to us: our coverages, our shot-selection keys, our matchups to hunt and to protect, our transition rules.
From insight to instruction
The final translation matters most. A data insight ("they allow 1.09 PPP to the pick-and-roll") is useless to a player until it becomes an instruction ("set the screen, turn the corner, and shoot the pull-up three their big gives you"). The best analysts live in this last mile: turning a probability into a behaviour the player will actually execute under pressure.
"The report is not the deliverable. The changed decision on the floor is the deliverable. Everything else is plumbing."The closing principle of basketball data analysis
Putting it all together
You now have the full arc: data (Part 1) is collected and made trustworthy; models (Part 2) turn it into meaning; tools (Part 3) make the workflow repeatable; and interpretation (Part 4) converts meaning into decisions. The fundamentals end here — but the craft is a lifetime. The fastest way to learn it is to pull one real game from nba_api and walk it through all four parts yourself.
Glossary & Sources
A quick-reference glossary of the metrics and terms used in this handbook, followed by the sources consulted.
Core metrics & terms
| Term | Meaning |
|---|---|
| Possession | The atomic unit of analysis; ≈ FGA − OREB + TOV + 0.44×FTA. |
| ORtg / DRtg | Points scored / allowed per 100 possessions. |
| Pace | Possessions per 48 minutes; neutralise it before comparing. |
| eFG% | (FGM + 0.5×3PM) / FGA — field-goal % crediting the three. |
| TS% | PTS / (2×(FGA + 0.44×FTA)) — total scoring efficiency. |
| xPPS | Expected points per shot = P(make) × point value. |
| Four Factors | eFG%, TOV%, ORB%, FTR — explain ~96% of wins (Oliver, 2004). |
| USG% | Share of team possessions a player uses while on court. |
| AST% / Reb% | Share of teammate FGs assisted / available rebounds grabbed. |
| RAPM | Regularized Adjusted Plus-Minus — ridge-regression impact estimate. |
| BPM / EPM / DARKO / PER | All-in-one impact metrics (box-only, box+PBP, predictive, original). |
| Shot quality | Tracking-based expected make rate of a look (selection, not skill). |
| Tracking data | All 10 players + ball, sampled ~25–60×/sec; the off-ball game. |
Sources & further reading
Basketball-Reference — stats glossary
Squared Statistics — Oliver's Four Factors
NBAstuffer — Plus-Minus & impact metrics
Dunks & Threes — metric comparison
NBA — Sony Hawk-Eye tracking partnership
ESPN — NBA adopts Hawk-Eye tracking
ShotQuality — shot probability models
nba_api (Python) — official stats wrapper
pbpstats — possession & lineup data