A Fundamentals Handbook

Understanding Basketball Through Data

From the first box score to a scouting report a coaching staff can run a game plan on — a practical, end-to-end guide to how modern basketball is measured, modelled and explained.

For analysts · scouts · coaches ~ 90-minute read Python & visualization included 4 parts · 20 chapters

Part 0 · Introduction

Why Basketball Analytics Matters

Basketball is a high-frequency, high-possession game where every trip down the floor is a decision with a measurable expected value. That combination makes it the team sport where good measurement most directly, and most quickly, changes how the game is played.

Unlike low-scoring invasion games, basketball gives you roughly 100 possessions per team per game — a large sample, every night. Points are frequent, the floor is small and well-bounded, and the same actions (a pick-and-roll, a closeout, a corner three) repeat thousands of times a season. That density is why basketball analytics moved from the fringe to the front office faster than almost any other sport: the data is rich, the patterns are stable, and the financial stakes of a single roster or shot-selection decision are enormous.

The defining insight of the analytics era is brutally simple and it reshaped the sport: not all shots are worth the same. A long two-pointer and a corner three are taken from almost the same spot, but one is worth 50% more when it goes in. Once teams started valuing shots by their expected points rather than counting them equally, the mid-range jumper collapsed and the three-point revolution began. Analytics didn't just describe basketball — it changed what the game looks like.

~100

possessions per team per game — a large, nightly sample

60 fps

optical tracking samples in the modern NBA feed

1.5×

a made three is worth 50% more than a made two

From efficiency to the three-point revolution

The intellectual foundation was laid by Dean Oliver's Basketball on Paper (2004), which reframed the game around possessions and efficiency rather than raw per-game totals. Scoring 110 points means little until you know how many possessions it took. From that single shift — measuring points per possession — flow nearly all modern metrics: offensive and defensive ratings, true shooting, the Four Factors, and ultimately the shot-value logic that pushed teams to the arc and the rim while abandoning the mid-range.

Who this handbook is for

Aspiring analysts

You will get the conceptual scaffolding and the practical Python/visualization workflow used in pro and college programs — enough to build a portfolio that gets you hired.

Scouts & front offices

You will learn how metrics become player profiles, why role and context matter, and how to fuse the eye test with impact data and on/off numbers.

Coaches

You will see how raw feeds become a scouting report you can act on — personnel tendencies, coverage plans, where to attack and where you're exposed.

The curious fan

You'll never watch a game the same way. TS%, usage, RAPM and shot quality will stop being broadcast jargon and start being lenses.

The promise & the limit

Data does not replace the coach's eye or the scout's instinct. It disciplines them — it tells you when intuition is fighting the evidence, sizes the sample you're reasoning from, and turns "he's a good shooter" into "he's in the 88th percentile on catch-and-shoot threes off the right wing." The best practitioners are bilingual: fluent in basketball and in data, and humble about both.

How to read this handbook

The four parts build on each other. Part 1 establishes what data is and how it's collected. Part 2 covers the advanced metrics and models that turn data into meaning. Part 3 is the hands-on toolkit — workflow, Python, visualization and video. Part 4 ties it together with realistic, worked case studies, ending in a full scouting report. Read it linearly or jump via the sidebar; cross-references are linked throughout.

Part 1 · Foundations & Data Types

1.1 The Role of a Basketball Data Analyst

"Basketball data analyst" is not one job. It's a family of roles that share a skill set but point at very different decisions — from the draft room to the film session to the trade deadline.

At its core the role is translation. On one side sits high-volume tracking and play-by-play data; on the other sit basketball people — coaches, GMs, scouts — who need a clear answer to a concrete question. The analyst turns "we have a season of tracking data" into "their drop coverage gives up the pull-up three; we should hunt their center in space." The value is never the chart. It is the decision the chart changes.

Role	Primary question	Typical employer
Advance / opponent scout	How does the next opponent play, and how do we beat them?	Coaching staff
Player personnel / draft analyst	Who should we draft, sign or trade for, and at what value?	Front office
Coaching / performance analyst	Is our offense/defense getting good possessions?	Coaching staff
Player development analyst	What should this player work on, and is it improving?	Development staff
Data scientist / engineer	How do we build the models and pipelines everyone uses?	Teams, vendors, media
Betting / quant analyst	Is the market price wrong, and by how much?	Sportsbooks, syndicates

What the job actually demands

Three competencies sit underneath all of these: basketball understanding (you must know why a number means something tactically), technical skill (querying, cleaning and modelling — typically Python, SQL and a BI tool), and communication (a brilliant model nobody acts on is worthless). Beginners over-index on the technical axis. In a team, a clear one-slide answer beats an elegant notebook every time.

Mindset

Think of yourself as decision-support, not a stats provider. Before any task, ask: "Whose decision am I trying to improve, and what would change their mind?" If you can't answer that, you're not ready to open the data.

1.2 Data Literacy and a Scientific Approach to Data

Data literacy isn't memorising metrics. It's the habit of asking where a number came from, what it can and cannot say, and how confident you should be — the scientific method applied to a basketball court.

Question
Start with a basketball question, not a dataset. "Are we generating efficient shots in the half-court?" is researchable. "Let's look at the data" is not.
Hypothesis
State what you expect and why. "I think our poor offense is a shot-selection problem, not a talent problem." A hypothesis you can be wrong about is the engine of good analysis.
Evidence
Gather the right data at the right grain. Match the metric to the question and be honest about sample size.
Test
Compare against a baseline — league average, the opponent, your own season. A number without a reference point is meaningless.
Communicate
Deliver the answer in the language of the decision-maker, with the uncertainty attached.

Three habits that separate professionals from dashboards

Per-possession, not per-game. Basketball's foundational adjustment is pace. A team scoring 115 points isn't necessarily good — if it plays fast, that may be below average per possession. Always think in points per 100 possessions, not per game. This one habit prevents more bad conclusions than any other.

Sample size and stabilization. Different stats stabilize at different rates. Usage and shot volume settle quickly; three-point percentage is famously noisy and needs a large sample before you trust it. A hot week of shooting is usually variance, not a new skill. Professionals distrust extreme rates from small samples and favour multi-game windows.

On/off and context. A lineup that outscores opponents by 15 per 100 might be carried by a teammate, a weak bench it faces, or pure noise over 80 minutes. Raw on/off is a starting point, not a conclusion — it must be adjusted for who else is on the floor (the whole motivation for RAPM in 2.5).

Common analytical traps

Per-game illusion — ignoring pace. Small-sample shooting — over-reading a few good/bad nights from three. Outcome bias — calling a good process bad because the shots didn't fall. Counting-stat worship — rewarding volume (points, rebounds) without efficiency or context.

1.3 Data Types and Data Collection Systems

Everything downstream — every metric, model and report — is constrained by the data underneath it. Knowing the layers of basketball data, and their blind spots, is the single most useful piece of literacy in the field.

The four layers of basketball data

1 · Box-score data

The traditional line: points, rebounds, assists, steals, FG%. Universal and free, but context-poor — it can't tell a contested shot from an open one, or a hockey assist from a kick-out.

2 · Play-by-play (PBP) data

Every game event in sequence with score, clock and players on court. The basis for lineup analysis, on/off, pace and possession-level ratings. The workhorse of public analytics.

3 · Tracking data

The position of all 10 players and the ball, sampled ~25–60 times per second. Captures the off-ball game — spacing, screens, closeouts, defensive matchups, shot contests.

4 · Biometric / load data

From wearables and force plates: workload, jump load, fatigue, sleep. The fuel for sports-science and availability decisions.

Play-by-play and the possession

PBP data is where most public analysis lives. Each row is an event — a shot, foul, rebound, substitution — with a timestamp, the score, and (in good feeds) the five players on the floor for each team. From this you reconstruct possessions, the atomic unit of the modern game, and compute offensive/defensive ratings, pace, lineup splits and on/off. It is rich enough for most metrics but blind to anything off the ball: it records that a shot was missed, not that a perfect closeout forced it.

Estimating possessions from the box score POSS ≈ FGA − OREB + TOV + 0.44 × FTA

0.44 approximates how many free-throw trips end a possession (and-ones, technicals, etc.)

Tracking data and the off-ball game

Tracking data turns basketball into geometry. With every player's coordinates dozens of times a second, you can measure how open a shooter was, how far the nearest defender stood, how a screen created separation, who actually guarded whom, and how much rim deterrence a center provides. It powers the most advanced models in the sport — shot quality, defensive matchup data, off-ball gravity and spacing analysis.

The NBA's collection history is the reference timeline for the field:

SportVU (STATS) introduced optical tracking in 2009, going league-wide by the 2013–14 season — the first major US league to track every game.
Second Spectrum became the official provider from 2017 to 2023, using a "center of mass" single-point model and adding the AI/visualization layer.
Since the 2023–24 season, Sony's Hawk-Eye system provides raw 3D optical tracking at 60 fps with 29 pose points per player; Sportradar distributes the data and Second Spectrum still supplies AI analysis and broadcast augmentation. The WNBA uses Second Spectrum/Genius Sports tracking.

The data & tools landscape

For learners and pros alike, a layered ecosystem has grown up around this data. Basketball-Reference and the official stats.nba.com are the canonical public sources; the nba_api Python package wraps the NBA stats endpoints (including tracking-derived stats and shot locations). pbpstats and PBP Stats serve possession-level and lineup data; Cleaning the Glass offers garbage-time-filtered team and player data prized by analysts. On the scouting side, Synergy Sports (play-type breakdowns synced to video) and Hudl/InStat are the professional standards.

Source	Strength	Data type
stats.nba.com / nba_api	Official stats incl. tracking & shot locations	Box, PBP, tracking
Basketball-Reference	Best free historical box & advanced stats	Box, advanced
pbpstats / Cleaning the Glass	Possession, lineup & on/off data	PBP, lineups
Synergy Sports	Play-type efficiency synced to video	Tagged events + video
Hawk-Eye / Second Spectrum	Raw optical tracking (team-side)	Tracking

Key takeaway

Match the data type to the question. "How efficient was our offense?" → PBP and ratings. "Was that a good shot?" → tracking-based shot quality. "Who actually guarded the star?" → tracking matchup data. "Is the player overworked?" → biometrics. Most mistakes come from forcing one data layer to answer a question it cannot see.

1.4 A Day in the Life of a Professional Basketball Data Analyst

The job runs on a relentless schedule — an 82-game season means a game every other night and an opponent always looming. Most of the work is preparation for a film session and a game plan that have to land in hours, not weeks.

Below is a representative day for an advance scout / coaching analyst the day before a game.