A Fundamentals Handbook

Understanding Football Through Data

From the first event log to a tactical report a head coach can act on — a practical, end-to-end guide to how modern football is measured, modelled and explained.

For analysts · scouts · coaches ~ 90-minute read Python & visualization included 4 parts · 20 chapters

Part 0 · Introduction

Why Football Analytics Matters

Football is a low-scoring, fluid, invasion game played by twenty-two people making thousands of decisions over ninety chaotic minutes. That combination makes it both the hardest team sport to quantify and the one where good measurement pays the largest dividend.

For most of the sport's history, the scoreboard was the only number that mattered, and it lied constantly. A team could dominate, hit the post three times, concede a deflected goal and "lose 1–0" — a result the raw scoreline records as a deserved defeat. Analytics exists to close the gap between what happened and what was likely to happen, so that clubs make decisions on the underlying process rather than on noisy, small-sample outcomes.

The reason this matters so much in football specifically is the goal. Goals are rare — roughly 2.7 per match across Europe's top leagues — which means a single fortunate or unfortunate event can swing a result, and a season is short enough (typically 34–38 league games) that luck does not fully wash out. A striker who scores 15 from 12 expected goals looks elite; the data tells you to expect regression. A pressing system that concedes few shots but loses on the day is, over time, very likely sound. Analytics is the discipline of separating signal (repeatable skill and tactics) from noise (variance you cannot bank on).

~2.7

goals per match in elite leagues — why variance dominates

3,400+

tagged events in a single modern match feed

25 Hz

positional samples per second in optical tracking

From Moneyball to the modern recruitment department

The cultural reference point is baseball's Moneyball, but football's analytics revolution arrived later and looks different. Baseball is a sequence of discrete one-on-one events that decompose cleanly into individual statistics. Football is continuous and deeply interdependent: a full-back's overlap only "works" because of where the winger and the opposition's defensive line are standing. The field had to invent new tools — expected goals, possession-value models, tracking-data physics — precisely because borrowed baseball thinking did not transfer. Today, every elite club runs a data department, recruitment is routinely informed by models, and opposition analysis is a blend of video and numbers.

Who this handbook is for

Aspiring analysts

You will get the conceptual scaffolding and the practical Python/visualization workflow used in professional environments — enough to build a portfolio that gets you hired.

Scouts & recruiters

You will learn how metrics translate into player profiles, why context (league, role, minutes) is everything, and how to combine the eye test with the spreadsheet.

Coaches

You will see how raw feeds become a tactical report you can act on — opponent pressing triggers, set-piece tendencies, where to attack and where you are exposed.

The curious fan

You will never watch a match the same way again. xG, PPDA and pitch control will stop being broadcast jargon and start being lenses.

The promise & the limit

Data does not replace the coach's eye, the scout's instinct or the manager's feel for a dressing room. It disciplines them — it tells you when your intuition is fighting the evidence, sizes the sample you are reasoning from, and turns "I think" into "here is how strongly, and how confident we should be." The best practitioners are bilingual: fluent in football and in data, and humble about both.

How to read this handbook

The four parts build on each other. Part 1 establishes what data is and how it is collected. Part 2 covers the advanced metrics and models that turn data into meaning. Part 3 is the hands-on toolkit — workflow, Python, visualization and video. Part 4 ties it together with realistic, worked case studies, ending in a full tactical report. You can read linearly or jump via the sidebar; cross-references are linked throughout.

Part 1 · Foundations & Data Types

1.1 The Role of a Football Data Analyst

"Football data analyst" is not one job. It is a family of roles that share a skill set but point at very different decisions — from the recruitment meeting to the training-ground whiteboard to the betting trading desk.

At its core the role is translation. On one side sits messy, high-volume data; on the other sit football people — coaches, sporting directors, scouts — who need a clear answer to a concrete question. The analyst turns "we have 38 matches of event data" into "their right-back steps out to press and leaves space in behind; we should target it with early diagonals." The value is never the chart. It is the decision the chart changes.

The main flavours of the role

Role	Primary question	Typical employer
Recruitment / scouting analyst	Who should we sign, and what is fair value?	Clubs, agencies, data firms
Opposition analyst	How does the next opponent play, and how do we beat them?	First-team staff
Performance / post-match analyst	Did our process work, regardless of the result?	First-team staff
Set-piece analyst	Where are the marginal goals in dead-ball situations?	Clubs (a fast-growing specialism)
Data scientist / engineer	How do we build the models and pipelines everyone else uses?	Clubs, data vendors, betting
Trading / quant analyst	Is the market price wrong, and by how much?	Betting syndicates, bookmakers

What the job actually demands

Three competencies sit underneath all of these, and the best analysts are strong in each: football understanding (you must know why a stat means something tactically), technical skill (querying, cleaning and modelling data — typically in Python, SQL and a BI/visualization tool), and communication (a brilliant model that nobody acts on is worthless). A common beginner mistake is to over-index on the technical axis. In a club, a clear one-slide answer beats an elegant notebook every time.

Mindset

Think of yourself as a decision-support function, not a stats provider. Before any piece of work, ask: "Whose decision am I trying to improve, and what would change their mind?" If you cannot answer that, you are not ready to open the data.

1.2 Data Literacy and a Scientific Approach to Data

Data literacy is not about memorising metrics. It is the habit of asking where a number came from, what it can and cannot say, and how confident you should be — the scientific method applied to a football pitch.

The analytical loop

Question
Start with a football question, not a dataset. "Are we creating enough high-quality chances from open play?" is researchable. "Let's look at the data" is not.
Hypothesis
State what you expect and why. "I think our low chance volume is a build-up problem, not a finishing problem." A hypothesis you can be wrong about is the engine of good analysis.
Evidence
Gather the right data at the right grain. Match the metric to the question and be honest about sample size.
Test
Compare against a baseline — league average, the opponent, your own past form. A number without a reference point is meaningless.
Communicate
Deliver the answer in the language of the decision-maker, with the uncertainty attached.

Three habits that separate professionals from dashboards

Always ask about sample size. Football's low event-rate means most single-match numbers are noise. One game of xG tells you almost nothing; ten games starts to. Per-90 rates from 200 minutes of play are dangerous. The professional instinct is to distrust extreme values from small samples and to favour rolling windows.

Correlation, causation and confounders. Teams that press high tend to win more — but elite teams both press high and have better players, so the press is partly a marker of quality, not only a cause of it. A scientific analyst names the confounder before drawing a conclusion.

Distributions over averages. "Average shot distance: 18m" hides whether a team takes many close-range chances and a few hopeful long-rangers, or a steady stream of 18m efforts. The shape of the data usually carries the tactical story; the mean often erases it.

Common analytical traps

Survivorship bias — judging a recruitment model only on the players you signed. Outcome bias — calling a good process bad because it lost on the day. Cherry-picking the window — choosing the date range that flatters your point. Metric overfitting — inventing a stat that "explains" last season but predicts nothing about the next.

1.3 Data Types and Data Collection Systems

Everything downstream — every metric, model and report — is constrained by the data underneath it. Knowing the four major data types, and their blind spots, is the single most useful piece of literacy in the field.

The four layers of football data

1 · Box-score data

The traditional match summary: goals, shots, possession %, passes, cards. Cheap, universal, and almost free of context. Useful for a first glance, misleading on its own.

2 · Event data

Every on-the-ball action, time-stamped and located on the pitch: passes, shots, tackles, carries, with x, y coordinates and rich attributes. 3,400+ events per match. The workhorse of modern analysis.

3 · Tracking data

The position of all 22 players (and the ball) sampled ~25 times per second. Captures the 98% of the game that happens off the ball — shape, space, runs, pressing distances.

4 · Physical / GPS data

Distance covered, sprints, accelerations, high-speed running — from wearables in training and, via computer vision, from match broadcast. The fuel for load management and fitness.

Event data, in detail

Event data is what most analysts spend their day in. A provider's operators (or, increasingly, machine-vision systems with human verification) tag every on-the-ball action. A single shot event might carry: the player, the team, the minute and second, the x/y location, the body part, the play pattern (open play, corner, fast break), whether it was a "big chance," and — crucially for modelling — the position of every other player at the moment of the shot (a freeze-frame) in the richest feeds.

A single event record (simplified) { id: 5821, minute: 63, second: 14, team: "Home",
  player: "A. Rossi", type: "Shot", x: 88.5, y: 41.2,
  body_part: "Right", play_pattern: "From Corner",
  outcome: "Saved", xg: 0.137, freeze_frame: [...] }

Its limitation is right there in the name: event data only records moments when the ball is touched. It is blind to the run a striker made that dragged a centre-back away, or the cover-shadow a midfielder cast to block a passing lane. For that, you need tracking.

Tracking data and the off-ball game

Tracking data turns football into geometry. With 22 player coordinates ten to twenty-five times a second you can compute distances between lines, the compactness of a block, how quickly a press collapses space, who is free, and which zones a team controls. It powers the most advanced models in the sport — pitch control, off-ball value, defensive line-height analysis. Two collection methods dominate:

Optical / in-stadium — fixed multi-camera rigs track every player. Highest accuracy, but requires installation and is club-specific.
Broadcast computer vision — companies such as SkillCorner extract positional and physical data straight from the TV feed, with no sensors required. This democratised tracking: you can now get running and speed numbers for opponents you will never install hardware for. The trade-off is occasional gaps (players off-camera) and slightly lower precision.

The provider landscape (2025–26)

The market consolidated sharply. Hudl acquired Wyscout, then InStat, and most recently StatsBomb, bundling video scouting, event data and player-location data. Opta (Stats Perform) remains the data that powers much of the industry and broadcast. SkillCorner leads broadcast-derived tracking and physical data. IMPECT, known for its packing data, was acquired by Catapult in late 2025. For learning and portfolios, FBref (free, Opta/Stats Perform-powered) and StatsBomb's free Open Data are the standard entry points.

Provider	Strength	Data type
Hudl StatsBomb	Deep event data + freeze-frames, analytics platform	Event + location
Opta / Stats Perform	Coverage, broadcast, established xG	Event
Hudl Wyscout	Largest video library, day-to-day scouting	Video + event
SkillCorner	Broadcast tracking & physical data	Tracking + physical
FBref (free)	Best free starting point for learners	Aggregated event

Key takeaway

Match the data type to the question. Asking "how good was that chance?" → event data with xG. Asking "why was nobody marking him?" → tracking data. Asking "are our players cooked by minute 70?" → physical/GPS. Most mistakes in football analytics come from forcing one data type to answer a question it cannot see.

1.4 A Day in the Life of a Professional Football Data Analyst

The job is far less glamorous and far more rhythmic than outsiders imagine. It runs on the fixture calendar, and most of it is preparation for a forty-five-minute conversation that has to land.

Below is a representative match-week day for a club opposition analyst two days before a fixture (often called "MD-2," matchday minus two).