Categories
data sport

Premier League Head-to-Head Records

I recently came across an article that explored the idea of the “nemesis” in football, i.e. that team against which a club performs the worst. It’s a concept that crosses the minds of many fans – in the context of existential angst when you lose yet another game to your closest rival; in the context of hubristic triumph, when you consider a game won before it’s played because of past history;  and in the context of casual fandom when television producers flash up some obscure statistic of past head-to-head encounters.

At the same time I found that article, I was building a new head-to-head exploration feature for my football site (it’s under the Tables section). The functionality displays a matrix showing head-to-head results using points, goals, and results metrics. I’d already built a rivalries page with historical data on the biggest derbies and the ability to explore past results for any match-up (also under the Tables section). So the idea of head-to-head match-ups was something I’d looked at casually before and the article helped me to think about other angles to explore in the data.

For this article here, I’ll show each Premier League side’s favorite opponent and most truculent adversary using average points per game as it’s what I’ve personally used to think about performance on a per-game basis. I’ve limited the analysis to match-ups where teams have played each other at least ten times, or at least five season of Premier League football together. That’s useful as it removes the perfect records that top teams have against minnows who have barely thrived in the top flight and allows for more interesting results. The tool on my site does not have any such limit so fans of Bournemouth, for example, can mosey on over there for more details.

Categories
data sport

Ranking the most exciting and dramatic Premier League seasons using data

Football seasons can live long in the memory for many reasons. Club supporters will rely on their own team’s successes or failures to measure each season. The broader football-watching and the sport’s historians will craft narratives that can be revisited time and again, narratives that true or not become the measure of a how a season is remembered as time softens the true memories of a football season long ago.

Over the past couple of years I’ve been collecting and collating a wide variety of football data. And I began to wonder: so much of data analysis in sports looks at individual and team performance but can we also measure seasons? Can we use data, instead of rose-tinged memories and Wikipedia entries, to objectively identify which seasons were the most exciting, which ones were the most dramatic?

Of course, first we need to try to measure excitement. I decided to look at three general dimensions: the title race, the relegation battle, and everything else. The following sections spend a decent amount of time exploring different metrics which, perhaps, lend themselves to measuring the inherent drama of a season. Unless otherwise noted, each metric covers the second half of the season instead of the full season. I’ve made a simple assumption that seasons are more dramatic and memorable in  more for what happens in the second half instead of through their full course.

It’s worth noting that, for the author, the purpose of writing this article was as much the exploration of the metrics and using them to uncover interesting trends from the past as it was with developing an actual ranking. I am by no means a mathematician nor a statistician, and there will perhaps a couple of areas where a data scientist may shake their head in bewilderment at certain methodologies. I certainly enjoyed diving into the metrics and learning more about Premier League history, especially the earlier years, and it is my fervent hope that some casual fan stumbles upon this article and finds some entertainment in it.

Categories
data sport

The Home Field Advantage in Soccer

After COVID-19 disrupted football, no more than it disrupted all of society and civilization across the world, and enforced a pause of several months, the games have recently restarted with the German Bundesliga the first of the major leagues to resume play. Games are played without fans, leading to a stark leveling of the home cooking used by teams to gain an advantage on their own territory. Playing at home confers a distinct advantage – everyone knows that. But the restart of football got me thinking about how much of an advantage is there really.

I draw upon a few data sets. I have match results for the Turkish Super Lig back the 1959/60 season, the German Bundesliga from the 1963/64 season, the full set of Premier League fixtures starting in 1992/93, and Spanish La Liga and Champions League results dating back to 2010/11. I have omitted details on the La Liga and UCL because the data set is small – and some fields have odd values that make me doubt the data integrity – but I have included them in the full file linked at the end. I also have events data (goals, bookings, penalties, etc.) for those seasons. Some of the earlier seasons of the Turkish and German leagues have missing data for certain event types so some seasons are omitted in the analyses.

First, I looked at the points advantage that home teams gain. After all, the long-term goal of every club is to accumulate points over the course of a season. I use a simple metric: net points gained per home game, calculated with simple arithmetic by subtracting the average points per game for away teams from the average points per game for home teams. At a very high level, across all of the data sets I looked at, recent trends show that home field confers an advantage of between 0.4 and 0.6 points per game. What is evident – more so in the German and Turkish data sets because of the larger data set – is that the home advantage has decreased over time. But it still exists and is significant.