As of December 18, 2019, the dorukakan.com football site, available at dorukakan.com/football/football.php, is live.
Over the past several years, I have had several aborted attempts at creating a website. I’ve dabbled at creating personal portals, sketching out numerous designs on paper, and built a few simple utilities for activities like tracking book collections and rating whisky. It was never really my ambition to create a finished product. Instead, the process alone was sufficient: a pleasant diversion to maintain basic programming skills and explore a particular topic of interest.
Over those years, I have had several aborted attempts at creating this website. This particular website was first conceived when I pondered what a table of the Istanbul clubs would look like when only including the matches among the three. And so began many years of searching for data, then scraping and parsing, parsing and scraping, with no real end in sight. But, eventually, toward the beginning of the 2018/19 season, I was able to build enough momentum to move from basic prototyping to the goal of a production-ready site. In many ways, I got lucky with finding the data in the particular format I found it in – and that fortuity was just the necessary impetus the complete this initiative.
What then is this website? It is a collection of various statistics and information, metrics that are out of the ordinary, an exploration into the interesting trends of football data. It is, hopefully, a beginning as I plan on exploring more avenues in the data. What other ways can we evaluate goal-scorers? Is there anything predictive about these metrics or do they operate only as gems of curiosity (nothing wrong with that)? The original purpose and, now that the site is live, the enduring one is all about pondering a question and then diving into the data to see if answers are within.
The basic counting statistics are here; but also there is out-of-the-ordinary information like top goalscorers against the best six teams of the league or number of games with multiple dismissals. The data are found in three broad categories: tables, team statistics, and individual statistics. (Note: all tables currently assume three points per win and so there may be some historical discrepancy with eras that awarded two points for victory.) I have also developed a straightforward metric called the separation score, which measures the dominance or ease with a champion prevailed over the course of the season.
As things stand, the website is navigated by league and season; and the data is segregated by league. I may explore more cross-league in the future though this requires an additional layer of data alignment. Metrics and information can be viewed for a particular season, across all seasons for a league, and for the top seasons for a league. In many metrics, the user can view further details on matches or events.
The data is sourced from various websites, which I have done my best to collate and transform before combining. The usual challenges with data formats and consistent values applies here so, at times, the data integrity may be suspect. Currently, the data covers the following:
- Premier League – 1992/93 to 2018/19
Spanish La Liga – 2010/11 to 2018/19- Turkish Super Lig – 1959 to 2018/19
- German Bundesliga – 1963/64 to 2018/19
UEFA Champions League – 2010/11 – 2018/19UEFA Europa League – 2012/13 to 2018/19FA Cup – 2004/05 to 2018/19Spanish Copa Del Rey – 2010/11 to 2018/19
Here are some nuances in the data:
- Currently, all data sets end at 2018/19. I am not collecting 2019/20 data though I plan to do so at the end of the season.
- Some Turkish and German data may be Turkish language, such as country names (eventually I will convert them to English)
- The Turkish and German data do not include stoppage time or captain information.
- Premier League data includes captain information from 2009/10 on. It has stoppage time information from 2003/04 on.
- German data has assist information from 2013/14 on. The Turkish data is from 2011/12.
Here are some items that are on my enhancements list (beyond developing new metrics). I anticipate a fairly slow development cycle as updates will be infrequent and ad-hoc.
- Add location/city data for all Turkish and German venues.
- I have collected data for the remaining seasons of the Champions League and Europa League. At some point I need to transform it and load it.
- I have identified a source for Italian Serie A data. However, scraping and cleaning it will be time-consuming.
- I have referee data and hope to create some metrics or tables using it
- I have missed penalty data for the Turkish and German but not for the other leagues unfortunately. I may try to manually collect the information for the Premier League. I’m keen to explore different trends in penalties, including whether there are certain biases in how they are awarded.
- I also have manager data for the Turkish and German leagues but unfortunately not for the other leagues. There are possibilities for exploring, say, head-to-head manager rivalries.
- Behind-the-scenes refactoring. It’s pretty obvious that the programming for the site is fairly basic, using php. I don’t plan on doing anything so dramatic as migrating to another platform but I do have a few items of code cleanup, performance improvements, and configuration management that hopefully will make the experience better.
—- —-
UPDATES: February 2021
I’ve made some major updates to the site in the past couple of months.
- I removed all leagues except for the Premier League, Bundesliga, and Turkish Super Lig. All of the other leagues had incomplete data sets.
- Changed the tab structure
- Added a new tab for Charts and created three new charts
- Created a new category – Miscellaneous Statistics
- Creation of a couple dozen new metrics across categories
- Fixed data issues in Bundesliga and Super Lig data sets
- Fixed a dozen or so defects
UPDATES: August 2020
Over the past year or so, I’ve made a few enhancements to the site. Some recent updates include:
- Addition of English Premier League 2019/20 data
- Creation of 10-15 new metrics for teams, individuals, and duos
- Fixes of several issues affecting the All Seasons and Best Season view of metrics
- Addition of volatility metrics and a position progress chart
- Addition of a team-by-team results table
—- —-
If you have stumbled on this website, I hope you enjoy it and find it interesting. If you have questions, comments, or feedback, please feel free to reach out to me at doruk@dorukakan.com. Whether it’s ideas on interesting metrics, discovery of incorrect data, or anything else. I am also happy to share my data sets for serious inquirers.
.