New tool: ScatterPlotBot
tl;dr I made a new new new thing for producing scatter plots.
Last week, I tried to dig into team-level experience earned per match to look for differences between winning and losing teams in varying leagues. I ended up producing a large number of histograms like the following; they initially looked attractive and interesting, but ended up being baffling, inexplicable, and ultimately wrong:
I put a nice, subtle 'wrong' overlay here, so that no one gets misled.
To me, there appeared to be some sort of underlying multimodal distribution. However, I could not think of a compelling explanation for this in terms of actual gameplay. It turns out that doing so would be extremely difficult, since precisely no gameplay-related factors contributed to the hard-to-explain parts of that histogram.
What was it then?
It became clear when I plotted the data in a different way (as it often does).
Here is a scatter plot of xp/min vs. game length for similar data:
Well, that certainly looks a bit wacky. The apparent discontinuities immediately suggest that something strange is afoot. Here, I would expect to see some sort of amorphous blob of games, the hard horizontal lines are ugly and indicative of trouble. There is no gameplay reason for huge jumps in xp/min as matches cross certain thresholds of game length.
Given the specific way I am using Riot's data api, the experience data actually turns out to be a bit tough to get. Without downloading the full timeline for each match, experience earned is only reported in 5 and 10 minute increments. So, for certain game lengths, the last 5 or 10 minutes of experience earned is dropped (ignored), and thus the data I collected is not useful for this type of analysis.
So, I had to redact the second half of last weeks post. No big deal.
More interesting than these specific technical details is how I produced that scatter plot. So, I want to now discuss that.
The scatter plot above was produced with a brand new tool I built this week called ScatterPlotBot. I put the source code up on GitHub for you to check out, if that is your kind of thing. It is merely a quick start with a couple of good ideas, but I look forward to developing it further and using it for all kinds of fun scatter plots in future analyses.
Existing tools that I have traditionally used for doing plots fail under certain conditions with lots of data. The experience and gold data from last week's post was drawn from nearly two million matches, and with several data points per match, the number of points per plot goes well into the millions. Many general purpose tools are not geared for mult-million point datasets, so I saw an opportunity to develop my own.
The main ideas behind ScatterPlotBot are:
- Handle arbitrarily large datasets. Through clever projections, the program can handle huge datasets without doing too much work.
- Easy to configure and use in repeatable ways. ScatterPlotBot is a Windows console app that consumes data in easy-to-produce xml/line-oriented file formats. Automating running ScatterPlotBot is a breeze.
- Do not plot points that are not there. Many tools for making scatter plots plot small dots at each data point. These can overlap and blend in an attractive way, but I have shaded towards only plotting single pixels. That way, I can choose to plot only points that were actually found in the dataset.
If you are a programmer, analyst, or other interested party, head over to GitHub and check out the code. Otherwise, just chill, and look forward to many insightful and enlightening scatter plots in future LeagueMath.com posts.
So, yes. I spent the time I would have otherwise spent doing League of Legends math analysis building a new tool for making attractive scatter plots from millions of data points. Doing so helped me uncover a funny quirk in Riot's match data reporting that hosed my analysis from last week. I redacted a huge swath of last weeks post, and put the ScatterPlotBot on Github.
So, it is time to go home and scroll down a bit; dig in to some earlier post and actually learn something about League of Legends math. As ever, get at me on twitter to discuss any ideas you have about this or any post. Last but not least, do not forget the rss feed, which is the most reliable way to be notified about upcoming articles.
Cayley's theorem indicates that having a complete understanding of symmetric groups gives one a complete understanding of all groups. When I go to sleep I dream of the symmetries in a League of Legends match and see the hazy outlines of a mirage of such a complete understanding of League and math. Until we get there, we will have to sate ourselves with mere LeagueMath.com.
No F# expressions of higher-dimensional data projection were harmed in the making of this article.