The Art of Data Science: Chapter Seventeen
Featuring diophantine equations, coxcombs and Jorge Valdano!
Hello and welcome to yet another edition of the Art of Data Science newsletter. The aim is to bring it back to the beginning-of-month schedule, following last month's inordinate delay. And I plan to do it in a phased manner.
While I was in Bangalore in May, I offered my first beta session of my "data diagnostic" program. The aim of the exercise is to help a company that is not yet data-driven to understand how they can use data to improve their business, and "incorporate AI" into their everyday business. This one-day workshop is ideally suited for businesses that collect lots of data but don't yet use it in their day-to-day decision-making. If this interests you, do get in touch!
At the moment, I'm offering one "beta session" of this workshp for a UK based business. So if you, or someone you know, want to benefit from it, do let me know!
On to the usual programming now.
Diophantine equations!
Those of you who did non-standard math in high school might remember these equations, which are linear equations in multiple variables where we are only concerned about integer solutions. These equations are interesting because there exist many elegant ways to solve them, all of them using one or the other interesting things in number theory (primality, congruences, divisibility, etc.). And now it turns out Diophantine equations are useful in fraud detection as well!
The Economist has the full story (possible paywall), about a group of researchers at University of Illinois who have used Diophantine equations to list all possible (integer) data sets given mean, standard deviation and number of data points. This can be incredibly useful in testing research fraud, since the algo can be used to check if researchers have made up data. Given that replicability is a huge problem in empirical research, this method to verify that a dataset is not made up can have immense value!
Whoever claimed that elegant maths had no practical applications!
Anti stats
One of the delights of the just-concluded football World Cup is that former Argentina great Jorge Valdano (he of the "shit hanging from a stick" fame) started writing a column for the Guardian. It is unclear if he will continue writing after the World Cup, but his articles during the tournament were rather thought-provoking.
In one piece that is relevant to this newsletter, Valdano talks about the problems with statistics in football. "Of course the data helps but in the world of play, like in art, we have to put a limit on it because these are realms of freedom.", he writes.
Valdano might come across as a Luddite in this article, but there is significant insight there. For starters, football being a low-scoring game can make it rather random. Decisions made with small margins can have a huge impact. We saw in Sunday's final, for example, about how a marginal penalty decision completely turned the game. All conventional metrics, such as possession, passes and shots, were in Croatia's favour, but they lost by a comprehensive-looking 4-2.
Valdano also raises the point that "unqualified" metrics might lead to the wrong kind of insight - an observation that isn't limited to football alone. He writes:
Javier Mascherano was the player who touched the ball more times in a single game than anyone else in the group phase – more than 140 touches against Iceland. Did Mascherano play well? That’s another story. Because the players with the second- and third-most touches in that game were Argentina’s two centre-backs, a sign that the passes were routine, inoffensive, without purpose. They didn’t threaten the opponent, they didn’t break through lines.
But analysing football through numbers rather than letters seems to comfort specialists, as if it offered incontestable evidence and thus certainty. There are players who are unmarked but keep running, presumably so that the kilometre count doesn’t make them look bad.
We are moving to a world where we are finding metrics to measure everything, and putting a number to things gives an air of precision. However, sometimes these metrics lead us away from the real story, and thus only seek to mislead.
The solution is to take everything with the proverbial salt. When presented with numbers, always put them in context. Always normalise. Look for the patterns. Metrics devised for one context may not travel well to another.
AI Conferences
Kaiser Fung, who we have encountered multiple times in the past on this newsletter, attended an "AI day" conference, and has a few interesting insights from there. None of them are particularly surprising to me, such as the fact that a lot of people equate "AI" with deep learning, or that everyone has their own definition of AI, or that bias in AI is a common topic at such conferences.
Read the whole post. It is full of nice insights, and links.
On a related note, I wrote a short post about why AI will always be biased.
Cap Gemini on AI
You might be forgiven in thinking I love reading reports by consultants on Artificial Intelligence and Machine Learning, for I quote them so often in this newsletter. This time, it is by Cap Gemini, which has a report out on "The secret to winning customers' hearts with artificial intelligence". To their immense credit, they follow what some of us call the "reverse bollywood format", by putting their main conclusion up front. As the sub-heading of the Cap Gemini report says, "add human intelligence".
The report specifically deals with how companies can improve their customer service using artificial intelligence. As is common with such reports, it is full of interesting data, and horrible graphics (which sometimes make it hard to digest the said data).
In the middle of all this, one graph (Figure 10, on page 13 of the report) caught my eye. It is about how companies decide on where to deploy AI in customer interaction. At the top of the list are "RoI", "cost effectiveness" and "availability of data". Response to customer inputs, and impact on customer experience are towards the bottom of the list.
The reason this graph caught my eye is because based on my general observations it is similar to how companies in general try to approach the whole analytics / machine learning / data science thing. Rather than starting with the business problem and pain points, they focus on what can be done, and what they have the data for. In the latter, they're akin to the drunk searching for lost keys under the lamppost.
Instead, what can add value is a hypothesis-led approach, starting with a business problem (I have an atrocious looking presentation I made on the topic several years ago). Sometimes the data may not be available, but you can rest assured that the solution will add value.
Coming back to Cap Gemini, their main recommendation has value outside customer service as well. A combination of human and artificial intelligence can add significantly greater value than either human or artificial intelligence alone!
Chart of the edition - from Florence Nightingale
While Florence Nightingale is rather well known for her contributions to nursing during the Crimean War (aside: I can't think about the Crimean War without Iron Maiden's The Trooper playing in my head), she also made significant contributions to statistics during the war.
As this article details, she was responsible for painstaking data collection during the war, which helped diagnose the main cause of deaths. And apart from meticulously logging data in notebooks, she also made graphics - by hand. Her "coxcomb" may not pass modern tests of visualisation effectiveness, but we still see versions of the chart in contemporary reports (basically a bar chart drawn in a spiral manner). For her day and age, when few people collected data, and statistics wasn't yet a "thing", though, her work is truly commendable.
You can read more about Nightingale's graphics work here.
So that's about it for this edition. Do keep the feedback coming. And if you think someone you know might benefit from this newsletter, please share it with them!
Cheers
Karthik Shashidhar
Director
Bespoke Data Insights Limited
http://bespokedatainsights.com