One of the big challenges that any brand faces is how to leverage insights from the huge amounts of data they now typically have at their disposal. The promise of big data is, of course, compelling; McKinsey, for example, has estimated that a retailer using big data can potentially increase its margin by more than 60%.
So, a lot is invested in the ability of data scientists to seek out this value but how well equipped are they to deliver this? Research suggests that the act of leveraging ROI from the big data investment is easier said than done. A survey by Capgemini among senior business decision makers identified that only 27% considered their data initiatives to be successful. In another study, Gartner predicted that 60% of data projects will fail to go beyond piloting and experimentation and will be abandoned.the reality is that we have our subjective fingerprints all over the data analytics.
Before we discuss why, it is worth saying something about why we cannot simply rely on statistics to deliver the insight. This is because those working with data have long recognised that given a large enough sample size, most data points will have statistically significant correlations. This has famously been labelled the ‘Crud Factor’ – thinking there are real relationships in the data where, in fact, the linkage is trivial.
Nate Silver made the same point when he warned that the number of ‘meaningful relationships’ is not increasing in step with the meteoric increase in the amount of data available. We simply generate a larger number of false positives, an issue endemic in data analytics, which led John Ioannidis to suggest that two-thirds of the findings in medical journals were in fact not robust. Ioannidis has recently suggested that this is “more likely to happen in fields that chase subtle, complex phenomena” – which perhaps encapsulates much of the modern marketing agenda.
So if we cannot necessarily rely on statistical techniques to cut through swathes of data to find meaningful patterns then where do we turn? Naturally, we look to ourselves. But as soon as we recognise the human involvement in the analysis of data, we need to start exploring some of the frailties of our judgement. For if there is one thing that behavioural economics has taught us, is that none of us are immune from misinterpreting data.
One cognitive function that is surely critical for any data scientist is the ability to find order and spot patterns in data. As humans, we are excellent at doing this, for which there are good evolutionary reasons – our ability to do so is what drives new findings and thus our advancement. But as with all cognitive functions, our strength is also a weakness, as this ability is so integral to ourselves that it can tip over into detecting patterns when in fact none exist.
One of the problems we encounter when looking at data is that it’s very hard for us to see when it is random and when there are in fact patterns. When we see a truly random sequence, we tend to think there are patterns in the data because it somehow looks too ordered or ‘lumpy’. So, for example, when we flip a coin twenty times, there is a 50% chance of getting 4 heads in a row, a 25 percent chance of five in a row, and a 10 percent chance of a run of six. But if you give this sequence to most individuals they will consider that these were patterns in the data and not at all random. This explains the ‘hot hand’ fallacy where we think we are on a winning streak – in whatever that may be – from cards to basketball to football. In each of these areas the data is random, but happens to include a sequence that we massively over-interpret the importance of.
This effect does not only apply to numeric data but also to the analysis of visuals. This is important, as visualisation is rapidly becoming a key element of big data analytics. A good example of the pitfalls is from the latter part of World War II when the Germans had a particularly intense bombing campaign of London. It was a commonly held view at the time that the bombs were landing in clusters that made some parts of London more dangerous than others. However, after the war analysis of the data showed that the bombs had, in fact, landed in a random sequence and no part of London was more dangerous than another.
It is easy to see why Londoners had, at the time, concluded there was a sequence in the bombing, as eyeballing the data retrospectively easily allows one to start seeing patterns. A more rigorous approach, of course, requires us to generate hypotheses that then are tested on other sets of data.
And of course once we start seeing patterns we quickly begin to create stories that would explain data. As Duncan Watts might say, everything is obvious when you know why. We hate uncertainty and strive to reduce this by quickly adopting explanations. The challenge we then face is that we tend to seek out only information that is consistent with the story we have developed. This ‘confirmation bias’ was first identified in a series of experiments in the 1960’s which showed that we seek out data which confirms our theory rather than test it. And when we get new information we tend to interpret it in a way that is self-serving. So we cement in our misinterpretation of data alarmingly quickly.
So where does this leave the challenge of finding value from big data? First, we must get over the idea that ‘data science’ is free from subjectivity; frankly no discipline is. Second, once we have recognised this, we can educate analysts to spot when they are making subjective judgements and provide education on the ways to mitigate the potential pitfalls. Third, we can train analysts to examine the reasoning behind the way they view the world, so they can have a greater awareness of the way their mindsets shape the way they approach their discipline. Once we are aware of the way we see the world (which is not always explicit), we can then choose whether it is one we want to use in the way we look at data.
It is easy to ignore these challenges, as we all want to assume that the way we approach data is highly objective. But unless we address these issues head on, we run a very real risk of failing to capitalise on the much-promised opportunity that big data still needs to deliver.