Survivorship Bias and the Hidden Lies in Your Data
What does data indicate?
Nowadays, data is considered the most valuable commodity in the world, taking over the crown from oil in the marketplace. As a result, companies invest immensely in technology and data science compared to other business sectors.
Then, how to handle such data becomes a challenge. Not only for data scientists for how to analyze it in an objective approach but also for customers about their privacy. In other words, data advancement is a double-edged sword. It can help businesses to optimize their services and target more accurately on problem-solving. But, still, it can also be a weapon for manipulation toward both customers and the company itself.
One of the biggest pitfalls for a data scientist is survivorship bias. When I was working in the Fast-Moving Consumer Goods (FMCG) industry, I found brand managers or category managers were selective on the data they used to present the brand performance. Of course, it was wrong, but they thought they were doing the right thing at that moment. This is how bias works. It attacks people without them realizing it.
The most representative example is a study conducted on aircraft during WWII. The United States armed forces faced a dilemma during the war because returning bomber planes were riddled with bullet holes and they needed better ways to protect them.
The army knew they needed armour to protect their planes, but the question was, "Where should they put it?" When they plotted out the damage these planes were incurring, it was spread out, but largely concentrated around the tail, body and wings.
But Abraham Wald, a statistician at the Statistical Research Group (SRG), made a glaring observation: the military would make a terrible mistake by upgrading the armour along these sections of the plane. Why? Because the military was only looking at the damage on returned planes. They hadn't factored in damage on planes that didn't return.
Planes that didn't return were the ones that sustained damage in ways not seen on returned planes: their engines. Unlike the body, tail, and wings, the engine was extremely vulnerable. Once hit there, planes went down, and they didn't make it back home to have their damage charted out.
So, how relevant is survivorship bias to our business? The most straightforward example is entrepreneurship. Many young people with good ideas look up to Amazon, Facebook and Tesla. They believe they can achieve if they put their ideas into a real business. However, most new businesses do not survive their first decade. All the success stories we have been told are rare cases. Warren Buffett illustrated this trap with a coin-flipping parable. His version, in short:
Imagine 225 million Americans waking up to call a coin flip each morning, with the losers dropping out. After ten days, roughly 220,000 have called ten flips in a row. After twenty days, 215 of them have turned a dollar into a million. By then they are writing books about their coin-flipping technique and being invited to speak at seminars. Now imagine the same exercise with 225 million orangutans. You would still end up with 215 smug winners. The 215 are real. The skill is not. That is survivorship bias.
Survivorship bias frequently happens in data analysis as well. For example, companies value customer feedback, which is very rightful to do as success is tied to customer relationships. When businesses get negative feedback, they're eager to dig in and figure out what went wrong. Studies have shown that the most vocal customers are the ones who'll express their feelings. Everyone else will either give companies another chance or just leave.
Consider this: most unhappy customers don't say anything. They just leave quietly. The small fraction who do complain are far more likely to stay when their problems are heard. Therefore, instead of focusing on your unhappy customers, look at the behaviours of your happiest customers as well.
That being said, how to avoid being duped by survivorship bias? Firstly, before making any judgment, my rule of thumb is to consider what you don't see. What is the limitation of the data representing? What is not involved in the data and the analysis? You probably will not find out the answer right away, but raising the awareness of knowing something we don't know is always a good start.
Secondly, we have to embrace openness and transparency, accept the fact both good things and bad can happen in the business. So ask your team members for their opinions and perspectives, including those who won't agree with you most of the time.
Thirdly, having a team from diverse backgrounds can sparkle more valuable insights than those more traditional. However, it is noteworthy that without openness and transparency, companies can't afford diversity. They will only get conflicts.
Last but not least, data scientists have to communicate with relevant stakeholders continuously. Data is more than numbers; they are numbers with meanings, which were indicated by different activities and events. Thus, a good business analyst won't sit in front of the computer all day but always asks the right questions to the right people to show the right picture of the data.