Don’t be a Facebook! Data strategy versus the single point of failure

Even if you don’t use Facebook, WhatsApp, or Instagram, you’ll most likely know that all three services were utterly unavailable worldwide earlier recently, raising questions about the organization’s data strategy. So, of course, you would think that a global corporation the size of Facebook would not allow itself to have a single point of failure as part of a data strategy deliverable. Facebook’s explanation for the significant event was intriguing; they said it was due to a “misconfigured server” which impacted all three services. Furthermore, according to reports, the “misconfigured server” could not be handled remotely from the Facebook HQ; something else that a data strategy or a health check could have identified. So, technicians had to travel to the site, adding to the delay in restoring all three services. As if that was not enough, the company’s services were down again in a separate incident due to a “configuration change”. In other words, a single point of failure.

If this root cause analysis is accurate, it suggests that Facebook’s technical architecture is feebler than anyone could have ever surmised. How could this happen? More often than we like to think, unfortunately. Can data ever be free from our values?

At the time of writing, we had almost two years of disruption because governments had neglected to plan for pandemics efficiently on the assumption that they were unlikely. The outcome is a single point of failure, which has led to deaths and economic destruction. If this seems unfeasible, who would introduce a single point of failure into a system that an estimated two billion people access at least once every day? Sure, there would likely be a range of technical failover and protections in place to deal with points of failure. However, this outage revealed that Facebook has single points of failure in its network and that it does not have a trustworthy plan to recover from these issues quickly or without in-person intervention.

To put it simply, if that server goes down, Facebook stops. Why is this the case?

Our brains do not like considering negatives.

People often skim over the likelihood of events, and they do not always consider the long-term reputational, financial, legal, or even physical risks. Being part of the data world, at Data Relish, we look for single points of failure all the time. We also strive to encourage our customers to have an excellent plan in place. Also, being a cloud-first company, we do not have to visit sites to resolve issues.

Points of failure or points of complexity?

In our experience, many businesses may well operate in silos with data separated by division or functions. Points of failure are introduced when humans try to bunch them together in spreadsheets. Every day, many companies spend hours sifting through data and information to find something meaningful amongst silos of data. If robust data architectures appropriately managed the data and knowledge in the first place, it would save time and reduce points of failure.

Do you own your data, or does your data own you?

According to Domo, each of us is now producing 1.7Mb of data every second. So each minute you are working, that’s 6.12Gb. Of course, it’s not all going to land on your computer, but you can see how the data accumulate: a day’s worth of your emails will be several gigabytes.

How does your business manage that amount of data, and how to you reduce single points of failure while connecting the dots correctly? Do you rely on searching for the correlated material when the need arises? Do you simply pour data into a data lake, but refuse to look at your data’s reflection in the data lake because data puddles are easier to manage?

Human brains do not like thinking about scenarios that might go wrong. Whenever we are in business meetings, and someone says, “what if”, often the response is that it is “never gonna happen”. It’s the brain’s way of attempting to refocus our thought away from the negatives. Our brains are designed to protect us, but this bias diverts businesses away from recognizing our single points of failure.

What’s your businesses’ single point of failure?

Every business has a single point of failure. What’s yours? The recent Facebook example shows that it can often be something so simple that you just missed seeing it. It is comforting – if perhaps misled – to think that “it will never happen to us”. Similarly, with Facebook, it could be down to an arrogance of simply thinking we are experts in our area so that nothing will have escaped us.

Do you need a data strategy? Get in touch!

We care about the ‘what ifs’, regardless of how unlikely people think they might be. This thinking is different from what we saw at Facebook; they assumed that servers that size don’t break down. They believed that the company is full of technical wizards who can look after the unlikely outages and the likely ones. While that may be true, sometimes you need a diverse lens on your activities, and our data strategy will help you consider different angles.

Realistically, a single point of failure in your data estate is unlikely to be anything complicated. Instead, it is going to be something straightforward you will have missed it – unless you think about it. At Data Relish, our data strategy and our Healthcheck will cover these topics. So why not get in touch, and we can help you uncover your single points of failure – before they catch you out?

Leave a Reply