April 11, 2024

Revisiting Cambridge Analytica in 2024

The Cambridge Analytica Data Scandal was big news when it first came to light. It led to the collapse of the company, court cases and massive fines for Meta. It also highlighted the massive impact, for better or worse, that technology was having on society, politics and the democratic process. Now, almost a decade later, we take a look at how a poorly configured API was at the center of the scandal.

The Cambridge Analytica Case

In 2010, developers created an app called “This is Your Digital Life,” on Facebook which around 270,000 users took to develop personalized psychological profiles. In 2013, British consulting firm Cambridge Analytica used the app to harvest millions of users’ data for use in political campaigns.

Cambridge Analytica was able to do this thanks to Facebook’s Open Graph Platform. Unbeknownst to Facebook, the platform had an overly permissive API with a crucial flaw which allowed Cambridge Analytica to collect data on not only the people who had downloaded the app, but also their friends.

When the breach finally came to light in 2016, the firm claimed to have only accessed around 30 million users’ information.

“Facebook believes the data of up to 87 million people was improperly shared with the political consultancy Cambridge Analytica - many more than previously disclosed.” -BBC News

And they shared this data with the campaigns of Ted Cruz and Donald Trump, as well as (allegedly, but in all likelihood) Brexit misinformation campaigns.

The breach was discovered due to whistle-blower Christopher Wylie who left Cambridge Analytica in 2014 and disclosed the breach in 2016. Professor David Carroll then took it a step further and challenged Cambridge Analytica in court, resulting in a criminal conviction and publicizing the issue. Without these sources, the public may have never learned about the abuse of their data.

The Cambridge Analytica case was one of the earliest known API data breaches, before much was known about APIs and the data access that they enable, much less about API security, and it went down in history for raising public awareness about data privacy. As usual though, not much attention was paid to the actual technology behind the breach.

A full technical and criminal accounting never took place.Cambridge Analytica filed for bankruptcy in 2018 without ever disclosing the full extent of the data extracted or a full list of their clients and how they used the data.

Meta was fined $725 million for their laissez-faire attitude toward data management, but no real changes have been made, as we may still be seeing them abuse data today in 2024- but we’ll get to that later in the post.

What did we learn from Cambridge Analytica?

Technical response

The Cambridge Analytica breach taught us that GraphQL makes it easy for developers and applications to access data. But that also means that GraphQL has some inherent risks around data availability, as there are schema explorers easily available that can help developers find the data schema, analyze and extract data. APIs also allow software to easily correlate data - in this case, the user profiles and quiz responses - at a massive scale.

GraphQL, an emerging API technology model that Facebook had been using and promoting since 2012, is still used today, and in fact, its use has grown outside Facebook as well, becoming a top 3 API architecture in 2023. And Facebook is far from the only culprit of careless data handling in an era where foreign adversaries are more interested in data than ever.

Societal response

Now more than ever, there is increased emphasis on the value of data. It’s seen as the new “oil.” The Cambridge Analytica breach was one of the first to bring attention to the value of data privacy. However, the responses to the incident focused primarily on the companies, and not on their technologies.

The use of big data in political campaigns is more prevalent than ever, especially with record number of elections happening in 2024. A large percentage of the world’s population will be hitting the polls this year, from the United States to India, Indonesia, the European Union, the United Kingdom, Bangladesh, Mexico, Pakistan, South Africa and many more.

Following the Cambridge Analytica breach, there was increased awareness of data privacy around the world. In 2016, the European Union launched the General Data Protection Act (GDPR) with huge fines for data being used for any purpose other than the reason it was originally collected.

However, the GDPR only affects a small percentage of the world, and people in most areas are still vulnerable to having their data abused with little to no legal repercussions. And GraphQL is only continuing to grow, leading to additional abuse of data from Facebook.

Deja vu? Facebook and Netflix’s Data Sharing

In 2018, Facebook launched Facebook Watch, an original streaming service set to rival the likes of video giants Netflix and Hulu, with a multi-billion dollar budget to match. But almost immediately after its conception, the budget was slashed, and just a few years later in 2023, the site announced it would no longer produce content. Now, it is virtually obsolete. So what happened?

Following the paper trail of letters and complaints from Meta customers points to one simple conclusion: Meta crushed its own streaming service in order to make room for Netflix, one of its top-paying ad customers, and worse, they had allowed Netflix gain access to private information such as user messages via secret API agreements.

According to the letters, before Facebook Watch was even conceptualized, Facebook and Netflix entered a series of “Extended API Agreements” in 2013. These agreements allegedly included an “Inbox API” agreement, through which Netflix could view user messages in exchange for data reports it would send back to Facebook (source).

The litigation included claims that Facebook shared user message data with Spotify, as well. Of course, Facebook denies all of this. In 2018, they announced that they would be encrypting messages. But as long as GraphQL grows, it’s not far-fetched to say your data could easily be compromised via API abuse.

Takeaways

Social media platforms…

Since its birth in the early 2000’s, social media has had a profound impact on modern society, connecting people across the world and creating digital communities. Social media facilitates the spread of information and fosters innovation, with application platforms that reach hundreds of millions of people. From promoting social change to technical development, social media is a powerful tool in the right hands.

However, with all these benefits that social media brings come real risks regarding data handling and safety. The Cambridge Analytica case is one of the most well-known, however, it is far from the only or even the biggest breach in recent history. Other notable incidents, such as Myspace in 2016, and LinkedIn in 2021 emphasize the necessity for better cybersecurity across social media platforms.

Data Security

In order to maintain secure APIs, close attention must be paid to all data, especially the data that is being returned to the external users of an API, whether legitimately or illegitimately. However, to this day, there remains a big blind spot around APIs.

This blind spot is growing with the rise of cloud platforms and new API technologies that don’t enforce strong contracts around data exchange. Tracking the data that goes in and out of an API is particularly challenging because API logging is not a “default” thing.

APIs can run on many compute platforms, all of which create different types of logs, with different levels of data visibility and different default log destinations. Therefore, in order to track the data returned to the external users, you need logs that capture the response payload, which most systems do not log.

Society and regulation

And unfortunately, there remain to be minimal regulatory barriers preventing this type of data abuse and collection. GDPR is one, but it applies to less than 10% of the world’s population, leaving most others vulnerable to data breaches and exploitation.

All these issues about data safety bring up the following questions: are large platforms disclosing all the information that the public needs to assess their data handling processes? And what do we, as humans, think about the role of data and data manipulation in the upcoming elections?

A decade later, Cambridge Analytica is still throwing up as many questions as answers.