Back to Blogs
TOWARDS A PRIVACY-PRESERVING AND DECENTRALISED METHOD OF DATA COLLABORATION
By Edison Lim, Head of Engineering, Aqilliz
Published on April 07, 2020
Digital transformation has brought about many advancements since the dawn of the internet. As governments and businesses transitioned from paper to pixels, so came the operational risks of transitioning fully to digital. Historically, you’ll find that some of the most notable data breaches were recorded from 2005 onwards. That year, the very first data breach of a major shoe retailer in the United States was reported, revealing over 1 million consumer records. Since then, we’ve come a long way. In fact, in the first six months of 2019, we saw over 3,800 publicly disclosed breaches, exposing 4.1 billion records.
With the many hundreds of thousands (or perhaps even millions) of data points shared with companies on a given day, data has certainly proven itself as the lifeblood of every industry as it powers the digital economy as we know it today. However, amid its abundant collection, use, and implementation, it’s clearly not without its risks.
In recent years, we’ve seen promising advancements in cryptography and distributed systems and we think that two core technologies—differential privacy and federated learning—could guide the industry in the right direction for the collaborative enrichment of customer profiles. We believe that privacy is a fundamental human right, and this right is reflected as a design philosophy in all our products.
In this blogpost, we will unpack the mechanics of the Aqilliz protocol which is designed to allow organisations to work together and achieve the same outcomes as they create better customer profiles without sharing their data with other companies or third-parties.
Our Approach
Imagine a hypothetical scenario where three organisations want to use the Aqilliz protocol to collaborate to enrich their customer profile. Here’s how this situation would play out and which players would be involved:
Data Nodes and Data Sources: Organisations who want to collaborate on data sharing can join a network within Aqilliz’s protocol and host a data node—a database that comprises a privacy-preserving layer which uses differential privacy to ensure the confidentiality of information. This data node would then procure data from their respective original (“native”) data sources and these sources are linked to an individual.
Pre-Processing and Pseudonymisation: As part of a pre-processing phase, data will first be processed into an export-ready version. This involves pseudonymising data that identifies a specific individual.
For example, if data is stored natively as a birthday but the network only requires a generic age group demographics, the birthday will be parsed into an approximate age band and prepared for processing. This pre-processing step ensures that personal data can no longer be attributed to a specific data subject without the use of additional information.
This pseudonymisation process may involve replacing names and other identifiers that are easily associated with individuals by encryption or cryptographic hashing and tokenising other attributes such as salary into a more generic form that cannot expose an individual. This “exported” data is then synchronised to a data node which is still in the control of the data owner.
However, there are still potential risks for re-identification despite this pre-processing phase. Personal data that has undergone pseudonymisation may still be attributed to a person when additional information—from a different data source, for example—is layered on top of the original data set. In 2008, researchers from the University of Texas at Austin were able to conduct statistical deanonymisation attacks against high-dimensional micro-data from the Netflix Prize dataset. This consisted of anonymous movie ratings contributed by over 500,000 subscribers of Netflix. By linking the data with a separate IMDB dataset, researchers were able to identify the Netflix records of users which revealed even more sensitive personal information.
Differential Privacy: To mitigate this problem, we added a privacy-preserving layer to enhance the security of our solution. The privacy-preserving layer ensures that aggregated queries are made differentially private such that questions made cannot be traced back to an individual. The differential privacy layer is made possible with the addition of noise combined with redaction threshold techniques to ensure user-level privacy in a group-level query. Data sent to the data node remains in the control of the data owners, and sensitive data is never shared with other parties, not even Aqilliz.
Federation Nodes: Each data source contributor will host a data node, and the federation node connects these nodes. The federation node ensures that data remains decentralised within the individual data nodes, yet still allowing analysis to be done across different data nodes. The federation node can execute queries across the data nodes and obtain results that can either be activated directly for engagement or to perform insights.
Maximising Outcomes
As consumers increasingly engage different services across multiple platforms, customer data points become inevitably scattered across different companies. As a result, brands will find it increasingly difficult to maintain a consistent thread of engagement. The key here is to look for solutions that can map consistent customer journeys across multiple platforms.
Gradually, organisations will want to work with complementary platforms to enrich their customer profiles. For example, a ride-sharing company might find it useful to work with airlines company so that they can know the customer’s travelling habits better and build more relevant personalised services.
However, as consumers and regulators place more emphasis on privacy, organisations will have to lean towards privacy-preserving technology solutions to ensure that they are rising to that demand.
Privacy-preserving technologies can help to accelerate the development of relevant insights without infringing on the privacy of individuals. For example, Google has leveraged differential privacy techniques to build its COVID-19 Community Mobility Report to help consumers and public health officials to make informed decisions about individual movements while abiding by privacy requirements.
We believe that the future is a collaborative one, and organisations will progressively want to work together to strive for better outcomes. To achieve that, today’s data infrastructure must be equipped with technology that will meet rising security and privacy challenges on the horizon, while ensuring that consumers benefit from personalisation of content.