A FUTURE-FIT MODEL FOR IDENTITY CREATION: UNDERSTANDING FEDERATED LEARNING
Published on June 25, 2020
From cross-site tracking to device fingerprinting, marketers are well-acquainted with these concepts that have shaped their arsenal of targeting tools for years. So much so that when Apple first revealed its Intelligent Tracking Prevention (ITP) feature in its macOS update in late 2017, advertisers met the news with furor.
As the first privacy-centric update to one of the world’s most popular internet browsers, ITP was notable as it offered an unprecedented “compromise between functionality and privacy”. Safari would keep cookies for sites regularly visited by users, but would delete the cookies left by advertisers and other forms of tracking services. While users could disable and enable ITP as they wished, a recent update to the browser dated March 2020 now sees third-party cookies blocked by default altogether.
Underscored by the ethos that “what happens on your iPhone, stays on your iPhone”, a movement is certainly well underway in building a new privacy-centric internet infrastructure. With Mozilla Firefox and most recently, Google Chrome (which accounts for approximately a little over 60 percent of global browser market share), following in Safari’s footsteps, it’s clear that marketers need to make a long-needed change. Rather than seeing this as a threat, marketers should willingly take up the challenge—after all, regulatory compliance is no joke and neither is the matter of customer trust.
While the introduction of new technologies may seem jarring—after all, there are so many already within the martech ecosystem—we’d say it’s warranted this time around. To tackle the need for post-cookie personalisation methods, one approach that we happen to wholeheartedly support is a machine learning technique known as federated learning.
What does machine learning have to do with it?
Before we jump into machine learning, here’s what you should keep in mind: federated learning is an approach that relies on a privacy-centric model to data aggregation. Rather than looking at what individuals are doing, federated learning helps marketers glean insights from a group of people
As a machine learning algorithm, federated learning helps to generate data models without users needing to share personally identifiable information—this, of course, is a plus, given the renewed attention surrounding the over-collection and under-securing of consumer data. What happens is that an algorithm is trained across a decentralised network of devices—this means that data is not shared or exchanged across this network and instead remains in local storage, never leaving a user’s device.
Compare this to a centralised network where data points would be traditionally collected to be uploaded onto a centralised server where it’s then aggregated and analysed. With the swaths of data being collected every second, a centralised approach poses greater data storage requirements and exposes users and data controllers to greater risks. We’ve explained before that one of the core flaws of a centralised system is a central point of failure, making networks and databases such as these far more vulnerable to security breaches that can easily be exploited when data is being transferred or stored.
You might recall that we briefly touched on federated learning in a previous blog post that looked to provide a beginner’s overview of what it was. If you’ve yet to give that a read, we recommend that you do.
How is the industry responding?
To address growing concerns across the advertising ecosystem, Google has been working on its Privacy Sandbox initiative since 2019, in response to how ad targeting could feasibly take place within a new browser infrastructure that prioritises privacy-by-design. A method proposed under the Privacy Sandbox which leverages federated learning is known as FLoC, or federated learning of cohorts. This model is distinguished by the fact that it ensures that targeting and audience profile creation can take place without collecting a user’s browser history.
Each browser would leverage machine learning to develop a cluster of users (a flock) based on the sites that they visit—the metrics used to determine the grouping of flocks could be based on the URL of these sites or other factors such as content on their pages. Rather than exposing the web history, advertisers would only be able to see the FloC key, representative of an audience profile. As a machine learning model, it would continue to train itself, updating the user’s FLoC key as they browse sites online.
The model, while promising, isn’t perfect. Federated learning works best when these groups are “sufficiently sized” so that they don’t reveal information that’s too personal—Google cites that FLoC groups should be in the thousands, thereby providing brands and advertisers with an optimal view of an audience segment to use for targeting efforts. Simultaneously, there still remain some privacy concerns, when it comes to the use of flocks as user identifiers which could result in a new form of tracking as well as the ability to reveal potentially “sensitive” information which can differ on an individual level.
Outside of the big tech industry, several marketing technology players are fast recognising the potential of federated learning and we’re certain that over time, we can certainly hope to see more firms adopting its use.
Why does federated learning matter?
Invariably, what federated learning offers is a clear proposition that proves to the advertising industry that it is in fact feasible to rely on first-party data, allowing them to decrease their dependency on cross-site tracking and other tactics that uphold the usefulness of third-party data. With data that’s provided directly by users and with their full consent, marketers can ensure that their data-driven insights are being arrived at in a compliant manner. To ensure its usefulness, however, federated learning needs to be deployed at scale.
At Aqilliz, our approach is to allow brands to participate in a decentralised form of data collaboration, enabling them to host data nodes on our protocol. These nodes are essentially individual databases that mask data points with differential privacy—a cryptographic enhancement that provides greater anonymisation to a data set while simultaneously preventing any reverse-engineering before it’s uploaded to a database.
Each data node provided by a data source contributor is then joined by a federated node which helps to ensure that data remains decentralised within each of these data bases, while allowing analysis to be done across each node. The result? Audience profiles, enriched by first-party data points consensually provided by users to each data source contributor. These profiles can then be activated to inform consumer engagement strategies through targeting and personalisation.
Scaling the walled garden
While Google’s proposition is certainly a step in the right direction, it would be remiss to fail to take into account that the end of third-party cookies is not a loss on their part or any major tech giant responsible for today’s walled gardens. With the majority of the world’s population reliant upon their services, be it a social network, a search engine, or an e-commerce site—their unparalleled access to first-party data will ensure that they will be an essential to data-driven marketing strategy in the future.
As the regulatory landscape continues to evolve, the industry needs to increasingly examine solutions that can help to satisfy the needs of regulators, privacy advocates, and conscious consumers alike. From our perspective, it’s clear that a sustainable way forward is an approach underpinned by collaboration—between brands and publishers who are in the position to enrich one another’s ecosystems.