The Role of Data Clean Rooms in the World of Advertising

Ad tech never suffers from stagnation; there are constantly new technologies, paradigms, and strategies to explore. While the concept of a data clean room isn’t new, it’s gaining traction and garnering interest as the world grows increasingly concerned about privacy and user-level tracking. In this article, we’ll explore the concept of a data clean room, how it works, and — most importantly — how it is used in the world of advertising.

First, what exactly is a data clean room?

In short, a data clean room is essentially a database with enhanced security controls controlling data access. The idea behind this concept is you can merge two (or more) datasets without revealing personally identifiable information (PII).

A Venn diagram is labeled "Sunglass" on the left and "Cablevision" on the right. Arrows drawn to the CableVision circle say "client list" and "customer emails in the clear." Above "customer emails in the clear" is a bubble showing a list of legible email addresses. An arrow also identifies the overlap between the Sunglass and Cablevision circles.

For instance, let’s say an eyeglasses brand, Sunglass Co., wants to advertise on a major streaming provider, CableVision, to reach new customers. If there are no concerns with privacy, Sunglass Co. could simply compile a complete list of customer email addresses that they could then compare against CableVision subscribers in the hopes of understanding how many of their joint customers they could reach. 

A Venn diagram is labeled "Sunglass" on the left and "Cablevision" on the right. Arrows drawn to the CableVision circle say "client list" and "one-way hashing." Above "one-way hashing" is a bubble showing a list of scrambled characters. An image of a key points to another bubble labeled "customer emails in the clear," which shows a legible list of email addresses. An arrow also identifies the overlap between the Sunglass and Cablevision circles.

But CableVision doesn’t want to reveal its subscriber information, and Sunglass Co. doesn’t want to share its customer data with CableVision; both are considered third parties that haven’t been authorized to have this information. Instead, Sunglass Co. and CableVision can upload hashed email addresses (a one-way coded string that cannot be traced back to the original email) into a data clean room. They can configure this clean room to report only on the matching hashed emails that appear on both CableVision’s and Sunglass Co.’s client lists. This way, Sunglass Co. will only see shared customers without the risk of exposing any of its customers’ data.

Like a database, there are nearly endless potential applications for a clean room. The primary difference between the two is that clean rooms allow for data access that is controlled. This is a benefit because it allows brands to mutually understand shared customer information overlap without exposing sensitive data. The main uses of clean rooms typically involve merging customer datasets to check for overlap, either for ad targeting or reporting purposes.

Understandably, in a period of increased scrutiny of data sharing, data clean rooms are gaining renewed interest as a mechanism for sharing data in a privacy-centric manner. There are many vendors in this space that primarily specialize in data storage: AWS, Snowflake, Google Cloud, and LiveRamp.

Branch works with nearly all clean rooms

Because most data clean rooms operate similarly to data warehouses, Branch can readily connect and populate most existing clean rooms today. For instance, if the clean room is hosted by your cloud provider, we can use Scheduled Log Exports to populate the designated clean room bucket within your cloud provider with Branch data. The mechanisms for matching and shielding the data exist within the clean room solution, so its functionality isn’t dependent on Branch or the data.

The role of data clean rooms

The market largely agrees that user privacy and data sharing will become more strict, so don’t expect to see interest in clean rooms fading anytime soon. I see three clear use cases for clean rooms emerging:

  1. Targeting and audience insights: Traditional ad targeting relied on publishers measuring — in some cases estimating — the demographics of their audience and pitching advertisers on this demographic. As digital performance advertising emerged, advertisers could upload their audiences and the advertiser would actively find and serve ads to these users. See the highly effective Facebook lookalike audiences for an example on how this works in practice. Data clean rooms allow advertisers to continue to retarget users, but instead of uploading a list of names, they’ll upload a more secure list of hashed, one-way identifiers that respect user privacy. 
  2. Measurement and campaign attribution: In the heyday of user tracking, advertisers could literally attribute the source of every single user that interacted with their campaign. Now with user opt outs, this isn’t possible. Clean rooms offer the advertiser and publisher the option to compare newly acquired users with a list of users who’ve seen the ad, in a privacy-centric manner. Note that depending on your platform or region, matching user data — online or offline — may still be prohibited. 
  3. Third-party ad bidding and serving: In a more complex example, we’re seeing a variant of clean rooms that are used for creating partnerships that allow advertisers to interact with publishers without sharing data. The best example is Amazon, who has created relationships with Meta, Snap, and Pinterest to allow advertisers to serve targeted ads and users to purchase products without leaving these publisher properties. Eric Suefert does an excellent job outlining how this may work in a recent post (paywall).

They are powerful, but it’s important to realize data clean rooms face their own shortcomings; that is, they aren’t designed to solve all issues that emerge from identifier deprecation. A clean room alone won’t make up for the lack of deterministic identifiers. However, with increasingly creative, complex, and combined efforts, advertisers and publishers are making headway to use the first-party data they have in safe, protected ways to increase the efficacy of their paid media strategies.