The future of data modeling

From 2022 onward, Google’s ban on the use of third party cookies will render up to 85% of current digital marketing useless, which means the clock is ticking for our industry to find new ways to enable effective targeting and measurement. Marketing organizations will have to adapt to a new reality.

Third-party cookies today

With 65%, 18% and 4% of market share, respectively, Google Chrome, Apple Safari and Mozilla Firefox are the three most consequential web browsers—with Chrome dwarfing the other two. Safari and Mozilla have already ended the usage of third-party cookies as identifiers for ad targeting, and Chrome will do the same next year.

Targeting in a world without identifiers

To reclaim 1:1 targeting opportunities that will be lost as a result of cookie deprecation (since tracking browsing activity of individual users will no longer be possible), an increasing number of publishers are following in the footsteps of the likes of Facebook and Amazon by creating their own walled gardens—requiring that users be logged in. (This enables those publishers to collect valuable data during the onboarding process and see exactly what users do while inside their “walls.”) Meanwhile, more DSPs are developing authenticated traffic solutions, leveraging publishers’ opt-in user bases to compile a more complete picture of audiences on the web. All of this means that the truly open web is shrinking somewhat, though it will still comprise nearly 90% of the internet.

With the vast majority of the internet soon to be anonymous, the long-term prospects for cookie-based targeting are looking grim, which means new perspectives will be required for audience modeling—or the process of defining audience segments for targeting and measurement.

Moving away from deterministic

When it comes to the open web where users aren’t signed in, it’s clear that our reliance on probabilistic modeling is going to increase—and this will be a steep learning curve for some marketing organizations. Then there are other developments, such as Apple requiring users’ explicit permission for tracking in its newly upgraded iOS 14.5 version, which has led to anywhere from 63.5% to 83.2% of users opting out, and the introduction of alternative ID solutions. Taken together, these changes are presaging a new era of ad targeting.

With the number of available deterministic targeting attributes dropping significantly and matching being limited to login information, IP address, variants of synthetic device ID and various combinations of the three, the industry needs fundamentally different ways of reaching consumers with high precision and scale.

The future of identifiers

One of the biggest challenges going forward for the open web (where users aren’t signed in) is bringing the fragmented ID space under a single, standardized methodology and convincing the ad-tech market to adopt it.

There are some potential solutions, such as The Trade Desk’s Unified ID 2.0—now completely open source with hashed and encrypted PIIs that are impossible to convert back to email addresses—and LiveRamp’s ATS, which enables matching of consented data by letting the Unified ID 2.0 transact with its IdentityLink ecosystem. But every solution poses its own challenges, ranging from limited ID coverage to issues with scalability and adoption.

AI as the hero

The good news is that AI can now enable data modeling for ad targeting that is both privacy-safe and more effective than cookie-based models. While AI already excelled at driving content personalization, chatbots and attribution, its power wasn’t fully realized in other areas of marketing, primarily due to the industry’s reliance on cookie-based tracking. Advanced by the proliferation of e-commerce data, advances in cloud computing and lower costs of data storage and processing, AI is now being applied to more nascent areas, such as probabilistic data modeling.

There are two areas of AI that I expect to have a significant impact on modeling in the coming years: pattern mining (PM) and natural language processing (NLP). Their combined power will have a profound impact on ad targeting, allowing messaging and creative to be tailored for specific audiences with far greater accuracy than before.

What is pattern mining?

Pattern mining (PM) is the practice of identifying traits or behaviors that allow for highly accurate segmentation, which in turn enables more powerful targeting. For example, based on someone’s visitation patterns and purchasing habits gleaned from web browsing data and e-commerce data respectively, we might be able to infer they’re the parents of young children. Meanwhile, their average basket size along with items purchased, can suggest the size and wealth of their household.

The great news is that effective PM isn’t contingent on tracking cookies or mobile ad IDs.

We can enrich our view of customers by parsing “bidstream data” from ad auctions to infer IP addresses, for example, or use Google to match individuals with visitations through geocoding. In addition to improving the precision of inferences, PM methods also reduce the probability of unwanted privacy breaches since they don’t rely on PII data.

Companies like Hearts & Science will build their own proprietary identity graphs to join disparate datasets for a more holistic view of consumers; instead of relying on cookies, probabilistic methods will be employed, allowing us to reason that the same person who buys Tide Pods and baby food is picking their children up from school every day and running outdoors several times a week. We won’t have a clue who these people actually are, but our anonymized profile of them will be rich and textured.

With purchase data now heavily weighted toward e-commerce, the pandemic has provided an even richer trove of signals for pattern mining, enabling data scientists to discern unique patterns, such as health consciousness. These same insights could theoretically have been uncovered when more transactions were happening in-store, but matching people’s offline and online identities through analysis of store-based transactions is an arduous process. (A portion of IDs also typically get lost because of sampling techniques, infrastructure limitations or other reasons.)

When deployed for identity resolution, PM algorithms can more seamlessly match online and offline shoppers’ identities and improve the accuracy of identity graphs, which, in turn, helps marketers to fine-tune their recommendation systems and avoid wasted ad impressions.

What is natural language processing?

Natural language processing (NLP) algorithms let machines analyze large volumes of text data and extract meaning from them. In the context of advertising, this means accurately gauging people’s sentiment by interpreting their reviews, comments on ads and other writings. From there, brands can measure how well an ad was received and deliver different messaging to people who responded negatively—or neutrally—in the future.

When coupled with computer vision (a field of AI that trains computers to interpret and understand the visual world), NLP can produce even better results. Computer vision(CV) algorithms can “look” at a piece of creative and assess it across thousands of dimensions, such as the types of objects in the picture, the number of people, whether they’re smiling and the percentage shown of people’s faces. By leveraging CV, brands can have a library of enriched data on creative assets and derive additional insights for targeting based on what individuals respond to.

By pairing insights from NLP-led sentiment analysis and CV, brands are in a position to target consumers with tremendous accuracy and relevance.

Preparing for the future

So, what to expect in the cookie-less world of ad targeting? On the one hand, PII in the clean rooms of walled gardens like Google and Facebook—where brands can match their first-data party data against aggregate publisher or platform data in a privacy-safe way in lieu of access to customer-level insights—will be strongholds of precision ad targeting and better audiences. On the other, AI will lead to the rise of improved probabilistic methods, uncovering new opportunities to supplement traditional marketing methods.

With AI taking on a larger scope across the marketing org and taking the main stage for data processing and modeling, more and more companies will embrace it.

But leveraging AI effectively and unlocking its full potential requires an investment in data science and engineering talent, not to mention computing resources to process terabytes of data.

What can we do now?

Here are four things to start focusing on now while we still have cookie data to fall back on:

  1. Make sure your data processing infrastructure is ready to work with extremely large volumes of data—or your agency partner’s is.
  2. Ensure your data warehouse (or your partner’s data warehouse) supports making insights from AI models available in near real time so they’re actionable for future targeting methods, including cohort-based methods.
  3. Ensure you have talent with the necessary skill set—or your agency partner does. Bringing AI and machine learning solutions to market requires a blend of analytical and engineering skills, as well as industry experience with diverse use cases.
  4. Equip your data models to optimize against an array of signals not limited to just Group IDs (i.e., FloC) but to any future signals coming out of sandbox environments.

AI has been a shiny object in our industry for years, but the imperative for meaningful adoption is urgent now. Marketers that follow through can avoid missing a beat after cookie deprecation and, ultimately, target consumers more effectively and honorably.