How to Build Data Enrichment Waterfalls

The Data Quality Challenge

Data quality is one of the top challenges at any company, especially B2B organizations with complex Account Hierarchies and Buying Teams. Poor data quality is cited as a large cost with a big drag on sales, marketing, and insight processes.

Bad data costs companies and teams time and money. Forbes estimated it drags the economy down by $3.1 trillion while Gartner estimates it costs each organizations about $15M on average per year in lost time and insights. To calculate your organization’s losses due to data quality, add up the following:

Database costs from dupes and unneeded data
Risks in privacy and brand reputation
Thousands of hours of manual fixes in databases and insights
Less optimal decisions in sales and marketing spend
Sales drag when phone and email are out of date
And so many more

RingLead once estimated the impact of bad data as $100 per record and data quality decomposes about 25% per year. Given the potential drag on profits, MOPS and Data teams can easily make a business case to automate Data Quality.

The Data Waterfall is a concept employed by data quality experts in sales operations and marketing operations, as well as Data Science Teams. The concept is simple, yet challenging to get right.

What good data quality drives helps you scale

Executive Insights
Funnel modeling and predictive reports
Sales capacity planning
Lead Routing
Account Based Marketing
Automated campaigns
Personalization

So what can you do to achieve data quality that drives growth and keeps costs down?

Data waterfalls are the solution

The data waterfall is a time tested technique to ensuring data quality is very high. They work well for incoming Leads as well as to reprocess existing data. Cut off bad data at the source and over time.

Here is what a data waterfall should look like when you are done planning it out.

Depending on your specific requirements, the order of operations may be slightly different.

Requirements

Before embarking on a data waterfall to cleanse incoming (or existing) data, be sure to plan out your needs. Data and data processing can become expensive so the focus should be on what matters to your business – from insights to Sales.

Enrichment

Data, Lead, or Account enrichment is the process of adding and updating field values for People and Accounts from other sources. These sources can be Third Parties like Dun & Bradstreet or other internal databases such as User tables.

The purpose is to add relevant details that support further Cleansing, Routing, Matching to create a Minimum Viable sales record a person would call.

Depending on your needs, generally keep this to the top values you need for Insights or to Contact someone. Be sure to determine where Enrichment should add, update, or ignore fields. For example, with Sales entered or Prospect entered data, assume that is correct data, while allowing other sources to fill in Null fields, then normalize only certain fields to conform to specific needs.

Sales entered data > Other Systems > Third Party Sources

Here’s an example record with possible actions based on if we trust User input over the Third Party. Note that some values should be normalized as part of the update.

Data Normalization

Data Normalization is the process of mapping existing values to Standard Values for certain fields. The reason to do this is to enable unform reporting and ability to action on these data fields. The fields to normalize are dependent on reference data such as ISO or NAICS values, or your own set of standard picklist values.

In the example above, Industry values are mapped up to a more general “Media” from “Publishing” because that’s how we think about our customers and insights.

It is recommended to select only a few truly critical fields to normalize. Do attempt to future proof because companies often go through challenging re-mapping exercises that impact multiple data systems if they don’t plan for future expansion.

Cleaning up junk and typos

Ensure your Data Normalization process looks at common typos as well as junk values before allowing a record to proceed.

Junk values are usually from spammers attempting to flood your systems or people who do not want to provide their personal information. If the names are like “AAA,” unsavory words, or email addresses like none@none.com then consider auto deleting them, or routing to a junk bucket.

Typos can cause havoc because they can easily end up as junk or creating hard bounces. For example, these common email address typos will kill a Lead fast. Use regular expressions to resolve the ones you want. This way you recover good records.

Gmail.con

Gmil.com

Gmail.cmo

Record Matching, Deduping, and More

Once your record is cleaned and enriched appropriately, then you can attempt to dedupe and match the record. Deduping is an entire process with algorithms ranging from simple to very complex fuzzy matching.

Record matches in marketing automation platforms are based on email address and domain. The system will find any records with the same values and then decide to Merge, Delete, or Leave Alone.

Once the survivor record is identified, then that record can be matched to an Account and Buying Team based on your business rules. Many organizations use Domain to match a person to an Account. If the Account doesn’t exist, then the system may create the Account and assign it to a salesperson.

Survivor record considerations as you build out the process:

Retain oldest record?
Most complete record?
Record already attached to Account?
Merge only Leads but not Contacts?
What do you do with other object relationships?

Account Matching and Assignment

Account Matching rules can be its own waterfall using various values and techniques. Since automation may be challenging, it is ideal to leverage several techniques leveraging enrichment and user entered data. You can match based on criteria such as:

Domain Name
Account Name or Company Name
DNB Number
Fuzzy match on Company Name

In The Trailblazer’s Guide to RevOps by Openprise, there is an even deeper list of considerations to make when assigning Accounts to Sales and matching Leads to Accounts. Many of these will depend on your Rules of Engagement.

Assign the entire Account Hierarchy to one Sales Rep or by Territory?
Assigning all Contacts to the owner of the Account?
How do you handle Named Accounts and Carve Outs?
Do you assign End Users to Accounts or keep them out and who are they assigned to?

Prioritization: Scoring and Sorting

Scoring people and Accounts is an important step to make efficient use of Sales and Marketing resources.

Account Scoring and Ranking

Sort Accounts into your Target Accounts based on ICP and propensity to buy. Companies with good historical data can use a regression algorithm to filter out companies if they do not match the ICP and Target Account list, then prioritize the Accounts that seem most similar to ones that bought in the past.

When you couple this with In Market signals, you can surface Buying Teams and Accounts that Sales will love to call.

Person and Lead Scoring

There is a lot out there on this topic from Behaviors to Demographics to Predictive. The best is a machine learning approach that helps filter up People who match not just the static ICP but also the likelihood to advance to the next stage based on their Role and signals.

The ICP
The Buying Team Roles
Engaged (based on signals)
Readiness to talk to sales

Customer Lifecycle Stage

Stamping each Lead and each Account with a clear Customer Lifecycle Stage can assist people and reporting insights. The deterministic approach can take into considerations such as:

Buying Team

The Buying Team association for a Person and Account record is important. Salespeople need the relevant people in an Account to advance the sale. The more automatically we can associate the Director of Sales Operations to the Buying Team and Account, surfacing their level of interest, the faster Sales can make a deal. Similarly, the Intern or less relevant roles can receive automated communications rather than needing a BDR.

Routing to Sales

The last step to take is Routing. Routing requires very clean, very clear data. Routing rules are typically Geographic, Company Size, and Industry. Get these right earlier in the waterfall and Sales should love your routing system.

To maintain this system make sure to stay close to Sales to understand when changes occur, such as

Territory re-alignments
Rules of Engagement
Leavers and Joiners

What to do with the leftovers?

Invariably, automation with deterministic rules (or even AI) may be uncertain what to do with a record. A record may have such unclear values or no matches that it falls out.

At each step in the waterfall, the final step should be to drop non-compliant records to a human bucket for further evaluation. In the diagram above, the example is to drop out records early. Each stage can have its own exit to a human team.

For example, Leads that make it to the Routing Stage but cannot be routed can drop to a SDR/BDR team to evaluate further. Eventually, AI may solve parts of this challenge.

Order of Operations Matters

The examples provided are examples. Your needs may differ. Just be aware that the order of operation in the waterfall affects the outcome.

For example, if you run Data Normalization before Data Enrichment, you will likely miss the opportunity to normalize third party data that isn’t in the format you want.

Implementation

A well-run data waterfall will cause data quality to skyrocket and your executive insights become more in focus, with higher resolution.

Nailing the implementation is easy if you have done the hard work of requirement gathering and carefully designing that waterfall.

The next step is to automate this process using a revops data platform like Openprise.

Managing Risks

Risks are to be managed and mitigated. The risk of a data waterfall should be built into the process from the start while allowing human tweaking as everyone learns about edge cases.

Junk filters are too aggressive (false positives)
Data enrichment overwrites useful data that was more up to date (and how to remediate this)
Merging records removes or overwrites history in fields that people used improperly, just as Notes instead of Tasks. (Make sure you know the human side of your CRM!)
Automating the loose expressions, such as matching on “intern” will pick up “international VP” and deletes the record you actually wanted.
Automated deletion of data is risky! Find a way to store it more cheaply for a specified time to handle potential mitigation.
Not monitoring the human review fall out box when a record can’t be cleaned automatically.

To manage these risks as you build out your data waterfall, ensure that you test the automation in a sandbox before running it on all records. Constantly monitor the process, counts, and fall out.

And always show the value of your data waterfall to the business!

More resources

The GTM Guide to Data Quality

Data Quality Considerations

The High Cost of Bad Data

The Deduplication Survival Guide

Data Enrichment Waterfalls