The Role Of Data Matching In Big Data Business Strategy

“It is both staggering and exciting to imagine how data and analytic capabilities will transform entire industries.”
– Ariel Dora Stern

As promising as big data analytics sound, there’s still a huge gap between a company’s expectations with their data, and the reality. In the article Companies love big data but lack the strategy to use it effectively, Harvard Business School shared some insights that they teach to executives. And it said:

“The problem is that, in many cases, big data is not used well. Companies are better at collecting data – about their customers, about their products, about competitors – than analyzing that data and designing strategy around it.”

This clearly highlights not only the need for big data, but learning how to devise business strategies that incorporate it.

Big data – leveraging advanced analytics

Big data is something that consumes a lot of space (volume), at unprecedented speeds (velocity), and exists in different formats (variety). Big data, in itself, is not something that adds value to your business processes or strategies. You have to “use it well” to extract all the insights and benefits out of it.

If your big data is used well, then it can help you to:

Optimize operational and business process by leveraging insights gathered about products, customers, and markets,
Comply with governmental standards and reduce risks,
Design a better, personalized customer experience, and
Discover new revenue opportunities.

Let’s talk about how organizations can use big data to achieve business goals.

Devising effective business strategies that incorporate big data

Bill Schmarzo (known as the Dean of Big Data) explains it best when he reverse-engineers the process of achieving business goals using big data. He gives a 5-step approach for how it is done. I’ll give a brief overview of those steps here, and you can read about it in detail at this link.

1. Identify desired business outcomes

You must first identify the desired business outcomes of your business. Try to think of initiatives that will transform your business, or take it one step closer to success. For example, increasing online store sales by 10% in the next 12 months.

2. Identify supporting use cases

This step is about realizing which use cases will help you to achieve the business outcomes listed in the first step. For example, if increase online sales by 10% is the desired business outcome, then its supporting use cases would be: advertise promotions on high-traffic sites, run email marketing campaigns, increase online lead generation, etc.

Once the supporting use cases for each business outcome are realized, you need to assess the financial impact of each use case, its potential value, and implementation risks.

3. Prioritize use cases

In this step, your organization is required to prioritize all use cases so that you can focus on one use case at a time. This can be done by plotting the use case’s implementation feasibility against business value.

4. Identify data sources for each use case

The implementation of every use case needs to be done using data. For example, to improve customer cross-selling, you need data from social media, market baskets, site traffic information, etc. In this step, every use case is related to one or more data sources to realize which source is used for any use case implementation.

5. Compute economic value for each use case

Once you have realized the data sources that you need to successfully execute each use case, you are now ready to compute the financial value a data source holds. This is done by aggregating the financial impacts of all use case implementations this data source will be used for.

Is it that simple?

We just saw how each data source holds an economic, financial value, and how it is used to successfully execute any use case that will help you achieve desired business outcomes. Every organization has access to their data. So, it must be pretty simple, and everyone should be doing it, right? What’s the catch? Its data quality.

Your data sources hold this economic value given that they measure up to 6 critical dimensions of data quality: data accuracy, validity, consistency, uniqueness, completeness, and timeliness.

There is one challenge that is more complex than the others. And it is having unique data records across all data sources.

Many times, data from multiple sources is needed to fully execute a single use case. For this reason, data is first merged and integrated so that it is present at one place, and can be used for analysis.

Let’s look at an example

Companies usually have a number of data records in their databases for the same individual/entity. It occurs due to storing work and personal email addresses of the same person as separate contacts, or incomplete information causes you to create new contacts rather than updating the existing ones, or the information is stored in disparate systems such as website tracking application, email campaign tool, etc.

Whatever the reason, this is the most common obstacle that reduces the accuracy of big data analysis results. For instance, if your data contains duplicate records relating to the same person, you may end up sending an email campaign twice to an individual. This does not only damage your brand’s customer experience, but it also makes the use case results inaccurate. You could count the click rates from the same individual multiple times and overestimate the effectiveness of your email campaign.

Introducing data matching

When disparate datasets are merged and purged together, the data values become duplicated and inconsistent. If you base your big data business strategy on inaccurate data records, it will yield biased results. On the other hand, if you perform data matching techniques, you can easily utilize this data for the execution of any use case or business process.

How does data matching work?

Data matching is pretty simple when datasets contain unique identifiers, such as social security number, national identity number, etc. In such cases, you can simply compare both records’ identifier and classify it as a match or a non-match.

Things get complex when there are no unique identifiers in datasets or they cannot be used due to confidentiality purposes. In such cases, multiple variables are assigned weights and then evaluated together to classify matches and nonmatches.

Organizations employ various data matching techniques such as phonetic, numeric, fuzzy matching, or other proprietary algorithms. Once matched, you can then decide to merge records or purge them so that each record in your big data only relates to a single entity.

Conclusion – the role of data matching in big data business strategy

The role of data matching and data quality is imperative when it comes to designing business strategies while incorporating big data. As we mapped out the process of devising these strategies, we noticed how every data source holds a financial value and it has great impact on the business outcomes you’re looking to achieve with the supporting use cases.