Your Dataset Is A Giant Inkblot Test

Cassie Kozyrkov Cassie Kozyrkov
November 9, 2020 AI & Machine Learning

The danger of apophenia in analytics and what you can do about it

There’s a fine line between telling stories with data and telling lies. Before I tell you how to spot a top-notch data analyst and boost your analytical excellence, let me scare you a little.

The psychological trap in data analytics

Human brains are pattern-finding powerhouses… but those patterns don’t always have much to do with reality. We are the sort of species that finds rabbits in clouds and Elvis’s face in a potato chip.

Your Dataset Is A Giant Inkblot Test
Do these look like a rabbit and a portrait of Elvis to you? Image: SOURCE.

Take a moment to consider the Rorschach test — the one where people are shown random inkblots and asked what they see — and you’ll appreciate just how eagerly the mind injects spurious interpretations into randomness.

Your Dataset Is A Giant Inkblot Test
Bat? Butterfly? Or just an ink blot? This is the first of the ten cards in the Rorschach test, created in 1921.

Psychologists have a pretty name for this tendency to conjure false meaning out of nothing: apophenia. Give humans a vague stimulus and we’ll find faces, butterflies, and a reason to allocate budget to our favorite project or launch an AI system.

Uh-oh.

There’s plenty of random noise in most datasets, so what are the chances there’s no apophenia going on with your analytics? Can you really trust your interpretation of the data?

What the mind does with inkblots it also does with data.

To make matters worse, the more ways there are to slice-and-dice those datasets and the more complex they are, the more vague they are as stimuli. That means they’re practically begging you to see false nonsense in them.

Complex datasets practically beg you to find false meaning in them.

Are you sure your latest data epiphany isn’t an apophany in disguise?

Your Dataset Is A Giant Inkblot Test
Another great word is pareidolia, which is a kind of apophenia (finding familiar things in vague sensory stimuli). In Japan, they even have a museum of rocks that look like faces. It’s a beautiful world.

Lies, damned lies, and analytics

If that sounds dismal, I’m not done yet. Taking data analysis courses can pour fuel on that psychological fire. Students are conditioned to expect that looking at data yields real meaning because every homework exploratory analysis exercise has buried treasure in it. Very few professors have the heart to send you on wild goose chases (for your own good!) and it’s hard to grade open-ended assignments, so you usually don’t get enough exposure to them as a student.

Students grow up believing that every dataset is ready to cough up a nugget of solid truth.

Data storytelling is just a hop, skip, and jump away from outright lying with data. Setting aside the issue of whether the patterns are real, let’s talk about multiple interpretations. Just because you see a bat shape in that inkblot doesn’t mean that there isn’t also a butterfly, a pelvis, or a pair of foxes in it. If I hadn’t mentioned the foxes, would you have seen them? Probably not. Psychological mechanisms related to motivation and attention have stacked the deck against you. It takes a special sort of skill to release the bat interpretation and force yourself to see a superposition of meanings.

Once people glom on to their favorite “insight”, they’ll struggle to unsee it.

The trouble is that once people glom on to their favorite “insight”, they’ll struggle to unsee it in favor of others. People tend to believe most strongly in whichever interpretation captured their attention first and each additional meaning reduces their motivation to keep searching. Juggling multiple potential stories without overweighting your favorite is a mental muscle that takes hard work to build. Alas, not every analyst has the discipline for it. In fact, many are incentivized to “prove” one side of a story through data exploration. Why grow skills that only get in the way of engorging your data science paycheck?

What color is your lightsaber?

There are ways to prove things with data (honestly and rigorously)— my data-splitting article will tell you more — but exploratory data analysis (EDA) is not one of them. Open-ended data exploration is always a fishing expedition. What determines the color of your lightsaber is what you’re fishing for.

Your Dataset Is A Giant Inkblot Test

If you join the dark side, you’re fishing for evidence to support a theory you already “know” to be true (so you can sell it to some naive victim). You might not even realize that your lightsaber is red if you genuinely believe in data objectivity and your own unbiasedness.

Open-ended data exploration is always a fishing expedition.

With a sufficiently complex (vague) dataset, you’ll find a pattern you can spin as support for your favorite story. That’s the beauty of the Rorschach test, after all. Unfortunately, it’s worse with data than with inkblots because the more mathemagical your method (p-hacking, anyone?), the more legitimate and convincing you’ll sound to those who don’t know any better.

Your Dataset Is A Giant Inkblot Test
Satellite photo of the “Face on Mars” which many people took as evidence of extraterrestrial habitation.

Those who reject the dark side also go fishing, but they’re after something else: inspiration. They’re looking for patterns that might be interesting or compelling, but they know better than to take them as evidence. Instead, they practice a sort of open-minded analytics zen with the discipline to be mindful of as many interpretations as possible.

The best analysts challenge themselves to find as many interpretations as possible.

This takes a sharp eye and a humble, unsticky mind. Rather than tricking their stakeholders into seeing only one side of a story, they challenge themselves to do the creative thinking required to digest the same data into as many stories as possible. They present their findings in a way that inspires rigorous follow-up without causing their leadership team to run overconfidently off a cliff.

Open-mindedness gives data analysis a chance to be worthwhile.

As an added bonus, the discipline to look for multiple interpretations is an analyst’s secret weapon for not snoozing past the real treasures buried in the data. If you’re distracted by a falsehood you believe in, confirmation bias makes it hard to notice evidence that points in the opposite direction. Why bother analyzing anything if your conclusions are determined in advance? Open-mindedness gives the whole endeavor a chance to be worthwhile.

Your Dataset Is A Giant Inkblot Test
This grilled cheese sandwich fetched $28,000 in auction because it features the Virgin Mary. Alternative interpretations of what we’re seeing, anyone?

Hiring a great analyst

If you liked my other articles about analytics, here are the traits you’re already looking for in a great analyst:

  • They don’t make inferences that reach beyond the data they’re exploring. [1]
  • They’re handy with data science tools and have the skills to sift through vast datasets quickly. [2]
  • They have relevant domain knowledge so they’re less likely to waste stakeholders’ time with trivia. [3]
  • They understand that their work is about prospecting for inspiration. [3] [4]
  • They visualize data in a brain-friendly way so that time-to-inspiration is kept as short as possible. [3]
  • They know what it takes to follow up rigorously on any potential insights they found (and whom to call for help with that). [4] [5] [6] [7]

In addition to all that, this article suggests you look for analysts with three more traits:

  • They’re aware that the mind finds meaning where it doesn’t exist, so they stay humble and avoid jumping to conclusions.
  • They don’t try to sell you a story found by torturing data until it confesses. Instead, they use hedging/softening language when talking about data.
  • They have the discipline to come up with multiple interpretations for everything. The faster they produce multiple explanations and the more alternatives they generate, the more the force is them. Try interviewing for this skill next time you’re hiring an analytics Jedi.

Finally, if you’re a leader, turn a critical eye inward and make sure that you’re giving your people the right incentives. Are you looking for a data analyst or a data spin doctor? These take different mindsets (and skillsets!), so choose wisely and reward the right behaviors.

Forget potato chips! That Japanese museum of rocks that look like faces takes the cake
Forget potato chips! That Japanese museum of rocks that look like faces takes the cake.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Cassie Kozyrkov

    Tags
    AnalyticsArtificial IntelligenceData AnalystDataset
    Leave a Comment
    Next Post
    The Simple Yet Practical Data Cleaning Codes

    The Simple Yet Practical Data Cleaning Codes

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2023, Experfy Inc. All rights reserved.