Unsupervised learning demystified

Cassie Kozyrkov Cassie Kozyrkov
September 20, 2018 Big Data, Cloud & DevOps

Ready to learn Machine Learning? Browse Machine Learning Training and Certification courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.

Unsupervised learning may sound like a fancy way to say “let the kids learn on their own not to touch the hot oven” but it’s actually a pattern-finding technique for mining inspiration from your data. It has nothing to do with machines running around without adult supervision, forming their own opinions about things. Let’s demystify!

If this feels familiar, unsupervised machine learning might be your new best friend.

This post is beginner-friendly, but assumes you’re familiar with the story so far:

  • Machine learning is all about labeling things using examples.
  • If you train your system by feeding it the answers you’re looking for, you’re doing supervised learning. 
  • To get started with supervised learning you need to know what labels you want. (Not so with unsupervised.)
  • Standard jargon includes instance, feature, label, model, and algorithm. 

What’s unsupervised learning?

Your mission? Put these six images into two groups however you like.

Check out the six instances above. What’s missing? These photographs are not accompanied by labels. No worries, your brain is pretty good at unsupervised learning. Let’s try it.

Think about how you’d like to split these images into two groups. There are no wrong answers. Ready?

Clustering the data

In a live class, Googlers shout out answers like “sitting versus standing,” “can see a wooden floor versus can’t,” “cat selfie vs not cat selfie,” and so on. Let’s examine the first answer.

One way to split the images in to two clusters: sitting versus standing. Well, “sitting” versus standing.

Unsupervised learning‘s secret labels

If you chose to define your clusters based on whether the cats are standing, what are the labels your system outputs? Machine learning is about labeling things, after all.

If you’re thinking “sitting vs standing” are the labels, think again! That’s the recipe (model) you’re using for creating your clusters. The labels in unsupervised learning are far more boring: something like “Group 1 and Group 2” or “A or B” or “0 or 1”. They simply indicate group membership, and they have no additional human-interpretable (or poetic) meaning.

Unsupervised learning’s labels simply indicate cluster membership. They have no higher human-interpretable meaning, as disappointingly boring as that may feel.

All that is happening here is that the algorithm groups things by similarity. The similarity measure is specified by the choice of algorithm, but why not try as many as possible? After all, you don’t know what you’re looking for and that’s okay. Think of unsupervised learning as a sort of mathematical version of making “birds of a feather flock together.”

Like a Rorschach card, the results are there to help you dream. Don’t take whatever you see in them too seriously.

Look again!

As the proud mother of these two individual cats, I’m saddened that in the 50 or so times I’ve taught this lesson, only one audience noticed: “Cat 1 versus Cat 2.” Instead it’s answers like “sitting, standing” or “wooden floor absent/present” or sometimes even “ugly cats versus pretty cats.” (Awww.)

Turns out these were photos of my two individual cats! Maybe you spotted it, but most of my audiences don’t… unless I give them the labels (supervise their learning). If I’d presented the data with name labels in the first place and then asked you to classify the next photo, I bet you’d find the task easy.

Lessons learned

Imagine I’m a novice data scientist getting started with unsupervised learning and (naturally!) interested in my own two cats. I won’t be able to unsee my cats when I look at these data. Because my cuddlebugs are so meaningful to me, I expect my unsupervised machine learning system to be able to recover the only thing worth caring about here. Oops!

Before this decade, computers couldn’t even hope to compete with the best
pattern finder in the world for this kind of task: the human brain. This is easy for people! So why did the thousands of Googlers who saw these unlabeled photos miss the “Cat 1 versus Cat 2” answer?

Think of unsupervised learning as a sort of mathematical version of making “birds of a feather flock together.”

Just because something’s interesting to me doesn’t mean my pattern finder will find it. Even if the pattern finder is awesome, I didn’t tell it what I’m looking for, so why would I expect my learning algorithm to deliver it? This isn’t magic! If I don’t tell it what the right answers are… I get what I get and I don’t get upset. All I can do is look at the clusters the system returns for me and see if I find them inspiring. If I don’t like 'em, I just run a different unsupervised algorithm (“Someone else in the audience, split them for me a different way”) over and over until something feels interesting.

The results are a Rorschach card to help you dream.

There’s no guarantee that anything inspiring will come out of the process, but it doesn’t hurt to try. Exploring the unknown is supposed to be a bit of an adventure, after all. Have fun with it!

In future episodes, we’ll look at cautionary tales of what can go wrong if you forget that the labels are just an inspiration and shouldn’t be taken too seriously, let alone treated as human-interpretable. (Hint: there may be mention of finding Elvis in a slice of toast.) They’re just there to give you ideas about what you might like to dive into next.

Summary: Unsupervised learning helps you find inspiration in data by grouping similar things together for you. There are many different ways of defining similarity, so keep trying algorithms and settings until a cool pattern catches your eye.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Cassie Kozyrkov

    Tags
    Data Science
    Leave a Comment
    Next Post
    A First Step Towards Strong AI

    A First Step Towards Strong AI

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in Big Data, Cloud & DevOps
    Big Data, Cloud & DevOps
    Cognitive Load Of Being On Call: 6 Tips To Address It

    If you’ve ever been on call, you’ve probably experienced the pain of being woken up at 4 a.m., unactionable alerts, alerts going to the wrong team, and other unfortunate events. But, there’s an aspect of being on call that is less talked about, but even more ubiquitous – the cognitive load. “Cognitive load” has perhaps

    5 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    How To Refine 360 Customer View With Next Generation Data Matching

    Knowing your customer in the digital age Want to know more about your customers? About their demographics, personal choices, and preferable buying journey? Who do you think is the best source for such insights? You’re right. The customer. But, in a fast-paced world, it is almost impossible to extract all relevant information about a customer

    4 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    3 Ways Businesses Can Use Cloud Computing To The Fullest

    Cloud computing is the anytime, anywhere delivery of IT services like compute, storage, networking, and application software over the internet to end-users. The underlying physical resources, as well as processes, are masked to the end-user, who accesses only the files and apps they want. Companies (usually) pay for only the cloud computing services they use,

    7 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2023, Experfy Inc. All rights reserved.