Why Data Annotation is the Secret to Hacking AI

Michael Goldberg Michael Goldberg
October 1, 2019 AI & Machine Learning

In case you’ve been living under a rock, artificial intelligence (AI) is everywhere. It’s infiltrated almost every aspect of our private and professional lives. From healthcare to transportation, AI aims to redefine how information is collected, integrated, and analyzed; ultimately leading to more informed insights and delivering better outcomes. But for all its hype, the full promise of AI rarely comes to fruition because of one four-letter word: “data.”

While the AI story is all the rage, the data narrative is not as prominently discussed. Sure, data may not be as sexy as the automated systems that can learn and process information quicker than a human, but it is equally as important. And don’t get me wrong, we all know that AI requires vast amounts of data to continually learn and identify patterns that humans can’t. After all, it’s the ability to process this information and make instant decisions that has led to AI being such a game changer for industries that rely on massive volumes of data.

But the real story is not about the algorithms powering the AI revolution, instead it’s about the quality of data powering these systems. What enterprises really need as they develop their AI strategy is to integrate, clean, link, and supplement their data so they have an accurate foundation on which to build and train their machine learning algorithms.  For many organizations, this makes AI difficult if not impossible. 

“Data-related challenges are a top reason (our) clients have halted or canceled artificial-intelligence projects,” said IBM’s senior vice president of cloud and cognitive software, Arvind Krishna, speaking at The Wall Street Journal’s Future of Everything Festival. He’s certainly not alone in his assessment.  According to a report by MIT Technology Review, insufficient data quality was one of the biggest challenges to employing AI. What’s more, 85% of AI projects will “not deliver” for organizations, according to research and advisory company Gartner.

Companies need to think of AI and machine learning as the engines that will drive the amazing things they want to accomplish. But like every engine, it needs the right fuel to run well.

Enter Data Annotation

Data annotation (also referred to as data labeling) is quite critical to ensuring your AI and machine learning projects can scale. It provides that initial setup for training a machine learning model with what it needs to understand and how to discriminate against various inputs to come up with accurate outputs.

There are many different types of data annotation modalities, depending on what kind of form the data is in. It can range from image and video annotation, text categorization, semantic annotation, and content categorization. Humans are needed to identify and annotate specific data so machines can learn to identify and classy information. Without these labels, the machine learning algorithm will have a difficult time computing the necessary attributes.

The unfortunate reality about all of this is that it’s still a very manual process requiring manual labor. While tools for annotation are getting better, the difference between an ill-designed tool and an intuitive one makes significant difference in annotation productivity. According to some estimates, 80% of AI project time is currently spent on data preparation. But even small errors in the data could prove to be disastrous. In this area, humans actually have a leg up on machines. We’re are simply better than computers at managing subjectivity, understanding intent, and coping with ambiguity – all of which are important factors of data annotation.

Regardless of modality, the vast majority of problems in which AI models are being built to address them can fit into one (or many) of the below annotation tasks:

  • Sequencing: text or time series from which there’s a start (left boundary) an end (right boundary) and a label. (e.g., recognize the name of a person in a text, identify a paragraph discussing penalties in a contract)
  • Categorization: binary classes, multiple classes, one label, multi-labels, flat or hierarchic, otologic (e.g., categorize a book according to the BISAC ontology, categorize an image as offensive or not offensive)
  • Segmentation: find paragraph splits, find an object in image, find transitions between speakers, between topics, etc. (e.g., spot objects and people in a picture, find the transition between topics in a news broadcast)
  • Mapping: language-to-language, full text to summary, question to answer, raw data to normalized data (e.g., translate from French to English, normalize a date from free text to standard format)

Usually, complex problems can be solved as a sequence or a combination of tasks. For example, when you unlock your phone with face identification, machine learning is used to spot your nose and eyes (segmentation) and categorize as you or not-you (categorization). Think about when you talk to Alexa or Siri, machine learning is used to map your voice to words (mapping), recognize sequences such as instruction, name of a song, etc.(sequences) and play music, tell weather, etc. (categorization).

At the end of the day, even the most technically advanced algorithm cannot address or solve a problem without the right data. We know having access to data is quite valuable, but having access to data with a learnable ‘signal’ consistently added at a massive scale is the biggest competitive advantage nowadays. That’s the power of data annotation.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Michael Goldberg

    Tags
    Artificial Intelligence
    Leave a Comment
    Next Post
    CIOs can’t ignore these 5 realities of blockchain

    CIOs can’t ignore these 5 realities of blockchain

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2023, Experfy Inc. All rights reserved.