Your First Step Towards AI — Labeled Data!

Andrea Suckro Andrea Suckro
October 22, 2020 AI & Machine Learning

Successful machine learning projects need good data — but in many projects, this can already be the first major hurdle. For example, image data from a camera may already be available, but so far no computer-readable data has been recorded, which should be automatically recognized in the future. Especially if you want to start with AI, the preceding work in data preparation can be overwhelming. Of course, this is particularly discouraging in cases where you only want to start the first attempt with a proof of concept for a defined area.

So you need a solution with which you can quickly and easily label your own raw data. We took a look at the tooling jungle and collected what the market has to offer and what to look out for. Let’s have a look at what are important questions to ask yourself before starting the labeling and the tools that are available to you.

Who should label the data?

Labeling is a very tedious task — especially if you want to provide a sufficient amount of data for a deep learning algorithm. If you don’t believe this, you should imagine what it is like to draw in a defect on components on hundreds of images with pixel-perfect accuracy. Unfortunately, a certain amount of labeled data is absolutely necessary. So where do you get this data from? Basically, there are the following scenarios:

  1. Labeling of the data by the domain expert: When it comes to your own data, as an expert you naturally know best how it should be labeled. In the beginning, you should decide what is a good format for storing the labels and set up a workflow with suitable tooling so that this can be done as quickly as possible for a data set.
  2. Labeling by the AI service provider: After suitable instruction and explanation, your AI partner is certainly also able to perform labeling. The advantage of this is that a good understanding of the basic data and its classes is directly conveyed. At the same time, this approach will be too expensive in most cases, since labeling itself is a less complex task.
  3. Third-party labeling: It is also possible to outsource the work completely. Cloud service providers such as Google also offer their own services for this purpose (https://cloud.google.com/ai-platform/data-labeling/docs). However, only certain formats (images, video, and text) are supported and very specific instructions must be formulated to achieve the right result. The documentation suggests doing several test runs until it works. The data itself must of course also be transferred to the cloud.

In the following, we will focus more on the first two scenarios, as in our experience they occur most frequently.

What kind of data can be labeled?

Although when thinking about labeled data and AI object recognition in images is the first thing that comes to mind, a wide variety of data types from different applications can be enriched with labels. The labels themselves can also be applied on different levels. For example, you can label an entire image, draw boxes in it and label them (bounding box segmentation), or even assign classes with pixel accuracy (semantic segmentation).

Your First Step Towards AI — Labeled Data!
Different kind of labeling levels (from left to right): Tagging, Bounding Box, Semantic Segmentation, Picture adapted by Andrea Suckro from https://www.pexels.com/photo/grey-and-white-long-coated-cat-in-middle-of-book-son-shelf-156321/

Besides images, there are of course other data formats that are well suited for processing — the most relevant among them are:

  • Time-series: this includes, for example, the recordings from a machine control system or the sensor values that are saved during a production process.
  • Audio: Data from the audio area can be used to identify speech or to analyze recordings picked up by microphones in any process.
  • Text: Interesting for all UseCases from the area of natural language processing, such as chatbots or intelligent document management systems.
  • Images: Data from cameras or optical sensors in general, which should support e.g. quality control at the end of production processes.
  • Videos: Video recordings can stem from surveillance cameras and could for example be used to increase the safety of machine operation.
  • 3D-Data: It is also conceivable that e.g. parts of a manufacturing model needs to be provided with labels.

As we will see later, the different data areas are supported by Tooling to varying degrees. However, besides the requirements for functionality, there are also other general conditions to consider.

Your First Step Towards AI — Labeled Data!
Photo by Barn Images on Unsplash

What are the further requirements for a good labeling tool?

If you are working with an AI service provider and have sensitive company data there also further considerations interesting.

  • License compliance: When using an external tool, it must be allowed to pass it on to customers for a limited time if they want to do the labeling themselves. On the other hand, this case can also occur for the customer if the AI service provider is to support the labeling process.
  • Data security: If possible, we would like to avoid a cloud-based solution since we often handle sensitive data and do not want it to end up unnecessarily on the servers of the label suppliers.
  • Comfort: The tool should also be intuitively operable by employees with little technical experience. This aspect also includes the time invest. It must always be faster to use the selected tool than to do it ‘manually’. The tool must also be easy to set up technically and be usable in as many environments as possible.
  • Use case coverage: Optimally, one tool instead of five tools would be ideal. It should be a program that supports image segmentation but that can also handle time series classification.
  • Costs: The tool used should not exceed the financial and time frame. In most cases, this can consideration is reflected in the comparison of free tools and the time saved by paid solutions.

Given those criteria, we are ready to have a look at the offered solutions!

What tools are there?

We looked around and tried to get an impression of the existing solutions on the market. We looked at both commercial and freely available software and tried to understand which use cases they cover. The commercial solutions are mostly based on cloud support and also offer additional functions besides labeling, such as the simultaneous training of AI algorithms or the support of external workforce. Most freely available alternatives usually require a command-line installation and are not available as out-of-the-box solutions (if you want to host them yourself).

Your First Step Towards AI — Labeled Data!
Our comparison overview of different labeling tools, Image by Andrea Suckro (slashwhy)

This table is by far not complete and rather intended to give an overview of the current spectrum of solutions. We noticed that there is no real canonical solution for the labeling of 3D data (except for KNOSSOS which is a specialized tool for tissue data). So for your technical 3D data, you would have to do the labeling yourself with the tool of your choice (e.g. AutoCAD, Blender, …) and export it to the corresponding files.

Final thoughts

All beginnings are difficult — as is the case when preparing raw data with labels for an initial AI project. However, one can rely on an ever-increasing support and tool landscape. For the first steps in this field, Label-Studio has convinced us the most, because it is quickly installed and easy to use. It also has a very broad support of different data types and advanced workflows if the need arises. We hope this article could give you a little insight into the world of labeling and enables you to take the next step on your personal AI journey. So don’t be shy and let’s go — from collecting data to labeling!

Here is the collection of links to the tools we covered:

  1. Hasty https://hasty.ai/solution.html
  2. DataGym https://www.datagym.ai/
  3. Labelbox https://labelbox.com/
  4. Google Cloud AI Labeling https://cloud.google.com/ai-platform/data-labeling/docs
  5. Cloudfactory https://www.cloudfactory.com/data-labeling
  6. UltimateLabeling https://github.com/alexandre01/UltimateLabeling
  7. labelme https://github.com/wkentaro/labelme
  8. labelImg https://github.com/tzutalin/labelImg
  9. Label-Studio https://github.com/heartexlabs/label-studio
  10. Curve https://github.com/baidu/Curve
  11. ELAN https://archive.mpi.nl/tla/elan

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Andrea Suckro

    Tags
    DataDeep Learning AlgorithmLabelingMachine LearningTools
    Leave a Comment
    Next Post
    1. Culture: Digital Transformation Debts post-Covid-19

    1. Culture: Digital Transformation Debts post-Covid-19

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2023, Experfy Inc. All rights reserved.