AI fuels
modern life — from the way we commute to how we order online,
and how we find a date or a job. Billions of people use AI-powered
applications every day, looking at just Facebook and Google
users alone. This represents the tip of the iceberg when it comes to AI’s
potential.
OpenAI, which recently made
headlines again for offering general availability to its
models, uses labeled data to “improve language model behavior,” or to make its
AI fairer and less biased. This is an important example, as OpenAI’s models
were long reprimanded for being toxic and racist.
Many of the AI applications we use day-to-day require a particular dataset to
function well. To create these datasets, we need to label data for AI.
Why does AI need data labeling?
The term artificial intelligence is somewhat of a misnomer. AI is not actually
intelligent. It takes in data and uses
algorithms to make predictions based on that data. This process
requires a large amount of labeled data.
This is particularly the case when it comes to challenging domains like
healthcare, content moderation, or autonomous
vehicles. In many instances, human judgment is still required to
ensure the models are accurate.
Consider the example of sarcasm in social media
content moderation. A Facebook post might read, “Gosh, you’re so
smart!” However, that could be sarcastic in a way that a robot would miss. More
perniciously, a language model trained on biased data can be sexist, racist, or
otherwise toxic. For instance, the GPT-3 model once associated
Muslims and Islam with terrorism. This was until labeled data was
used to improve the model’s behavior.
As long as the human bias is handled as well, “supervised models allow for more
control over bias in data selection,” a 2018 TechCrunch article stated.
OpenAI’s newer models are a perfect example of using labeled data to control
bias. Controlling bias with data labeling is of vital importance, as
low-quality AI models have even landed
companies in court, as was the case with a firm that attempted to
use AI as a screen reader, only to have to later agree to a settlement when the
model didn’t work as advertised.
The importance of high-quality AI models is making its way into regulatory
frameworks as well. For example, the European Commission’s regulatory
framework proposal on artificial intelligence would subject
some AI systems to “high quality of the datasets feeding the system to minimize
risks and discriminatory outcomes.”
Standardized language and tone analysis are also critical in content
moderation. It’s not uncommon for people to have different definitions of the
word “literally” or how literally they should take something such as “It was
like banging your head against a wall!” To decide which posts are violating
community standards, we need to analyze these types of subtleties.
Similarly, the AI startup Handl uses labeled data to more accurately convert
documents to structured text. We’ve all heard of OCR (Object
Character Recognition), but with AI-powered by labeled data, it’s being taken
to a whole new level.
To give another example, to train an algorithm to analyze medical images for
signs of cancer, you would need to have a large dataset of medical images
labeled with the presence or absence of cancer. This task is commonly referred
to as image segmentation and requires labeling tens of thousands of samples in
each image. The more data you have, the better your model will be at making
accurate predictions.
Sure, it’s possible to use unlabeled data for AI training algorithms, but this
can lead to biased results, which could have serious implications in many
real-world cases.
Applications using data labeling
Data labeling is
vital for applications across search, computer vision, voice assistants,
content moderation, and more.
Search was one of the first major AI use-cases relying on human judgment to
determine relevance. With labeled data, a search can be extremely accurate. For
instance, Yandex turned to human “annotators”
from Toloka to help improve its search engine.
Some of the most popular uses of AI in health care include
helping to diagnose skin conditions and diabetic retinopathy, boosting recall
rates for medication compliance reviews, and analyzing radiologist reports to
detect eye conditions like glaucoma.
Content
moderation has also seen significant advances thanks to AI
applied to large quantities of labeled data. This is especially true for
sensitive topics like violence or threats of violence. For example, people may
post videos on YouTube threatening suicide, which need to be immediately
detected and differentiated from informational videos about suicide.
Another important use of AI for data labeling is understanding voices with any
accent or tone, for voice assistants like Alexa or Siri.
This requires training an algorithm to recognize male and female speech
patterns based on large volumes of labeled audio.
Human computing for labeling at scale
All this begs the question: How do you create labeled data at scale?
Manually labeling data
for AI is an extremely labor-intensive process. It can take
weeks or months to label a few hundred samples using this approach, and the
accuracy rate is not very good, particularly when facing niche labeling tasks.
Additionally, it will be necessary to update datasets and build bigger datasets
than competitors in order to remain competitive.
The best way to scale data labeling is with a combination of machine learning
and human expertise. Companies like Toloka, Appen, and others use AI to match
the right people with the right tasks, so the experts do the work that only
they can do. This allows firms to scale their labeling efforts. Further, AI can
weigh the answers from different respondents according to the quality of the
responses. This ensures that each label has a high chance of being accurate.
With techniques like these, labeled data is fueling a new AI revolution. By
combining AI with human judgment, companies can create accurate models of their
data. These models can then be used to make better decisions that have a
measurable impact on businesses.
Author: Frederik Bussler
Source: Venturebeat.com
ReplyDeletecheck Cryptoanime for more anime reviews.