Fluid annotation: An exploratory machine learning–powered interface for faster image annotation

It is now quicker and easier to make image annotation.

Example of image in the COCO dataset (left) and its pixel-wise semantic labeling (right). Image credit: Florida Memory,
Example of image in the COCO dataset (left) and its pixel-wise semantic labeling (right). Image credit: Florida Memory,

A team of scientists at Google Research has introduced Fluid Annotation, an intuitive human-machine collaboration interface for annotating the class label and outline of every
object and background region in an image.

Obtaining high-quality training data is speedily becoming a major strain in computer vision. This is actually the case for pixel-wise prediction tasks such as semantic segmentation, utilized in applications such as autonomous driving, robotics, and image search.

However, traditional manual labeling tools need an annotator to precisely click on the boundaries to outline each object in the image, which is obviously difficult.

Visualization of the fluid annotation interface in action on image from COCO dataset. Image credit: gamene
Visualization of the fluid annotation interface in action on image from COCO dataset. Image credit: gamene

Why is it so tedious? Labeling a single image in the COCO+Stuff dataset takes 19 minutes while labeling the whole dataset would take over 53k hours.

The aim of this study is to have a very efficient and natural interface, which can produce high-quality annotations with relatively less human effort than traditional manual interfaces.

Scientists say, “We explore a machine learning–powered interface for annotating the class label and outline of every object and background region in an image, accelerating the creation of labeled datasets by a factor of 3x.”

Fluid Annotation begins from the output of a strong semantic segmentation model, which a human annotator can enhance through machine-assisted edit operations utilizing a natural user interface.

They say, “Our interface empowers annotators to choose what to correct and in which order, allowing them to effectively focus their efforts on what the machine does not already know.”

Comparison of annotations using traditional manual labeling tools (middle column) and fluid annotation (right) on three COCO images. While object boundaries are often more accurate when using manual labeling tools, the biggest source of annotation differences is because human annotators often disagree on the exact object class. Image Credits: sneaka, original image (top), Dan Hurt, original image (middle), Melodie Mesiano, original image
Comparison of annotations using traditional manual labeling tools (middle column) and fluid annotation (right) on three COCO images. While object boundaries are often more accurate when using manual labeling tools, the biggest source of annotation differences is because human annotators often disagree on the exact object class. Image Credits: sneaka, original image (top), Dan Hurt, original image (middle), Melodie Mesiano, original image

It is the first investigative step towards making image annotation quicker and easier.

Scientists added, “More precisely, to annotate an image we first run it through a pre-trained semantic segmentation model (Mask-RCNN). This generates around 1000 image segments with their class labels and confidence scores. The segments with the highest confidences are used to initialize the labeling which is presented to the annotator.”

“Afterwards, the annotator can: (1) Change the label of an existing segment choosing from a shortlist generated by the machine. (2) Add a segment to cover a missing object. The machine identifies the most likely pre-generated segments, through which the annotator can scroll and select the best one. (3) Remove an existing segment. (4) Change the depth-order of overlapping segments.”

In a paper, they say, “In future work, we aim to improve the annotation of object boundaries, make the interface faster by including more machine intelligence, and finally extend the interface to handle previous unseen classes for which efficient data collection is needed the most.”