A team of scientists at Google Research has introduced Fluid Annotation, an intuitive human-machine collaboration interface for annotating the class label and outline of every
object and background region in an image.
Obtaining high-quality training data is speedily becoming a major strain in computer vision. This is actually the case for pixel-wise prediction tasks such as semantic segmentation, utilized in applications such as autonomous driving, robotics, and image search.
However, traditional manual labeling tools need an annotator to precisely click on the boundaries to outline each object in the image, which is obviously difficult.
Why is it so tedious? Labeling a single image in the COCO+Stuff dataset takes 19 minutes while labeling the whole dataset would take over 53k hours.
The aim of this study is to have a very efficient and natural interface, which can produce high-quality annotations with relatively less human effort than traditional manual interfaces.
Scientists say, “We explore a machine learning–powered interface for annotating the class label and outline of every object and background region in an image, accelerating the creation of labeled datasets by a factor of 3x.”
Fluid Annotation begins from the output of a strong semantic segmentation model, which a human annotator can enhance through machine-assisted edit operations utilizing a natural user interface.
They say, “Our interface empowers annotators to choose what to correct and in which order, allowing them to effectively focus their efforts on what the machine does not already know.”
It is the first investigative step towards making image annotation quicker and easier.
Scientists added, “More precisely, to annotate an image we first run it through a pre-trained semantic segmentation model (Mask-RCNN). This generates around 1000 image segments with their class labels and confidence scores. The segments with the highest confidences are used to initialize the labeling which is presented to the annotator.”
“Afterwards, the annotator can: (1) Change the label of an existing segment choosing from a shortlist generated by the machine. (2) Add a segment to cover a missing object. The machine identifies the most likely pre-generated segments, through which the annotator can scroll and select the best one. (3) Remove an existing segment. (4) Change the depth-order of overlapping segments.”
In a paper, they say, “In future work, we aim to improve the annotation of object boundaries, make the interface faster by including more machine intelligence, and finally extend the interface to handle previous unseen classes for which efficient data collection is needed the most.”