Scientists at the Carnegie Mellon University’s Robotics Institute have recently developed a method to which they enabled a computer that reads human body language and their movements on video in real-time. The computer is also able to read the pose of each individual’s fingers.
The Panoptic Studio supercharges the research. Currently, it is being used to improve body, face and hand detectors by jointly training them.
According to scientists, the methods of tracking 2D human’s body language and motion open up new ways for people and machines to interact with each other. It also could enable people to use machines to better understand the world around them.
Yaser Sheikh, an associate professor of robotics, said, “The ability to recognize hand poses, for instance, will make it possible for people to interact with computers in new and more natural ways, such as communicating with computers simply by pointing at things.”
Identifying the difference of non-verbal communication between individuals allows robots to serve in social spaces. This allows robots to notice what people around them are doing, what moods they are in and whether they can be interrupted.
For example, a self-driving car could get an early warning that a pedestrian is about to step into the street by monitoring body language. In this way, enabling machines to understand human behavior also could enable new approaches to behavioral diagnosis and rehabilitation for conditions such as autism, dyslexia, and depression.
Scientists, for now, released their computer code for both multiperson and hand pose estimation. Although, no doubt that tracking lots of people in real time presents a number of challenges. For example, particularly in social situations where they may be in contact with each other.
Scientists for that purpose took a bottom-up approach, which first localizes all the body parts in a scene, arms, legs, faces, etc. They then associate those parts with particular individuals.
Hanbyul Joo, a Ph.D. student in robotics said, “The challenges for hand detection are even greater. As people use their hands to hold objects and make gestures, a camera is unlikely to see all parts of the hand at the same time. Unlike the face and body, large datasets do not exist of hand images that have been laboriously annotated with labels of parts and positions.”
The image that shows only part of the hand often comes with another image from a different angle with a full or complementary view of the hand. Here, particularly scientists used CMU’s multicamera Panoptic Studio.
Joo said, “A single shot gives you 500 views of a person’s hand, plus it automatically annotates the hand position. Hands are too small to be annotated by most of our cameras. Thus for this study, we used just 31 high-definition cameras, but still were able to build a massive data set.”
Shiekh said, “Now, we’re able to break through a number of technical barriers primarily as a result of that NSF grant 10 years ago. We’re sharing the code, but we’re also sharing all the data captured in the Panoptic Studio.”
Sheikh and his colleagues will present reports on their multi-person and hand pose detection methods at CVPR 2017. CVPR is the Computer Vision and Pattern Recognition Conference July 21-26 in Honolulu.