RESEARCHERS HAVE DEVELOPED METHODS THAT ENABLE COMPUTERS TO UNDERSTAND BODY LANGUAGE, ENABLING COMPUTERS TO TRACK THE BODY POSE OF MULTIPLE INDIVIDUALS, INCLUDING FACIAL EXPRESSIONS AND HAND POSITIONS
CREDIT: CARNEGIE MELLON UNIVERSITY

Modern devices like telephones and intelligent personal assistants, will respond to the touch of a finger and even to our simple voice commands, but when will we take the next step?  Humans often communicate more with the movement of their bodies especially to express emotions, then they do with speech.  Interpreting the sometimes subtle movements of nonverbal communication between individuals will allow computers and autonomous robots to serve in social spaces, where understanding what people around them are doing could be vital to their duties.

Next Step

The critical next step will be for a computer to learn to properly read  our nuanced movements and gestures and even to read our facial expressions.  Well researchers at Carnegie Mellon University’s Robotics Institute have  teamed with Panoptic Studio, and developed a computer that understands movements of multiple people from video.  This is an impressive achievement but more like a crowd reading advancement, than interpreting what an individual is saying through their movements.

How was it done?

The application was developed through experiments conducted in the Panoptic two-story dome facility equipped with 500 video cameras.  The team built large data sets through a bottom-up approach, which first localizes all the body parts in a particular photo, arms, legs, faces, and especially hands, and then link those body parts with particular individuals.  The computer application developed will now make it possible to detect the collective pose of a group of people using a single camera and a laptop computer. The team will present their findings on their multiperson and hand-pose detection methods at CVPR 2017, the Computer Vision and Pattern Recognition Conference, July 21-26 in Honolulu.

Next Steps

The Carnegie Mellon team is now sharing their source code for both multiperson and hand-pose interpretation to encourage more research and applications with great success.  Many research groups are now using the technology to apply and extend the technologies functionality.  Meanwhile 20+ commercial companies, including the automotive industry, have stated interest in licensing the technology.  The Carnegie Mellon team intend to now work on moving from the 2-D models of humans to 3-D models.