iRobot, developers of military robots and the Roomba robot vacuum cleaner, have been building robots for more than 23 years. In 2012 they reported having sold 8 million home robots and more than 5,000 iRobot unmanned ground vehicles worldwide. For all of these robotic devices iRobot has required sophisticated robot vision. But robots still are challenged when trying to differentiate a couch from a chair, or a dog from an ottoman.
This month, however, a team of researchers at iRobot announced that they had developed a generic object recognition algorithm. The technology enhances artificial vision to allow a robot to discriminate among objects viewed, giving it the ability to tell the difference between a dog and a chair. As you can imagine this versatility can serve many more applications than just the vacuuming of a room.
The mathematics to detect and understand what a robot sees is highly complex involving a hundred thousand calculations per pixel or 10 billion per single VGA image. For high definition vision the number of calculations is much greater.
iRobot based its new algorithm on the Deformable Parts Model (DPM). DPM breaks down single objects into multiple parts and shapes and discriminates among objects in the field of view. The algorithm must calculate distance, angle, and scale of the objects it is viewing and do it in real time. It takes into consideration lighting as well. To train the new robotic vision algorithm the researchers at iRobot showed it 1,000 objects which it then classified. The iRobot team employed a separate high-speed graphic processor unit (GPU) for image detection separating vision from all other functions. Testing demonstrated that the robot vision can discriminate between a tube containing potato chips and a roll of toilet paper. It can even identify classifications of objects and individual styles within those objects as you can see in the images of the chairs below, one of the experiments conducted by the iRobot team. The algorithm can even identify a whole object by viewing only a part of one. For example, detecting a cat by only seeing a cat’s paw. It also can detect and identify compound objects such as a person sitting in a chair as opposed to just the chair or the person alone.