Authors: Georgios Georgakis, Arsalan Mousavian, Alexander Berg, Jana Kosecka
Detection of objects in cluttered indoor environments is one of the key enabling functionalities for service robots. The best performing object detection approaches in computer vision exploit deep Convolutional Neural Networks (CNN) to simultaneously detect and categorize the objects of interest in cluttered scenes. Training of such models typically requires large amounts of annotated training data which is time consuming and costly to obtain. In this work we explore the ability of using synthetically generated composite images for training state of the art object detectors. We superimpose 2D images of textured object models into images of real environments at variety of locations and scales. Our experiments evaluate different superimposition strategies ranging from purely image-based blending all the way to depth and semantics informed positioning of the object models to real scenes. We demonstrate the effectiveness of these object detector training strategies on publicly available datasets of GMU-Kitchens and Washington RGB-D Scenes v2, and show how object detectors can be trained with limited amounts of annotated real scenes with objects present. This charts new opportunities for training detectors of for new objects by exploiting existing object model repositories in either a purely automatic fashion or with only a very small number of human-annotated examples.