Research - Video Analysis and Content Recognition

Content-Based Image Retrieval

During the past decade efficient search of content in continuously growing multimedia databases, e.g. image databases, arises as a growing challenge.

A way to solve this problem is provided by annotating the images with textual descriptions and using well known text retrieval methods. To avoid the disadvantage, that this approach fails, when no metadata is available, Content-Based Image Retrieval (CBIR) has been introduced. Instead of using annotated text, the images are described using features, provided directly from the visual content (e.g. color, texture, shape) of the image itself.

A typical application area of a CBIR-system is similarity search, where a given image is compared with images from an image database aiming to find all related images (query by example).

Compact representation of local feature descriptors

Local feature descriptors are quite significant for image retrieval in image databases or for image classification. Their main disadvantage is that one image is represented by a lot of high dimensional feature descriptors. For applications using local features descriptors which should use large scale databases or run on a mobile device, problems concerning memory and computation time occur.

To overcome this problem, several methods are in use which pool local features and obtain compact representations so only one global descriptor represents an image. The challenge is to maintain the main information from several local descriptors and represent that information in a compact form without or only with little loss of information. Methods which enable that are e.g. Bag of Keypoints, image signatures and Fisher vectors.

Multiview 3D Reconstruction

When multiple views of the scene are available, the reconstruction of the 3D structure becomes realizable. For this purpose, the knowledge of the parameters of the camera with which the images are taken must be extracted. In general, there are two ways to retrieve the camera parameters, the chart-based calibration and the structure from motion.

In order to achieve a 3D reconstruction, the dense matching, which estimates the disparity of each pixel, is necessary. The difficulty of the dense matching in multiview case becomes more apparent because of the visiblity constraints. To address this problem, many approaches have been developed with different forms of energy functions.

In recent years, there is a trend to combine the camera calibration and the dense matching within an iterative process. Additionally, how the number and position of the views influence the 3D reconstruction is also an open question.


Are you interested? There are many possible topics for Bachelor and Master theses in the area of video object segmentation, multiview 3D reconstruction and content-based image retrieval: Bachelorarbeiten / Masterarbeiten.

Iris Heisterklaus M.Sc. and Abin Jose, M.Sc