Bild- und Videosignal-Inhaltsanalyse

Optimized feature space learning using Convolutional Neural Networks Abin Jose M.Sc.

With the advance in deep learning, Convolutional Neural Networks (CNNs) are used widely for classifying images. CNNs trained for classification can be used for feature extraction. Using a pretrained CNN is one such approach. A major problem with feature vectors is that they are high dimensional in nature. Most of the dimensions carry redundant information. Linear discriminant analysis is a conventional method for classification in an optimized feature space, which also helps in compressing the feature vectors. The master thesis explores how effectively neural networks can be trained for learning this optimized feature space. Basic knowledge of machine learning and image processing are beneficial.

Virtual View Synthesis using Point Cloud Hossein Bakhshi-Golestani M.Sc.

The goal of this research is to generate a 3D model (solid 3D model or a dense point cloud) and then render a virtual view from a set of captured photos. The rendered virtual views then can be used as an additional reference for motion compensation in video coding. Assume we have a moving camera capturing images from different viewpoints. These images are used as inputs for 3D reconstruction, camera parameters estimation and generating a sparse/dense point cloud of the captured scene. This way, the 2D visual information is converted to its equivalent 3D data (2D → 3D). This 3D information will be employed to predict the missing/future frames (by projecting 3D to 2D if the camera poses are known), synthesizing the novel views haven’t seen by the camera (Virtual/Augmented Reality and Free Viewpoint TV) and Localization/Mapping for robotics and self-driving vehicles. The figures below show the concept of virtual view synthesis and a dense point cloud generated from a video sequence captured by a moving car. In this research, we are focusing on predicting missing/future frames.

There are many challenging topics in this area. Some of them are listed below:

(1) Image-Based Rendering using Point Cloud. A dense point cloud, camera poses and already known images (real cameras) are given and a novel view should be synthesized.

(2) Point Cloud and 3D Mesh Reconstruction for Video Coding. The aim is to generate a dense point cloud and then convert it to a 3D mesh. In this research, the limitation of video coding pipeline (e.g. hierarchical coding structure) should be considered.

(3) Low Complexity 3D Model-based Motion Compensation for Video Coding. Computational complexity is one of the major issues in 3D reconstruction. In this research, the computational complexity of the point cloud-based virtual view synthesis will be studied and solutions to reduce it will be investigated (solutions like using sparse point cloud, coarser mesh, ...).

(4) A Statistical Analysis of 3D Model-based Video Coding. This topic is more related to video coding and focuses on analyzing the contribution of the synthesized 3D model-based prediction in motion compensation.

Prior knowledge of image processing/computer vision is helpful and basic programming skills ( C++, Matlab ) are required.