Bild- und Videosignal-Inhaltsanalyse

 
Binarization of feature vectors generated by Convolutional Neural Networks Abin Jose M.Sc.

The feature vectors generated by Convolutional Neural Networks (CNNs) are in general floating-point values. For fast image retrieval, it is important to generate binary codes which can be compared by computing Hamming distance between the codewords instead of the Euclidean distances. A popular approach for binarization is using Iterative Quantization (ITQ) approach. However, a downside of this approach is that the number of bits generated in this case is limited by the number of feature vector dimensions. This problem could be solved by multi-level quantization approaches in which each feature dimension is quantized into different levels and corresponding binary codes are learned. Previous knowledge of python, matlab and basic understanding of deep learning is required for the successful completion of this thesis.

Optimized feature space learning using Convolutional Neural Networks Abin Jose M.Sc.

With the advance in deep learning, Convolutional Neural Networks (CNNs) are used widely for classifying images. CNNs trained for classification can be used for feature extraction. Using a pretrained CNN is one such approach. A major problem with feature vectors is that they are high dimensional in nature. Most of the dimensions carry redundant information. Linear discriminant analysis is a conventional method for classification in an optimized feature space, which also helps in compressing the feature vectors. The master thesis explores how effectively neural networks can be trained for learning this optimized feature space. Basic knowledge of machine learning and image processing are beneficial.

Binary Hashing using Convolutional Neural Networks Abin Jose M.Sc.

Convolutional neural networks generate feature vectors which can be easily binarized using a sigmoid hashing function. The problem with this saturating function is that the gradient vanishes when the sigmoid function saturates. In a deep neural network, the error signal from the final layer will completely vanish when propagated back during the backpropgation step. An approach to avoid this problem is by using a sigmoid with adaptive slope. In this thesis the student have to mainly explore different approaches for solving this problem and also test the effectiveness of using a sigmoid with variable slope. Basic knowledge of deep learning is beneficial.

Virtual View Synthesis, Camera Calibration and Point Cloud Generation using SfM/SLAM Hossein Bakhshi-Golestani M.Sc.

The goal of this research is to generate a 3D model (solid 3D model or a dense point cloud) and then render a virtual view from a set of captured photos. The rendered virtual views then can be used as an additional reference for motion compensation in video coding. Assume we have a moving camera capturing images from different viewpoints. These images are used as inputs for 3D reconstruction, camera parameters estimation and generating a sparse/dense point cloud of the captured scene. This way, the 2D visual information is converted to its equivalent 3D data (2D → 3D). This 3D information will be employed to predict the missing/future frames (by projecting 3D to 2D if the camera poses are known), synthesizing the novel views haven’t seen by the camera (Virtual/Augmented Reality and Free Viewpoint TV) and Localization/Mapping for robotics and self-driving vehicles. The figures below show the concept of virtual view synthesis and a dense point cloud generated from a video sequence captured by a moving car. In this research, we are focusing on predicting missing/future frames.

There are many challenging topics in this area. Some of them are listed below:

(1) Camera Calibration and Point Cloud Generation using SfM/SLAM. First, the intrinsic and extrinsic camera parameters should be estimated from a video sequence captured by a moving monocular camera, then these parameters should be used to estimate a semi-dense point cloud of the captured scene. The main approaches to solving this problem are Structure from Motion (SfM) and Simultaneously Localization and Mapping (SLAM).

(2) Image-Based Rendering using Point Cloud. A dense point cloud, camera poses and already known images (real cameras) are given and a novel view should be synthesized.

(3) Point Cloud and 3D Mesh Reconstruction for Video Coding. The aim is to generate a dense point cloud and then convert it to a 3D mesh. In this research, the limitation of video coding pipeline (e.g. hierarchical coding structure) should be considered.

(4) Low Complexity 3D Model-based Motion Compensation for Video Coding. Computational complexity is one of the major issues in 3D reconstruction. In this research, the computational complexity of the point cloud-based virtual view synthesis will be studied and solutions to reduce it will be investigated (solutions like using sparse point cloud, coarser mesh, ...).

(5) A Statistical Analysis of 3D Model-based Video Coding. This topic is more related to video coding and focuses on analyzing the contribution of the synthesized 3D model-based prediction in motion compensation.

Prior knowledge of image processing/computer vision is helpful and basic programming skills (C++, Python) are required.