Bild- und Videosignal-Inhaltsanalyse

Hyper parameter tuning for Convolutional Neural Networks Abin Jose M.Sc.

In deeplearing, tuning of hyperparameters is one of the challenging area. Some of the approaches for hyperparameter tuning are grid search and random search approaches. However, both these approaches require more computations and simulations to find the optimum local minima. Bayasian optimization is one of the approach to solve this problem which uses a probabilistic approach to find the optimum minima. This thesis explores how we could use Bayasian optimization to fine tune the hyperparametrs of a CNN architecture. Basic understanding of deep learning and programming skills in python are required for the successful completion of this thesis.

Optimized feature space learning using Convolutional Neural Networks Abin Jose M.Sc.

With the advance in deep learning, Convolutional Neural Networks (CNNs) are used widely for classifying images. CNNs trained for classification can be used for feature extraction. Using a pretrained CNN is one such approach. A major problem with feature vectors is that they are high dimensional in nature. Most of the dimensions carry redundant information. Linear discriminant analysis is a conventional method for classification in an optimized feature space, which also helps in compressing the feature vectors. The master thesis explores how effectively neural networks can be trained for learning this optimized feature space. Basic knowledge of machine learning and image processing are beneficial.

Binary Hashing using Convolutional Neural Networks Abin Jose M.Sc.

Convolutional neural networks generate feature vectors which can be easily binarized using a sigmoid hashing function. The problem with this saturating function is that the gradient vanishes when the sigmoid function saturates. In a deep neural network, the error signal from the final layer will completely vanish when propagated back during the backpropgation step. An approach to avoid this problem is by using a sigmoid with adaptive slope. In this thesis the student have to mainly explore different approaches for solving this problem and also test the effectiveness of using a sigmoid with variable slope. Basic knowledge of deep learning is beneficial.

Virtual View Synthesis, Camera Calibration and Point Cloud Generation using SfM/SLAM Hossein Bakhshi-Golestani M.Sc.

The goal of this research is to generate a 3D model (solid 3D model or a dense point cloud) and then render a virtual view from a set of captured photos. The rendered virtual views then can be used as an additional reference for motion compensation in video coding. Assume we have a moving camera capturing images from different viewpoints. These images are used as inputs for 3D reconstruction, camera parameters estimation and generating a sparse/dense point cloud of the captured scene. This way, the 2D visual information is converted to its equivalent 3D data (2D → 3D). This 3D information will be employed to predict the missing/future frames (by projecting 3D to 2D if the camera poses are known), synthesizing the novel views haven’t seen by the camera (Virtual/Augmented Reality and Free Viewpoint TV) and Localization/Mapping for robotics and self-driving vehicles. The figures below show the concept of virtual view synthesis and a dense point cloud generated from a video sequence captured by a moving car. In this research, we are focusing on predicting missing/future frames.

There are many challenging topics in this area. Some of them are listed below:

(1) Camera Calibration and Point Cloud Generation using SfM/SLAM. First, the intrinsic and extrinsic camera parameters should be estimated from a video sequence captured by a moving monocular camera, then these parameters should be used to estimate a semi-dense point cloud of the captured scene. The main approaches to solving this problem are Structure from Motion (SfM) and Simultaneously Localization and Mapping (SLAM).

(2) Image-Based Rendering using Point Cloud. A dense point cloud, camera poses and already known images (real cameras) are given and a novel view should be synthesized.

(3) Point Cloud and 3D Mesh Reconstruction for Video Coding. The aim is to generate a dense point cloud and then convert it to a 3D mesh. In this research, the limitation of video coding pipeline (e.g. hierarchical coding structure) should be considered.

(4) Low Complexity 3D Model-based Motion Compensation for Video Coding. Computational complexity is one of the major issues in 3D reconstruction. In this research, the computational complexity of the point cloud-based virtual view synthesis will be studied and solutions to reduce it will be investigated (solutions like using sparse point cloud, coarser mesh, ...).

(5) A Statistical Analysis of 3D Model-based Video Coding. This topic is more related to video coding and focuses on analyzing the contribution of the synthesized 3D model-based prediction in motion compensation.

Prior knowledge of image processing/computer vision is helpful and basic programming skills (C++, Python) are required.