Three dimensional shape modeling: segmentation, reconstruction and registration

Accounting for uncertainty in three-dimensional (3D) shapes is important in a large number of scientific and engineering areas, such as biometrics, biomedical imaging, and data mining. It is well known that 3D polar shaped objects can be represented by Fourier descriptors such as spherical harmonics and double Fourier series. However, the statistics of these spectral shape models have not been widely explored. This thesis studies several areas involved in 3D shape modeling, including random field models for statistical shape modeling, optimal shape filtering, parametric active contours for object segmentation and surface reconstruction. It also investigates multi-modal image registration with respect to tumor activity quantification. Spherical harmonic expansions over the unit sphere not only provide a low dimensional polarimetric parameterization of stochastic shape, but also correspond to the Karhunen-Lo´eve (K-L) expansion of any isotropic random field on the unit sphere. Spherical ...

Li, Jia — University of Michigan


Discrete-time speech processing with application to emotion recognition

The subject of this PhD thesis is the efficient and robust processing and analysis of the audio recordings that are derived from a call center. The thesis is comprised of two parts. The first part is dedicated to dialogue/non-dialogue detection and to speaker segmentation. The systems that are developed are prerequisite for detecting (i) the audio segments that actually contain a dialogue between the system and the call center customer and (ii) the change points between the system and the customer. This way the volume of the audio recordings that need to be processed is significantly reduced, while the system is automated. To detect the presence of a dialogue several systems are developed. This is the first effort found in the international literature that the audio channel is exclusively exploited. Also, it is the first time that the speaker utterance ...

Kotti, Margarita — Aristotle University of Thessaloniki


Dynamic Scheme Selection in Image Coding

This thesis deals with the coding of images with multiple coding schemes and their dynamic selection. In our society of information highways, electronic communication is taking everyday a bigger place in our lives. The number of transmitted images is also increasing everyday. Therefore, research on image compression is still an active area. However, the current trend is to add several functionalities to the compression scheme such as progressiveness for more comfortable browsing of web-sites or databases. Classical image coding schemes have a rigid structure. They usually process an image as a whole and treat the pixels as a simple signal with no particular characteristics. Second generation schemes use the concept of objects in an image, and introduce a model of the human visual system in the design of the coding scheme. Dynamic coding schemes, as their name tells us, make ...

Fleury, Pascal — Swiss Federal Institute of Technology


Multi-user Receiver Structures for Direct Sequence Code Division Multiple Access

This thesis reports on an investigation of various system architectures and receiver structures for cellular communications systems which discriminate users by direct sequence code division multiple access (DSCDMA). Attention is focussed on the downlink of such a spread spectrum system and the influence of a number of design parameters is considered. The objective of the thesis is to investigate signal processing techniques which may be employed either at the receiver, or throughout the system to improve the overall capacity. The principles of spread spectrum communication are first outlined, including a discussion of the relative merits of spreading sequence sets, and a description of various signal processing techniques which are to be applied to the multi-user environment. The measure of system performance is introduced, and the conventional DS-CDMA system is analysed theoretically and through simulation to provide a reference performance level. ...

Band, Ian W. — University Of Edinburgh


New Higher-Order Active Contour Models, Shape Priors, and Multiscale Analysis - Their Application To Road Network Extraction From Very High Resolution Satelite Images

The objective of this thesis is to develop and validate robust approaches for the semi-automatic extraction of road networks in dense urban areas from very high resolution (VHR) optical satellite images. Our models are based on the recently developed higher-order active contour (HOAC) phase field framework. The problem is difficult for two main reasons: VHR images are intrinsically complex and network regions may have arbitrary topology. To tackle the complexity of the information contained in VHR images, we propose a multiresolution statistical data model and a multiresolution constrained prior model. They enable the integration of segmentation results from coarse resolution and fine resolution. Subsequently, for the particular case of road map updating, we present a specific shape prior model derived from an outdated GIS digital map. This specific prior term balances the effect of the generic prior knowledge carried by ...

Peng, Ting — Project-Team Ariana (INRIA-Sophia Antipolis, France); LIAMA (CASIA, China)


Fire Detection Algorithms Using Multimodal Signal and Image Analysis

Dynamic textures are common in natural scenes. Examples of dynamic textures in video include fire, smoke, clouds, volatile organic compound (VOC) plumes in infra-red (IR) videos, trees in the wind, sea and ocean waves, etc. Researchers extensively studied 2-D textures and related problems in the fields of image processing and computer vision. On the other hand, there is very little research on dynamic texture detection in video. In this dissertation, signal and image processing methods developed for detection of a specific set of dynamic textures are presented. Signal and image processing methods are developed for the detection of flames and smoke in open and large spaces with a range of up to $30$m to the camera in visible-range (IR) video. Smoke is semi-transparent at the early stages of fire. Edges present in image frames with smoke start loosing their sharpness ...

Toreyin, Behcet Ugur — Bilkent University


Good Features to Correlate for Visual Tracking

Estimating object motion is one of the key components of video processing and the first step in applications which require video representation. Visual object tracking is one way of extracting this component, and it is one of the major problems in the field of computer vision. Numerous discriminative and generative machine learning approaches have been employed to solve this problem. Recently, correlation filter based (CFB) approaches have been popular due to their computational efficiency and notable performances on benchmark datasets. The ultimate goal of CFB approaches is to find a filter (i.e., template) which can produce high correlation outputs around the actual object location and low correlation outputs around the locations that are far from the object. Nevertheless, CFB visual tracking methods suffer from many challenges, such as occlusion, abrupt appearance changes, fast motion and object deformation. The main reasons ...

Gundogdu, Erhan — Middle East Technical University


Best Signal Selection with Automatic Delay Compensation in VoIP Environment

In the last decades, air traffic spread more and more in the world, connecting more and more places. At the same time, the need to manage all the flights correctly and securely increased. Air traffic authorities imposed and updated several standards for the air traffic management (ATM) system, keeping in pace with the growing traffic flow. To achieve this, special voice communication systems (VCS) were developed. They ensure the communication between the pilots and the operators from the ground control centers. When a communication is initiated between the aircraft’s pilot and the ground air traffic control operator, various systems are used. The pilot speaks through the aircraft’s radio station and the signal is received by several ground radio stations. Then, the signal from each ground radio station arrives on different paths to the control center. Here one of the received ...

Marinescu, Radu-Sebastian — University Politehnica of Bucharest


Cross-Lingual Voice Conversion

Cross-lingual voice conversion refers to the automatic transformation of a source speaker’s voice to a target speaker’s voice in a language that the target speaker can not speak. It involves a set of statistical analysis, pattern recognition, machine learning, and signal processing techniques. This study focuses on the problems related to cross-lingual voice conversion by discussing open research questions, presenting new methods, and performing comparisons with the state-of-the-art techniques. In the training stage, a Phonetic Hidden Markov Model based automatic segmentation and alignment method is developed for cross-lingual applications which support textindependent and text-dependent modes. Vocal tract transformation function is estimated using weighted speech frame mapping in more detail. Adjusting the weights, similarity to target voice and output quality can be balanced depending on the requirements of the cross- lingual voice conversion application. A context-matching algorithm is developed to reduce ...

Turk, Oytun — Bogazici University


Nonlinear Noise Cancellation

Noise or interference is often assumed to be a random process. Conventional linear filtering, control or prediction techniques are used to cancel or reduce the noise. However, some noise processes have been shown to be nonlinear and deterministic. These nonlinear deterministic noise processes appear to be random when analysed with second order statistics. As nonlinear processes are widespread in nature it may be beneficial to exploit the coherence of the nonlinear deterministic noise with nonlinear filtering techniques. The nonlinear deterministic noise processes used in this thesis are generated from nonlinear difference or differential equations which are derived from real world scenarios. Analysis tools from the theory of nonlinear dynamics are used to determine an appropriate sampling rate of the nonlinear deterministic noise processes and their embedding dimensions. Nonlinear models, such as the Volterra series filter and the radial basis function ...

Strauch, Paul E. — University Of Edinburgh


Oscillator-plus-Noise Modeling of Speech Signals

In this thesis we examine the autonomous oscillator model for synthesis of speech signals. The contributions comprise an analysis of realizations and training methods for the nonlinear function used in the oscillator model, the combination of the oscillator model with inverse filtering, both significantly increasing the number of `successfully' re-synthesized speech signals, and the introduction of a new technique suitable for the re-generation of the noise-like signal component in speech signals. Nonlinear function models are compared in a one-dimensional modeling task regarding their presupposition for adequate re-synthesis of speech signals, in particular considering stability. The considerations also comprise the structure of the nonlinear functions, with the aspect of the possible interpolation between models for different speech sounds. Both regarding stability of the oscillator and the premiss of a nonlinear function structure that may be pre-defined, RBF networks are found a ...

Rank, Erhard — Vienna University of Technology


An Attention Model and its Application in Man-Made Scene Interpretation

The ultimate aim of research into computer vision is designing a system which interprets its surrounding environment in a similar way the human can do effortlessly. However, the state of technology is far from achieving such a goal. In this thesis different components of a computer vision system that are designed for the task of interpreting man-made scenes, in particular images of buildings, are described. The flow of information in the proposed system is bottom-up i.e., the image is first segmented into its meaningful components and subsequently the regions are labelled using a contextual classifier. Starting from simple observations concerning the human vision system and the gestalt laws of human perception, like the law of 'good (simple) shape' and 'perceptual grouping', a blob detector is developed, that identifies components in a 2D image. These components are convex regions of interest, ...

Jahangiri, Mohammad — Imperial College London


Device-to-Device Wireless Communications

Device-to-Device (D2D) is one of the important proposed solutions to increase the capacity, offload the traffic, and improve the energy effciency in next generation cellular networks. D2D communication is known as a direct communication between two users without using cellular infrastructure networks. Despite of large expected bene fits in terms of capacity in D2D, the coexistence of D2D and cellular networks in the same spectrum creates new challenges in interference management and network design. To limit the interference power control schemes on cellular networks and D2D networks are typically adopted. Even though power control is introduced to limit the interference level, it does not prevent cellular and D2D users from experiencing coverage limitation when sharing the same radio resources. Therefore, the design of such networks requires the availability of suitable methods able to properly model the eff ect of interference ...

Alhalabi, Ashraf S.A. — Universita Degli Sudi di Bologna


Nonlinear rate control techniques for constant bit rate MPEG video coders

Digital visual communication has been increasingly adopted as an efficient new medium in a variety of different fields; multi-media computers, digital televisions, telecommunications, etc. Exchange of visual information between remote sites requires that digital video is encoded by compressing the amount of data and transmitting it through specified network connections. The compression and transmission of digital video is an amalgamation of statistical data coding processes, which aims at efficient exchange of visual information without technical barriers due to different standards, services, media, etc. It is associated with a series of different disciplines of digital signal processing, each of which can be applied independently. It includes a few different technical principles; distortion, rate theory, prediction techniques and control theory. The MPEG (Moving Picture Experts Group) video compression standard is based on this paradigm, thus, it contains a variety of different coding ...

Saw, Yoo-Sok — University Of Edinburgh


A Geometric Deep Learning Approach to Sound Source Localization and Tracking

The localization and tracking of sound sources using microphone arrays is a problem that, even if it has attracted attention from the signal processing research community for decades, remains open. In recent years, deep learning models have surpassed the state-of-the-art that had been established by classic signal processing techniques, but these models still struggle with handling rooms with strong reverberations or tracking multiple sources that dynamically appear and disappear, especially when we cannot apply any criteria to classify or order them. In this thesis, we follow the ideas of the Geometric Deep Learning framework to propose new models and techniques that mean an advance of the state-of-the-art in the aforementioned scenarios. As the input of our models, we use acoustic power maps computed using the SRP-PHAT algorithm, a classic signal processing technique that allows us to estimate the acoustic energy ...

Diaz-Guerra, David — University of Zaragoza

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.