Radial Basis Function Network Robust Learning Algorithms in Computer Vision Applications

This thesis introduces new learning algorithms for Radial Basis Function (RBF) networks. RBF networks is a feed-forward two-layer neural network used for functional approximation or pattern classification applications. The proposed training algorithms are based on robust statistics. Their theoretical performance has been assessed and compared with that of classical algorithms for training RBF networks. The applications of RBF networks described in this thesis consist of simultaneously modeling moving object segmentation and optical flow estimation in image sequences and 3-D image modeling and segmentation. A Bayesian classifier model is used for the representation of the image sequence and 3-D images. This employs an energy based description of the probability functions involved. The energy functions are represented by RBF networks whose inputs are various features drawn from the images and whose outputs are objects. The hidden units embed kernel functions. Each kernel ...

Bors, Adrian G. — Aristotle University of Thessaloniki


Causal Inference from Time Series: Methods for Discovering, Explaining, and Estimating Causal Relationships

Across various fields of engineering and science, there is great interest in studying causal relationships between time series. Distinguishing cause from effect is difficult in practice for many reasons, including limited access to data, unknown functional relationships, and unobserved confounding factors. Due to these challenges, modern causal inference requires methods that can perform robust detection and estimation, quantify uncertainty, and explain how model’s inputs contribute to its predictions. These challenges are further compounded in time series settings, where autocorrelation and temporal patterns can skew inference. This thesis introduces several contributions to the field of causal inference that aim to address these concerns. The first part of the thesis examines approaches to causal discovery and the detection and estimation of causal relationships, with a focus on time-series data. The second part of the thesis considers the explanation of causal models and ...

Butler, Kurt — Stony Brook University


Representation and Metric Learning Advances for Deep Neural Network Face and Speaker Biometric Systems

The increasing use of technological devices and biometric recognition systems in people daily lives has motivated a great deal of research interest in the development of effective and robust systems. However, there are still some challenges to be solved in these systems when Deep Neural Networks (DNNs) are employed. For this reason, this thesis proposes different approaches to address these issues. First of all, we have analyzed the effect of introducing the most widespread DNN architectures to develop systems for face and text-dependent speaker verification tasks. In this analysis, we observed that state-of-the-art DNNs established for many tasks, including face verification, did not perform efficiently for text-dependent speaker verification. Therefore, we have conducted a study to find the cause of this poor performance and we have noted that under certain circumstances this problem is due to the use of a ...

Mingote, Victoria — University of Zaragoza


Deep Learning for Distant Speech Recognition

Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. ...

Ravanelli, Mirco — Fondazione Bruno Kessler


Adaptive Edge-Enhanced Correlation Based Robust and Real-Time Visual Tracking Framework and Its Deployment in Machine Vision Systems

An adaptive edge-enhanced correlation based robust and real-time visual tracking framework, and two machine vision systems based on the framework are proposed. The visual tracking algorithm can track any object of interest in a video acquired from a stationary or moving camera. It can handle the real-world problems, such as noise, clutter, occlusion, uneven illumination, varying appearance, orientation, scale, and velocity of the maneuvering object, and object fading and obscuration in low contrast video at various zoom levels. The proposed machine vision systems are an active camera tracking system and a vision based system for a UGV (unmanned ground vehicle) to handle a road intersection. The core of the proposed visual tracking framework is an Edge Enhanced Back-propagation neural-network Controlled Fast Normalized Correlation (EE-BCFNC), which makes the object localization stage efficient and robust to noise, object fading, obscuration, and uneven ...

Ahmed, Javed — Electrical (Telecom.) Engineering Department, National University of Sciences and Technology, Rawalpindi, Pakistan.


Nonlinear rate control techniques for constant bit rate MPEG video coders

Digital visual communication has been increasingly adopted as an efficient new medium in a variety of different fields; multi-media computers, digital televisions, telecommunications, etc. Exchange of visual information between remote sites requires that digital video is encoded by compressing the amount of data and transmitting it through specified network connections. The compression and transmission of digital video is an amalgamation of statistical data coding processes, which aims at efficient exchange of visual information without technical barriers due to different standards, services, media, etc. It is associated with a series of different disciplines of digital signal processing, each of which can be applied independently. It includes a few different technical principles; distortion, rate theory, prediction techniques and control theory. The MPEG (Moving Picture Experts Group) video compression standard is based on this paradigm, thus, it contains a variety of different coding ...

Saw, Yoo-Sok — University Of Edinburgh


Interpretable Machine Learning for Machine Listening

Recent years have witnessed a significant interest in interpretable machine learning (IML) research that develops techniques to analyse machine learning (ML) models. Understanding ML models is essential to gain trust in their predictions and to improve datasets, model architectures and training techniques. The majority of effort in IML research has been in analysing models that classify images or structured data and comparatively less work exists that analyses models for other domains. This research focuses on developing novel IML methods and on extending existing methods to understand machine listening models that analyse audio. In particular, this thesis reports the results of three studies that apply three different IML methods to analyse five singing voice detection (SVD) models that predict singing voice activity in musical audio excerpts. The first study introduces SoundLIME (SLIME), a method to generate temporal, spectral or time-frequency explanations ...

Mishra, Saumitra — Queen Mary University of London


Signal Processing and Learning over Topological Spaces

The aim of this thesis is to introduce a variety of signal processing methodologies specifically designed to model, interpret, and learn from data structured within topological spaces. These spaces are loosely characterized as a collection of points together with a neighborhood notion among points. The methodologies and tools discussed herein hold particular relevance and utility when applied to signals defined over combinatorial topological spaces, such as cell complexes, or within metric spaces that exhibit non-trivial properties, such as Riemann manifolds with non-flat metrics. One of the primary motivations behind this research is to address and surmount the constraints encountered with traditional graph-based representations when they are employed to depict intricate systems. This thesis emphasizes the necessity to account for sophisticated, multiway, and geometry-sensitive interactions that are not adequately captured by conventional graph models. The contributions of this work include but ...

Battiloro Claudio — Sapienza University of Rome


Wireless Localization via Learned Channel Features in Massive MIMO Systems

Future wireless networks will evolve to integrate communication, localization, and sensing capabilities. This evolution is driven by emerging application platforms such as digital twins, on the one hand, and advancements in wireless technologies, on the other, characterized by increased bandwidths, more antennas, and enhanced computational power. Crucial to this development is the application of artificial intelligence (AI), which is set to harness the vast amounts of available data in the sixth-generation (6G) of mobile networks and beyond. Integrating AI and machine learning (ML) algorithms, in particular, with wireless localization offers substantial opportunities to refine communication systems, improve the ability of wireless networks to locate the users precisely, enable context-aware transmission, and utilize processing and energy resources more efficiently. In this dissertation, advanced ML algorithms for enhanced wireless localization are proposed. Motivated by the capabilities of deep neural networks (DNNs) and ...

Artan Salihu — TU Wien


Good Features to Correlate for Visual Tracking

Estimating object motion is one of the key components of video processing and the first step in applications which require video representation. Visual object tracking is one way of extracting this component, and it is one of the major problems in the field of computer vision. Numerous discriminative and generative machine learning approaches have been employed to solve this problem. Recently, correlation filter based (CFB) approaches have been popular due to their computational efficiency and notable performances on benchmark datasets. The ultimate goal of CFB approaches is to find a filter (i.e., template) which can produce high correlation outputs around the actual object location and low correlation outputs around the locations that are far from the object. Nevertheless, CFB visual tracking methods suffer from many challenges, such as occlusion, abrupt appearance changes, fast motion and object deformation. The main reasons ...

Gundogdu, Erhan — Middle East Technical University


Deep Learning Techniques for Visual Counting

The explosion of Deep Learning (DL) added a boost to the already rapidly developing field of Computer Vision to such a point that vision-based tasks are now parts of our everyday lives. Applications such as image classification, photo stylization, or face recognition are nowadays pervasive, as evidenced by the advent of modern systems trivially integrated into mobile applications. In this thesis, we investigated and enhanced the visual counting task, which automatically estimates the number of objects in still images or video frames. Recently, due to the growing interest in it, several Convolutional Neural Network (CNN)-based solutions have been suggested by the scientific community. These artificial neural networks, inspired by the organization of the animal visual cortex, provide a way to automatically learn effective representations from raw visual data and can be successfully employed to address typical challenges characterizing this task, ...

Ciampi Luca — University of Pisa


Disentanglement for improved data-driven modeling of dynamical systems

Modeling dynamical systems is a fundamental task in various scientific and engineering domains, requiring accurate predictions, robustness to varying conditions, and interpretability of the underlying mechanisms. Traditional data-driven approaches often struggle with long-term prediction accuracy, generalization to out-of-distribution (OOD) scenarios, and providing insights into the system's behavior. This thesis explores the integration of supervised disentanglement into deep learning models as a means to address these challenges. We begin by advancing the state-of-the-art in modeling wave propagation governed by the Saint-Venant equations. Utilizing U-Net architectures and purposefully designed training strategies, we develop deep learning models that significantly improve prediction accuracy. Through OOD analysis, we highlight the limitations of standard deep learning models in capturing complex spatiotemporal dynamics, demonstrating how integrating domain knowledge through architectural design and training practices can enhance model performance. We further extend our supervised disentanglement approach to high-dimensional ...

Stathi Fotiadis — Imperial College London


Bayesian data fusion for distributed learning

This dissertation explores the intersection of data fusion, federated learning, and Bayesian methods, with a focus on their applications in indoor localization, GNSS, and image processing. Data fusion involves integrating data and knowledge from multiple sources. It becomes essential when data is only available in a distributed fashion or when different sensors are used to infer a quantity of interest. Data fusion typically includes raw data fusion, feature fusion, and decision fusion. In this thesis, we will concentrate on feature fusion. Distributed data fusion involves merging sensor data from different sources to estimate an unknown process. Bayesian framework is often used because it can provide an optimal and explainable feature by preserving the full distribution of the unknown given the data, called posterior, over the estimated process at each agent. This allows for easy and recursive merging of sensor data ...

Peng Wu — Northeastern University


High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion

This Ph.D. thesis focuses on developing a system for high-quality speech synthesis and voice conversion. Vocoder-based speech analysis, manipulation, and synthesis plays a crucial role in various kinds of statistical parametric speech research. Although there are vocoding methods which yield close to natural synthesized speech, they are typically computationally expensive, and are thus not suitable for real-time implementation, especially in embedded environments. Therefore, there is a need for simple and computationally feasible digital signal processing algorithms for generating high-quality and natural-sounding synthesized speech. In this dissertation, I propose a solution to extract optimal acoustic features and a new waveform generator to achieve higher sound quality and conversion accuracy by applying advances in deep learning. The approach remains computationally efficient. This challenge resulted in five thesis groups, which are briefly summarized below. I introduce firstly a new method to shape the ...

Al-Radhi Mohammed Salah — Budapest University of Technology and Economics


Deep learning for semantic description of visual human traits

The recent progress in artificial neural networks (rebranded as “deep learning”) has significantly boosted the state-of-the-art in numerous domains of computer vision offering an opportunity to approach the problems which were hardly solvable with conventional machine learning. Thus, in the frame of this PhD study, we explore how deep learning techniques can help in the analysis of one the most basic and essential semantic traits revealed by a human face, namely, gender and age. In particular, two complementary problem settings are considered: (1) gender/age prediction from given face images, and (2) synthesis and editing of human faces with the required gender/age attributes. Convolutional Neural Network (CNN) has currently become a standard model for image-based object recognition in general, and therefore, is a natural choice for addressing the first of these two problems. However, our preliminary studies have shown that the ...

Antipov, Grigory — Télécom ParisTech (Eurecom)

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.