Bayesian data fusion for distributed learning

This dissertation explores the intersection of data fusion, federated learning, and Bayesian methods, with a focus on their applications in indoor localization, GNSS, and image processing. Data fusion involves integrating data and knowledge from multiple sources. It becomes essential when data is only available in a distributed fashion or when different sensors are used to infer a quantity of interest. Data fusion typically includes raw data fusion, feature fusion, and decision fusion. In this thesis, we will concentrate on feature fusion. Distributed data fusion involves merging sensor data from different sources to estimate an unknown process. Bayesian framework is often used because it can provide an optimal and explainable feature by preserving the full distribution of the unknown given the data, called posterior, over the estimated process at each agent. This allows for easy and recursive merging of sensor data with prior knowledge without the need for storage. However, there are challenges in data fusion, such as the multiple counting problem, which occurs when data is utilized numerous times without the user’s knowledge. In a Bayesian setting, a priori information of the unknown quantities is available and can be present among different distributed estimators. When the local estimates are fused, the prior knowledge used to construct several local posteriors might be overused unless the fusion node accounts for this and corrects it. In this thesis, we analyze the effects of shared priors in Bayesian data fusion contexts. Our analysis helps to understand the performance behavior as a function of the number of collaborative agents and different types of priors, depending on different common fusion rules. To perform this analysis, we use two divergences that are common in Bayesian inference. The generality of the results allows us to analyze very generic distributions. These theoretical results are corroborated through experiments in a variety of estimation and classification problems, including linear and nonlinear models, and federated learning schemes in a Bayesian perspective. Federated Learning is a discipline of Distributed Data Fusion that can be used for model training on local data without exchanging the data itself. This technique is useful in scenarios like image classification, indoor positioning, and jammer signal classification, where privacy concerns make data crowdsourcing difficult. In this thesis, we explore FL in two specific applications: Jammer signal and indoor positioning. Jamming signals can disrupt the operation of GNSS receivers, making jamming mitigation and localization techniques essential. Jammer classification can help with these techniques. For indoor positioning, local data points are collected to create a local fingerprinting database. Data-driven machine learning methods are used to predict the location based on this model learned without data sharing. However, Federated Learning faces several challenges, including the non-IID (non-identically distributed) problem. This problem occurs when the data distribution varies across different clients, which can lead to a decrease in performance. In this thesis, we will discuss various FL frameworks and algorithms that can help address this problem. A FL method from Bayesian perspective are proposed to deal with the problem in indoor positioning problem. Additionally, we will delve into the concepts of personalized and clustered Federated Learning. The personalized approach tailors the learning process to individual clients, which enhances the system’s performance by leveraging the unique data and characteristics of each client. We will discuss the application of this approach in indoor positioning and jammer signal classification, highlighting its potential improvements and challenges. Expanding on the personalized approach, we introduce and discuss clustered Federated Learning. This learning method groups clients with similar characteristics or data patterns. Each client is typically associated with one data distribution and participates in training a model for that distribution along with their cluster peers. We propose a Bayesian framework for clustered FL, which associates clients to clusters, along with several practical algorithms for efficient learning, considering the trade-offs between performance and computational complexity. In conclusion, this thesis provides a detailed investigation and evaluation of the integration of data fusion, federated learning, and Bayesian methods. The insights and conversations presented in this study are anticipated to enhance the progress and implementation of these technologies in various domains, particularly in indoor positioning, and GNSS.

File Type: pdf
File Size: 7 MB
Publication Year: 2024
Author: Peng Wu
Supervisors: Pau Closas
Institution: Northeastern University
Keywords: Data fusion, Federated Learning, Machine Learning, Bayesian learning, Positioning, Jamming