Learned Image SR: Advancing in Modeling and Generative Sample Selection

Super-resolution (SR) is an ill-posed inverse problem focused on reconstructing high-resolution images from low-resolution counterparts by recovering missing details. Despite advancements, SR faces persistent challenges in generalization, balancing fidelity and perceptual quality, mitigating artifacts, and ensuring trustworthy results. This thesis tackles these issues through innovations in model architecture, loss design, and sample selection. Central to our contributions is the use of wavelet loss, which improve the ability of SR models to distinguish genuine details from artifacts. By leveraging these losses in both GAN-based and transformer-based models, we achieve enhanced fidelity and perceptual quality. Furthermore, we augment transformer architectures with convolutional non-local sparse attention blocks and wavelet-based training, delivering state-of-the-art performance across diverse datasets. For generative models, we address the challenge of selecting a single trustworthy solution from the diverse outputs generated by flow-based and diffusion-based models. We propose image fusion strategies for flow-based models to optimize the perception-distortion trade-off and introduce human-in-the-loop and vision-language model-guided approaches for selecting reliable diffusion model samples. These strategies provide scalable, automated solutions that match or surpass human assessments in generating trustworthy SR outputs. This thesis presents comprehensive advancements in SR methodologies, spanning both regressive and generative paradigms. By introducing novel frameworks and scalable solutions, it sets new benchmarks for reliability, efficiency, and visual quality in SR, with promising implications for real-world applications, including medical imaging, satellite imagery, and digital content enhancement.

File Type: pdf
File Size: 31 MB
Publication Year: 2025
Author: Cansu Korkmaz
Supervisors: Ahmet Murat Tekalp, Zafer Dogan
Institution: Koc University
Keywords: Super-Resolution (SR Image Restoration, Wavelet-Domain Loss, Artifact Suppression, Perception-Distortion Trade-off, Generative Adversarial Networks (GANs), Diffusion Models, Flow-based Models, Transformer-based SR, Attention Mechanisms, Frequency-Domain Learning, Difficulty-Aware Evaluation, Trustworthy Image Generation, Vision-Language Models (VLM), Multi-Model Fusion, High-Frequency Detail Reconstruction, Human-in-the-Loop SR, Perceptual Quality, Sample Selection, Evaluation Metrics