Prof. Yun's paper "A Novel VLM-Guided Diffusion Model for Remote Sensing Image Super-Resolution" got accepted to IEEE Geoscience and Remote Sensing Letters (IF 4.4, JCR Q1 Rank 9.5%)!

Super-resolution (SR) of remote sensing imagery based on generative AI models is vital for practical applications such as urban planning and disaster assessment. However, current approaches suffer from poor performance trade-offs among the pivotal, yet competing, objectives: perceptual quality, factual accuracy, and inference speed. To break through this limitation, we propose a novel and high-performing two-stage SR framework for the remote sensing imagery based on a generative diffusion model. First, in Stage 1, factually grounded base images are generated by employing a guidance-free diffusion process relying solely on the original low-resolution images, such that the risk of semantic hallucination can be effectively mitigated. The generated images are refined subsequently in Stage 2 such that high-frequency details for SR quality can be restored via our customized and innovative guidance mechanism with a vision–language model (VLM) and a ControlNet, and a dynamic inference acceleration technique is applied to ensure efficiency. Extensive experimental results confirm that our proposed framework excels in perceptual quality—achieving top CLIP-IQA scores—and in structural integrity while achieving robust performance. In particular, it enables reliable, high-fidelity SR for large-scale, real-world remote sensing pipelines by surpassing the conventional fidelity–hallucination trade-off at practical inference speed. Source code is available at https://github.com/Bluear7878/Remote-Sensing-Vision-Language-Diffusion-Model.

Learn more


© 2024. All rights reserved.

Powered by Hydejack v9.2.1