CVPR 2023

CDDFuse

Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion

Zixiang Zhao^1,2, Haowen Bai¹, Jiangshe Zhang¹, Yulun Zhang², Shuang Xu³, Zudi Lin⁴, Radu Timofte^2,5, Luc Van Gool²

¹Xi'an Jiaotong University ²Computer Vision Lab, ETH Zürich ³Northwestern Polytechnical University ⁴Harvard University ⁵University of Würzburg

Paper arXiv Code Supp.

Core Idea

Fuse what is shared. Preserve what is specific.

Image fusion combines complementary sensor evidence into one informative representation: infrared highlights salient targets under poor illumination, visible images preserve texture and scene context, and medical modalities reveal different anatomical or functional cues.

Background

Infrared

Low-light target saliency

Thermal cues expose people, vehicles, and objects when visible images are degraded.

Visible

Texture and scene context

Visible images preserve edges, colors, local textures, and readable spatial structure.

Medical

Complementary anatomy

Different medical modalities reveal structural and functional cues in the same scene.

Challenge

No ground-truth fusion target

Shared layout and modality-specific details are entangled and need explicit constraints.

CDDFuse Contributions

Dual Branches

Separate base and detail paths model shared global structure and modality-specific local evidence.

Interpretable Loss

Correlation constraints narrow the decomposition space instead of relying on a black-box fusion rule.

Lossless Details

INN blocks preserve high-frequency textures and thermal targets through invertible transformations.

Recognition Friendly

Fused images improve semantic segmentation and object detection in downstream recognition benchmarks.

Overview

Feature decomposition for better cross-modality fusion

Comparison between existing multi-modality image fusion pipelines and CDDFuse — Existing pipelines mix modality-shared and modality-specific cues. CDDFuse explicitly separates base and detail branches.

Radar plots showing CDDFuse performance on MSRS and RoadScene metrics — CDDFuse achieves leading performance across eight fusion metrics on representative IVF benchmarks.

Architecture

Dual-branch architecture with correlation-driven decomposition loss

CDDFuse first learns an autoencoder-style decomposition pipeline, then fuses decomposed base and detail features to decode the final fused image.

Shared feature encoder Restormer blocks extract cross-modality shallow features.

Base transformer encoder Lite Transformer blocks capture long-range, low-frequency shared information.

Detail CNN encoder INN blocks keep local, high-frequency details with information-preserving transforms.

Qualitative Results

Infrared-visible and medical fusion examples

MSRS infrared-visible image fusion comparison

MSRS: Infrared-Visible Fusion

CDDFuse preserves salient infrared targets while recovering visible texture and scene structure.

Quantitative comparison for infrared-visible image fusion — Infrared-visible image fusion: TNO, MSRS, and RoadScene quantitative comparison.

Quantitative comparison for medical image fusion — Medical image fusion: MRI-CT, MRI-PET, and MRI-SPECT quantitative comparison.

Downstream Recognition

Fusion images that help perception models

Multi-modal object detection comparison after fusion — Object detection benefits from clearer thermal targets and visible-scene context.

Multi-modal semantic segmentation comparison after fusion — Semantic segmentation improves from sharper boundaries and more informative fused images.

Infrared-visible object detection visualization on M3FD

Infrared-visible Object Detection on M³FD

Detection comparison on fused images, highlighting target localization under challenging visibility.

Release

Everything needed to reproduce CDDFuse

Pretrained Weights IVF and MIF checkpoints are included in the repository.

Training Code Two-stage decomposition and fusion training scripts are public.

Testing Scripts Paper-level IVF and MIF metrics can be reproduced directly.

Open GitHub Repository

Citation

BibTeX

@InProceedings{Zhao_2023_CVPR,
  author    = {Zhao, Zixiang and Bai, Haowen and Zhang, Jiangshe and Zhang, Yulun and Xu, Shuang and Lin, Zudi and Timofte, Radu and Van Gool, Luc},
  title     = {CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2023},
  pages     = {5906-5916}
}

CDDFuse