CVPR 2023

CDDFuse

Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion

1Xi'an Jiaotong University 2Computer Vision Lab, ETH Zürich 3Northwestern Polytechnical University 4Harvard University 5University of Würzburg

Core Idea

Fuse what is shared. Preserve what is specific.

Image fusion combines complementary sensor evidence into one informative representation: infrared highlights salient targets under poor illumination, visible images preserve texture and scene context, and medical modalities reveal different anatomical or functional cues.

Background
Infrared

Low-light target saliency

Thermal cues expose people, vehicles, and objects when visible images are degraded.

Visible

Texture and scene context

Visible images preserve edges, colors, local textures, and readable spatial structure.

Medical

Complementary anatomy

Different medical modalities reveal structural and functional cues in the same scene.

Challenge

No ground-truth fusion target

Shared layout and modality-specific details are entangled and need explicit constraints.

CDDFuse Contributions

Dual Branches

Separate base and detail paths model shared global structure and modality-specific local evidence.

Interpretable Loss

Correlation constraints narrow the decomposition space instead of relying on a black-box fusion rule.

Lossless Details

INN blocks preserve high-frequency textures and thermal targets through invertible transformations.

Recognition Friendly

Fused images improve semantic segmentation and object detection in downstream recognition benchmarks.

Overview

Feature decomposition for better cross-modality fusion

From vanilla fusion to base/detail decomposition

Comparison between existing multi-modality image fusion pipelines and CDDFuse
Existing pipelines mix modality-shared and modality-specific cues. CDDFuse explicitly separates base and detail branches.

Metric gains across MSRS and RoadScene

Radar plots showing CDDFuse performance on MSRS and RoadScene metrics
CDDFuse achieves leading performance across eight fusion metrics on representative IVF benchmarks.

Architecture

Dual-branch architecture with correlation-driven decomposition loss

CDDFuse network architecture and training workflow

CDDFuse first learns an autoencoder-style decomposition pipeline, then fuses decomposed base and detail features to decode the final fused image.

Shared feature encoder Restormer blocks extract cross-modality shallow features.
Base transformer encoder Lite Transformer blocks capture long-range, low-frequency shared information.
Detail CNN encoder INN blocks keep local, high-frequency details with information-preserving transforms.

Qualitative Results

Infrared-visible and medical fusion examples

Quantitative comparison for infrared-visible image fusion
Infrared-visible image fusion: TNO, MSRS, and RoadScene quantitative comparison.
Quantitative comparison for medical image fusion
Medical image fusion: MRI-CT, MRI-PET, and MRI-SPECT quantitative comparison.

Downstream Recognition

Fusion images that help perception models

Multi-modal object detection comparison after fusion
Object detection benefits from clearer thermal targets and visible-scene context.
Multi-modal semantic segmentation comparison after fusion
Semantic segmentation improves from sharper boundaries and more informative fused images.

Release

Everything needed to reproduce CDDFuse

Pretrained Weights IVF and MIF checkpoints are included in the repository.
Training Code Two-stage decomposition and fusion training scripts are public.
Testing Scripts Paper-level IVF and MIF metrics can be reproduced directly.
Open GitHub Repository

Citation

BibTeX

@InProceedings{Zhao_2023_CVPR,
  author    = {Zhao, Zixiang and Bai, Haowen and Zhang, Jiangshe and Zhang, Yulun and Xu, Shuang and Lin, Zudi and Timofte, Radu and Van Gool, Luc},
  title     = {CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2023},
  pages     = {5906-5916}
}