RPCANet++: Deep Interpretable Robust PCA for Sparse Object Segmentation

1 UESTC, China
IDIP Lab, SICE
2 Nankai University, China
VCIP Lab, CCS
3 Tsinghua University, China
DA
Failure
Overview of the proposed RPCANet++ architecture. A. Model the given image within a relaxed RPCA scheme and transform it into an unconstrained optimization problem. B. Iteratively solves the model above with closed-form solutions; Consider two high-level issues with corresponding solutions. C. Unfold the solutions in a deep unfolding framework; typically, RPCANet++ are assisted with memory-augmented modules and deep target priors. D. Visual and numerical model verifications via post-hoc techniques present overall interpretability.

Abstract

Robust principal component analysis (RPCA) decomposes an observation matrix into low-rank background and sparse object components. This capability has enabled its application in tasks ranging from image restoration to segmentation. However, traditional RPCA models suffer from computational burdens caused by matrix operations, reliance on finely tuned hyperparameters, and rigid priors that limit adaptability in dynamic scenarios. To solve these limitations, we propose RPCANet++, a sparse object segmentation framework that fuses the interpretability of RPCA with efficient deep architectures. Our approach unfolds a relaxed RPCA model into a structured network comprising a Background Approximation Module (BAM), an Object Extraction Module (OEM), and an Image Restoration Module (IRM). To mitigate inter-stage transmission loss in the BAM, we introduce a Memory-Augmented Module (MAM) to enhance background feature preservation, while a Deep Contrast Prior Module (DCPM) leverages saliency cues to expedite object extraction. Extensive experiments on diverse datasets demonstrate that RPCANet++ achieves state-of-the-art performance under various imaging scenarios. We further improve interpretability via visual and numerical low-rankness and sparsity measurements. By combining the theoretical strengths of RPCA with the efficiency of deep networks, our approach sets a new baseline for reliable and interpretable sparse object segmentation.

How it works

Problem Modeling, Iterative Solving, and Unfolding to the Network

In the context of segmentation orientated RPCA tasks, our objective is to estimate low rank background \(\mathbf{B}\in \mathbb{R}^{m \times n}\) and extract the sparse object matrix \(\mathbf{O}\in \mathbb{R}^{m \times n}\). For an image \(\mathbf{D}\in \mathbb{R}^{m \times n}\), we transform the segmentation model into the following optimization framework: \begin{equation} \min \limits_{\mathbf{B},\mathbf{O}} rank(\mathbf{B}) + \lambda \left\| \mathbf{O} \right\|_0 \quad s.t.~\mathbf{D} = \mathbf{B} + \mathbf{O} \enspace, \label{eq_RPCA} \end{equation} where we signify \(\lambda\) as a trade-off coefficient, and the term \({\left\| \cdot \right\|_0}\) denotes the \(l_0\)-norm, which is defined as the count of non-zero elements within a matrix. However, when facing complex scenarios, the background can exhibit varying degrees of complexity, rendering a solitary nuclear norm or rank function insufficient for encapsulating the practical constraints. Similarly, the sparsity of object elements can vary, making the exclusive use of the \(l_0\) or \(l_1\)-norm potentially inadequate. Consequently, we propose a more generalized formulation of the problem. Here, we employ \(\mathcal{R}(\mathbf{B})\) and \(\mathcal{S}(\mathbf{O})\) as constraints that incorporate prior knowledge of the background and object images, individually: \begin{equation} \min \limits_{\mathbf{B},\mathbf{O}} \mathcal{R}(\mathbf{B}) + \lambda \mathcal{S}(\mathbf{O}) \quad s.t.~\mathbf{D} = \mathbf{B} + \mathbf{O} \enspace. \label{eq_relaxPCP} \end{equation}

This motivate us to solve the above optimization problem in an iterative manner and unfolds the above optimization problem into a deep network as follows:

Marigold training scheme


\(\textbf{RPCANet}^{++}\) framework unfolds iterative model-driven closed-form equations in deep network design and comprises corresponding \(K\) stages. Transmissive elements are presented in different colors: \(\mathbf{D}\) for the restoration image, \(\mathbf{B}\) for the low rank background, \(\mathbf{O}\) for the sparse object matrix, \(\rho\) for the learnable parameter, and \([\mathcal{B}_h,\mathcal{B}_c]\) for the latent background features.

Marigold training scheme
Marigold training scheme

Model Verifications

In the context of deep unfolding, the network is architectured to iteratively yield guided results congruent with an algorithm’s unrolled stages. Demonstrating outcomes at each stage is vital for model validation.

Failure
(a) Typcial sparse object segmentation tasks solved by RPCA methods with overall low-rank background. (b) Datasets utilized in this paper with objects' average area.



Failure
\(\textbf{Heatmaps}\) of different stages' \(\textbf{B}^{k}\) and \(\textbf{O}^{k}\) visualization results (\(K=6\)) of our RPCANet\(^{++}\) on various scenarios from six different datasets (\(\textbf{IRSTD}\), \(\textbf{VS}\), and \(\textbf{DD}\) tasks). We can observe its gradual shaping process via iterative unfolding.


Failure
The impact of different stage index \(K\) on detection efficacy


Failure
\(\textbf{Low-rankness verification}\) of different stage features (1st to 6th) in \(\textbf{(a)}\) RPCANet\(^{++}\), compared to original images. As well as its variants \(\textbf{(b)}\) without MAM or \(\textbf{(c)}\) without DCPM, and the baseline \(\textbf{(d)}\) RPCANet. Verification is conducted on the IRSTD-1K test set. Our RPCANet\(^{++}\) progressively estimates background features satisfying low-rankness, step-by-step, without overestimation. [Zoom in for a better view]


Failure
\(\textbf{Sparsity verification}\) of different stages our RPCANet\(^{++}\) and its variants(without MAM or DCPM) vs RPCANet on IRSTD-1K. \(\textbf{Left:}\) numerical verification. \(\textbf{Right:}\) heatmaps among different stages.

Experimental Comparisons

Performance metrics, including IoU (%), F1 (%), Pd (%), Fa (10-5 ), and runtime are evaluated for various methods on datasets NUDT-SIRST, IRSTD-1K, SIRST, and SIRST-AUG. Parameter statistics for data-driven approach are encapsulated within the second column (Find more details in the main manuscript).

Failure

Infrared Small Target Detection (IRSTD) task

Failure

Vessel Segmentation (VS) task

Failure

Defect Detection (DD) task

Failure

Refer to the arxiv paper link above for more details on qualitative, quantitative, and ablation studies.

Gallery

Heatmap results of different stages' \(\mathbf{O}^{k}\) of our RPCANet++ on various scenarios from the IRSTD task.

Related and Follow-up Works

  • RPCANet, WACV 2024, an early version of our work on the IRSTD task.

Our Talks

CHN Version

ENG Version

Citation

@misc{wu2025rpcanetpp,
        author    = {Wu, Fengyi and Dai, Yimian and Zhang, Tianfang and Ding, Yixuan and Yang, Jian and Peng, Zhenming},
        title     = {RPCANet++: Deep Interpretable Robust PCA for Sparse Object Segmentation},
        year      = {2025},
        primaryClass = {cs.CV}
    }
@InProceedings{Wu_2024_WACV,
        author    = {Wu, Fengyi and Zhang, Tianfang and Li, Lei and Huang, Yian and Chen, Mingming and Peng, Zhenming},
        title     = {RPCANet: Deep Unfolding RPCA Based Infrared Small Target Detection},
        booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
        month     = {January},
        year      = {2024},
        pages     = {4809-4818}
    }