*Result*: Digs: diffusion-guided Gaussian Splatting for dynamic occlusion surgical scene reconstruction.

Title:
Digs: diffusion-guided Gaussian Splatting for dynamic occlusion surgical scene reconstruction.
Authors:
Luo H; Shenzhen Institute of Information Technology, Shenzhen, China.; Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China., Nan X; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China., Yang J; Sun Yat-Sen University, Shenzhen, China., Wang C; Shenzhen Research Institute of Big Data, Shenzhen, China., Zhang T; Guilin University of Electronic Technology, Guilin, China., Fan Y; The Third Affiliated Hospital of Southern Medical University, Guangzhou, China., Jia F; Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China. fc.jia@siat.ac.cn., Zhang Q; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China. cszhangq@scut.edu.cn.
Source:
International journal of computer assisted radiology and surgery [Int J Comput Assist Radiol Surg] 2026 Jan 30. Date of Electronic Publication: 2026 Jan 30.
Publication Model:
Ahead of Print
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Springer Country of Publication: Germany NLM ID: 101499225 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1861-6429 (Electronic) Linking ISSN: 18616410 NLM ISO Abbreviation: Int J Comput Assist Radiol Surg Subsets: MEDLINE
Imprint Name(s):
Original Publication: Heidelberg : Springer
References:
Luo H, Yin D, Zhang S, Xiao D, He B, Meng F, Zhang Y, Cai W, He S, Zhang W (2020) Augmented reality navigation for liver resection with a stereoscopic laparoscope. Comput Methods Programs Biomed 187:105099. (PMID: 10.1016/j.cmpb.2019.10509931601442)
Ramalhinho J, Yoo S, Dowrick T, Koo B, Somasundaram M, Gurusamy K, Hawkes DJ, Davidson B, Blandford A, Clarkson MJ (2023) The value of augmented reality in surgery–a usability study on laparoscopic liver surgery. Med Image Anal 90:102943. (PMID: 10.1016/j.media.2023.1029433770367510958137)
Saikia A, Di Vece C, Bonilla S, He C, Magbagbeola M, Mennillo L, Czempiel T, Bano S, Stoyanov D (2025) Robotic arm platform for multi-view image acquisition and 3d reconstruction in minimally invasive surgery. IEEE Robotics and Automation Letters.
Espinel Y, Rabbani N, Bui TB, Ribeiro M, Buc E, Bartoli A (2024) Keyhole-aware laparoscopic augmented reality. Med Image Anal 94:103161. (PMID: 10.1016/j.media.2024.10316138574543)
Long Y, Li Z, Yee CH, Ng CF, Taylor RH, Unberath M, Dou Q (2021) E-dssr: efficient dynamic surgical scene reconstruction with transformer-based stereoscopic depth perception. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 415–425. Springer.
Liu Y, Li C, Liu H, Yang C, Yuan Y (2025) Foundation model-guided gaussian splatting for 4d reconstruction of deformable tissues. IEEE Transactions on Medical Imaging.
Wang Y, Long Y, Fan SH, Dou Q (2022) Neural rendering for stereo 3d reconstruction of deformable tissues in robotic surgery. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 431–441. Springer.
Schonberger JL, Frahm J-M (2016) Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113.
Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R (2021) Nerf: representing scenes as neural radiance fields for view synthesis. Commun ACM 65(1):99–106. (PMID: 10.1145/3503250)
Kerbl B, Kopanas G, Leimkühler T, Drettakis G (2023) 3d gaussian splatting for real-time radiance field rendering. ACM Trans Graph 42(4):139–1. (PMID: 10.1145/3592433)
Yang S, Li Q, Shen D, Gong B, Dou Q, Jin Y (2024) Deform3dgs: flexible deformation for fast surgical scene reconstruction with gaussian splatting. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 132–142. Springer.
Huang Y, Cui B, Bai L, Guo Z, Xu M, Islam M, Ren H (2024) Endo-4dgs: endoscopic monocular scene reconstruction with 4d gaussian splatting. In: International Conference on Medical Image Computing and Computer-Assisted Intervention., pp. 197–207. Springer.
Li C, Feng BY, Liu Y, Liu H, Wang C, Yu W, Yuan Y (2024) Endosparse: real-time sparse view synthesis of endoscopic scenes using gaussian splatting. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 252–262. Springer.
Hayoz M, Hahne C, Gallardo M, Candinas D, Kurmann T, Allan M, Sznitman R (2023) Learning how to robustly estimate camera pose in endoscopic videos. Int J Comput Assist Radiol Surg 18(7):1185–1192. (PMID: 10.1007/s11548-023-02919-w3718476810329609)
Bobrow TL, Golhar M, Vijayan R, Akshintala VS, Garcia JR, Durr NJ (2023) Colonoscopy 3d video dataset with paired depth from 2d–3d registration. Med Image Anal 90:102956. (PMID: 10.1016/j.media.2023.1029563771376410591895)
Stoyanov D, Scarzanella MV, Pratt P, Yang G-Z (2010) Real-time stereo reconstruction in robotically assisted minimally invasive surgery. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 275–282. Springer.
Garg R, Bg VK, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: geometry to the rescue. In: European Conference on Computer Vision, pp. 740–756. Springer.
Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279.
Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858.
Shao S, Pei Z, Chen W, Zhu W, Wu X, Sun D, Zhang B (2022) Self-supervised monocular depth and ego-motion estimation in endoscopy: appearance flow to the rescue. Med Image Anal 77:102338. (PMID: 10.1016/j.media.2021.10233835016079)
Li Z, Liu X, Drenkow N, Ding A, Creighton FX, Taylor RH, Unberath M (2021) Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6197–6206.
Godard C, Mac Aodha O, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838 (2019).
Yang L, Kang B, Huang Z, Xu X, Feng J, Zhao H (2024) Depth anything: unleashing the power of large-scale unlabeled data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10371–10381.
Cui B, Islam M, Bai L, Wang A, Ren H (2024) Endodac: efficient adapting foundation model for self-supervised depth estimation from any endoscopic camera. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 208–218. Springer.
Cui B, Islam M, Bai L, Ren H (2024) Surgical-dino: adapter learning of foundation models for depth estimation in endoscopic surgery. Int J Comput Assist Radiol Surg 19(6):1013–1020. (PMID: 10.1007/s11548-024-03083-53845940211178563)
Sheikh Zeinoddin M, Hoque MI, Tandogdu Z, Shaw GL, Clarkson MJ, Mazomenos EB, Stoyanov D (2025) Endo-fast3r: endoscopic foundation model adaptation for structure from motion. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 117–126. Springer.
Lou A, Li Y, Zhang Y, Noble J (2025) Surgical depth anything: depth estimation for surgical scenes using foundation models. In: Medical Imaging 2025: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 13408, pp. 77–82. SPIE.
Oquab M, Darcet T, Moutakanni T, Vo H, Szafraniec M, Khalidov V, Fernandez P, Haziza D, Massa F, El-Nouby A et al (2023) Dinov2: learning robust visual features without supervision. arXiv preprint arXiv:2304.07193.
Zha R, Cheng X, Li H, Harandi M, Ge Z (2023) Endosurf: Neural surface reconstruction of deformable tissues with stereo endoscope videos. In: International Conference on Medical Image Computing and Computer-Assisted Intervention., pp. 13–23. Springer.
Zhu L, Wang Z, Cui J, Jin Z, Lin G, Yu L (2024) Endogs: Deformable endoscopic tissues reconstruction with gaussian splatting. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 135–145. Springer.
Xie W, Yao J, Cao X, Lin Q, Tang Z, Dong X, Guo X (2024) Surgicalgaussian: deformable 3d gaussians for high-fidelity surgical scene reconstruction. In: International Conference on Medical Image Computing and Computer-Assisted Intervention., pp. 617–627. Springer.
Guo J, Wang J, Kang D, Dong W, Wang W, Liu Y-h (2024) Free-surgs: Sfm-free 3d gaussian splatting for surgical scene reconstruction. In: International Conference on Medical Image Computing and Computer-Assisted Intervention., pp. 350–360. Springer.
Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33:6840–6851.
Haario H, Saksman E, Tamminen J (2001) An adaptive metropolis algorithm. Bernoulli 7(2):223–242. (PMID: 10.2307/3318737)
Lugmayr A, Danelljan M, Romero A, Yu F, Timofte R, Van Gool L (2022) Repaint: inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition., pp. 11461–11471.
Cohen I, Huang Y, Chen J, Benesty J, Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. Noise Reduction in Speech Processing., 1–4.
Yu T, Feng R, Feng R, Liu J, Jin X, Zeng W, Chen Z (2023) Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790.
Wu J, Li X, Si C, Zhou S, Yang J, Zhang J, Li Y, Chen K, Tong Y, Liu Z (2024) Towards language-driven video inpainting via multimodal large language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12501–12511.
Grant Information:
2024A0505040020 Guangdong Science and Technology Program; D2402006 Shenzhen Medical Research Fund; JCYJ20220531100614032 Shenzhen Science and Technology Innovation Program; SZIIT2025KJ012, SZIIT2025KJ016 Shenzhen Municipal Key Laboratory of Neuropsychiatric Modulation, Chinese Academy of Sciences
Contributed Indexing:
Keywords: Diffusion model; Gaussian Splatting; Instrument occlusion; Surgical scene reconstruction
Entry Date(s):
Date Created: 20260130 Latest Revision: 20260130
Update Code:
20260130
DOI:
10.1007/s11548-026-03571-w
PMID:
41615560
Database:
MEDLINE

*Further Information*

*Purpose: Accurate 3D reconstruction from endoscopic videos is crucial for advancing computer-assisted minimally invasive surgery. However, existing approaches struggle with dynamic surgical scenes where instrument occlusions cause significant reconstruction artifacts. Although 3D Gaussian Splatting (3DGS) enables rapid reconstruction, it often suffers from incomplete surface recovery due to occlusion-induced missing regions and error propagation from suboptimal initial point clouds during radiance field optimization. This study aims to enhance reconstruction accuracy in dynamically occluded surgical environments.
Methods: We propose a diffusion-guided Gaussian Splatting (DiGS) framework comprising two key components: (1) a diffusion-guided surface completion network that incorporates surgical scene priors to restore high-fidelity textures in occluded regions, improving surface completeness; and (2) a lightweight annealed smoothing mechanism designed to mitigate endoscope motion estimation errors, ensuring temporal coherence during continuous frame interpolation and stabilizing radiance field optimization.
Results: Extensive experiments on the EndoNeRF and StereoMIS datasets demonstrate the superiority of DiGS over state-of-the-art baselines. On EndoNeRF, DiGS achieves a 61.75% improvement in LPIPS, indicating stronger perceptual alignment in dynamically occluded scenes. On StereoMIS, DiGS delivers an 7.03% PSNR gain and a 40.79% LPIPS improvement, along with consistently higher SSIM scores confirming superior preservation of structural details.
Conclusion: The proposed DiGS framework effectively addresses the challenges of dynamic occlusions and motion-induced errors in surgical scene reconstruction, producing more accurate and temporally coherent 3D models. The code is publicly available at https://github.com/IGSResearch/DiGS .
(© 2026. CARS.)*

*Declarations. Conflict of interest: None. Ethical approval and consent to participate: This article does not contain any studies with human participants or animals performed by any of the authors. Informed consent: This article does not contain patient data.*