Treffer: NaturalL2S: End-to-end high-quality multispeaker lip-to-speech synthesis with differential digital signal processing.
Weitere Informationen
Recent advancements in visual speech recognition (VSR) have promoted progress in lip-to-speech synthesis, where pre-trained VSR models enhance the intelligibility of synthesized speech by providing valuable semantic information. The success achieved by cascade frameworks, which combine pseudo-VSR with pseudo-text-to-speech (TTS) or implicitly utilize the transcribed text, highlights the benefits of leveraging VSR models. However, these methods typically rely on mel-spectrograms as an intermediate representation, which may introduce a key bottleneck: the domain gap between synthetic mel-spectrograms, generated from inherently error-prone lip-to-speech mappings, and real mel-spectrograms used to train vocoders. This mismatch inevitably degrades synthesis quality. To bridge this gap, we propose Natural Lip-to-Speech (NaturalL2S), an end-to-end framework that jointly trains the vocoder with the acoustic inductive priors. Specifically, our architecture introduces a fundamental frequency (F0) predictor to explicitly model prosodic variations, where the predicted F0 contour drives a differentiable digital signal processing (DDSP) synthesizer to provide acoustic priors for subsequent refinement. Notably, the proposed system achieves satisfactory performance on speaker similarity without requiring explicit speaker embeddings. Both objective metrics and subjective listening tests demonstrate that NaturalL2S significantly enhances synthesized speech quality compared to existing state-of-the-art methods. Audio samples are available on our demonstration page: https://yifan-liang.github.io/NaturalL2S/.
(Copyright © 2025 Elsevier Ltd. All rights reserved.)
Declaration of competing interest The authors declared that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.