Skip to content

Voice Clone Guard: Few-Shot Deepfake Detection with XLS-R Embeddings and Gaussian Process Calibration

We present Voice Clone Guard, a few-shot voice
deepfake detector that combines XLS-R embeddings with Gaussian Process (GP) classification for superior detection accuracy and probability calibration. Our approach leverages selfsupervised speech representations from Wav2Vec2-XLS-R and
employs GP inference to provide well-calibrated uncertainty
estimates. Evaluation across synthetic and real-world datasets
demonstrates significant improvements: 95.6% AUC (vs 78.2%
MFCC baseline), 4.2% Equal Error Rate (vs 18.5%), and
substantially better calibration with Expected Calibration Error
of 0.032 (vs 0.127). The method excels in few-shot scenarios,
achieving 85.2% accuracy with only 4 examples per class, making
it practical for deployment with limited training data.
Index Terms—Voice deepfake detection, few-shot learning,
XLS-R embeddings, Gaussian processes, probability calibration,
speech forensics

Facebook

[PDF] WESPER: Zero-shot and Realtime Whisper to Normal Voice … – arXiv

It achieves user-independent voice conversion in real time. ABSTRACT. Recognizing whispered speech and converting it to normal speech creates …

arxiv.org

A Survey of Context Engineering for Large Language Models – arXiv

This survey introduces Context Engineering, a formal discipline that transcends simple prompt design to encompass the systematic optimization of information …

arxiv.org

The Relational Origins of Rules in Online Communities – arXiv

One of the original questions motivating our study was “Why do so many online communities have similar rules?” In the previous section, we …

arxiv.org

[PDF] arXiv:2402.01662v4 [cs.CY] 12 Dec 2024

Another key expansion to the notion of griefbots is that generative ghosts might exist pre-mortem as a generative clone (an AI agent …

arxiv.org

Understanding Content Moderation Policies and User Experiences …

AI and security researchers use two automated methods to ensure the safety of GAI model output: internal model fine-tuning and external content …

arxiv.org

AI-generated audio deepfakes are increasing. We tested 4 tools …

We tested four free online tools that claim to determine whether an audio clip is AI-generated. Only one of them signaled that the Biden-like robocall was …

politifact.com

Trustworthy-AI-Group/Adversarial_Examples_Papers: A list … – GitHub

A complete list of papers about adversarial examples. It appears that the List of All Adversarial Example Papers has been experiencing crashes over the past …

github.com

[PDF] Decl. ISO Patent Owner’s Mot. to Amend Case No. IPR2023-00693

Detecting deepfake audio through linguistic information. … Benjamin Mood, Debayan Gupta, Henry … The State of Voice Cloning Technology. Federal …

ptacts.uspto.gov

[PDF] ABSTRACT …………………………………………………….. – SSRN

that deepfake porn constituted 98 percent of all deepfake videos … Benjamin S. Sheffner, Senior … from the sound of the voice or simulation of the voice,.

papers.ssrn.com

ORCID

ORCID Please enable JavaScript to continue using this application.

orcid.org

CorentinJ/Real-Time-Voice-Cloning – GitHub

SV2TTS is a deep learning framework in three stages. In the first stage, one creates a digital representation of a voice from a few seconds of audio. In the …

github.com

Best Open Source Voice Cloning if you have lots of reference audio?

I’ve been using ElevenLabs for awhile but now want to self-host. I was really impressed with F5-TTS for its ability to clone using only a few seconds of audio.

reddit.com

xlsr · GitHub Topics

The official implementation of the method discussed in the paper Improving Spoken Language Identification with Map-Mix(work accepted at ICASSP-2023).

github.com

GitHub – cvlab-kaist/GaussianTalker

This is our official implementation of the paper “GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting”

github.com

voice-clone · GitHub Topics

Transform your speech into your favorite celebrity’s or your customized voice. Our cutting-edge tool converts text or any audio into your desired voice.

github.com