Towards Voice Protection: Adversarial Perturbations as a Defensive Mechanism

framework
( Voice Cloning is a technique that synthesizes speech in the style of a target speaker by learning their unique vocal characteristics, such as timbre and intonation, based on input text. It is widely applied in scenarios like virtual assistants and personalized audio broadcasting. In contrast, Voice Conversion transforms the acoustic features of a source speech while preserving its linguistic content, making it sound as if spoken by a target speaker. This is often used in voice style transfer or dubbing adaptation. The primary distinction lies in the input: voice cloning requires text as input, whereas voice conversion operates on source speech. )

Voice Cloning

Original Input Audio 1

Original Output Audio 1

The Audio with Perturbation 1 (SneakyVoice 1)

Perturbed Result of SneakyVoice 1

Original Input Audio 2

Original Output Audio 2

The Audio with Perturbation 2 (SneakyVoice 2)

Perturbed Result of SneakyVoice 2

Original Input Audio 3

Original Output Audio 3

The Audio with Perturbation 3 (SneakyVoice 3)

Perturbed Result of SneakyVoice 3

Original Input Audio 4

Original Output Audio 4

The Audio with Perturbation 4 (SneakyVoice 1)

Perturbed Result of SneakyVoice 4

Original Input Audio 5

Original Output Audio 5

The Audio with Perturbation 5 (SneakyVoice 5)

Perturbed Result of SneakyVoice 5

Voice Conversion

Voice Conversion Reference Text Audio

Original Input Audio 1

Original Output Audio 1

The Audio with Perturbation 1 (SneakyVoice 1)

Perturbed Result of SneakyVoice 1

Original Input Audio 2

Original Output Audio 2

The Audio with Perturbation 2 (SneakyVoice 2)

Perturbed Result of SneakyVoice 2

Original Input Audio 3

Original Output Audio 3

The Audio with Perturbation 3 (SneakyVoice 3)

Perturbed Result of SneakyVoice 3

Original Input Audio 4

Original Output Audio 4

The Audio with Perturbation 4 (SneakyVoice 4)

Perturbed Result of SneakyVoice 4

Original Input Audio 5

Original Output Audio 5

The Audio with Perturbation 5 (SneakyVoice 5)

Perturbed Result of SneakyVoice 5