Samples
Performance comparison on zero-shot TTS
Female
| Reference speech | Enhanced reference speech | Without fine-tuning | without adapters | With bottleneck and CNN adapters | Original | |
|---|---|---|---|---|---|---|
| clean | ||||||
| 10 dB | ||||||
| -5 dB |
| Reference speech | Enhanced reference speech | Without fine-tuning | without adapters | With bottleneck and CNN adapters | Original | |
|---|---|---|---|---|---|---|
| clean | ||||||
| 10 dB | ||||||
| -5 dB |
| Reference speech | Enhanced reference speech | Without fine-tuning | without adapters | With bottleneck and CNN adapters | Original | |
|---|---|---|---|---|---|---|
| clean | ||||||
| 10 dB | ||||||
| -5 dB |
Male
| Reference speech | Enhanced reference speech | Without fine-tuning | without adapters | With bottleneck and CNN adapters | Original | |
|---|---|---|---|---|---|---|
| clean | ||||||
| 10 dB | ||||||
| -5 dB |
| Reference speech | Enhanced reference speech | Without fine-tuning | without adapters | With bottleneck and CNN adapters | Original | |
|---|---|---|---|---|---|---|
| clean | ||||||
| 10 dB | ||||||
| -5 dB |
| Reference speech | Enhanced reference speech | Without fine-tuning | without adapters | With bottleneck and CNN adapters | Original | |
|---|---|---|---|---|---|---|
| clean | ||||||
| 10 dB | ||||||
| -5 dB |