Multi-interaction TTS toward professional recording reproduction

Anonymous submission to ISCA SSW13 (2025)

Contents

Detail of compared methods

The style refining process over a series of sessions is defined as the "direction-cycle". The following table provides a clear understanding of I/O for the both comparative methods Actor-Guided(oracle) and Iterative(ours).

Conditions Session 1 Session 2 Session 3
Input Output Input Output Input Output
Speech prompt Direction 1 Speech 1 Speech prompt Direction 2 Speech 2 Speech prompt Direction 3 Speech 3
Actor-Guided(oracle) Recorded-1 Guided-1 Recored-2 Guided-2 Recorded-3 Guided-3
Iterative (ours) Recorded-0 Iterative-1 Iterative-1 Iterative-2 Iterative-2 Iterative-3

Speech samples (Comparison on iterative style refinement)

If you are having trouble listening to the audios, try refreshing the page.

Sample1 (Female, sadness style group)

Recorded-0 (Audio before being refined)
Transcription: Our tennis club advisor is pretty strict, you know. (Japanase sentence: テニス部の顧問が厳しいんだよね。 )

Direction-cycle1: Please begin with a slight exhale. To convey a tired feeling, add a bit of breath to your voice at the very start of your line and speak with a softer volume. (Japanase sentence: 少し息を吐くように始めてください。疲れた感じが伝わるように、発話の冒頭で声に少し息を混ぜて、声量を抑えて話しましょう。)

Recorded-1(refinement target) Guided-1 Iterative-1 Iterative-0(speech prompt of Iterative-1)

Direction-cycle2: Let’s slow the tempo down a bit more. By inserting slight pauses between words and reducing your speaking pace, you’ll let the weariness come through in your voice. (Japanase sentence: テンポをもう少しゆっくりにしましょう。言葉の間に少し間を置いて、話すテンポを落とすことで、疲れが声に表れるようにしましょう。)

Recorded-2(refinement target) Guided-2 Iterative-2 Iterative-1(speech prompt of Iterative-2)

Direction-cycle3: Let’s lower your endings. Weaken the ends of your phrases and speak as though you’re feeling a bit drained. (Japanase sentence: 語尾を落としましょう。フレーズの最後を弱くし、少し力が抜けたような感じで話してみてください。)

Recorded-3(refinement target) Guided-3 Iterative-3 Iterative-2(speech prompt of Iterative-3)

Sample2 (Female, fear style group)

Recorded-0 (Audio before being refined)
Transcription: Would you mind spending time with me? (Japanase sentence: 私と一緒にいるの、嫌かな…?)

Direction-cycle1: Let’s lower your voice slightly and speak a little more slowly to evoke a calm, sultry quality. (Japanase sentence: 声を少し低めにし、少しゆっくり話すことで、落ち着いた色っぽさを出してみてください。)

Recorded-1(refinement target) Guided-1 Iterative-1 Iterative-0(speech prompt of Iterative-1)

Direction-cycle2: Please use a soft, breath-filled tone at a subdued volume. Pay special attention to gently lengthening the ends of your phrases. (Japanase sentence: 息を含んだ柔らかい口調で、抑えめな音量にしてみてください。特に語尾はやさしく伸ばすように意識してください。)

Recorded-2(refinement target) Guided-2 Iterative-2 Iterative-1(speech prompt of Iterative-2)

Direction-cycle3: Let’s leave a bit of lingering resonance in your voice, speak as if you’re directly addressing someone, and take slightly longer pauses throughout. (Japanase sentence: 声に少し余韻を残すように、語りかける感じで全体的にもう少し間をとってみましょう。)

Recorded-3(refinement target) Guided-3 Iterative-3 Iterative-2(speech prompt of Iterative-3)

Sample3 (Male, anger style group)

Recorded-0 (Audio before being refined)
Transcription: I never thought you wouldn’t even be able to handle basic reporting and communication. (Japanase sentence: 報連相すらできないとは思わなくてね。)

Direction-cycle1: Let’s lower your pitch just a bit and speak in a more composed tone. Overall, aim to convey a nuance of calmly stating your lines. (Japanase sentence: 声の高さをほんの少し低くして、少し落ち着いたトーンで話してください。全体として、冷静に言い放つようなニュアンスに近づけましょう。)

Recorded-1(refinement target) Guided-1 Iterative-1 Iterative-0(speech prompt of Iterative-1)

Direction-cycle2: Let’s make each phrase end cleanly to keep your delivery consistent. Be especially careful not to let your endings become vague. (Japanase sentence: 各語句の最後をはっきりと切るようにして、言葉に一貫性を持たせてください。特に、語尾が曖昧にならないように気をつけてください。)

Recorded-2(refinement target) Guided-2 Iterative-2 Iterative-1(speech prompt of Iterative-2)

Direction-cycle3: Let’s slow the pace down slightly and give each word a solid weight. Overall, balance your delivery so that the emphasis is distributed evenly. (Japanase sentence: 速度をややゆっくりにして、一言一言をしっかりと重みを持たせるように発話してください。全体として、強調する部分が均等になるように調整しましょう。)

Recorded-3(refinement target) Guided-3 Iterative-3 Iterative-2(speech prompt of Iterative-3)

Sample4 (Male, surprise style group)

Recorded-0 (Audio before being refined)
Transcription: I’ve stumbled across quite the bargain. (Japanase sentence: なかなかの掘り出し物を見つけてしまった。)

Direction-cycle1: Begin at the start of the sentence and gradually raise your voice as you go. To convey a rising excitement, finish the last word at your highest pitch. (Japanase sentence: 文の始まりから終わりにかけて、声を少しずつ上げていってください。高揚感を出すために、最後の単語を一番高めの声で終えてください。)

Recorded-1(refinement target) Guided-1 Iterative-1 Iterative-0(speech prompt of Iterative-1)

Direction-cycle2: At the very start of your line, take a slight inhale to create a subtly breathless effect. This will intensify the feeling of excitement. (Japanase sentence: 発話の最初で少し息を吸い込むようにして、少し息苦しい感じを演出してください。これによって、興奮した感じが強調されます。)

Recorded-2(refinement target) Guided-2 Iterative-2 Iterative-1(speech prompt of Iterative-2)

Direction-cycle3: Let’s pick up the overall speaking speed. In particular, rush slightly in the second half to highlight the excitement of having found it. (Japanase sentence: 発話速度を全体的に速めてみてください。特に、後半の部分は少し急ぎながら話すことで、見つけた時の喜びを強調します。)

Recorded-3(refinement target) Guided-3 Iterative-3 Iterative-2(speech prompt of Iterative-3)