Experience Voice AI in Action
Try our end-to-end speech synthesis technology. Upload a voice sample, enter your text, and hear the AI generate natural speech that matches your voice.
Note
This demo runs on Hugging Face Spaces and may have queue times during peak hours. For production use, deploy your own instance.
Audio Demo Samples
Explore pre-recorded examples showcasing FunAudioChat's capabilities
Fun-Audio-Chat demonstrates strong empathy capabilities, understanding emotional context in voice interactions and responding with appropriate empathetic expressions without explicit emotional markers. The model can naturally detect and respond to user emotions through Paralinguistic cues such as tone, pace, and prosody.
The model understands emotional context from the content of speech and responds with appropriate empathetic expressions.
Anger (Chinese)
Anxiety/Fear (Chinese)
Joy/Excitement (Chinese)
Sadness/Disappointment (Chinese)
Anger (English)
Anxiety/Fear (English)
Joy/Excitement (English)
Sadness/Disappointment (English)
The model detects emotions from voice characteristics like tone, pace, and prosody, responding differently to the same text based on how it's spoken.
Case 1 - Positive Emotion (Input)
Case 1 - Positive Response
Case 1 - Negative Emotion (Input)
Case 1 - Negative Response
Case 2 - Positive Emotion (Input)
Case 2 - Positive Response
Case 2 - Negative Emotion (Input)
Case 2 - Negative Response
Case 3 - Positive Emotion (Input)
Case 3 - Positive Response
Case 3 - Negative Emotion (Input)
Case 3 - Negative Response
Case 4 - Positive Emotion (Input)
Case 4 - Positive Response
Case 4 - Negative Emotion (Input)
Case 4 - Negative Response
Case 5 - Positive Emotion (Input)
Case 5 - Positive Response
Case 5 - Negative Emotion (Input)
Case 5 - Negative Response
The model maintains emotional understanding across multiple conversation turns, adapting responses based on evolving emotional cues.
Happy Scenario - Turn 1 (Input)
Happy Scenario - Turn 1 (Response)
Happy Scenario - Turn 2 (Input)
Happy Scenario - Turn 2 (Response)
Sarcastic Scenario - Turn 1 (Input)
Sarcastic Scenario - Turn 1 (Response)
Sarcastic Scenario - Turn 2 (Input)
Sarcastic Scenario - Turn 2 (Response)
How to Use
Follow these simple steps to try the voice cloning demo
Upload Reference Audio
Record or upload a clear audio sample of the voice you want to clone. 3-10 seconds works best.
Enter Reference Text
Type the exact words spoken in your reference audio. This helps the AI understand the voice patterns.
Generate Speech
Enter new text you want to hear in the cloned voice, then click Submit to generate the audio.
Demo Capabilities
Voice Cloning
Clone any voice with just a few seconds of audio
Natural Speech
Generate human-like speech with proper intonation
Fast Processing
Get results in seconds with GPU-accelerated inference
Ready to Integrate?
Learn how to use FunAudioChat in your own projects