Live ZONOS2 Space

ZONOS2 playground, install guide, and comparison hub

ZONOS2 TTS: Try Zyphra's Open-Weight Voice Cloning Model

Generate expressive multilingual speech, test English, Japanese, and Mandarin Chinese prompts, compare samples, and learn how to run ZONOS2 locally with CUDA, API examples, and real-world tradeoffs.

Try Demo Compare Voices Install Locally View API Example

8Btotal parameters

900Mactive parameters

6M+training audio hours

Tier 1English, Mandarin, Japanese

Voice cloning workflow

ZONOS2 Voice Cloning Demo

Start with the embedded demo, then test a short script with a clean, consented reference voice. For production, route generation through Zyphra Cloud or your own local inference server.

Prompt textVoice preset

Choose a preset to plan your ZONOS2 test.

Listen before you decide

Audio Sample Gallery

Real TTS decisions are made by listening. Use these lanes to compare ZONOS2 against managed and open-source alternatives.

Narrator

Clean explainer voice for product videos, docs, and YouTube scripts.

Japanese Dub

A test lane for anime-style dialogue, visual novels, and localization.

Podcast Host

Conversational pacing for intros, ad reads, and long-form narration.

Game NPC

Short expressive lines for quests, combat barks, and prototypes.

Quick facts

What Is ZONOS2?

ZONOS2 is Zyphra's real-time text-to-speech model focused on expressive multilingual speech and high-fidelity voice cloning. Public sources describe a sparse MoE model with 8B total parameters, 900M active parameters, and training on more than 6M hours of speech.

System requirements

Can My GPU Run ZONOS2?

Local inference is aimed at Linux x86_64 with NVIDIA CUDA. Use this quick checker to choose local, WSL2, or cloud GPU.

Copyable setup

How to Install ZONOS2 Locally

The shortest official path is Linux plus NVIDIA CUDA. Windows users should consider WSL2 only if they are comfortable debugging GPU passthrough.

Linux CUDA

git clone https://github.com/Zyphra/ZONOS2.git
cd ZONOS2
uv sync
uv run python -m minisgl --model-path Zyphra/ZONOS2 --tts-default-voices-dir ./default_voices/

Generate Speech

curl -X POST http://localhost:1919/tts/generate \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello from ZONOS2","stream":true}' \
  --output output.pcm
ffmpeg -f f32le -ar 44100 -ac 1 -i output.pcm output.wav

WSL2 Route

wsl --install
# Install an NVIDIA driver with WSL CUDA support on Windows.
# Inside Ubuntu on WSL2:
nvidia-smi
uv --version
# Then follow the Linux CUDA commands.

Comparison

ZONOS2 vs ElevenLabs

Dimension	ZONOS2	ElevenLabs
Best fit	Open-weight TTS experiments, self-hosting, voice-clone research, API wrappers	Managed creator and business voice production
Control	High if you run the model or local server yourself	Lower, but easier for non-technical teams
Setup	Linux x86_64, NVIDIA CUDA, uv, local server on port 1919	Browser-first SaaS workflow
Cost model	GPU time, hosting, maintenance, and engineering effort	Subscription or usage-based billing
Voice cloning	Strong focus on high-fidelity and naturalistic voice cloning	Polished voice library, cloning flows, and creator UX
Commercial risk	Verify model weights, code license, third-party components, and usage rights	Review platform terms, voice rights, and usage policy

Read full comparison

Tier 1 languages

ZONOS2 for Japanese, Chinese, and English

Japanese voice cloning

Use clean reference audio and short scripts first. Japanese is a strong long-tail page because users search for anime dubbing, game dialogue, and localization workflows.

Mandarin Chinese TTS

Mandarin Chinese is listed as Tier 1 in official language support, so it deserves first-class examples instead of being hidden in a generic language list.

English narration

English remains the main comparison lane against ElevenLabs, Cartesia, Fish Audio, Qwen, Kokoro, and Chatterbox.

Developer lane

ZONOS2 API Examples

After the local server starts, the default endpoint accepts generation requests on localhost port 1919. Keep early tests short, then add chunking for long scripts.

curl -X POST http://localhost:1919/tts/generate \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello world","stream":true}' \
  --output output.pcm

Troubleshooting

ZONOS2 Troubleshooting

CUDA mismatch

Check nvidia-smi, driver version, CUDA toolkit, and whether the toolkit matches the runtime expected by your environment.

Server not opening

Confirm the server started on http://localhost:1919 and that another process is not using the same port.

Long text fails

Split long scripts by sentence, generate chunks, apply fade-out only when needed, and stitch audio after checking pacing.

Homepage content map

30 ZONOS2 Sections to Keep Users Moving

Try ZONOS2 Online

Use the embedded ZONOS2 Space first. Users should hear or test the model before reading a long article.

Audio Sample Gallery

Compare ZONOS2, ElevenLabs, Kokoro, Chatterbox, Fish Audio, Qwen, and Cartesia by use case and workflow.

ZONOS2 Voice Cloning Demo

Explain clean reference audio, consent, sample length, speaker similarity, and safe cloning boundaries.

ZONOS2 Quick Facts

Surface 8B total parameters, 900M active parameters, 6M+ hours of audio, and Tier 1 language support.

Can My GPU Run ZONOS2?

Give visitors a fast GPU, OS, and VRAM answer before they lose time on CUDA setup.

Install ZONOS2 Locally

Split commands for Linux, WSL2, and cloud GPU paths with copyable snippets.

ZONOS2 vs ElevenLabs

Capture comparison intent from buyers who need cost, quality, hosting, and license tradeoffs.

Japanese and Chinese Voice Cloning

Highlight Tier 1 Mandarin Chinese and Japanese support as the long-tail advantage.

ZONOS2 API Examples

Show REST and Python entry points so developers can bookmark the page.

ZONOS2 Troubleshooting

Answer CUDA, uv, port 1919, long text, and audio conversion issues.

What Is ZONOS2?

Define it as Zyphra's real-time open-weight TTS model with high-fidelity cloning.

Why ZONOS2 Matters

Explain why MoE, larger data scale, and voice cloning fidelity make the launch important.

Core Parameters

Turn architecture details into scannable cards instead of dense prose.

Supported Languages

List Tier 1, Tier 2, and Tier 3 language expectations with realistic quality notes.

Stable vs Expressive Mode

Explain when users should prefer clean output or faithful voice-clone output.

RunPod and Cloud GPU Setup

Give users without a local NVIDIA machine a practical route.

REST API Example

Show the localhost 1919 endpoint and output conversion workflow.

Python API Example

Show the offline inference path for developers who do not want a server.

Long Text Chunking

Explain sentence splitting, fade-out, pacing, and post-processing.

Voice Sample Recording Guide

Teach users to record a single speaker with low noise and clear consent.

Best Settings

Summarize temperature, top-k, speaking rate, and seed tradeoffs.

CUDA Error Fixes

Handle driver mismatch, missing toolkit, and Linux-only assumptions.

YouTube Voiceover Use Case

Map the model to creators making narration, Shorts, and localization.

Game Character Voice Use Case

Map the model to NPC dialogue, prototypes, and mod tools.

Podcast and Audiobook Use Case

Explain long-form narration expectations and editing workflow.

Commercial Use and License Risk

Tell users to verify model weights and inference code license before production.

Responsible Voice Cloning

Call out impersonation, private voices, public figures, scams, and platform policy risk.

FAQ

Answer free demo, API, Windows, WSL2, Japanese, Chinese, and ElevenLabs questions.

Newsletter and Changelog

Invite users to follow ZONOS2 demos, fixes, and benchmark updates.

Open-Source TTS Hub Expansion

Grow beyond one model into comparisons and troubleshooting for open voice tools.

Commercial use

ZONOS2 License and Commercial Use

Zyphra's launch post and Hugging Face model page present Apache-2.0 licensing for the model, while the GitHub repository page currently presents code-side MIT license signals. Treat model weights, inference code, third-party notices, and generated voice rights as separate checks before commercial deployment.

Multilingual ZONOS2 workflow

ZONOS2 Multilingual TTS for English, Japanese, and Mandarin Chinese

This Home page targets users searching for ZONOS2 multilingual TTS, ZONOS2 voice cloning, ZONOS2 Japanese voice cloning, ZONOS2 Mandarin Chinese speech, and ZONOS2 English narration. Keep tests short, compare language output side by side, and verify consent before using any cloned voice.

English ZONOS2 narration

Use ZONOS2 TTS for product explainers, YouTube voiceovers, podcasts, API demos, and developer documentation where clear English pacing matters.

Japanese ZONOS2 voice cloning

Use ZONOS2 Japanese TTS for anime-style dialogue tests, game character lines, VTuber scripts, localization drafts, and language-learning examples.

Mandarin Chinese ZONOS2 TTS

Use ZONOS2 Mandarin Chinese speech for bilingual demos, creator narration, app onboarding, education content, and Chinese voice cloning experiments.

API

Multilingual API planning

Store language, reference voice, prompt text, consent status, and output settings together so every ZONOS2 multilingual generation is traceable.

FAQ

ZONOS2 FAQ

Is ZONOS2 free to try?

The model can be explored through Zyphra Cloud during its launch period and through community Spaces. Free access can change, so check the official provider before relying on it for production.

Can I run ZONOS2 locally?

Yes, but the official local path targets Linux x86_64 with an NVIDIA GPU and a CUDA toolkit matching your driver.

Does ZONOS2 support Japanese and Chinese?

Yes. Official model cards list English, Mandarin Chinese, and Japanese as Tier 1 languages.

Is ZONOS2 an ElevenLabs alternative?

It can be an alternative for developers who want open-weight control and self-hosting. ElevenLabs remains stronger for polished SaaS workflows and managed production UX.

Can I clone any voice?

No. Clone only voices you own or have permission to use. Do not impersonate private people, public figures, or copyrighted characters without the right to do so.