Text-to-Speech (TTS) API Guide

Overview

The audio API provides the speech endpoint to implement the following functions based on the TTS model:

📝 Blog article reading aloud

🌍 Multi-language audio generation

🎵 Real-time audio streaming output

IMPORTANT: It must be stated to the user that what they are hearing is AI-generated speech, not a human voice

Basic usage

Basic example

Features

Audio quality options

tts-1: low latency, suitable for real-time applications

tts-1-hd: 更高质量，可能有更少的静态内容

Available sounds

-alloy
-echo
-fable
-nova
-shimmer

onyx

Supported output formats

Format	Features	Applicable scenarios
MP3	Default format	Common scenes
Opus	Low Latency	Web Streaming and Communications
AAC	Efficient compression	Mobile device playback
FLAC	Lossless compression	Audio archiving
WAV	No compression	Low latency applications
PCM	Raw samples	24kHz, 16-bit signed

Live audio streaming

Supported languages

Supports multiple languages, including:

Asian languages: Chinese, Japanese, Korean, etc.

European languages: English, French, German, etc.

Other languages: Arabic, Hindi, etc.

Note: The current sound is mainly optimized for English

FAQ

Q: How to control the emotion of generated audio?

A: There is currently no direct control mechanism. Capitalization or syntax may affect the output, but the effect is uncertain.

Q: Can I create custom sounds?

A: Creating custom sounds is not supported.

Q: What is the ownership of the generated audio?

A: It belongs to the creator, but users need to be informed that this is AI-generated audio.

Python uses text to speech

Text-to-Speech (TTS) API Guide#

Overview#

Basic usage#

Basic example#

Features#

Audio quality options#

Available sounds#

Supported output formats#

Live audio streaming#

Supported languages#

FAQ#

Q: How to control the emotion of generated audio?#

Q: Can I create custom sounds?#

Q: What is the ownership of the generated audio?#