Speech to Text API Guide

Overview

The Audio API provides two main endpoints:

📝 transcriptions: audio to text

🔄 translations: Audio translation to English

Supported formats

📁 File size: Max 25 MB

🎵 Supported formats: mp3, mp4, mpeg, mpg, m4a, wav, webm

How to use

1. Transcription

Convert audio to original language text

2. Translation

Convert any language audio to English text

3. Timestamp function

4. Processing large files

Use PyDub to split files larger than 25MB:

Optimization suggestions

Tips (Prompts) usage tips

🔍 Used to correct specific word recognition

📜 Maintain contextual coherence

✍️ Control punctuation mark output

🗣️Keep filler words

📝 Control the output text style (such as Simplified and Traditional Chinese)

Supported languages

Supports 98 languages, including:
-Main Asian languages: Chinese, Japanese, Korean, etc.

European languages: English, French, German, etc.

Other regional languages: Arabic, Hindi, etc.

Note: Only languages with a word error rate (WER) lower than 50% are listed. Other languages are supported but may have lower quality.

Python uses speech to text

Speech to Text API Guide#

Overview#

Supported formats#

How to use#

1. Transcription#

2. Translation#

3. Timestamp function#

4. Processing large files#

Optimization suggestions#

Tips (Prompts) usage tips#

Supported languages#