Python uses speech to text
Speech to Text API Guide#
Overview#
The Audio API provides two main endpoints:📝 transcriptions: audio to text
🔄 translations: Audio translation to English
🎵 Supported formats: mp3, mp4, mpeg, mpg, m4a, wav, webm
How to use#
1. Transcription#
Convert audio to original language text2. Translation#
Convert any language audio to English text3. Timestamp function#
4. Processing large files#
Use PyDub to split files larger than 25MB:Optimization suggestions#
Tips (Prompts) usage tips#
1.
🔍 Used to correct specific word recognition
2.
📜 Maintain contextual coherence
3.
✍️ Control punctuation mark output
5.
📝 Control the output text style (such as Simplified and Traditional Chinese)
Supported languages#
Supports 98 languages, including:
-Main Asian languages: Chinese, Japanese, Korean, etc.European languages: English, French, German, etc.
Other regional languages: Arabic, Hindi, etc.
Note: Only languages with a word error rate (WER) lower than 50% are listed. Other languages are supported but may have lower quality.
Modified at 2026-03-21 09:13:17