npx skills add https://github.com/ankitjh4/indic-ai-skills --skill indic-tts使用 Sarvam AI 为印度语言提供高质量的文本转语音服务。
您必须拥有 Sarvam API 密钥才能使用此技能。
在此处获取免费的 API 密钥:https://dashboard.sarvam.ai
SARVAM_API_KEY = 您的-api-密钥使用 Sarvam AI 的文档智能 API 从 PDF 文档和图像(JPEG/PNG)中提取文本和结构。支持 23 种印度语言及英语。
| 代码 | 语言 |
|---|---|
hi-IN | 印地语(默认) |
en-IN |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 英语 |
bn-IN | 孟加拉语 |
gu-IN | 古吉拉特语 |
kn-IN | 卡纳达语 |
ml-IN | 马拉雅拉姆语 |
mr-IN | 马拉地语 |
or-IN | 奥里亚语 |
pa-IN | 旁遮普语 |
ta-IN | 泰米尔语 |
te-IN | 泰卢固语 |
ur-IN | 乌尔都语 |
as-IN | 阿萨姆语 |
bodo-IN | 博多语 |
doi-IN | 多格拉语 |
ks-IN | 克什米尔语 |
kok-IN | 孔卡尼语 |
mai-IN | 迈蒂利语 |
mni-IN | 曼尼普尔语 |
ne-IN | 尼泊尔语 |
sa-IN | 梵语 |
sat-IN | 桑塔利语 |
sd-IN | 信德语 |
md - Markdown 文件(默认,人类可读)html - 保留布局的结构化 HTMLjson - 用于程序化处理的结构化 JSON# 处理 PDF 文档
python3 scripts/document_intelligence.py document.pdf --language hi-IN --format md
# 使用自定义输出目录处理
python3 scripts/document_intelligence.py document.pdf -o ./extracted/
# 检查现有任务的状态
python3 scripts/document_intelligence.py --job-id <job-id>
# 下载已完成任务的结果
python3 scripts/document_intelligence.py --job-id <job-id> --download -o ./output/
| 参数 | 默认值 | 描述 |
|---|---|---|
file | - | 要处理的 PDF 或 ZIP 文件 |
--language | hi-IN | 文档语言代码 |
--format | md | 输出格式:html、md 或 json |
--output-dir | . | 保存输出文件的目录 |
--poll | 5 | 状态检查之间的秒数 |
--timeout | 300 | 等待完成的最大秒数 |
--job-id | - | 检查状态/下载现有任务 |
--download | - | 现有任务的下载模式 |
文本 AI 功能,包括聊天补全、翻译、转写和语言检测。支持 23 种印度语言。
Sarvam 用于聊天和文本补全的 LLM API。提供两种模型:
sarvam-105b (旗舰版) - 用于复杂推理、编码和指令遵循的最强模型sarvam-m - 用于一般聊天的高效模型拥有 1050 亿参数的旗舰模型,在印地语和印度语言基准测试中具有最先进的性能。
# 使用 sarvam-105b 进行聊天(默认)
python3 scripts/text_processing.py chat "用简单的术语解释量子计算"
# 使用系统上下文进行聊天
python3 scripts/text_processing.py chat "写一首诗" --system "你是一位富有创造力的诗人"
# 调整温度(0-2,越低越集中)
python3 scripts/text_processing.py chat "创意故事" --temperature 0.8
主要特性:
用于一般聊天和文本补全的高效模型。
# 简单聊天
python3 scripts/text_processing.py chat "印度的首都是什么?" --model sarvam-m
# 使用系统上下文聊天
python3 scripts/text_processing.py chat "告诉我关于 AI 的事" --system "你是一个乐于助人的 AI 助手" --model sarvam-m
# 调整温度
python3 scripts/text_processing.py chat "创意故事" --temperature 0.8 --model sarvam-m
在 23 种印度语言之间翻译文本。提供两种模型:
mayura:v1 - 12 种语言,支持模式和转写sarvam-translate:v1 - 所有 23 种语言,仅支持正式模式语言:hi-IN, en-IN, bn-IN, gu-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, as-IN, brx-IN, doi-IN, kok-IN, ks-IN, mai-IN, mni-IN, ne-IN, sa-IN, sat-IN, sd-IN, ur-IN
# 自动检测源语言并翻译成印地语
python3 scripts/text_processing.py translate "Hello, how are you?" --target hi-IN
# 指定源语言
python3 scripts/text_processing.py translate "नमस्ते" --source hi-IN --target en-IN
# 使用 Mayura 模型和口语模式
python3 scripts/text_processing.py translate "What's up?" --target hi-IN --model mayura:v1 --mode modern-colloquial
# 使用罗马化输出进行翻译
python3 scripts/text_processing.py translate "I am going home" --target hi-IN --model mayura:v1 --output-script roman
模式(仅 mayura:v1):formal, modern-colloquial, classic-colloquial, code-mixed
输出脚本(仅 mayura:v1):roman, fully-native, spoken-form-in-native
将文本从一种文字转换为另一种文字,同时保留发音。
# 印地语转英语(罗马化)
python3 scripts/text_processing.py transliterate "नमस्ते" --source hi-IN --target en-IN
# 英语转印地语
python3 scripts/text_processing.py transliterate "namaste" --source en-IN --target hi-IN
# 使用口语形式转换
python3 scripts/text_processing.py transliterate "I have 2 meetings at 3pm" --source en-IN --target hi-IN --spoken-form
自动识别文本的语言和文字。
python3 scripts/text_processing.py detect "नमस्ते दुनिया"
# 输出:Language: hi-IN, Script: Deva
python3 scripts/text_processing.py detect "Hello world"
# 输出:Language: en-IN, Script: Latn
将语音转换为文本,支持自动语言检测和可选的英语翻译。提供三种模式:
hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN, as-IN, ur-IN, ne-IN, kok-IN, ks-IN, sd-IN, sa-IN, sat-IN, mni-IN, brx-IN, mai-IN, doi-IN
最适合短音频文件(<30 秒)。即时获得结果。
# 基本转录,自动翻译成英语
python3 scripts/speech_to_text.py rest audio.mp3
# 使用上下文提示
python3 scripts/speech_to_text.py rest audio.mp3 --prompt "这是一场关于技术的对话"
# 为 PCM 文件指定编解码器
python3 scripts/speech_to_text.py rest audio.raw --codec pcm_s16le
支持的格式:WAV, MP3, AAC, AIFF, OGG, OPUS, FLAC, MP4/M4A, AMR, WMA, WebM, PCM
使用流式音频进行实时语音转文本。
# 带翻译的流式传输(默认)
python3 scripts/speech_to_text.py websocket audio.wav
# 转录模式(无翻译)
python3 scripts/speech_to_text.py websocket audio.wav --mode transcribe
# 不同的输出模式(仅 saaras:v3)
python3 scripts/speech_to_text.py websocket audio.wav --mode translit # 罗马化输出
python3 scripts/speech_to_text.py websocket audio.wav --mode verbatim # 精确的逐字输出
python3 scripts/speech_to_text.py websocket audio.wav --mode codemix # 代码混合输出
模式(v3):translate, transcribe, verbatim, translit, codemix
用于较长的音频或处理多个文件。支持说话人分离。
# 完整工作流程 - 创建、上传、启动、轮询、下载
python3 scripts/speech_to_text.py batch audio1.mp3 audio2.mp3 audio3.mp3 --output-dir ./transcripts/
# 带说话人分离
python3 scripts/speech_to_text.py batch meeting.wav --diarization --num-speakers 3
# 分步工作流程
# 1. 创建任务
python3 scripts/speech_to_text.py batch-create --diarization
# 返回:Job ID: abc-123
# 2. 上传文件
python3 scripts/speech_to_text.py batch-upload abc-123 audio.mp3
# 3. 启动任务
python3 scripts/speech_to_text.py batch-start abc-123
# 4. 检查状态
python3 scripts/speech_to_text.py batch-status abc-123
# 5. 下载结果
python3 scripts/speech_to_text.py batch-download abc-123 output1.txt output2.txt --output-dir ./results/
Batch 工作流程状态:Accepted → Pending → Running → Completed/Failed
使用 Sarvam AI 的 Bulbul v3 模型为印度语言提供高质量的文本转语音服务。
python3 scripts/tts.py "नमस्ते, आप कैसे हैं?" --language hi-IN --speaker meera
| 代码 | 语言 |
|---|---|
hi-IN | 印地语 |
bn-IN | 孟加拉语 |
ta-IN | 泰米尔语 |
te-IN | 泰卢固语 |
gu-IN | 古吉拉特语 |
kn-IN | 卡纳达语 |
ml-IN | 马拉雅拉姆语 |
mr-IN | 马拉地语 |
pa-IN | 旁遮普语 |
od-IN | 奥里亚语 |
en-IN | 英语 |
女性:Meera(默认), Priya, Neha, Simran, Kavya, Ishita, Shreya, Roopa, Tanya, Shruti, Suhani, Kavitha, Rupali, Amelia, Sophia
男性:Shubh, Aditya, Rahul, Amit, Dev, Arjun, Ratan, Varun, Manan, Sumit, Kabir, Aayan, Ashutosh, Advait, Anand, Tarun, Sunny, Mani, Gokul, Vijay, Mohit, Rehan, Soham
| 参数 | 默认值 | 描述 |
|---|---|---|
text | - | 要转换的文本(最多 2500 个字符) |
--language | hi-IN | 目标语言代码 |
--speaker | meira | 语音说话人 |
--model | bulbul:v3 | TTS 模型 |
--output | output.wav | 输出文件路径 |
--sample-rate | 24000 | 音频采样率 |
每周安装数
1
代码仓库
GitHub 星标数
32
首次出现
1 天前
安全审计
安装于
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1
High-quality Text-to-Speech for Indian languages using Sarvam AI.
You must have a Sarvam API key to use this skill.
Get your free API key at: https://dashboard.sarvam.ai
SARVAM_API_KEY = your-api-keyExtract text and structure from PDF documents and images (JPEG/PNG) using Sarvam AI's Document Intelligence API. Supports 23 Indian languages plus English.
| Code | Language |
|---|---|
hi-IN | Hindi (default) |
en-IN | English |
bn-IN | Bengali |
gu-IN | Gujarati |
kn-IN | Kannada |
ml-IN | Malayalam |
mr-IN | Marathi |
or-IN | Odia |
pa-IN | Punjabi |
ta-IN | Tamil |
te-IN | Telugu |
ur-IN | Urdu |
as-IN | Assamese |
bodo-IN | Bodo |
doi-IN | Dogri |
ks-IN | Kashmiri |
kok-IN | Konkani |
mai-IN | Maithili |
mni-IN | Manipuri |
ne-IN | Nepali |
sa-IN | Sanskrit |
sat-IN | Santali |
sd-IN | Sindhi |
md - Markdown files (default, human-readable)html - Structured HTML with layout preservationjson - Structured JSON for programmatic processing# Process a PDF document
python3 scripts/document_intelligence.py document.pdf --language hi-IN --format md
# Process with custom output directory
python3 scripts/document_intelligence.py document.pdf -o ./extracted/
# Check status of existing job
python3 scripts/document_intelligence.py --job-id <job-id>
# Download results for completed job
python3 scripts/document_intelligence.py --job-id <job-id> --download -o ./output/
| Parameter | Default | Description |
|---|---|---|
file | - | PDF or ZIP file to process |
--language | hi-IN | Document language code |
--format | md | Output format: html, md, or json |
--output-dir | . | Directory to save output files |
--poll | 5 | Seconds between status checks |
Text AI capabilities including chat completion, translation, transliteration, and language detection. Supports 23 Indian languages.
Sarvam's LLM APIs for chat and text completion. Two models available:
sarvam-105b (Flagship) - Most capable model for complex reasoning, coding, and instruction followingsarvam-m - Efficient model for general chatThe flagship 105B parameter model with state-of-the-art performance on Hindi and Indian language benchmarks.
# Chat using sarvam-105b (default)
python3 scripts/text_processing.py chat "Explain quantum computing in simple terms"
# Chat with system context
python3 scripts/text_processing.py chat "Write a poem" --system "You are a creative poet"
# Adjust temperature (0-2, lower = more focused)
python3 scripts/text_processing.py chat "Creative story" --temperature 0.8
Key Features :
Efficient model for general chat and text completion.
# Simple chat
python3 scripts/text_processing.py chat "What is the capital of India?" --model sarvam-m
# Chat with system context
python3 scripts/text_processing.py chat "Tell me about AI" --system "You are a helpful AI assistant" --model sarvam-m
# Adjust temperature
python3 scripts/text_processing.py chat "Creative story" --temperature 0.8 --model sarvam-m
Translate text between 23 Indian languages. Two models available:
mayura:v1 - 12 languages, supports modes and transliterationsarvam-translate:v1 - All 23 languages, formal mode onlyLanguages : hi-IN, en-IN, bn-IN, gu-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, as-IN, brx-IN, doi-IN, kok-IN, ks-IN, mai-IN, mni-IN, ne-IN, sa-IN, sat-IN, sd-IN, ur-IN
# Auto-detect source and translate to Hindi
python3 scripts/text_processing.py translate "Hello, how are you?" --target hi-IN
# Specify source language
python3 scripts/text_processing.py translate "नमस्ते" --source hi-IN --target en-IN
# Use Mayura model with colloquial mode
python3 scripts/text_processing.py translate "What's up?" --target hi-IN --model mayura:v1 --mode modern-colloquial
# Translate with romanized output
python3 scripts/text_processing.py translate "I am going home" --target hi-IN --model mayura:v1 --output-script roman
Modes (mayura:v1 only): formal, modern-colloquial, classic-colloquial, code-mixed
Output Scripts (mayura:v1 only): roman, fully-native, spoken-form-in-native
Convert text from one script to another while preserving pronunciation.
# Hindi to English (romanization)
python3 scripts/text_processing.py transliterate "नमस्ते" --source hi-IN --target en-IN
# English to Hindi
python3 scripts/text_processing.py transliterate "namaste" --source en-IN --target hi-IN
# With spoken form conversion
python3 scripts/text_processing.py transliterate "I have 2 meetings at 3pm" --source en-IN --target hi-IN --spoken-form
Automatically identify the language and script of text.
python3 scripts/text_processing.py detect "नमस्ते दुনিয়া"
# Output: Language: hi-IN, Script: Deva
python3 scripts/text_processing.py detect "Hello world"
# Output: Language: en-IN, Script: Latn
Convert speech to text with automatic language detection and optional translation to English. Three modes available:
hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN, as-IN, ur-IN, ne-IN, kok-IN, ks-IN, sd-IN, sa-IN, sat-IN, mni-IN, brx-IN, mai-IN, doi-IN
Best for short audio files (<30 seconds). Immediate results.
# Basic transcription with auto-translation to English
python3 scripts/speech_to_text.py rest audio.mp3
# With context prompt
python3 scripts/speech_to_text.py rest audio.mp3 --prompt "This is a conversation about technology"
# Specify codec for PCM files
python3 scripts/speech_to_text.py rest audio.raw --codec pcm_s16le
Supported formats : WAV, MP3, AAC, AIFF, OGG, OPUS, FLAC, MP4/M4A, AMR, WMA, WebM, PCM
Real-time speech-to-text with streaming audio.
# Streaming with translation (default)
python3 scripts/speech_to_text.py websocket audio.wav
# Transcription mode (no translation)
python3 scripts/speech_to_text.py websocket audio.wav --mode transcribe
# Different output modes (saaras:v3 only)
python3 scripts/speech_to_text.py websocket audio.wav --mode translit # Romanized output
python3 scripts/speech_to_text.py websocket audio.wav --mode verbatim # Exact word-for-word
python3 scripts/speech_to_text.py websocket audio.wav --mode codemix # Code-mixed output
Modes (v3): translate, transcribe, verbatim, translit, codemix
For longer audio or processing multiple files. Supports speaker diarization.
# Full workflow - create, upload, start, poll, download
python3 scripts/speech_to_text.py batch audio1.mp3 audio2.mp3 audio3.mp3 --output-dir ./transcripts/
# With speaker diarization
python3 scripts/speech_to_text.py batch meeting.wav --diarization --num-speakers 3
# Step-by-step workflow
# 1. Create job
python3 scripts/speech_to_text.py batch-create --diarization
# Returns: Job ID: abc-123
# 2. Upload files
python3 scripts/speech_to_text.py batch-upload abc-123 audio.mp3
# 3. Start job
python3 scripts/speech_to_text.py batch-start abc-123
# 4. Check status
python3 scripts/speech_to_text.py batch-status abc-123
# 5. Download results
python3 scripts/speech_to_text.py batch-download abc-123 output1.txt output2.txt --output-dir ./results/
Batch workflow states : Accepted → Pending → Running → Completed/Failed
High-quality Text-to-Speech for Indian languages using Sarvam AI's Bulbul v3 model.
python3 scripts/tts.py "नमस्ते, आप कैसे हैं?" --language hi-IN --speaker meera
| Code | Language |
|---|---|
hi-IN | Hindi |
bn-IN | Bengali |
ta-IN | Tamil |
te-IN | Telugu |
gu-IN | Gujarati |
kn-IN | Kannada |
ml-IN |
Female : Meera (default), Priya, Neha, Simran, Kavya, Ishita, Shreya, Roopa, Tanya, Shruti, Suhani, Kavitha, Rupali, Amelia, Sophia
Male : Shubh, Aditya, Rahul, Amit, Dev, Arjun, Ratan, Varun, Manan, Sumit, Kabir, Aayan, Ashutosh, Advait, Anand, Tarun, Sunny, Mani, Gokul, Vijay, Mohit, Rehan, Soham
| Parameter | Default | Description |
|---|---|---|
text | - | Text to convert (max 2500 chars) |
--language | hi-IN | Target language code |
--speaker | meira | Voice speaker |
--model | bulbul:v3 | TTS model |
--output | output.wav | Output file path |
Weekly Installs
1
Repository
GitHub Stars
32
First Seen
1 day ago
Security Audits
Gen Agent Trust HubPassSocketFailSnykPass
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
45,100 周安装
--timeout | 300 | Maximum seconds to wait for completion |
--job-id | - | Check status/download existing job |
--download | - | Download mode for existing job |
| Malayalam |
mr-IN | Marathi |
pa-IN | Punjabi |
od-IN | Odia |
en-IN | English |
--sample-rate | 24000 | Audio sample rate |