npx skills add https://github.com/diskd-ai/groq-api --skill groq-api使用 Groq 的超快 LLM 推理(300-1000+ 令牌/秒)构建应用程序。
# Python
pip install groq
# TypeScript/JavaScript
npm install groq-sdk
export GROQ_API_KEY=<your-api-key>
Python:
from groq import Groq
client = Groq() # 使用 GROQ_API_KEY 环境变量
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
TypeScript:
import Groq from "groq-sdk";
const client = new Groq();
const response = await client.chat.completions.create({
model: "llama-3.3-70b-versatile",
messages: [{ role: "user", content: "Hello" }],
});
console.log(response.choices[0].message.content);
Build applications with Groq's ultra-fast LLM inference (300-1000+ tokens/sec).
# Python
pip install groq
# TypeScript/JavaScript
npm install groq-sdk
export GROQ_API_KEY=<your-api-key>
Python:
from groq import Groq
client = Groq() # Uses GROQ_API_KEY env var
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
TypeScript:
import Groq from "groq-sdk";
const client = new Groq();
const response = await client.chat.completions.create({
model: "llama-3.3-70b-versatile",
messages: [{ role: "user", content: "Hello" }],
});
console.log(response.choices[0].message.content);
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 使用场景 | 模型 | 备注 |
|---|
| 快速 + 廉价 | llama-3.1-8b-instant | 适用于简单任务 |
| 均衡 | llama-3.3-70b-versatile | 质量/成本平衡 |
| 最高质量 | openai/gpt-oss-120b | 内置工具 + 推理 |
| 智能体 | groq/compound | 网络搜索 + 代码执行 |
| 推理 | openai/gpt-oss-20b | 快速推理(低/中/高) |
| 视觉/OCR | llama-4-scout-17b-16e-instruct | 图像理解 |
| 音频 STT | whisper-large-v3-turbo | 转录 |
| TTS | playai-tts | 文本转语音 |
完整模型列表和定价请参阅 references/models.md。
stream = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello"}
]
)
import asyncio
from groq import AsyncGroq
async def main():
client = AsyncGroq()
response = await client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello"}]
)
return response.choices[0].message.content
print(asyncio.run(main()))
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "List 3 colors as JSON array"}],
response_format={"type": "json_object"}
)
强制输出匹配模式。提供两种模式:
| 模式 | 保证 | 模型 |
|---|---|---|
strict: true | 100% 模式合规 | openai/gpt-oss-20b, openai/gpt-oss-120b |
strict: false | 尽力合规 | 所有支持的模型 |
严格模式(保证合规):
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[{"role": "user", "content": "Extract: John is 30 years old"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person",
"strict": True,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"],
"additionalProperties": False
}
}
}
)
使用 Pydantic:
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[{"role": "user", "content": "Extract: John is 30"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person",
"strict": True,
"schema": Person.model_json_schema()
}
}
)
person = Person.model_validate(json.loads(response.choices[0].message.content))
模式要求、验证库和示例请参阅 references/structured-outputs.md。
with open("audio.mp3", "rb") as f:
transcription = client.audio.transcriptions.create(
model="whisper-large-v3-turbo",
file=f,
language="en", # 可选:ISO-639-1 代码
response_format="verbose_json", # json, text, verbose_json
timestamp_granularities=["word", "segment"]
)
print(transcription.text)
with open("french_audio.mp3", "rb") as f:
translation = client.audio.translations.create(
model="whisper-large-v3",
file=f
)
print(translation.text) # 英语文本
response = client.audio.speech.create(
model="playai-tts",
input="Hello, world!",
voice="Fritz-PlayAI",
response_format="wav", # flac, mp3, mulaw, ogg, wav
speed=1.0 # 0.5 到 5
)
response.write_to_file("output.wav")
使用 Llama 4 多模态模型处理图像。每个请求最多支持 5 张图像。
模型: meta-llama/llama-4-scout-17b-16e-instruct(更快),meta-llama/llama-4-maverick-17b-128e-instruct(更高质量)
response = client.chat.completions.create(
model="meta-llama/llama-4-scout-17b-16e-instruct",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}]
)
import base64
def encode_image(path: str) -> str:
with open(path, "rb") as f:
return base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="meta-llama/llama-4-scout-17b-16e-instruct",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encode_image('photo.jpg')}"}}
]
}]
)
response = client.chat.completions.create(
model="meta-llama/llama-4-scout-17b-16e-instruct",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Extract all text and data as JSON"},
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{base64_image}"}}
]
}],
response_format={"type": "json_object"}
)
多图像处理、图像工具使用和多轮对话请参阅 references/vision.md。
工具调用模式和示例请参阅 references/tool-use.md。
快速示例:
import json
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"]
}
}
}]
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Weather in Paris?"}],
tools=tools
)
if response.choices[0].message.tool_calls:
for tc in response.choices[0].message.tool_calls:
args = json.loads(tc.function.arguments)
# 执行函数并继续对话
使用 groq/compound 或 openai/gpt-oss-120b 进行内置网络搜索和代码执行:
response = client.chat.completions.create(
model="groq/compound",
messages=[{"role": "user", "content": "Search for latest Python news"}]
)
# 模型自动使用网络搜索
连接到第三方 MCP 服务器以使用 Stripe、GitHub、网络抓取等工具。使用 Responses API:
import openai
client = openai.OpenAI(
api_key=os.environ.get("GROQ_API_KEY"),
base_url="https://api.groq.com/openai/v1"
)
response = client.responses.create(
model="openai/gpt-oss-120b",
input="What models are trending on Huggingface?",
tools=[{
"type": "mcp",
"server_label": "Huggingface",
"server_url": "https://huggingface.co/mcp"
}]
)
MCP 配置和流行服务器请参阅 references/tool-use.md。
控制模型如何思考复杂问题。
模型: openai/gpt-oss-20b, openai/gpt-oss-120b(低/中/高),qwen/qwen3-32b(无/默认)
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[{"role": "user", "content": "How many r's in strawberry?"}],
reasoning_effort="high", # low, medium, high
temperature=0.6,
max_completion_tokens=1024
)
print(response.choices[0].message.content)
print("Reasoning:", response.choices[0].message.reasoning)
response = client.chat.completions.create(
model="qwen/qwen3-32b",
messages=[{"role": "user", "content": "Solve: x + 5 = 12"}],
reasoning_format="parsed" # raw, parsed, hidden
)
print("Answer:", response.choices[0].message.content)
print("Reasoning:", response.choices[0].message.reasoning)
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[{"role": "user", "content": "What is 15% of 80?"}],
include_reasoning=False # 在响应中隐藏推理
)
流式处理、带推理的工具使用和最佳实践请参阅 references/reasoning.md。
用于高容量异步处理(24小时-7天完成窗口):
# 1. 创建包含请求的 JSONL 文件
# 2. 上传文件
# 3. 创建批次
batch = client.batches.create(
input_file_id=file_id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
# 4. 检查状态
batch = client.batches.retrieve(batch.id)
if batch.status == "completed":
results = client.files.content(batch.output_file_id)
完整的批处理 API 详情请参阅 references/api-reference.md。
通过自动缓存重复的提示前缀,可将延迟和成本降低 50%。无需更改代码。
支持的模型: moonshotai/kimi-k2-instruct-0905, openai/gpt-oss-20b, openai/gpt-oss-120b, openai/gpt-oss-safeguard-20b
工作原理:
跟踪缓存使用情况:
response = client.chat.completions.create(
model="moonshotai/kimi-k2-instruct-0905",
messages=[{"role": "system", "content": large_system_prompt}, ...]
)
cached = response.usage.prompt_tokens_details.cached_tokens
print(f"Cached tokens: {cached}") # 对这些令牌应用了 50% 折扣
优化策略和示例请参阅 references/prompt-caching.md。
使用安全防护模型检测和过滤有害内容。
通用内容安全分类。返回 safe 或 unsafe\nSX(类别代码)。
response = client.chat.completions.create(
model="meta-llama/Llama-Guard-4-12B",
messages=[{"role": "user", "content": user_input}]
)
if response.choices[0].message.content.startswith("unsafe"):
# 阻止或处理不安全内容
pass
使用自定义策略进行提示注入检测。返回结构化 JSON。
response = client.chat.completions.create(
model="openai/gpt-oss-safeguard-20b",
messages=[
{"role": "system", "content": injection_detection_policy},
{"role": "user", "content": user_input}
]
)
# 返回:{"violation": 1, "category": "Direct Override", "rationale": "..."}
完整策略、危害分类和集成模式请参阅 references/moderation.md。
from groq import Groq, RateLimitError, APIConnectionError, APIStatusError
client = Groq()
try:
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello"}]
)
except RateLimitError:
# 等待并使用指数退避重试
pass
except APIConnectionError:
# 网络问题
pass
except APIStatusError as e:
# API 错误(检查 e.status_code)
pass
完整的音频 API 参考,包括文件处理、元数据字段和提示指南,请参阅 references/audio.md。
每周安装量
61
仓库
首次出现
2026 年 1 月 23 日
安全审计
安装于
opencode53
gemini-cli52
codex52
github-copilot47
amp46
kimi-cli45
| Use Case | Model | Notes |
|---|---|---|
| Fast + cheap | llama-3.1-8b-instant | Best for simple tasks |
| Balanced | llama-3.3-70b-versatile | Quality/cost balance |
| Highest quality | openai/gpt-oss-120b | Built-in tools + reasoning |
| Agentic | groq/compound | Web search + code exec |
| Reasoning | openai/gpt-oss-20b | Fast reasoning (low/med/high) |
| Vision/OCR | llama-4-scout-17b-16e-instruct | Image understanding |
| Audio STT | whisper-large-v3-turbo | Transcription |
| TTS | playai-tts | Text-to-speech |
See references/models.md for full model list and pricing.
stream = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello"}
]
)
import asyncio
from groq import AsyncGroq
async def main():
client = AsyncGroq()
response = await client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello"}]
)
return response.choices[0].message.content
print(asyncio.run(main()))
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "List 3 colors as JSON array"}],
response_format={"type": "json_object"}
)
Force output to match a schema. Two modes available:
| Mode | Guarantee | Models |
|---|---|---|
strict: true | 100% schema compliance | openai/gpt-oss-20b, openai/gpt-oss-120b |
strict: false | Best-effort compliance | All supported models |
Strict Mode (guaranteed compliance):
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[{"role": "user", "content": "Extract: John is 30 years old"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person",
"strict": True,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"],
"additionalProperties": False
}
}
}
)
With Pydantic:
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[{"role": "user", "content": "Extract: John is 30"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person",
"strict": True,
"schema": Person.model_json_schema()
}
}
)
person = Person.model_validate(json.loads(response.choices[0].message.content))
See references/structured-outputs.md for schema requirements, validation libraries, and examples.
with open("audio.mp3", "rb") as f:
transcription = client.audio.transcriptions.create(
model="whisper-large-v3-turbo",
file=f,
language="en", # Optional: ISO-639-1 code
response_format="verbose_json", # json, text, verbose_json
timestamp_granularities=["word", "segment"]
)
print(transcription.text)
with open("french_audio.mp3", "rb") as f:
translation = client.audio.translations.create(
model="whisper-large-v3",
file=f
)
print(translation.text) # English text
response = client.audio.speech.create(
model="playai-tts",
input="Hello, world!",
voice="Fritz-PlayAI",
response_format="wav", # flac, mp3, mulaw, ogg, wav
speed=1.0 # 0.5 to 5
)
response.write_to_file("output.wav")
Process images with Llama 4 multimodal models. Supports up to 5 images per request.
Models: meta-llama/llama-4-scout-17b-16e-instruct (faster), meta-llama/llama-4-maverick-17b-128e-instruct (higher quality)
response = client.chat.completions.create(
model="meta-llama/llama-4-scout-17b-16e-instruct",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}]
)
import base64
def encode_image(path: str) -> str:
with open(path, "rb") as f:
return base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="meta-llama/llama-4-scout-17b-16e-instruct",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encode_image('photo.jpg')}"}}
]
}]
)
response = client.chat.completions.create(
model="meta-llama/llama-4-scout-17b-16e-instruct",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Extract all text and data as JSON"},
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{base64_image}"}}
]
}],
response_format={"type": "json_object"}
)
See references/vision.md for multi-image, tool use with images, and multi-turn conversations.
For tool calling patterns and examples, see references/tool-use.md.
Quick example:
import json
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"]
}
}
}]
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Weather in Paris?"}],
tools=tools
)
if response.choices[0].message.tool_calls:
for tc in response.choices[0].message.tool_calls:
args = json.loads(tc.function.arguments)
# Execute function and continue conversation
Use groq/compound or openai/gpt-oss-120b for built-in web search and code execution:
response = client.chat.completions.create(
model="groq/compound",
messages=[{"role": "user", "content": "Search for latest Python news"}]
)
# Model automatically uses web search
Connect to third-party MCP servers for tools like Stripe, GitHub, web scraping. Use the Responses API:
import openai
client = openai.OpenAI(
api_key=os.environ.get("GROQ_API_KEY"),
base_url="https://api.groq.com/openai/v1"
)
response = client.responses.create(
model="openai/gpt-oss-120b",
input="What models are trending on Huggingface?",
tools=[{
"type": "mcp",
"server_label": "Huggingface",
"server_url": "https://huggingface.co/mcp"
}]
)
See references/tool-use.md for MCP configuration and popular servers.
Control how models think through complex problems.
Models: openai/gpt-oss-20b, openai/gpt-oss-120b (low/medium/high), qwen/qwen3-32b (none/default)
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[{"role": "user", "content": "How many r's in strawberry?"}],
reasoning_effort="high", # low, medium, high
temperature=0.6,
max_completion_tokens=1024
)
print(response.choices[0].message.content)
print("Reasoning:", response.choices[0].message.reasoning)
response = client.chat.completions.create(
model="qwen/qwen3-32b",
messages=[{"role": "user", "content": "Solve: x + 5 = 12"}],
reasoning_format="parsed" # raw, parsed, hidden
)
print("Answer:", response.choices[0].message.content)
print("Reasoning:", response.choices[0].message.reasoning)
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[{"role": "user", "content": "What is 15% of 80?"}],
include_reasoning=False # Hide reasoning in response
)
See references/reasoning.md for streaming, tool use with reasoning, and best practices.
For high-volume async processing (24h-7d completion window):
# 1. Create JSONL file with requests
# 2. Upload file
# 3. Create batch
batch = client.batches.create(
input_file_id=file_id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
# 4. Check status
batch = client.batches.retrieve(batch.id)
if batch.status == "completed":
results = client.files.content(batch.output_file_id)
See references/api-reference.md for full batch API details.
Automatically reduce latency and costs by 50% for repeated prompt prefixes. No code changes required.
Supported models: moonshotai/kimi-k2-instruct-0905, openai/gpt-oss-20b, openai/gpt-oss-120b, openai/gpt-oss-safeguard-20b
How it works:
Track cache usage:
response = client.chat.completions.create(
model="moonshotai/kimi-k2-instruct-0905",
messages=[{"role": "system", "content": large_system_prompt}, ...]
)
cached = response.usage.prompt_tokens_details.cached_tokens
print(f"Cached tokens: {cached}") # 50% discount applied to these
See references/prompt-caching.md for optimization strategies and examples.
Detect and filter harmful content using safeguard models.
General content safety classification. Returns safe or unsafe\nSX (category code).
response = client.chat.completions.create(
model="meta-llama/Llama-Guard-4-12B",
messages=[{"role": "user", "content": user_input}]
)
if response.choices[0].message.content.startswith("unsafe"):
# Block or handle unsafe content
pass
Prompt injection detection with custom policies. Returns structured JSON.
response = client.chat.completions.create(
model="openai/gpt-oss-safeguard-20b",
messages=[
{"role": "system", "content": injection_detection_policy},
{"role": "user", "content": user_input}
]
)
# Returns: {"violation": 1, "category": "Direct Override", "rationale": "..."}
See references/moderation.md for complete policies, harm taxonomy, and integration patterns.
from groq import Groq, RateLimitError, APIConnectionError, APIStatusError
client = Groq()
try:
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello"}]
)
except RateLimitError:
# Wait and retry with exponential backoff
pass
except APIConnectionError:
# Network issue
pass
except APIStatusError as e:
# API error (check e.status_code)
pass
See references/audio.md for complete audio API reference including file handling, metadata fields, and prompting guidelines.
Weekly Installs
61
Repository
First Seen
Jan 23, 2026
Security Audits
Installed on
opencode53
gemini-cli52
codex52
github-copilot47
amp46
kimi-cli45
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
50,500 周安装