LLM硬件模型匹配器：自动检测系统配置，推荐最佳LLM模型，支持GPU/CPU/量化

llmfit-hardware-model-matcher by aradotso/trending-skills

375 周安装量

10 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/aradotso/trending-skills --skill llmfit-hardware-model-matcher

AI/机器学习自动化开发运维

🇨🇳中文介绍

llmfit 硬件模型匹配器

由 ara.so 提供的 Skill — Daily 2026 Skills 集合。

llmfit 检测您系统的 RAM、CPU 和 GPU，然后从质量、速度、适配度和上下文长度等多个维度对数百个 LLM 模型进行评分 — 准确告诉您哪些模型能在您的硬件上良好运行。它附带交互式 TUI 和 CLI，支持多 GPU、MoE 架构、动态量化以及本地运行时提供程序（Ollama、llama.cpp、MLX、Docker Model Runner）。

安装

macOS / Linux (Homebrew)

brew install llmfit

快速安装脚本

curl -fsSL https://llmfit.axjns.dev/install.sh | sh

# 不使用 sudo，安装到 ~/.local/bin
curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local

Windows (Scoop)

scoop install llmfit

Docker / Podman

docker run ghcr.io/alexsjones/llmfit

# 配合 jq 进行脚本处理
podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'

从源码安装 (Rust)

git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release
# 二进制文件位于 target/release/llmfit

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

显示系统硬件检测信息

llmfit system
llmfit --json system   # JSON 输出

llmfit search "llama 8b"
llmfit search "mistral"
llmfit search "qwen coding"

# 所有可运行模型按适配度排序
llmfit fit

# 仅显示完美适配的模型，前 5 个
llmfit fit --perfect -n 5

# JSON 输出
llmfit --json fit -n 10

llmfit info "Mistral-7B"
llmfit info "Llama-3.1-70B"

# 前 5 个推荐模型（默认 JSON 格式）
llmfit recommend --json --limit 5

# 按用例筛选：general, coding, reasoning, chat, multimodal, embedding
llmfit recommend --json --use-case coding --limit 3
llmfit recommend --json --use-case reasoning --limit 5

硬件规划（反向思考：我需要什么硬件？）

llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json
llmfit plan "Qwen/Qwen2.5-Coder-0.5B-Instruct" --context 8192 --json

REST API 服务器（用于集群调度）

llmfit serve
llmfit serve --host 0.0.0.0 --port 8787

当自动检测失败时（虚拟机、nvidia-smi 损坏、直通设置）：

# 覆盖 GPU VRAM
llmfit --memory=32G
llmfit --memory=24G --cli
llmfit --memory=24G fit --perfect -n 5
llmfit --memory=24G recommend --json

# 兆字节
llmfit --memory=32000M

# 可与任何子命令配合使用
llmfit --memory=16G info "Llama-3.1-70B"

接受的后缀：G/GB/GiB、M/MB/MiB、T/TB/TiB（不区分大小写）。

上下文长度上限

# 在 4K 上下文长度下估算内存适配度
llmfit --max-context 4096 --cli

# 配合子命令使用
llmfit --max-context 8192 fit --perfect -n 5
llmfit --max-context 16384 recommend --json --limit 5

# 环境变量替代方案
export OLLAMA_CONTEXT_LENGTH=8192
llmfit recommend --json

llmfit serve --host 0.0.0.0 --port 8787

# 健康检查
curl http://localhost:8787/health

# 节点硬件信息
curl http://localhost:8787/api/v1/system

# 带筛选器的完整模型列表
curl "http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20"

# 适合此节点的顶级可运行模型（关键调度端点）
curl "http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"

# 按模型名称/提供程序搜索
curl "http://localhost:8787/api/v1/models/Mistral?runtime=any"

`/models` 和 `/models/top` 的查询参数

参数	值	描述
`limit` / `n`	整数	返回的最大行数
`min_fit`	`perfect	good
`perfect`	`true	false`
`runtime`	`any	mlx
`use_case`	`general	coding
`provider`	字符串	提供程序子字符串匹配
`search`	字符串	跨名称/提供程序/大小/用例的自由文本搜索
`sort`	`score	tps
`include_too_tight`	`true	false`
`max_context`	整数	单次请求的上下文上限

脚本与自动化示例

Bash：以 JSON 格式获取顶级编码模型

#!/bin/bash
# 获取完美适配的前 3 个编码模型
llmfit recommend --json --use-case coding --limit 3 | \
  jq -r '.models[] | "\(.name) (\(.score)) - \(.quantization)"'

Bash：检查特定模型是否适配

#!/bin/bash
MODEL="Mistral-7B"
RESULT=$(llmfit info "$MODEL" --json 2>/dev/null)
FIT=$(echo "$RESULT" | jq -r '.fit')
if [[ "$FIT" == "perfect" || "$FIT" == "good" ]]; then
  echo "$MODEL will run well (fit: $FIT)"
else
  echo "$MODEL may not run well (fit: $FIT)"
fi

Bash：自动拉取顶级 Ollama 模型

#!/bin/bash
# 获取适配度最高的模型名称并用 Ollama 拉取
TOP_MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')
echo "Pulling: $TOP_MODEL"
ollama pull "$TOP_MODEL"

Python：查询 REST API

import requests

BASE_URL = "http://localhost:8787"

def get_system_info():
    resp = requests.get(f"{BASE_URL}/api/v1/system")
    return resp.json()

def get_top_models(use_case="coding", limit=5, min_fit="good"):
    params = {
        "use_case": use_case,
        "limit": limit,
        "min_fit": min_fit,
        "sort": "score"
    }
    resp = requests.get(f"{BASE_URL}/api/v1/models/top", params=params)
    return resp.json()

def search_models(query, runtime="any"):
    resp = requests.get(
        f"{BASE_URL}/api/v1/models/{query}",
        params={"runtime": runtime}
    )
    return resp.json()

# 示例用法
system = get_system_info()
print(f"GPU: {system.get('gpu_name')} | VRAM: {system.get('vram_gb')}GB")

models = get_top_models(use_case="reasoning", limit=3)
for m in models.get("models", []):
    print(f"{m['name']}: score={m['score']}, fit={m['fit']}, quant={m['quantization']}")

Python：面向智能体的硬件感知模型选择器

import subprocess
import json

def get_best_model_for_task(use_case: str, min_fit: str = "good") -> dict:
    """使用 llmfit 为给定任务选择最佳模型。"""
    result = subprocess.run(
        ["llmfit", "recommend", "--json", "--use-case", use_case, "--limit", "1"],
        capture_output=True,
        text=True
    )
    data = json.loads(result.stdout)
    models = data.get("models", [])
    return models[0] if models else None

def plan_hardware_requirements(model_name: str, context: int = 4096) -> dict:
    """获取运行特定模型所需的硬件要求。"""
    result = subprocess.run(
        ["llmfit", "plan", model_name, "--context", str(context), "--json"],
        capture_output=True,
        text=True
    )
    return json.loads(result.stdout)

# 选择最佳编码模型
best = get_best_model_for_task("coding")
if best:
    print(f"Best coding model: {best['name']}")
    print(f"  Quantization: {best['quantization']}")
    print(f"  Estimated tok/s: {best['tps']}")
    print(f"  Memory usage: {best['mem_pct']}%")

# 为特定模型规划硬件
plan = plan_hardware_requirements("Qwen/Qwen3-4B-MLX-4bit", context=8192)
print(f"Min VRAM needed: {plan['hardware']['min_vram_gb']}GB")
print(f"Recommended VRAM: {plan['hardware']['recommended_vram_gb']}GB")

Docker Compose：节点调度器模式

version: "3.8"
services:
  llmfit-api:
    image: ghcr.io/alexsjones/llmfit
    command: serve --host 0.0.0.0 --port 8787
    ports:
      - "8787:8787"
    environment:
      - OLLAMA_CONTEXT_LENGTH=8192
    devices:
      - /dev/nvidia0:/dev/nvidia0  # 直通 GPU

按键	操作
`↑`/`↓` 或 `j`/`k`	导航模型
`/`	搜索（名称、提供程序、参数、用例）
`Esc`/`Enter`	退出搜索
`Ctrl-U`	清除搜索
`f`	循环切换适配筛选器：全部 → 可运行 → 完美 → 良好 → 紧张
`a`	循环切换可用性：全部 → GGUF 可用 → 已安装
`s`	循环切换排序：评分 → 参数 → 内存占比 → 上下文 → 日期 → 用例
`t`	循环切换颜色主题（自动保存）
`v`	可视化模式（多选比较）
`V`	选择模式（基于列的筛选）
`p`	规划模式（此模型需要什么硬件？）
`P`	提供程序筛选器弹出窗口
`U`	用例筛选器弹出窗口
`C`	能力筛选器弹出窗口
`m`	标记模型以进行比较
`c`	比较视图（标记的 vs 选中的）
`d`	下载模型（通过检测到的运行时）
`r`	从运行时刷新已安装的模型
`Enter`	切换详情视图
`g`/`G`	跳转到顶部/底部
`q`	退出

t 循环切换：默认 → Dracula → Solarized → Nord → Monokai → Gruvbox
主题保存到 ~/.config/llmfit/theme

GPU 厂商	检测方法
NVIDIA	`nvidia-smi`（多 GPU，聚合 VRAM）
AMD	`rocm-smi`
Intel Arc	sysfs（独立显卡）/ `lspci`（集成显卡）
Apple Silicon	`system_profiler`（统一内存 = VRAM）
Ascend	`npu-smi`

"我的 16GB M2 Mac 能运行什么？"

llmfit fit --perfect -n 10
# 或交互式
llmfit
# 按 'f' 筛选到完美适配

"我有一张 3090（24GB VRAM），哪些编码模型适配？"

llmfit recommend --json --use-case coding | jq '.models[]'
# 或者如果检测失败，手动覆盖
llmfit --memory=24G recommend --json --use-case coding

"Llama 70B 能在我的机器上运行吗？"

llmfit info "Llama-3.1-70B"
# 规划您需要的硬件
llmfit plan "Llama-3.1-70B" --context 4096 --json

"只显示已在 Ollama 中安装的模型"

llmfit
# 按 'a' 循环切换到已安装筛选器
# 或
llmfit fit -n 20  # 运行，在 TUI 中按 'i' 优先显示已安装的

"脚本：找到最佳模型并启动 Ollama"

MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')
ollama serve &
ollama run "$MODEL"

"API：轮询节点能力以进行集群调度"

# 检查节点，获取前 3 个适配良好及以上的推理模型
curl -s "http://node1:8787/api/v1/models/top?limit=3&min_fit=good&use_case=reasoning" | \
  jq '.models[].name'

GPU 未检测到 / VRAM 报告错误

# 验证检测
llmfit system

# 手动覆盖
llmfit --memory=24G --cli

nvidia-smi 未找到，但您有 NVIDIA GPU

# 安装 CUDA 工具包或 nvidia-utils，然后重试
# 或手动覆盖：
llmfit --memory=8G fit --perfect

模型显示为 too_tight，但您有足够的 RAM

# llmfit 可能使用了上下文膨胀的估算值；限制上下文长度
llmfit --max-context 2048 fit --perfect -n 10

REST API：测试端点

# 启动服务器并运行验证套件
python3 scripts/test_api.py --spawn

# 测试已运行的服务器
python3 scripts/test_api.py --base-url http://127.0.0.1:8787

Apple Silicon：VRAM 显示为系统 RAM（正常现象）

# 这是正确的 — Apple Silicon 使用统一内存
# llmfit 会自动考虑这一点
llmfit system  # 应显示后端：Metal

上下文长度环境变量

export OLLAMA_CONTEXT_LENGTH=4096
llmfit recommend --json  # 使用 4096 作为上下文上限

🇺🇸English

llmfit Hardware Model Matcher

Skill by ara.so — Daily 2026 Skills collection.

llmfit detects your system's RAM, CPU, and GPU then scores hundreds of LLM models across quality, speed, fit, and context dimensions — telling you exactly which models will run well on your hardware. It ships with an interactive TUI and a CLI, supports multi-GPU, MoE architectures, dynamic quantization, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner).

Installation

macOS / Linux (Homebrew)

brew install llmfit

Quick install script

curl -fsSL https://llmfit.axjns.dev/install.sh | sh

# Without sudo, installs to ~/.local/bin
curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local

Windows (Scoop)

scoop install llmfit

Docker / Podman

docker run ghcr.io/alexsjones/llmfit

# With jq for scripting
podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'

From source (Rust)

git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release
# binary at target/release/llmfit

Core Concepts

Fit tiers : perfect (runs great), good (runs well), marginal (runs but tight), too_tight (won't run)
Scoring dimensions : quality, speed (tok/s estimate), fit (memory headroom), context capacity
Run modes : GPU, CPU+GPU offload, CPU-only, MoE
Quantization : automatically selects best quant (e.g. Q4_K_M, Q5_K_S, mlx-4bit) for your hardware
Providers : Ollama, llama.cpp, MLX, Docker Model Runner

Key Commands

Launch Interactive TUI

llmfit

CLI Table Output

llmfit --cli

Show System Hardware Detection

llmfit system
llmfit --json system   # JSON output

List All Models

llmfit list

Search Models

llmfit search "llama 8b"
llmfit search "mistral"
llmfit search "qwen coding"

Fit Analysis

# All runnable models ranked by fit
llmfit fit

# Only perfect fits, top 5
llmfit fit --perfect -n 5

# JSON output
llmfit --json fit -n 10

Model Detail

llmfit info "Mistral-7B"
llmfit info "Llama-3.1-70B"

Recommendations

# Top 5 recommendations (JSON default)
llmfit recommend --json --limit 5

# Filter by use case: general, coding, reasoning, chat, multimodal, embedding
llmfit recommend --json --use-case coding --limit 3
llmfit recommend --json --use-case reasoning --limit 5

Hardware Planning (invert: what hardware do I need?)

llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json
llmfit plan "Qwen/Qwen2.5-Coder-0.5B-Instruct" --context 8192 --json

REST API Server (for cluster scheduling)

llmfit serve
llmfit serve --host 0.0.0.0 --port 8787

Hardware Overrides

When autodetection fails (VMs, broken nvidia-smi, passthrough setups):

# Override GPU VRAM
llmfit --memory=32G
llmfit --memory=24G --cli
llmfit --memory=24G fit --perfect -n 5
llmfit --memory=24G recommend --json

# Megabytes
llmfit --memory=32000M

# Works with any subcommand
llmfit --memory=16G info "Llama-3.1-70B"

Accepted suffixes: G/GB/GiB, M/MB/MiB, T/TB/TiB (case-insensitive).

Context Length Cap

# Estimate memory fit at 4K context
llmfit --max-context 4096 --cli

# With subcommands
llmfit --max-context 8192 fit --perfect -n 5
llmfit --max-context 16384 recommend --json --limit 5

# Environment variable alternative
export OLLAMA_CONTEXT_LENGTH=8192
llmfit recommend --json

REST API Reference

Start the server:

llmfit serve --host 0.0.0.0 --port 8787

Endpoints

# Health check
curl http://localhost:8787/health

# Node hardware info
curl http://localhost:8787/api/v1/system

# Full model list with filters
curl "http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20"

# Top runnable models for this node (key scheduling endpoint)
curl "http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"

# Search by model name/provider
curl "http://localhost:8787/api/v1/models/Mistral?runtime=any"

Query Parameters for `/models` and `/models/top`

Param	Values	Description
`limit` / `n`	integer	Max rows returned
`min_fit`	`perfect	good
`perfect`	`true	false`
`runtime`	`any	mlx
`use_case`

Scripting & Automation Examples

Bash: Get top coding models as JSON

#!/bin/bash
# Get top 3 coding models that fit perfectly
llmfit recommend --json --use-case coding --limit 3 | \
  jq -r '.models[] | "\(.name) (\(.score)) - \(.quantization)"'

Bash: Check if a specific model fits

#!/bin/bash
MODEL="Mistral-7B"
RESULT=$(llmfit info "$MODEL" --json 2>/dev/null)
FIT=$(echo "$RESULT" | jq -r '.fit')
if [[ "$FIT" == "perfect" || "$FIT" == "good" ]]; then
  echo "$MODEL will run well (fit: $FIT)"
else
  echo "$MODEL may not run well (fit: $FIT)"
fi

Bash: Auto-pull top Ollama model

#!/bin/bash
# Get the top fitting model name and pull it with Ollama
TOP_MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')
echo "Pulling: $TOP_MODEL"
ollama pull "$TOP_MODEL"

Python: Query the REST API

import requests

BASE_URL = "http://localhost:8787"

def get_system_info():
    resp = requests.get(f"{BASE_URL}/api/v1/system")
    return resp.json()

def get_top_models(use_case="coding", limit=5, min_fit="good"):
    params = {
        "use_case": use_case,
        "limit": limit,
        "min_fit": min_fit,
        "sort": "score"
    }
    resp = requests.get(f"{BASE_URL}/api/v1/models/top", params=params)
    return resp.json()

def search_models(query, runtime="any"):
    resp = requests.get(
        f"{BASE_URL}/api/v1/models/{query}",
        params={"runtime": runtime}
    )
    return resp.json()

# Example usage
system = get_system_info()
print(f"GPU: {system.get('gpu_name')} | VRAM: {system.get('vram_gb')}GB")

models = get_top_models(use_case="reasoning", limit=3)
for m in models.get("models", []):
    print(f"{m['name']}: score={m['score']}, fit={m['fit']}, quant={m['quantization']}")

Python: Hardware-aware model selector for agents

import subprocess
import json

def get_best_model_for_task(use_case: str, min_fit: str = "good") -> dict:
    """Use llmfit to select the best model for a given task."""
    result = subprocess.run(
        ["llmfit", "recommend", "--json", "--use-case", use_case, "--limit", "1"],
        capture_output=True,
        text=True
    )
    data = json.loads(result.stdout)
    models = data.get("models", [])
    return models[0] if models else None

def plan_hardware_requirements(model_name: str, context: int = 4096) -> dict:
    """Get hardware requirements for running a specific model."""
    result = subprocess.run(
        ["llmfit", "plan", model_name, "--context", str(context), "--json"],
        capture_output=True,
        text=True
    )
    return json.loads(result.stdout)

# Select best coding model
best = get_best_model_for_task("coding")
if best:
    print(f"Best coding model: {best['name']}")
    print(f"  Quantization: {best['quantization']}")
    print(f"  Estimated tok/s: {best['tps']}")
    print(f"  Memory usage: {best['mem_pct']}%")

# Plan hardware for a specific model
plan = plan_hardware_requirements("Qwen/Qwen3-4B-MLX-4bit", context=8192)
print(f"Min VRAM needed: {plan['hardware']['min_vram_gb']}GB")
print(f"Recommended VRAM: {plan['hardware']['recommended_vram_gb']}GB")

Docker Compose: Node scheduler pattern

version: "3.8"
services:
  llmfit-api:
    image: ghcr.io/alexsjones/llmfit
    command: serve --host 0.0.0.0 --port 8787
    ports:
      - "8787:8787"
    environment:
      - OLLAMA_CONTEXT_LENGTH=8192
    devices:
      - /dev/nvidia0:/dev/nvidia0  # pass GPU through

TUI Key Reference

Key	Action
`↑`/`↓` or `j`/`k`	Navigate models
`/`	Search (name, provider, params, use case)
`Esc`/`Enter`	Exit search
`Ctrl-U`	Clear search

Themes

t cycles: Default → Dracula → Solarized → Nord → Monokai → Gruvbox
Theme saved to ~/.config/llmfit/theme

GPU Detection Details

GPU Vendor	Detection Method
NVIDIA	`nvidia-smi` (multi-GPU, aggregates VRAM)
AMD	`rocm-smi`
Intel Arc	sysfs (discrete) / `lspci` (integrated)
Apple Silicon	`system_profiler` (unified memory = VRAM)
Ascend	`npu-smi`

Common Patterns

"What can I run on my 16GB M2 Mac?"

llmfit fit --perfect -n 10
# or interactively
llmfit
# press 'f' to filter to Perfect fit

"I have a 3090 (24GB VRAM), what coding models fit?"

llmfit recommend --json --use-case coding | jq '.models[]'
# or with manual override if detection fails
llmfit --memory=24G recommend --json --use-case coding

"Can Llama 70B run on my machine?"

llmfit info "Llama-3.1-70B"
# Plan what hardware you'd need
llmfit plan "Llama-3.1-70B" --context 4096 --json

"Show me only models already installed in Ollama"

llmfit
# press 'a' to cycle to Installed filter
# or
llmfit fit -n 20  # run, press 'i' in TUI for installed-first

"Script: find best model and start Ollama"

MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')
ollama serve &
ollama run "$MODEL"

"API: poll node capabilities for cluster scheduler"

# Check node, get top 3 good+ models for reasoning
curl -s "http://node1:8787/api/v1/models/top?limit=3&min_fit=good&use_case=reasoning" | \
  jq '.models[].name'

Troubleshooting

GPU not detected / wrong VRAM reported

# Verify detection
llmfit system

# Manual override
llmfit --memory=24G --cli

nvidia-smi not found but you have an NVIDIA GPU

# Install CUDA toolkit or nvidia-utils, then retry
# Or override manually:
llmfit --memory=8G fit --perfect

Models show as too_tight but you have enough RAM

# llmfit may be using context-inflated estimates; cap context
llmfit --max-context 2048 fit --perfect -n 10

REST API: test endpoints

# Spawn server and run validation suite
python3 scripts/test_api.py --spawn

# Test already-running server
python3 scripts/test_api.py --base-url http://127.0.0.1:8787

Apple Silicon: VRAM shows as system RAM (expected)

# This is correct — Apple Silicon uses unified memory
# llmfit accounts for this automatically
llmfit system  # should show backend: Metal

Context length environment variable

export OLLAMA_CONTEXT_LENGTH=4096
llmfit recommend --json  # uses 4096 as context cap

Weekly Installs

375

Repository

aradotso/trending-skills

GitHub Stars

First Seen

8 days ago

Security Audits

Gen Agent Trust HubFail SocketWarn SnykFail

Installed on

cursor372

gemini-cli372

github-copilot372

codex372

amp372

cline372

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

105,000 周安装

LLM硬件模型匹配器：自动检测系统配置，推荐最佳LLM模型，支持GPU/CPU/量化

🇨🇳中文介绍

llmfit 硬件模型匹配器

安装

macOS / Linux (Homebrew)

快速安装脚本

Windows (Scoop)

Docker / Podman

从源码安装 (Rust)

相关 Skills

核心概念

关键命令

启动交互式 TUI

CLI 表格输出

显示系统硬件检测信息

列出所有模型

搜索模型

适配度分析

模型详情

推荐

硬件规划（反向思考：我需要什么硬件？）

REST API 服务器（用于集群调度）

硬件覆盖

上下文长度上限

REST API 参考

端点

/models 和 /models/top 的查询参数

脚本与自动化示例

Bash：以 JSON 格式获取顶级编码模型

Bash：检查特定模型是否适配

Bash：自动拉取顶级 Ollama 模型

Python：查询 REST API

Python：面向智能体的硬件感知模型选择器

Docker Compose：节点调度器模式

TUI 按键参考

主题

GPU 检测详情

常见模式

"我的 16GB M2 Mac 能运行什么？"

"我有一张 3090（24GB VRAM），哪些编码模型适配？"

"Llama 70B 能在我的机器上运行吗？"

"只显示已在 Ollama 中安装的模型"

"脚本：找到最佳模型并启动 Ollama"

"API：轮询节点能力以进行集群调度"

故障排除

🇺🇸English

llmfit Hardware Model Matcher

Installation

macOS / Linux (Homebrew)

Quick install script

Windows (Scoop)

Docker / Podman

From source (Rust)

Core Concepts

Key Commands

Launch Interactive TUI

CLI Table Output

Show System Hardware Detection

List All Models

Search Models

Fit Analysis

Model Detail

Recommendations

Hardware Planning (invert: what hardware do I need?)

REST API Server (for cluster scheduling)

Hardware Overrides

Context Length Cap

REST API Reference

Endpoints

Query Parameters for /models and /models/top

Scripting & Automation Examples

Bash: Get top coding models as JSON

Bash: Check if a specific model fits

Bash: Auto-pull top Ollama model

Python: Query the REST API

Python: Hardware-aware model selector for agents

Docker Compose: Node scheduler pattern

TUI Key Reference

Themes

GPU Detection Details

`/models` 和 `/models/top` 的查询参数

Query Parameters for `/models` and `/models/top`