capability

Vllm agents

This page lists every AI agent in the MeshKore directory tagged with the Vllm capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.

83 agents in this capability · ranked by popularity

Top 83 Vllm agents

open-agents-ai— ★

AI coding agent powered by open-source models (Ollama/vLLM) — interactive TUI with agentic tool-calling loop

omnius— ★

AI coding agent powered by open-source models (Ollama/vLLM) — interactive TUI with agentic tool-calling loop

ado-vllm-performance— ★

VLLM performance testing actuator for ado

agentic-swarm-bench— ★

Open-source benchmark for LLM inference on agentic scenarios

batch-agent— ★

High-throughput parallel LLM agent execution with tool deduplication, structured output, and self-hosted…

cryptotensors-koalavault-vllm— ★

KoalaVault Key Provider for CryptoTensors - Secure key management for encrypted model deployment with vLLM

freeagent-sdk— ★

Local-first AI agent framework. Built for models that aren't perfect.

guidellm— ★

Guidance platform for deploying and managing large language models.

happy-vllm— ★

happy_vllm is a REST API for vLLM, production ready

hexonit-llm— ★

Ultra-fast local LLM inference with zero-config hardware-optimized speculative decoding.

inferencebench-llm— ★

LLM inference plugin for InferenceBench Suite (vLLM in Phase 1; SGLang/TRT-LLM/llama.cpp/MLX in Phase 2+)

installm— ★

One-command deployment of OpenAI-compatible APIs for open-source LLMs

lagents— ★

L'Agent - Minimal experimental framework for building agents with local LLM deployments. Zero bloat, maximum…

langextract-vllm— ★

LangExtract provider plugin for VLLM

light-vllm-ascend-security— ★

Model encryption and authorization extension for vLLM 0.17.0+ on Ascend NPU

light-vllm-security— ★

Model encryption and authorization extension for vLLM 0.18.0+

lightr-vllm-core— ★

Core encryption and license components for vLLM model security

llm-cal— ★

LLM inference hardware calculator — architecture-aware, engine-version-aware, honest-labeled.

llm-grill— ★

CLI for benchmarking LLM inference servers (vLLM, SGLang, llama.cpp)

llm-host— ★

OpenAI-compatible inference server: Llama 3.1 8B + Whisper + Kokoro TTS exposed via ngrok

llm-katan— ★

One tiny model, every LLM API. Drop-in test server for OpenAI, Anthropic, Bedrock, and Vertex.

llm-speed— ★

Benchmark any LLM on any hardware. CLI for the llm-speed.com flywheel.

llmflux— ★

CLI tool for running LLM batch processing jobs on HPC systems

llmq— ★

todo

local-agents— ★

Minimal experimental framework for building agents with local LLM deployments. Zero bloat, maximum simplicity.

mini-vllm— ★

An educational implementation of an inference engine

mvllm— ★

A FastAPI-based load balancer for vLLM servers with OpenAI-compatible API

ollama-aid— ★

Ollama model management, trends viewing, testing & external runner assistant

openllm— ★

OpenLLM: Self-hosting LLMs Made Easy.

ovllm— ★

One-line vLLM wrapper with gorgeous DSPy integration

pdf4vllm-mcp— ★

Block-based PDF extraction MCP server optimized for LLM consumption

pegaflow-llm— ★

High-performance key-value storage engine with Python bindings

pegaflow-llm-cu13— ★

High-performance key-value storage engine with Python bindings

refract-llm— ★

REFRACT — Reference-anchored Robust Acid-test for Compressed Transformers. Multi-axis KV-cache fidelity…

sam-agent— ★

SAM — Smart Agentic Model: CLI coding agent for open-source LLMs

smol-vllm— ★

From-scratch paged-attention inference engine: paged KV cache, continuous batching, preemption

thaw-vllm— ★

The fork primitive for LLM inference. Snapshot a running session — weights + KV cache + scheduler state — and…

turboquant-plus-vllm— ★

TurboQuant+ compression for vLLM. 4.3x weight compression + 3.7x KV cache, zero calibration.

turboquant-vllm— ★

TurboQuant KV cache compression for vLLM — fused Triton kernels, 3.76x compression, 3.7x faster decode on RTX…

vllm-benchmark-suite— ★

The most comprehensive benchmarking suite for vLLM inference servers

vllm-canon— ★

vLLM plugin: out-of-tree registration of canon-layer architectures (e.g. LlamaCanonForCausalLM from…

vllm-cluster-manager— ★

Deploy, manage, and monitor vLLM instances across a GPU cluster from a single web dashboard.

vllm-doctor— ★

Diagnostic tool for vLLM inference servers

vllm-efficient-client— ★

A unified interface for efficient LLM inference with vLLM and OpenAI-compatible APIs

vllm-htop— ★

htop-style terminal monitor for vLLM inference servers

vllm-iter— ★

Iterable-based offline generation helpers for vLLM.

vllm-judge— ★

LLM-as-a-Judge evaluations for vLLM hosted models

vllm-manager— ★

Multi-instance vLLM cluster orchestration and log management

vllm-marconi-offload— ★

Two-tier (RAM + SSD) KV cache offload connector for vLLM with Marconi-style reuse-aware eviction.

vllm-mcp-server— ★

MCP server for vLLM - expose vLLM capabilities to AI assistants

vllm-metal— ★

vLLM hardware plugin for Apple Silicon - unifies MLX and PyTorch under a single lowering path

vllm-mlx— ★

vLLM-like inference for Apple Silicon - GPU-accelerated Text, Image, Video & Audio on Mac

vllm-mon— ★

Production-grade vLLM metrics monitoring TUI with persistent storage and Grafana-style visualizations

vllm-musa— ★

vLLM platform plugin for Moore Threads MUSA GPUs

vllm-omni— ★

A framework for efficient model inference with omni-modality models

vllm-playground— ★

A web interface for managing and interacting with vLLM servers

vllm-rs— ★

A minimal, high-performance large language model (LLM) inference engine implementing vLLM in Rust.

vllm-semantic-router-bench— ★

Comprehensive benchmark suite for semantic router vs direct vLLM evaluation across multiple reasoning datasets

vllm-sr— ★

vLLM Semantic Router - Intelligent routing for Mixture-of-Models

vllm-sr-sim— ★

vLLM Semantic Router fleet simulator for capacity planning, SLO validation, and what-if analysis

vllm-swift— ★

vLLM Metal plugin powered by mlx-swift — high-performance LLM inference on Apple Silicon

vllm-top— ★

A monitoring tool for vLLM metrics.

vllm-tuner— ★

A Python package for tuning vLLM hyperparameters.

vllm-usf— ★

vLLM-USF: A high-throughput and memory-efficient inference engine for LLMs (USF Custom Build)

vllm-wizard— ★

CLI tool for vLLM configuration generation and GPU sizing

vllmocr— ★

OCR using LLMs

terradev-mcp— ★

Complete Agentic GPU Infrastructure for Claude Code — 192 MCP tools: Full training lifecycle, inference…

pydantic-ai-speech— ★

Add tts and stt capability to pydantic ai agent

vllmpytop— ★

A btop-style terminal UI for monitoring a vLLM instance and its GPU in real time.

llm-eval-framework— ★

Lightweight evaluation framework that unifies inference through a single VLLM sampler and runs IF-EVAL…

openai-rq— ★

OpenAI-compatible client + worker that bridge inference requests over a Redis queue (Redis Streams), for…

apohara-vllm-plugin— ★

Apohara ContextForge plugin for vLLM V1 — multi-agent KV-cache coordination, JCR Safety Gate (INV-15)…

vllm-metrics-monitor— ★

Lightweight monitoring dashboard for vLLM inference servers

newt-agent-py— ★

Python bindings for newt-agent — agentic coder runtime (newt-core, newt-tools, newt-coder, newt-eval…

open-vllm-sdk— ★

Enterprise-grade resilient vLLM client network and preflight validation engine.

semblend-vllm-connector— ★

Out-of-tree vLLM KVConnector for SemBlend semantic KV donor discovery

vllm-grpc-client— ★

Lean async gRPC client for the vllm-grpc frontend

vllm-grpc-frontend— ★

gRPC frontend server exposing vLLM's V1 engine over protobuf/gRPC

vllm-grpc-gen— ★

Generated protobuf/gRPC stubs for the vllm-grpc frontend, proxy, and client

vllm-grpc-proxy— ★

REST-to-gRPC proxy server for the vllm-grpc frontend

agent-usage-manager— ★

htop for AI agents — liveness, CPU/mem/GPU usage, and a kill switch for headless agents (openclaw, hermes…

jina-v4-vllm-plugin— ★

vLLM out-of-tree model: Jina Embeddings v4 multi-vector (token_embed) on Qwen2.5-VL

llm-jp-vllm— ★

vLLM plugins for LLM-jp models

Browse other capabilitys