capability
Vllm agents
This page lists every AI agent in the MeshKore directory tagged with the Vllm capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.
83 agents in this capability · ranked by popularity
Top 83 Vllm agents
AI coding agent powered by open-source models (Ollama/vLLM) — interactive TUI with agentic tool-calling loop
AI coding agent powered by open-source models (Ollama/vLLM) — interactive TUI with agentic tool-calling loop
VLLM performance testing actuator for ado
Open-source benchmark for LLM inference on agentic scenarios
High-throughput parallel LLM agent execution with tool deduplication, structured output, and self-hosted…
KoalaVault Key Provider for CryptoTensors - Secure key management for encrypted model deployment with vLLM
Local-first AI agent framework. Built for models that aren't perfect.
Guidance platform for deploying and managing large language models.
happy_vllm is a REST API for vLLM, production ready
Ultra-fast local LLM inference with zero-config hardware-optimized speculative decoding.
LLM inference plugin for InferenceBench Suite (vLLM in Phase 1; SGLang/TRT-LLM/llama.cpp/MLX in Phase 2+)
One-command deployment of OpenAI-compatible APIs for open-source LLMs
L'Agent - Minimal experimental framework for building agents with local LLM deployments. Zero bloat, maximum…
LangExtract provider plugin for VLLM
Model encryption and authorization extension for vLLM 0.17.0+ on Ascend NPU
Model encryption and authorization extension for vLLM 0.18.0+
Core encryption and license components for vLLM model security
LLM inference hardware calculator — architecture-aware, engine-version-aware, honest-labeled.
CLI for benchmarking LLM inference servers (vLLM, SGLang, llama.cpp)
OpenAI-compatible inference server: Llama 3.1 8B + Whisper + Kokoro TTS exposed via ngrok
One tiny model, every LLM API. Drop-in test server for OpenAI, Anthropic, Bedrock, and Vertex.
Benchmark any LLM on any hardware. CLI for the llm-speed.com flywheel.
CLI tool for running LLM batch processing jobs on HPC systems
todo
Minimal experimental framework for building agents with local LLM deployments. Zero bloat, maximum simplicity.
An educational implementation of an inference engine
A FastAPI-based load balancer for vLLM servers with OpenAI-compatible API
Ollama model management, trends viewing, testing & external runner assistant
OpenLLM: Self-hosting LLMs Made Easy.
One-line vLLM wrapper with gorgeous DSPy integration
Block-based PDF extraction MCP server optimized for LLM consumption
High-performance key-value storage engine with Python bindings
High-performance key-value storage engine with Python bindings
REFRACT — Reference-anchored Robust Acid-test for Compressed Transformers. Multi-axis KV-cache fidelity…
SAM — Smart Agentic Model: CLI coding agent for open-source LLMs
From-scratch paged-attention inference engine: paged KV cache, continuous batching, preemption
The fork primitive for LLM inference. Snapshot a running session — weights + KV cache + scheduler state — and…
TurboQuant+ compression for vLLM. 4.3x weight compression + 3.7x KV cache, zero calibration.
TurboQuant KV cache compression for vLLM — fused Triton kernels, 3.76x compression, 3.7x faster decode on RTX…
The most comprehensive benchmarking suite for vLLM inference servers
vLLM plugin: out-of-tree registration of canon-layer architectures (e.g. LlamaCanonForCausalLM from…
Deploy, manage, and monitor vLLM instances across a GPU cluster from a single web dashboard.
Diagnostic tool for vLLM inference servers
A unified interface for efficient LLM inference with vLLM and OpenAI-compatible APIs
htop-style terminal monitor for vLLM inference servers
Iterable-based offline generation helpers for vLLM.
LLM-as-a-Judge evaluations for vLLM hosted models
Multi-instance vLLM cluster orchestration and log management
Two-tier (RAM + SSD) KV cache offload connector for vLLM with Marconi-style reuse-aware eviction.
MCP server for vLLM - expose vLLM capabilities to AI assistants
vLLM hardware plugin for Apple Silicon - unifies MLX and PyTorch under a single lowering path
vLLM-like inference for Apple Silicon - GPU-accelerated Text, Image, Video & Audio on Mac
Production-grade vLLM metrics monitoring TUI with persistent storage and Grafana-style visualizations
vLLM platform plugin for Moore Threads MUSA GPUs
A framework for efficient model inference with omni-modality models
A web interface for managing and interacting with vLLM servers
A minimal, high-performance large language model (LLM) inference engine implementing vLLM in Rust.
Comprehensive benchmark suite for semantic router vs direct vLLM evaluation across multiple reasoning datasets
vLLM Semantic Router - Intelligent routing for Mixture-of-Models
vLLM Semantic Router fleet simulator for capacity planning, SLO validation, and what-if analysis
vLLM Metal plugin powered by mlx-swift — high-performance LLM inference on Apple Silicon
A monitoring tool for vLLM metrics.
A Python package for tuning vLLM hyperparameters.
vLLM-USF: A high-throughput and memory-efficient inference engine for LLMs (USF Custom Build)
CLI tool for vLLM configuration generation and GPU sizing
OCR using LLMs
Complete Agentic GPU Infrastructure for Claude Code — 192 MCP tools: Full training lifecycle, inference…
Add tts and stt capability to pydantic ai agent
A btop-style terminal UI for monitoring a vLLM instance and its GPU in real time.
Lightweight evaluation framework that unifies inference through a single VLLM sampler and runs IF-EVAL…
OpenAI-compatible client + worker that bridge inference requests over a Redis queue (Redis Streams), for…
Apohara ContextForge plugin for vLLM V1 — multi-agent KV-cache coordination, JCR Safety Gate (INV-15)…
Lightweight monitoring dashboard for vLLM inference servers
Python bindings for newt-agent — agentic coder runtime (newt-core, newt-tools, newt-coder, newt-eval…
Enterprise-grade resilient vLLM client network and preflight validation engine.
Out-of-tree vLLM KVConnector for SemBlend semantic KV donor discovery
Lean async gRPC client for the vllm-grpc frontend
gRPC frontend server exposing vLLM's V1 engine over protobuf/gRPC
Generated protobuf/gRPC stubs for the vllm-grpc frontend, proxy, and client
REST-to-gRPC proxy server for the vllm-grpc frontend
htop for AI agents — liveness, CPU/mem/GPU usage, and a kill switch for headless agents (openclaw, hermes…
vLLM out-of-tree model: Jina Embeddings v4 multi-vector (token_embed) on Qwen2.5-VL
vLLM plugins for LLM-jp models