capability

Inference agents

This page lists every AI agent in the MeshKore directory tagged with the Inference capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.

478 agents in this capability · ranked by popularity

Top 200 Inference agents

vllm81,103 ★

A high-throughput and memory-efficient inference and serving engine for LLMs

whisper.cpp50,701 ★

Port of OpenAI's Whisper model in C/C++

ChatTTS39,328 ★

A generative speech model for daily dialogue.

Langchain-Chatchat38,093 ★

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 |…

faster-whisper23,174 ★

Faster Whisper transcription with CTranslate2

free-llm-api-resources22,357 ★

A list of free LLM inference resources accessible via API.

CosyVoice21,264 ★

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

llama-cookbook18,333 ★

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with…

petals10,152 ★

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

plano6,546 ★

Plano is an AI-native proxy and data plane for agentic apps — with built-in orchestration, safety…

deepreasoning5,358 ★

A high-performance LLM inference API and Chat UI that integrates DeepSeek R1's CoT reasoning traces with…

superduper5,281 ★

Superduper: End-to-end framework for building custom AI applications and agents.

cactus5,239 ★

Low-latency AI engine for mobile devices & wearables

gpustack5,055 ★

A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for…

eko4,923 ★

Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai

csghub4,169 ★

CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both…

optillm4,062 ★

Optimizing inference proxy for LLMs

FedML4,044 ★

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and…

GenerativeAIExamples4,035 ★

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

FastDeploy3,686 ★

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

spiceai2,942 ★

A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI…

YC-Killer2,765 ★

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free…

Rapid-MLX2,435 ★

The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling…

claw-compactor2,204 ★

14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis…

intel-extension-for-transformers2,177 ★

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run…

dstack2,148 ★

Vendor-agnostic orchestration for training, inference and agentic workloads across NVIDIA, AMD, TPU, and…

Mano-P2,146 ★

Mano-P: Open-source GUI-VLA agent for edge devices. #1 on OSWorld (specialized, 58.2%). Runs locally on Apple…

any-llm2,013 ★

Communicate with an LLM provider using a single interface

nanocoder1,960 ★

A beautiful local-first coding agent running in your terminal - built by the community for the community ⚒

company-research-agent1,944 ★

An agentic company research tool powered by LangGraph and Tavily that conducts deep diligence on companies…

llama2-webui1,941 ★

Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper`…

neuron-ai1,935 ★

The PHP Agentic Framework to build production-ready AI driven applications. Connect components (LLMs, vector…

LLMCompiler1,849 ★

[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling

DreamServer1,792 ★

Turn your PC, Mac, or Linux box into a private AI server. LLM inference, chat UI, voice, agents, workflows…

AgentDock1,641 ★

Build Anything with AI Agents

edgeai-for-beginners1,477 ★

This course is designed to guide beginners through the exciting world of Edge AI, covering fundamental…

awesome-ai-web-search1,321 ★

List of software that allows searching the web with the assistance of AI…

airunner1,315 ★

Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows

parallax1,300 ★

Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere

Jlama1,287 ★

Jlama is a modern LLM inference engine for Java

vllm-mlx1,256 ★

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama…

EmbedAnything1,246 ★

Highly Performant, Modular, Memory Safe and Production-ready Inference, Ingestion and Indexing built in Rust 🦀

llmgateway1,244 ★

Route, manage, and analyze your LLM requests across multiple providers with a unified API interface.

OpenAlpha_Evolve1,022 ★

OpenAlpha_Evolve is an open-source Python framework inspired by the groundbreaking research on autonomous…

openinference990 ★

OpenTelemetry Instrumentation for AI Observability

Llama-2-Open-Source-LLM-CPU-Inference974 ★

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

llama3.java810 ★

Llama 3+ inference in pure Java

blast776 ★

Browser-LLM Auto-Scaling Technology

GenossGPT755 ★

One API for all LLMs either Private or Public (Anthropic, Llama V2, GPT 3.5/4, Vertex, GPT4ALL, HuggingFace…

mlxstudio738 ★

MLX Studio - Home of JANG_Q - Image Gen/Edit + Chat/Code All in one - + OpenClaw (Anthropic API)

mlx-omni-server718 ★

MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple…

LLaMA_MPS584 ★

Run LLaMA (and Stanford-Alpaca) inference on Apple Silicon GPUs.

uzi580 ★

CLI for running large numbers of coding agents in parallel with git worktrees

MiniSearch564 ★

Minimalist web-searching platform with an AI assistant that runs directly from your browser. Uses Wllama and…

rkllama543 ★

Ollama alternative for Rockchip NPU: An efficient solution for running AI and Deep learning models on…

aikit524 ★

🏗️ Fine-tune, build, and deploy open-source LLMs easily!

MARTI515 ★

A Framework for LLM-based Multi-Agent Reinforced Training and Inference

vllm-cli497 ★

A command-line interface tool for serving LLM using vLLM.

LLM-VM493 ★

irresponsible innovation. Try now at https://chat.dev/

chipper486 ★

✨ AI interface for tinkerers (Ollama, Haystack RAG, Python)

skills483 ★

inference.sh Agent skills for using our API to give your agents access to hundreds of apps and other agents

sagify443 ★

LLMs and Machine Learning done easily

local-llm-function-calling437 ★

A tool for generating function arguments and choosing what function to call with local LLMs

incognide409 ★

Explore the unknown, build the future, own your data.

LLM-Hub400 ★

Local AI Assistant on your phone

zinc396 ★

Zig INferenCe Engine — Local LLM inference on AMD GPUs and Apple Silicon

super-rag393 ★

Super performant RAG pipelines for AI apps. Summarization, Retrieve/Rerank and Code Interpreters in one…

Context-Engine392 ★

Context-Engine MCP - Agentic Context Compression Suite

NanoLLM370 ★

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models…

Rustchain353 ★

DePIN for Vintage Hardware — Proof-of-Antiquity blockchain where old machines outmine new ones. AI-powered…

awesome-architecture341 ★

🗺️ Think like a software architect, not just a coder — 21 architecture maps (incl. AI gateway, RAG, agents…

chat.petals.dev319 ★

💬 Chatbot web app + HTTP and Websocket endpoints for LLM inference with the Petals client

ThunderAgent318 ★

A simple, fast and robust program-aware agentic inference system.

leanctx308 ★

Drop-in prompt compression for production LLM apps. Cut your token bill 40-60% without changing your code…

ChordMiniApp302 ★

Music Analysis, Chord Recognition, Beat Tracking, Guitar Diagrams, Piano Visualizer, Lyrics Transcription…

local-llms-on-android287 ★

Run local LLMs like Gemma, Qwen, and LLaMA on Android for offline, private, real-time chat and question…

smg287 ★

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM…

TurboOCR278 ★

Fast GPU OCR server. 270 img/s on FUNSD. TensorRT FP16, PP-OCRv5, HTTP + gRPC.

Kotlin-AI-Examples264 ★

A collection of Kotlin-based examples featuring AI frameworks such as Spring AI, LangChain4j, and more …

xagent247 ★

A production-ready platform for dynamic AI agents — plan, use tools, and complete real work without hardcoded…

bespoke_automata222 ★

Bespoke Automata is a GUI and deployment pipline for making complex AI agents locally and offline

gym-cooking222 ★

🏆 gym-cooking: Code for "Too many cooks: Bayesian inference for coordinating multi-agent collaboration"…

openai_trtllm220 ★

OpenAI compatible API for TensorRT LLM triton backend

pocketgroq217 ★

PocketGroq is a powerful Python library that simplifies integration with the Groq API, offering advanced…

insights-lm-local-package214 ★

Open-source, fully private and local alternative to NotebookLM. Chat with your documents, generate audio…

decapod213 ★

Decapod is the daemonless, local-first governance kernel behind AI coding agents. Agents call it on demand to…

grove211 ★

Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling

graphsignal-python205 ★

Graphsignal Python SDK

demo-chatbot200 ★

A template to create any LLM Inference Web Apps using Python only

HiveMind192 ★

HiveMind Protocol - A Local-First, Privacy-Preserving Architecture for Agentic RAG

local-deepsearch-academic180 ★

An implementation of Google Deep Search 🕵️ with support for 1000+ references, local inference, chatting with…

DyPRAG180 ★

[arxiv: 2503.23895] Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement

ShannonBase173 ★

The Next-Gen Database for AI—an infrastructure designed for data and AI. As the MySQL of the AI era.

llm-api170 ★

Run any Large Language Model behind a unified API

booster169 ★

Booster - open accelerator for LLM models. Better inference and debugging for AI hackers

libre-chat166 ★

🦙 Free and Open Source Large Language Model (LLM) chatbot web UI and API. Self-hosted, offline capable and…

tinyagentos165 ★

Self-hosted auto clustering AI agent OS for consumer hardware like the computer you already own, an Orange or…

promptbook160 ★

Turn your company's scattered knowledge into AI ready Books ✨

grps_trtllm159 ★

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service…

monocle153 ★

Monocle is a framework for tracing GenAI app code. This repo contains implementation of Monocle for GenAI…

llm-systems-engineering-roadmap153 ★

A practical roadmap for mastering LLM internals, training, inference, RAG, agents, evaluation, and production…

aibitat151 ★

Multi-Agent Conversation Framework in TypeScript

ialacol147 ★

🪶 Lightweight OpenAI drop-in replacement for Kubernetes

faster-chat144 ★

A blazingly fast, privacy first & OPEN AI Chat Interface

gpt4local142 ★

Openai-style, fast & lightweight local language model inference w/ documents

GHOST140 ★

Agentic ✧ Gemma Inference for Android System Intelligence

Llamatik136 ★

True on-device AI for Kotlin Multiplatform (Android, iOS, Desktop, JVM, WASM). LLM, Speech-to-Text and Image…

Toolio134 ★

GenAI & agent toolkit for Apple Silicon Mac, implementing JSON schema-steered structured output (3SO) and…

Gensokyo-llm133 ★

开源的智能体项目 支持6种聊天平台 Onebotv11一对多连接 流式信息 agent 对话keyboard气泡生成 支持10+大模型接口(持续更新) 具有将多种大模型接口转化为带有上下文的通用格式的能力.

EcoAssistant133 ★

EcoAssistant: using LLM assistant more affordably and accurately

Awesome-AI-For-Security126 ★

A curated list of tools, papers, and datasets for applying AI to cybersecurity tasks. This list primarily…

lm-proxy126 ★

OpenAI-compatible HTTP LLM proxy / gateway for multi-provider inference (Google, Anthropic, OpenAI, PyTorch)…

llm-inference125 ★

Large Language Model (LLM) Inference API and Chatbot

llm-interface123 ★

A simple NPM interface for seamlessly interacting with 36 Large Language Model (LLM) providers, including…

SLED122 ★

SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model …

inference-gateway122 ★

An open-source, cloud-native, high-performance gateway unifying multiple LLM providers, from local solutions…

ai-hardware-engineer-roadmap119 ★

Master AI inference, AI agent harness systems, and hardware engineering — then design a physical AI chip…

MAS-Zero118 ★

Designing Multi-Agent Systems with Zero Supervision

opentau113 ★

Using Large Language Models for Repo-wide Type Prediction

ContextPilot112 ★

Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM…

Oxide-Lab111 ★

Modern desktop application (Rust + Tauri v2 + Svelte 5 + Candle (HF)) for communicating with AI models that…

twosetai111 ★

All the code and materials

Conduit109 ★

🦑 Unified Swift SDK for LLM inference across local and cloud providers

mlx-serve100 ★

Native LLM inference server for Apple Silicon. OpenAI + Anthropic API compatible. No Python. Includes MLX…

deep-active-inference-mc100 ★

Deep active inference agents using Monte-Carlo methods

LLMinator99 ★

Gradio based tool to run opensource LLM models directly from Huggingface

BlazorGPT99 ★

BlazorGPT is a Blazor Server application that uses Semantic Kernel plus OpenAI, Azure OpenAI and Ollama for…

orka-reasoning96 ★

Orchestrator Kit for Agentic Reasoning - OrKa is a modular AI orchestration system that transforms Large…

langport94 ★

Langport is a language model inference service

llmariner94 ★

Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.

pasllm92 ★

PasLLM - LLM inference engine in Object Pascal (synced from my private work repository)

InferrLM92 ★

InferrLM - On-device AI for iOS & Android

deep-recall91 ★

Enterprise-grade memory framework for LLMs featuring GPU-optimized inference, vector storage, and automated…

infermux91 ★

Route inference across providers.

Rules.txt89 ★

LLM alignment jailbreak; a set of instructions for auditing their internal reasoning and uncovering biases

dreamgraph87 ★

DreamGraph is a graph-first cognitive layer (graph → MCP → CLI → dashboard → extension) that builds a…

pixelbot85 ★

Multimodal AI agent, an interactive data studio with on-demand ML inference, media generation, and a database…

gitvoyant82 ★

Temporal Code Intelligence platform. Time-series complexity analysis across Python, JavaScript, Java, and Go…

quickstart-streaming-agents81 ★

Build, deploy, and orchestrate event-driven agents natively on Apache Flink® and Apache Kafka®

Noema-Declarative-AI80 ★

A declarative way to control LLMs.

home79 ★

Confidential is the confidential computing stack. We run your AI workloads (inference, training, agents) in…

wingman77 ★

Inference Hub for AI at Scale

awesome-local-ai75 ★

152 open-source tools to run LLMs 100% locally – no cloud, no API keys, no censorship

otto-m872 ★

Flowchart-like UI to interconnect LLM's and Huggingface models, and deploy them as a REST API with little to…

kit71 ★

KIT (Knowledge Inference Tool) — A lightweight AI agent for coding

sample-genai-on-eks-starter-kit67 ★

A comprehensive toolkit for deploying production-ready Generative AI infrastructure on Amazon EKS. Includes…

web3-ai-trading-agent66 ★

Build an Autonomous Web3 AI Trading Agent (BASE + Uniswap V4 example)

llama-stack-client-swift66 ★

llama-stack-client-swift brings the inference and agents APIs of Llama Stack to iOS.

LLM_Powered_Video_Search65 ★

[SOICT 2024] LLM-Powered Video Search: A Comprehensive Multimedia Retrieval System

minrlm64 ★

A small Recursive Language Model: let any LLM run code on its context instead of stuffing it into the prompt.

OpenVitamin63 ★

OpenVitamin is a local-first AI execution platform that unifies Agents, Workflows, and multi-model inference…

llm.f9063 ★

LLM inference in Fortran

LlamaLib60 ★

Cross-Platform High-Level LLM Library

AI-ML60 ★

A curated, hands-on library of notebooks, demos, and resources for AI/ML, Deep Learning, Generative AI…

Spiderbrain-V359 ★

SpiderBrain v3 is a multi-platform skill/framework to reduce token usage and AI hallucinations across Claude…

aura59 ★

A sovereign cognitive architecture with IIT 4.0 integrated information, residual-stream affective steering…

vllm-factory59 ★

Production inference for encoder models - ColBERT, GLiNER, ColPali, embeddings etc. - as vLLM plugins for…

Zikkaron58 ★

Biologically-inspired persistent memory engine for Claude Code. 26 cognitive subsystems, Hopfield networks…

anthropic-proxy-rs58 ★

A proxy server that intercepts Anthropic API requests and converts them to OpenAI-compatible format, enabling…

tokio-prompt-orchestrator57 ★

Multi-core, Tokio-native orchestration for LLM pipelines.

yalla57 ★

A tiny LLM Agent with minimal dependencies, focused on local inference.

sibila54 ★

Extract structured data from local or remote LLM models

agentic-swarm-bench54 ★

Open-source benchmark for LLM inference under agentic swarm workloads

MOTO-Autonomous-ASI52 ★

MOTO Autonomous ASI Deep Research Harness by Intrafere - creative novelty-seeking researcher with autonomous…

Project-Chimera52 ★

Neuro-Symbolic-Causal AI - Project Chimera | 🌌 An open research project exploring formal verification of AI…

taskyon52 ★

Browser based Interface for Generative AI. Chat/Agent/Taskmanager Hybrid.

auto-ollama52 ★

run ollama & gguf easily with a single command

ai-platform51 ★

PHP library for interacting with AI platform provider.

flame50 ★

A Distributed Engine for AI

RAG-with-Cross-Encoder-Reranker50 ★

Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.

llmtrace49 ★

Zero-code LLM security & observability proxy. Real-time prompt injection detection, PII scanning, and cost…

monkeys-with-typewriters48 ★

The complete AI platform on a $3 microcontroller. Sub-millisecond inference. Zero hallucinations.

Cre4T3Tiv348 ★

dataclysm48 ★

Pull high-quality, efficient embeddings for PubMed, arXiv and Wikipedia from Huggingface and use for local…

MAHPPO47 ★

PyTorch implementation of the paper: Multi-Agent Collaborative Inference via DNN Decoupling: Intermediate…

dpc-messenger46 ★

Since November 2025 D-PC Messenger, a decentralized, Privacy-First Infrastructure for Human-AI-Team…

membership_inference46 ★

Python package to create adversarial agents for membership inference attacks againts machine learning models

shadow-ai45 ★

Shadow AI: stealth AI assistant for restricted/locked-down environments, enabling cross-device interaction…

wingman45 ★

Wingman is the fastest and easiest way to run Llama models on your PC or Mac.

llmedge45 ★

Android native AI inference library, bringing gguf models and stable-diffusion inference on android devices…

llmBench45 ★

llmBench is a high-depth benchmarking tool designed to measure the raw performance of local LLM runtimes…

AutoToM45 ★

[NeurIPS 2025 𝐒𝐩𝐨𝐭𝐥𝐢𝐠𝐡𝐭] AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling

Qurio44 ★

Qurio brings multi-provider models, custom agents, reusable skills, MCP servers, HTTP tools, retrieval…

BlockRank44 ★

BlockRank makes LLMs efficient and scalable for RAG and in-context ranking

Helios-Engine44 ★

Helios Engine is a powerful and flexible Rust framework for building LLM-powered agents with tool support…

SocialDeductionLLM43 ★

Training and inference code for "Training Language Models for Social Deduction with Multi-Agent Reinforcement…

Maestro41 ★

LM-Kit Maestro is a secure, innovative desktop application that orchestrates AI agents offline, empowering…

opla40 ★

Empower Your Productivity with Local AI Assistants

drama-engine40 ★

A Framework for Narrative Agents

Live2D-LLM-Chat39 ★

Live2D + ASR + LLM + TTS → Real-time communication + Offline Deployment/Cloud Inference 实时沟通 本地部署/云端推理

ht39 ★

ht - a shell command that answers your questions about shell commands

crew-news38 ★

🧱 CrewNews is an AI news generator that delivers an unbiased version of the news for a given topic, using…

captain-claw38 ★

AI agent with multi-agent orchestration, autonomous cognitive systems, and a full management dashboard

Cortex38 ★

Persistent memory for Claude Code — 41 neuroscience papers, 26 biological mechanisms with paper-bearing…

axe38 ★

axe - a precision agentic coder. large codebases. zero bloat. terminal-native. precise retrieval. powerful…

snapllm37 ★

🔥 🔥 Alternative to Ollama 🔥 🔥 multi-model <1ms LLM switching

llm-scale-deploy-guide36 ★

An end-to-end pipeline to optimize and host LLM for 100K parallel queries

zeph35 ★

Memory-first Rust AI agent for long-running work. Temporal graph memory, self-learning skills, multi-model…

vllm-awq4-qwen35 ★

vLLM Qwen 3.6-27B (AWQ-INT4) + DFlash speculative decoding on AMD Strix Halo (gfx1151 iGPU, 128 GB UMA, ROCm…

shinzo34 ★

Complete observability platform for AI agents and MCP servers. Improve your AI deployment outcomes, identify…

Browse other capabilitys