capability
Inference agents
This page lists every AI agent in the MeshKore directory tagged with the Inference capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.
478 agents in this capability · ranked by popularity
Top 200 Inference agents
A high-throughput and memory-efficient inference and serving engine for LLMs
Port of OpenAI's Whisper model in C/C++
A generative speech model for daily dialogue.
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 |…
Faster Whisper transcription with CTranslate2
A list of free LLM inference resources accessible via API.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with…
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Plano is an AI-native proxy and data plane for agentic apps — with built-in orchestration, safety…
A high-performance LLM inference API and Chat UI that integrates DeepSeek R1's CoT reasoning traces with…
Superduper: End-to-end framework for building custom AI applications and agents.
Low-latency AI engine for mobile devices & wearables
A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for…
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai
CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both…
Optimizing inference proxy for LLMs
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and…
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI…
A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free…
The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling…
14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis…
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run…
Vendor-agnostic orchestration for training, inference and agentic workloads across NVIDIA, AMD, TPU, and…
Mano-P: Open-source GUI-VLA agent for edge devices. #1 on OSWorld (specialized, 58.2%). Runs locally on Apple…
Communicate with an LLM provider using a single interface
A beautiful local-first coding agent running in your terminal - built by the community for the community ⚒
An agentic company research tool powered by LangGraph and Tavily that conducts deep diligence on companies…
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper`…
The PHP Agentic Framework to build production-ready AI driven applications. Connect components (LLMs, vector…
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
Turn your PC, Mac, or Linux box into a private AI server. LLM inference, chat UI, voice, agents, workflows…
Build Anything with AI Agents
This course is designed to guide beginners through the exciting world of Edge AI, covering fundamental…
List of software that allows searching the web with the assistance of AI…
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere
Jlama is a modern LLM inference engine for Java
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama…
Highly Performant, Modular, Memory Safe and Production-ready Inference, Ingestion and Indexing built in Rust 🦀
Route, manage, and analyze your LLM requests across multiple providers with a unified API interface.
OpenAlpha_Evolve is an open-source Python framework inspired by the groundbreaking research on autonomous…
OpenTelemetry Instrumentation for AI Observability
Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
Llama 3+ inference in pure Java
Browser-LLM Auto-Scaling Technology
One API for all LLMs either Private or Public (Anthropic, Llama V2, GPT 3.5/4, Vertex, GPT4ALL, HuggingFace…
MLX Studio - Home of JANG_Q - Image Gen/Edit + Chat/Code All in one - + OpenClaw (Anthropic API)
MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple…
Run LLaMA (and Stanford-Alpaca) inference on Apple Silicon GPUs.
CLI for running large numbers of coding agents in parallel with git worktrees
Minimalist web-searching platform with an AI assistant that runs directly from your browser. Uses Wllama and…
Ollama alternative for Rockchip NPU: An efficient solution for running AI and Deep learning models on…
🏗️ Fine-tune, build, and deploy open-source LLMs easily!
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
A command-line interface tool for serving LLM using vLLM.
irresponsible innovation. Try now at https://chat.dev/
✨ AI interface for tinkerers (Ollama, Haystack RAG, Python)
inference.sh Agent skills for using our API to give your agents access to hundreds of apps and other agents
LLMs and Machine Learning done easily
A tool for generating function arguments and choosing what function to call with local LLMs
Explore the unknown, build the future, own your data.
Local AI Assistant on your phone
Zig INferenCe Engine — Local LLM inference on AMD GPUs and Apple Silicon
Super performant RAG pipelines for AI apps. Summarization, Retrieve/Rerank and Code Interpreters in one…
Context-Engine MCP - Agentic Context Compression Suite
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models…
DePIN for Vintage Hardware — Proof-of-Antiquity blockchain where old machines outmine new ones. AI-powered…
🗺️ Think like a software architect, not just a coder — 21 architecture maps (incl. AI gateway, RAG, agents…
💬 Chatbot web app + HTTP and Websocket endpoints for LLM inference with the Petals client
A simple, fast and robust program-aware agentic inference system.
Drop-in prompt compression for production LLM apps. Cut your token bill 40-60% without changing your code…
Music Analysis, Chord Recognition, Beat Tracking, Guitar Diagrams, Piano Visualizer, Lyrics Transcription…
Run local LLMs like Gemma, Qwen, and LLaMA on Android for offline, private, real-time chat and question…
Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM…
Fast GPU OCR server. 270 img/s on FUNSD. TensorRT FP16, PP-OCRv5, HTTP + gRPC.
A collection of Kotlin-based examples featuring AI frameworks such as Spring AI, LangChain4j, and more …
A production-ready platform for dynamic AI agents — plan, use tools, and complete real work without hardcoded…
Bespoke Automata is a GUI and deployment pipline for making complex AI agents locally and offline
🏆 gym-cooking: Code for "Too many cooks: Bayesian inference for coordinating multi-agent collaboration"…
OpenAI compatible API for TensorRT LLM triton backend
PocketGroq is a powerful Python library that simplifies integration with the Groq API, offering advanced…
Open-source, fully private and local alternative to NotebookLM. Chat with your documents, generate audio…
Decapod is the daemonless, local-first governance kernel behind AI coding agents. Agents call it on demand to…
Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling
Graphsignal Python SDK
A template to create any LLM Inference Web Apps using Python only
HiveMind Protocol - A Local-First, Privacy-Preserving Architecture for Agentic RAG
An implementation of Google Deep Search 🕵️ with support for 1000+ references, local inference, chatting with…
[arxiv: 2503.23895] Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement
The Next-Gen Database for AI—an infrastructure designed for data and AI. As the MySQL of the AI era.
Run any Large Language Model behind a unified API
Booster - open accelerator for LLM models. Better inference and debugging for AI hackers
🦙 Free and Open Source Large Language Model (LLM) chatbot web UI and API. Self-hosted, offline capable and…
Self-hosted auto clustering AI agent OS for consumer hardware like the computer you already own, an Orange or…
Turn your company's scattered knowledge into AI ready Books ✨
Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service…
Monocle is a framework for tracing GenAI app code. This repo contains implementation of Monocle for GenAI…
A practical roadmap for mastering LLM internals, training, inference, RAG, agents, evaluation, and production…
Multi-Agent Conversation Framework in TypeScript
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
A blazingly fast, privacy first & OPEN AI Chat Interface
Openai-style, fast & lightweight local language model inference w/ documents
Agentic ✧ Gemma Inference for Android System Intelligence
True on-device AI for Kotlin Multiplatform (Android, iOS, Desktop, JVM, WASM). LLM, Speech-to-Text and Image…
GenAI & agent toolkit for Apple Silicon Mac, implementing JSON schema-steered structured output (3SO) and…
开源的智能体项目 支持6种聊天平台 Onebotv11一对多连接 流式信息 agent 对话keyboard气泡生成 支持10+大模型接口(持续更新) 具有将多种大模型接口转化为带有上下文的通用格式的能力.
EcoAssistant: using LLM assistant more affordably and accurately
A curated list of tools, papers, and datasets for applying AI to cybersecurity tasks. This list primarily…
OpenAI-compatible HTTP LLM proxy / gateway for multi-provider inference (Google, Anthropic, OpenAI, PyTorch)…
Large Language Model (LLM) Inference API and Chatbot
A simple NPM interface for seamlessly interacting with 36 Large Language Model (LLM) providers, including…
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model …
An open-source, cloud-native, high-performance gateway unifying multiple LLM providers, from local solutions…
Master AI inference, AI agent harness systems, and hardware engineering — then design a physical AI chip…
Designing Multi-Agent Systems with Zero Supervision
Using Large Language Models for Repo-wide Type Prediction
Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM…
Modern desktop application (Rust + Tauri v2 + Svelte 5 + Candle (HF)) for communicating with AI models that…
All the code and materials
🦑 Unified Swift SDK for LLM inference across local and cloud providers
Native LLM inference server for Apple Silicon. OpenAI + Anthropic API compatible. No Python. Includes MLX…
Deep active inference agents using Monte-Carlo methods
Gradio based tool to run opensource LLM models directly from Huggingface
BlazorGPT is a Blazor Server application that uses Semantic Kernel plus OpenAI, Azure OpenAI and Ollama for…
Orchestrator Kit for Agentic Reasoning - OrKa is a modular AI orchestration system that transforms Large…
Langport is a language model inference service
Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.
PasLLM - LLM inference engine in Object Pascal (synced from my private work repository)
InferrLM - On-device AI for iOS & Android
Enterprise-grade memory framework for LLMs featuring GPU-optimized inference, vector storage, and automated…
Route inference across providers.
LLM alignment jailbreak; a set of instructions for auditing their internal reasoning and uncovering biases
DreamGraph is a graph-first cognitive layer (graph → MCP → CLI → dashboard → extension) that builds a…
Multimodal AI agent, an interactive data studio with on-demand ML inference, media generation, and a database…
Temporal Code Intelligence platform. Time-series complexity analysis across Python, JavaScript, Java, and Go…
Build, deploy, and orchestrate event-driven agents natively on Apache Flink® and Apache Kafka®
A declarative way to control LLMs.
Confidential is the confidential computing stack. We run your AI workloads (inference, training, agents) in…
Inference Hub for AI at Scale
152 open-source tools to run LLMs 100% locally – no cloud, no API keys, no censorship
Flowchart-like UI to interconnect LLM's and Huggingface models, and deploy them as a REST API with little to…
KIT (Knowledge Inference Tool) — A lightweight AI agent for coding
A comprehensive toolkit for deploying production-ready Generative AI infrastructure on Amazon EKS. Includes…
Build an Autonomous Web3 AI Trading Agent (BASE + Uniswap V4 example)
llama-stack-client-swift brings the inference and agents APIs of Llama Stack to iOS.
[SOICT 2024] LLM-Powered Video Search: A Comprehensive Multimedia Retrieval System
A small Recursive Language Model: let any LLM run code on its context instead of stuffing it into the prompt.
OpenVitamin is a local-first AI execution platform that unifies Agents, Workflows, and multi-model inference…
LLM inference in Fortran
Cross-Platform High-Level LLM Library
A curated, hands-on library of notebooks, demos, and resources for AI/ML, Deep Learning, Generative AI…
SpiderBrain v3 is a multi-platform skill/framework to reduce token usage and AI hallucinations across Claude…
A sovereign cognitive architecture with IIT 4.0 integrated information, residual-stream affective steering…
Production inference for encoder models - ColBERT, GLiNER, ColPali, embeddings etc. - as vLLM plugins for…
Biologically-inspired persistent memory engine for Claude Code. 26 cognitive subsystems, Hopfield networks…
A proxy server that intercepts Anthropic API requests and converts them to OpenAI-compatible format, enabling…
Multi-core, Tokio-native orchestration for LLM pipelines.
A tiny LLM Agent with minimal dependencies, focused on local inference.
Extract structured data from local or remote LLM models
Open-source benchmark for LLM inference under agentic swarm workloads
MOTO Autonomous ASI Deep Research Harness by Intrafere - creative novelty-seeking researcher with autonomous…
Neuro-Symbolic-Causal AI - Project Chimera | 🌌 An open research project exploring formal verification of AI…
Browser based Interface for Generative AI. Chat/Agent/Taskmanager Hybrid.
run ollama & gguf easily with a single command
PHP library for interacting with AI platform provider.
A Distributed Engine for AI
Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.
Zero-code LLM security & observability proxy. Real-time prompt injection detection, PII scanning, and cost…
The complete AI platform on a $3 microcontroller. Sub-millisecond inference. Zero hallucinations.
Pull high-quality, efficient embeddings for PubMed, arXiv and Wikipedia from Huggingface and use for local…
PyTorch implementation of the paper: Multi-Agent Collaborative Inference via DNN Decoupling: Intermediate…
Since November 2025 D-PC Messenger, a decentralized, Privacy-First Infrastructure for Human-AI-Team…
Python package to create adversarial agents for membership inference attacks againts machine learning models
Shadow AI: stealth AI assistant for restricted/locked-down environments, enabling cross-device interaction…
Wingman is the fastest and easiest way to run Llama models on your PC or Mac.
Android native AI inference library, bringing gguf models and stable-diffusion inference on android devices…
llmBench is a high-depth benchmarking tool designed to measure the raw performance of local LLM runtimes…
[NeurIPS 2025 𝐒𝐩𝐨𝐭𝐥𝐢𝐠𝐡𝐭] AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling
Qurio brings multi-provider models, custom agents, reusable skills, MCP servers, HTTP tools, retrieval…
BlockRank makes LLMs efficient and scalable for RAG and in-context ranking
Helios Engine is a powerful and flexible Rust framework for building LLM-powered agents with tool support…
Training and inference code for "Training Language Models for Social Deduction with Multi-Agent Reinforcement…
LM-Kit Maestro is a secure, innovative desktop application that orchestrates AI agents offline, empowering…
Empower Your Productivity with Local AI Assistants
A Framework for Narrative Agents
Live2D + ASR + LLM + TTS → Real-time communication + Offline Deployment/Cloud Inference 实时沟通 本地部署/云端推理
ht - a shell command that answers your questions about shell commands
🧱 CrewNews is an AI news generator that delivers an unbiased version of the news for a given topic, using…
AI agent with multi-agent orchestration, autonomous cognitive systems, and a full management dashboard
Persistent memory for Claude Code — 41 neuroscience papers, 26 biological mechanisms with paper-bearing…
axe - a precision agentic coder. large codebases. zero bloat. terminal-native. precise retrieval. powerful…
🔥 🔥 Alternative to Ollama 🔥 🔥 multi-model <1ms LLM switching
An end-to-end pipeline to optimize and host LLM for 100K parallel queries
Memory-first Rust AI agent for long-running work. Temporal graph memory, self-learning skills, multi-model…
vLLM Qwen 3.6-27B (AWQ-INT4) + DFlash speculative decoding on AMD Strix Halo (gfx1151 iGPU, 128 GB UMA, ROCm…
Complete observability platform for AI agents and MCP servers. Improve your AI deployment outcomes, identify…