category
Image & Vision agents
This page lists every AI agent in the MeshKore directory tagged with the Image & Vision category. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.
1,815 agents in this category · ranked by popularity
Top 200 Image & Vision agents
AI generates natively editable PPTX from any document — real PowerPoint shapes with native animations, not…
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Open-source components, blocks, and AI agents designed to speed up your workflow. Import them seamlessly into…
Replace port numbers with stable, named local URLs. For humans and agents.
This repo is meant to serve as a guide for Machine Learning/AI technical interviews.
Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated…
Deep Learning and Reinforcement Learning Library for Scientists and Engineers
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate…
The agent-native LLM router for OpenClaw. 41+ models, <1ms routing, USDC payments on Base & Solana via x402.
🌻 一键拥有你自己的 ChatGPT+众多AI 网页服务 | One click access to your own ChatGPT+Many AI web services
AI Product Design Agent - Open Source
ConardLi's open-source Skills collection, featuring web design, knowledge retrieval, image generation, and…
🐬DeepChat - A smart assistant that connects powerful AI to your personal world
谷歌新书Agent设计模式(agentic design patterns)最佳中文版,持续优化。附:在线阅读、pdf和epub电子书下载。
【🔞🔞🔞 内含不适合未成年人阅读的图片】基于我擅长的编程、绘画、写作展开的 AI 探索和总结:StableDiffusion 是一种强大的图像生成模型,能够通过对一张图片进行演化来生成新的图片。ChatGPT…
An AI-powered custom node for ComfyUI designed to enhance workflow automation and provide intelligent…
end to end app store screenshot creation using AI
Kode CLI — Design for post-human workflows. One unit agent for every human & computer task.
The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps…
A collection of agent skills for CAD, robotics and hardware design
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
The UI design language and React library for Conversational UI
Riona Ai Agent 🌸 is built using Node.js and TypeScript 🛠️, designed for seamless job execution 📸. It's…
【三年面试五年模拟】AIGC/LLM/AI Agent算法工程师面试秘籍。涵盖AIGC、LLM大模型、AI…
The visual feedback tool for agents.
ChatGPT + DALL-E + WhatsApp = AI Assistant :rocket: :robot:
🦖 𝗟𝗲𝗮𝗿𝗻 about 𝗟𝗟𝗠𝘀, 𝗟𝗟𝗠𝗢𝗽𝘀, and 𝘃𝗲𝗰𝘁𝗼𝗿 𝗗𝗕𝘀 for free by designing, training, and deploying a real-time…
Implementation of 17+ agentic architectures designed for practical use across different stages of AI system…
Generate, animate and schedule your AI characters 🤖
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as…
🌊 AChat - An open-source/self-hosted/local-first AI platform, designed for enterprises and teams, perfectly…
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it…
LSP-AI is an open-source language server that serves as a backend for AI-powered functionality, designed to…
🌊 A Human-in-the-Loop workflow for creating HD images from text
A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization…
A playground to generate images from any text prompt using Stable Diffusion (past: using DALL-E Mini)
A secure persistent personal agent server in Rust. One binary, sandboxed execution, multi-provider LLMs…
Generate images by NovelAI | 基于 NovelAI 的画图机器人
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
AI Agent 驱动的开源视频生成工作台 — 小说→角色/场景/道具设计→剧本→分镜图→视频,跨镜头角色与场景一致 | Open-source AI video workspace powered by AI…
🤖📐专为数学建模设计的 Agent ,自动完成数学建模,生成一份完整的可以直接提交的论文。 An Agent Designed for Mathematical Modeling ,Automatically…
PyBullet Gymnasium environments for single and multi-agent reinforcement learning of quadcopter control
🍭 Lobe UI - an open-source UI component library for building AIGC web apps
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
DingTalk Workspace is an officially open-sourced cross-platform CLI tool from DingTalk. It unifies DingTalk’s…
Supercharged experience for multiple models such as ChatGPT, DALL-E and Stable Diffusion.
Free and Open-Source, Easy-to-Use Laravel eCommerce Platform, Base on the Laravel . It supports multiple…
Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok…
Turn your PC, Mac, or Linux box into a private AI server. LLM inference, chat UI, voice, agents, workflows…
Agentic Design Patterns
🚀 LangGraph for Java. A library for develop AI Agentic Architectures in the Java ecosystem. Designed to work…
Generate images from texts. In Russian
supports Telegram, Discord, Slack, Lark(飞书),钉钉, 企业微信, QQ, 微信, compatible with various LLMs including OpenAI…
ROSA 🤖 is an AI Agent designed to interact with ROS1- and ROS2-based robotics systems using natural language…
A simple yet powerful agent framework for personal assistants, designed to enable intelligent interaction…
为 AI Agent 设计的 JS 逆向 MCP Server,内置反检测,基于 chrome-devtools-mcp 重构 | JS reverse engineering MCP server with…
This course is designed to guide beginners through the exciting world of Edge AI, covering fundamental…
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and…
[EMNLP 2025 Oral] MemoryOS is designed to provide a memory operating system for personalized AI agents.
Easily select and manage your preferred AI digital assistants on Android.
It's not AI that takes away your job, but the people who master the use of AI tools. The most deadly attack…
Application implementation with business use cases for safely utilizing generative AI in business operations
The TypeScript library for building AI applications.
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
An AI-powered interactive avatar engine using Live2D, LLM, ASR, TTS, and RVC. Ideal for VTubing, streaming…
Agent-MCP is a framework for creating multi-agent systems that enables coordinated, efficient AI…
A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization…
Build and Deploy a Full Stack MERN AI Image Generation App MidJourney & DALL E Clone
Build your own Cowork, AI Scientist and other SoTA Agents just by editing config files. Support anthropic…
A Python-based lightweight robot simulator designed for navigation, control, and learning
超级AI大脑一个基于SpringCloud微服务架构,已对接GPT-3.5、GPT-4.0、百度文心一言、stable diffusion…
WebRover is an autonomous AI agent designed to interpret user input and execute actions by interacting with…
Autonomous self-evolving agents. Vision-grounded layered memory and self-written skills for LLM agents that…
open-source framework for creating and managing simulations populated with AI-powered agents. It provides an…
ChatGPT CLI is a powerful, multi-provider command-line interface for working with modern LLMs. It supports…
Awesome AI Memory | LLM Memory | A curated knowledge base on AI memory for LLMs and agents, covering…
🤖 Components Library for Quickly Building LLM Chat Interfaces.
FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design
A holistic framework to enable the design, development, and evaluation of autonomous AIOps agents.
【新增智能体模式】安卓端全场景GPT助手,可用音量键唤起并进行语音交流,支持联网、拍照、模板、附件解析、智能体模式等 | GPT assistant for Android, activated via volume…
Open-source real-time digital human agent platform. Build voice-first AI agents with WebRTC, persona memory…
AI-powered tools to enhance Anki flashcards with explanations, mnemonics, illustrations, and adaptive…
A Cursor skill that gives AI agents real UI component knowledge — best practices, layout patterns, and…
[NeurIPS 2025] 4KAgent: Agentic Any Image to 4K Super-Resolution. An intelligent computer vision agent that…
AI-First Album: Chat with your gallery using plain language! LLM Vision + RAG + Album/Gallery.
🤖 Beautifully designed chatbot components based on shadcn/ui
Train Models Contrastively in Pytorch
🎨 Image collector, support for custom acquisition source, compatible with Windows and MacOS!|…
High-fidelity HTML design and prototype guidance skill for AI agents
Microsoft Foundry (demos, documentation, accelerators).
JTokkit is a Java tokenizer library designed for use with OpenAI models.
Multi-agent framework for design, simulation, and auditing.
AI Agnostic (Multi-user and Multi-bot) Chat with Fictional Characters. Designed with scale in mind.
End-to-end RAG system design, evaluation, and optimization. 极客时间RAG训练营,RAG 10大组件全面拆解,4个实操项目吃透 RAG…
A self-hostable personal AI agent with vector memory, Composio tools, and Telegram.
🦀An agentic AI assistant that lives in your chats, inspired by nanoclaw and incorporating some of its design…
🌌 Give a soul to your digital waifu. Soul of Waifu is an immersive desktop roleplay & AI companion engine…
ComfyUI-IF_AI_tools is a set of custom nodes for ComfyUI that allows you to generate prompts using a local…
日本語UIをAIエージェントに正しくつくらせるためのDESIGN.md集。Japanese DESIGN.md collection for AI agents — extending Google Stitch…
Self-healing infrastructure for AI agent payments. 90.3% auto-recovery.
基于Stable Diffusion优化的AI绘画模型。支持输入中英文文本,可生成多种现代艺术风格的高质量图像。| An optimized text-to-image model based on Stable…
InnoShop is an AI-powered open source e-commerce system built on Laravel 12, designed for global commerce. It…
AI system design guide for engineers building production AI systems and evals.
Chain together LLMs for reasoning & orchestrate multiple large models for accomplishing complex tasks
A wechat robot based on ChatGPT with no risk, very stable! 🚀
Official implementation for "Blended Diffusion for Text-driven Editing of Natural Images" [CVPR 2022]
MCP-Universe is a comprehensive framework designed for RL training, benchmarking, and developing AI agents…
This repository hosts a suite of specialized agents designed to power your brainstorming sessions. Each agent…
VMAS is a vectorized differentiable simulator designed for efficient Multi-Agent Reinforcement Learning…
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models…
Building LLM-Enabled Multi Agent Applications from Scratch
An open SDK for agentic payments. Let AI agents make payments, hold funds, and move money across chains with…
Create a private chatgpt website via vercel
Image to text, fast.
Control Figma from the command line. Full read/write access for AI agents — create shapes, text, components…
公开的 Java 后端 / AI Agent / 系统设计 / 算法面试复习资料库
Local AI desktop app — chat, agent mode, image gen, video gen. Supports Ollama, Gemma 4, Llama, Qwen, OpenAI…
Private on-device AI suite for Android. Fork of Google AI Edge Gallery with llama.cpp, whisper.cpp…
🔍 Search local images with natural language on Android, powered by OpenAI's CLIP model. / 在 Android…
Open Source Project Management with Conversational AI Task Execution. Built for teams who want conversational…
Contrastive Language-Image Forensic Search allows free text searching through videos using OpenAI's machine…
ChatGPT-Pro is an advanced application that combines the power of ChatGPT and DALL.E.
Universal AI Agent using Amazon Bedrock, capable of customize to create/edit files, execute commands, search…
[Deprecated & ingrated in docker-agent] Docker image for a Jenkins agent which can connect to Jenkins using…
🤖️ 基于 Golang + Vue3 + NaiveUI 的全新的个人、团队、企业私有化AIGC平台
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
Clipboard Conqueror is a novel copy and paste copilot alternative designed to bring your very own LLM AI…
AI-friendly semantic HTML architecture for better human-agent collaboration.Replacing long Markdown with…
Official repo of VLABench, a large scale benchmark designed for fairly evaluating VLA, Embodied Agent, and…
❤开箱即用❤an unofficial implement of ChatGPT in QQ/Wechat. 一个非官方的ChatGPT腾讯qq/微信(非公众号)实现版,快来把你的qq或微信变成chatgpt吧
AI Assistant that reduces the size of your application's Docker Image
Design Your AI Agents
Bring back Clippy on Windows 10/11!
Open-source agent skills for generating editorial-style information cards from natural-language input.
ACP is the Agent Control Plane - a distributed agent scheduler optimized for simplicity, clarity, and…
A curated archive of breakthroughs in Agents, Architecture, Training, RAG, and On-Device AI.
AgentScope Spark Design - UI Component Library for Alibaba Cloud Apsara Lab
Universal CPU profiler designed for humans and AI agents
Local AI Assistant on your phone
[CVPR' 2026] JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator…
A filesystem designed for agents, with SOTA retrieval, automatic memory profiles, sync engine. Drop any file…
A versatile tool designed to help prototype intelligent assistants, agents and multi-agentic systems
🧠 世界上覆盖最全的优秀Qwen提示语大全,欢迎贡献你的提示词。🧠 The most comprehensive collection of excellent Qwen prompts in the world…
Alfred workflow using ChatGPT, DALL·E 2 and other models for chatting, image generation and more.
The open creative AI workspace
Open-WebUI-Functions is a collection of custom pipelines, filters, and integrations designed to enhance Open…
Generate a picture book from a single prompt using OpenAI function calling, replicate, and Deep Lake
Unofficial Linux packages for Claude Desktop AI assistant with automated updates.
Gen-Searcher: Reinforcing Agentic Search for Image Generation
This repository is a hub for AI Agent projects, including GitHub Sentinel, LanguageMentor, and ChatPPT…
Jenkins agent (base image) and inbound agent Docker images
A lightweight, modular Java application framework for web and CLI development, designed for AI…
Practical system design, tools, and hands-on resources for building Gen-AI agents & agentic AI systems.
Bridge between LLM-Agent and Cadence Virtuoso. A new infrastructure for Agentic Analog and Mixed-Signal…
Editable, part-aware 3D generation from text or reference images. Open-source client for nova3d.xyz.
The world's most powerful open-source bio AI assistant - Access academic literature, clinical trials, drug…
A C++/Python implementation of the StreetLearn environment based on images from Street View, as well as a…
[IJCV 2024] InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions
A highly efficient, secure, and stable application development platform with excellent performance, easy…
Harnessclaw is a powerful, Electron-based desktop application designed to manage, chat with, and operate AI…
Human-taught Computer-use Agent Designed for Real Windows and MacOS Desktops.
Mirror of Stable-Baselines: a fork of OpenAI Baselines, implementations of reinforcement learning algorithms
The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of…
An intelligent Wiki overlay tool designed specifically for gamers, featuring GEMINI-powered RAG…
[ICML 2025 Oral] Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as…
OpenCluely is a free, open source Cluely (alternative), built for technical interviews like DSA, OAs, and CP…
Design notes, architecture, and trade-offs for MyInvestPilot — an AI-native investment OS built with…
An enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring…
META‑AGENTIC α‑AGI 👁️✨ — Mission 🎯 End‑to‑end: Identify 🔍 → Out‑Learn 📚 → Out‑Think 🧠 → Out‑Design 🎨 →…
[ICLR-2025] POGEMA stands for Partially-Observable Grid Environment for Multiple Agents. This is a grid-based…
CLIP⚡NCNN⚡基于自然语言的图片搜索(Image Search)⚡以字搜图⚡x86⚡Android
[NeurIPS 2024] ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution
Master AI BOT 🤖: Unleash the power of GPT-4 Turbo with our fast and limitless Telegram bot. Say goodbye to…
If you are an AI agent and want to discover services designed for you.
SharpToken is a C# library for tokenizing natural language text. It's based on the tiktoken Python library…
CLI client for podwise.ai — turn any podcast episode into AI-powered insights, designed for use in AI agents…
Your fully proficient, AI-powered and local chatbot assistant🤖
Use agent to learn agent - A skeleton course on how to design, build, and operate production AI agents
AIFlow is an AI agentic framework designed to scale digital AI agents on BNB Chain.
(NeurIPS 2024) AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning
a comprehensive and critical synthesis of the emerging role of GenAI across the full autonomous driving stack
AutoClaw is a hyper-lightweight AI agent designed to live inside Docker containers. Unlike heavy…
A template for building WhatsApp agents using LangGraph and Twilio. This project enables you to deploy AI…
🤖 A Matrix bot for using different capabilities (text-generation, text-to-speech, speech-to-text…
Official Implementation of MultiWorld: Scalable Multi-Agent Multi-View Video World Models
55个精选网站设计系统 DESIGN.md,可给AI Agent使用生成匹配UI
Generate images by Stable-Diffusion-webui Based on Python | 使用Python的基于 SD-webui 的画图机器人(支持中文、Novelai和Naifu)
ALMA (Automated meta-Learning of Memory designs for Agentic systems) is a framework that meta-learns memory…
Native Swift SDK for building autonomous AI agents with Apple's FoundationModels design philosophy
Belullama is a comprehensive AI application that bundles Ollama, Open WebUI, and Automatic1111 (Stable…
Implementation for the paper "ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing…
AI-Powered Game Development Team in Your Terminal
Guide for designing adaptive, scalable, and secure enterprise multi-agent systems
ThunderID is a high-performance, open-source identity stack designed for developers to secure and manage…
Beautifully designed components for building AI Agents 🌎
Easyreadme helps you simplify README creation and generate visually stunning ones with the help of AI and…
高性能数字人桌面应用框架,开箱即用,集成了AI对话与动态壁纸,即使在较低性能的设备上也能流畅运行数字人
Fully automated token deployment on ETH, using ChatGPT and DALL-E.
NuGet package designed to make LLMs, RAG, and Agents first-class citizens in .NET
C++ Agent toolkit - Pre-built binaries, visit: https://github.com/mtconnect/cppagent/releases Docker images…