capability
Image agents
This page lists every AI agent in the MeshKore directory tagged with the Image capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.
961 agents in this capability · ranked by popularity
Top 200 Image agents
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that…
AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of…
🎨 Local-first, open-source Claude Design alternative. ⚡ 19 Skills · ✨ 71 brand-grade Design Systems 🖼…
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule…
AI generates natively editable PPTX from any document — real PowerPoint shapes with native animations, not…
GPT-Image-2 API and Prompts
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming…
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
AI-agent Skill for generating polished HTML slide decks: editorial magazine and Swiss layouts, image prompts…
Force Remove Copilot, Recall and More in Windows 11
OpenAI ChatGPT, GPT-5, GPT-Image-1, Whisper API clients for Go
A Python library for anomaly detection across tabular, time series, graph, text, and image data. 60+…
Open-source intelligence for the global theater. Track everything from the corporate/private jets of the…
A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured…
Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated…
Deep Learning and Reinforcement Learning Library for Scientists and Engineers
AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. Includes AI personas…
An APP that integrates mainstream large language models and image generation models, built with Flutter, with…
🚀 World's largest GPT Image 2 prompt library, updated daily — 2000+ curated prompts with preview images, 16…
ConardLi's open-source Skills collection, featuring web design, knowledge retrieval, image generation, and…
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative…
World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn…
One beautiful Ruby API for OpenAI, Anthropic, Gemini, Bedrock, Azure, OpenRouter, DeepSeek, Ollama, VertexAI…
Generate, animate and schedule your AI characters 🤖
Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image…
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it…
✨ Reverse-engineered Python API for Google Gemini web app
🌊 A Human-in-the-Loop workflow for creating HD images from text
A cross-platform video structuring (video analysis) framework. If you find it helpful, please give it a star…
A playground to generate images from any text prompt using Stable Diffusion (past: using DALL-E Mini)
Generate images by NovelAI | 基于 NovelAI 的画图机器人
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
GPT Image 2 prompt gallery, image prompt library, agentic skill, and CLI for OpenAI image generation/editing
Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video…
AI Agent 驱动的开源视频生成工作台 — 小说→角色/场景/道具设计→剧本→分镜图→视频,跨镜头角色与场景一致 | Open-source AI video workspace powered by AI…
A Node.js CLI that uses Ollama and LM Studio models (Llava, Gemma, Llama etc.) to intelligently rename files…
Curated GPT-Image-2 prompts for the OpenAI API — portraits, posters, UI mockups, game screenshots, character…
基于 OpenAI gpt-image-2 API 的图片生成与编辑工具
Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok…
Turn your PC, Mac, or Linux box into a private AI server. LLM inference, chat UI, voice, agents, workflows…
Open source implementation and extension of Google Research’s PaperBanana for automated academic figures…
基于 manga-image-translator 的开源漫画翻译工具。支持日/韩/美漫自动翻译,内置 OpenAI、Gemini 等 5…
Generate images from texts. In Russian
Trench — Open-Source Analytics Infrastructure. A single production-ready Docker image built on ClickHouse…
A ChatGPT web client that supports multiple users, multiple languages, and multiple database connections for…
AI skill for OpenClaw & Claude Code — recommend from 10000+ Nano Banana Pro (Gemini) image prompts. Smart…
supports Telegram, Discord, Slack, Lark(飞书),钉钉, 企业微信, QQ, 微信, compatible with various LLMs including OpenAI…
网文/小说写作 skill 包,覆盖长篇与短篇网络小说的扫榜、拆文、写作、去AI味、封面图全流程
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and…
80+ free AI services for chat, image, video, voice & APIs (may sometimes include access to lead gen ai models…
This Discord chatbot is incredibly versatile. Powered incredibly fast Groq API
Application implementation with business use cases for safely utilizing generative AI in business operations
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
OpenAI-compatible API for Gemini Business with multi-account load balancing and multimodal capabilities…
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama…
Simple shell script to use OpenAI's ChatGPT and DALL-E from the terminal. No Python or JS required. Formerly…
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜…
Build and Deploy a Full Stack MERN AI Image Generation App MidJourney & DALL E Clone
A @ClickHouse fork that supports high-performance vector search and full-text search.
用AI创作高质量内容,用gpt-image-2创作的最佳生图工具,AI图片自动编排,小红书版Openclaw,自媒体创作者的AI工作台,小红书创作AI工具RedClaw,支持小红书图文下载、创作风格学习、小红书AI创作|…
Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and…
NyaProxy acts like a smart, central manager for accessing various online services (APIs) – think AI tools…
ChatGPT CLI is a powerful, multi-provider command-line interface for working with modern LLMs. It supports…
⚕️GenAI powered multi-agentic medical diagnostics and healthcare research assistance chatbot. 🏥 Designed for…
中文手绘技术 PPT 整页图像生成 Skill | 21:9 封面 + 16:9 正文配图 | PNG 输出
面向 Gemini 的 Local-First AI 工作流 WebUI,集成多模态聊天、Canvas、文件处理、实时搜索、代码执行与高级推理。
[NeurIPS 2025] 4KAgent: Agentic Any Image to 4K Super-Resolution. An intelligent computer vision agent that…
AI-First Album: Chat with your gallery using plain language! LLM Vision + RAG + Album/Gallery.
Train Models Contrastively in Pytorch
🎨 Image collector, support for custom acquisition source, compatible with Windows and MacOS!|…
The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF…
Microsoft Foundry (demos, documentation, accelerators).
MLX Studio - Home of JANG_Q - Image Gen/Edit + Chat/Code All in one - + OpenClaw (Anthropic API)
Open‑WebUI Tools is a modular toolkit designed to extend and enrich your Open WebUI instance, turning it into…
ComfyUI-IF_AI_tools is a set of custom nodes for ComfyUI that allows you to generate prompts using a local…
Self-healing infrastructure for AI agent payments. 90.3% auto-recovery.
AigoTools can help users quickly create and manage website directory, with built-in site auto-crawling…
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding…
基于Stable Diffusion优化的AI绘画模型。支持输入中英文文本,可生成多种现代艺术风格的高质量图像。| An optimized text-to-image model based on Stable…
An open-source AI content search engine designed specifically for content creators. Supports extraction of…
Unreal Engine plugin for LLM/GenAI models & MCP UE5 server. OpenAI GPT-5, Deepseek R1, Claude Opus/Sonnet…
Official implementation for "Blended Diffusion for Text-driven Editing of Natural Images" [CVPR 2022]
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models…
Image to text, fast.
Control Figma from the command line. Full read/write access for AI agents — create shapes, text, components…
One-stop data intelligence agent, providing insights from all mainstream data formats in a single dialogue…
Local AI desktop app — chat, agent mode, image gen, video gen. Supports Ollama, Gemma 4, Llama, Qwen, OpenAI…
RESTai is an AIaaS (AI as a Service) open-source platform. Supports many public and local LLM suported by…
Free OpenAI-compatible AI API with 50+ active models, image generation, tool calling, Anthropic-style…
Private on-device AI suite for Android. Fork of Google AI Edge Gallery with llama.cpp, whisper.cpp…
🔍 Search local images with natural language on Android, powered by OpenAI's CLIP model. / 在 Android…
Contrastive Language-Image Forensic Search allows free text searching through videos using OpenAI's machine…
ChatGPT-Pro is an advanced application that combines the power of ChatGPT and DALL.E.
Huge AI models catalog. A curated list of AI tools, platforms, and resources across various domains.
Universal AI Agent using Amazon Bedrock, capable of customize to create/edit files, execute commands, search…
[Deprecated & ingrated in docker-agent] Docker image for a Jenkins agent which can connect to Jenkins using…
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
Smoothly Manage Multiple LLMs (OpenAI, Anthropic, Azure) and Image Models (Dall-E, SDXL), Speed Up Responses…
❤开箱即用❤an unofficial implement of ChatGPT in QQ/Wechat. 一个非官方的ChatGPT腾讯qq/微信(非公众号)实现版,快来把你的qq或微信变成chatgpt吧
AI Assistant that reduces the size of your application's Docker Image
[CVPR' 2026] JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator…
A filesystem designed for agents, with SOTA retrieval, automatic memory profiles, sync engine. Drop any file…
🧠 世界上覆盖最全的优秀Qwen提示语大全,欢迎贡献你的提示词。🧠 The most comprehensive collection of excellent Qwen prompts in the world…
A feature-rich portal to chat with GPT-4, Claude, Gemini, Mistral, & OpenAI Assistant APIs via a lightweight…
Free-Dall-E-Proxy, an open-source repository that serves as a proxy for API-based interactions with OpenAI's…
Java client library for OpenAI API.Full support for all OpenAI API models including Completions, Chat, Edits…
potato: the portable annotation tool
Alfred workflow using ChatGPT, DALL·E 2 and other models for chatting, image generation and more.
Consumer AI app for chat, image generation, video generation, and music creation powered by Ace Data Cloud…
Open-source spreadsheets platform for deep research and document processing
The open creative AI workspace
Unofficial Linux packages for Claude Desktop AI assistant with automated updates.
💜 The best free Telegram bot for ChatGPT, Microsoft Copilot (aka Bing AI / Sidney / EdgeGPT), Microsoft…
Gen-Searcher: Reinforcing Agentic Search for Image Generation
Jenkins agent (base image) and inbound agent Docker images
Editable, part-aware 3D generation from text or reference images. Open-source client for nova3d.xyz.
A C++/Python implementation of the StreetLearn environment based on images from Street View, as well as a…
🎩 An Alfred 5 Workflow for using OpenAI Chat API to interact with GPT models 🤖💬 It also allows image…
FIBO is a SOTA, first open-source, JSON-native text-to-image model built for controllable, predictable, and…
Getting the latest versions of Disco Diffusion to work locally, instead of colab. Including how I run this on…
Hybrid RAG system combining vector search, knowledge graph (LightRAG), and cross-encoder reranking — with…
Agent skill for turning AI images and videos into playable game art assets
OpenCluely is a free, open source Cluely (alternative), built for technical interviews like DSA, OAs, and CP…
Agent-friendly ComfyUI workflow skills for OpenClaw, Hermes Agent, Codex, and Claude Code. Turn any ComfyUI…
Collection of agent skills for AI coding assistants
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Stop re-explaining yourself to your AI. MARM gives every session persistent memory, cross-agent context…
Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web…
AI short drama & micro-drama video generator — turns any idea into a complete short-form drama using…
📛 Autoxsh is an open-source tool that utilizes OpenAI's API to automate the generation and publishing of…
CLIP⚡NCNN⚡基于自然语言的图片搜索(Image Search)⚡以字搜图⚡x86⚡Android
A fully autonomous AI Agent/Python pipeline that utilizes Large Language Models (LLMs) like Gemini to…
Access the latest AI models like ChatGPT, LLaMA, Deepseek, Diffusion, Hugging face, and beyond through a…
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
Minimal CLI + web UI for OpenAI GPT Image 2 generation. Dual auth: API Key (paid) or OAuth via ChatGPT…
An open-source Vibe platform similar to Claude Cowork / Manus / Openclaw, with professional rich image…
多模型同时对话、文生图,纯前端。Multi-model simultaneous chat、text-to-image generation, all done through pure front-end (API…
Your fully proficient, AI-powered and local chatbot assistant🤖
Multi-account pool proxy for Windsurf — 113+ models (Claude/GPT/Gemini/Grok/Kimi) via OpenAI & Anthropic…
🧠 Example Discord Bot written in JavaScript that uses OpenAIs models such as ,`GPT 4`, `GPT-3.5-Turbo`…
OmniFusion — a multimodal model to communicate using text and images
open source assistant hybrid using small models (2b - 5b) and gemini , with image and agentic tool…
A template for building WhatsApp agents using LangGraph and Twilio. This project enables you to deploy AI…
GPTerminator provides a convenient way to interact with OpenAI's chat completion and image generation API's…
a chatgpt starter based on Openai Official Apis.
🤖 A Matrix bot for using different capabilities (text-generation, text-to-speech, speech-to-text…
An AI-powered storytelling video generator that takes user input as a story prompt, generates a story using…
Generate images by Stable-Diffusion-webui Based on Python | 使用Python的基于 SD-webui 的画图机器人(支持中文、Novelai和Naifu)
Claude Cowork 한국어 도메인 전문가 AI 마켓플레이스 — 21 plugins · 108 skills · Korean B2B (business · finance · legal · HR ·…
🤖 Discord AI assistant with OpenAI, Gemini, Claude & DeepSeek integration, multilingual support, multimodal…
📄🔍 Parse, extract, and analyze documents with ease 📄🔍
An AI powered SaaS platform which enables the user to chat, generate images, videos, music, etc. 🚀
Unofficial Claude API supporting direct HTTP chat creation/deletion/retrieval, messages with multiple file…
Sort a folder of images according to their similarity with provided text in your browser (uses a…
The creative suite for character-driven AI experiences.
Character Animation Creator Skill for Codex and GPT Web Agent — generates game-ready animations and sprites…
A chatbot app that uses OpenAI's GPT and DALL-E to reply to incoming messages from WhatsApp and generate…
The Definitive GPT Image 2 Prompt Vault — Master OpenAI's next-gen model with curated prompts for…
Claude Code/Agent Skill for making App Store screenshots with GPT Image 2 that you can upload. Give it your…
Agentic Framework for Java, written in 100% Java using Gemini, OpenAI, LocalAI, Anthropic. Build Autonomous…
Revornix is an open-source, local-first AI information/markdown workspace. It helps you collect fragmented…
Seth's AI Tools: A Unity based front end that uses ComfyUI and LLMs to create stories, images, movies…
A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested…
C++ Agent toolkit - Pre-built binaries, visit: https://github.com/mtconnect/cppagent/releases Docker images…
Use DALL·E 2 in Python
c4 GenAI Suite
Browser script to share and export Anthropic Claude chat logs to Markdown, JSON, or as Image (PNG)
A JavaScript library that brings vector search and RAG to your browser!
Code release for our NeurIPS 2024 Spotlight paper "GenArtist: Multimodal LLM as an Agent for Unified Image…
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and…
Unleash the power of Chatty: the intersection of ChatGPT’s intelligence, DALL·E's creativity, and Whisper's…
Skywork Agent Skills for AI office suites, including AI PPT, AI Document, AI Excel, AI Image, AI…
Ready-to-use AI Multimodal ChatGPT-based WhatsApp chatbot assistant for your business. Now supports GPT-4o…
Real-time on-device text-to-image and image-to-image Semantic Search with video stream camera capture using…
Open-source Discord AI companion with ChatGPT-style conversations and image generation, now evolved into Erin.
DarkGPT Chat Explorer is an interactive web application that allows users to engage in conversations with…
VividNode: Multi-purpose Text & Image Generation Desktop Chatbot (supporting various models including GPT).
A next-generation AI-powered infinite canvas workspace built for creators and developers. Experience the…
Machine Learning and having it Deep and Structured (MLDS) in 2018 spring
Open-source document chat platform with semantic search, RAG (Retrieval Augmented Generation), and…
🧾✨ AI-Powered Receipt and Invoice Scanner for Laravel, with support for images, documents and text
A versatile multi-modal chat application that enables users to develop custom agents, create images, leverage…
🤖Free Agent Line Bot with Web Search, Google Image Search, Image Generator, Video Generator...
A production-ready Laravel package to integrate with the Google Gemini API. Supports text, image, video…
True on-device AI for Kotlin Multiplatform (Android, iOS, Desktop, JVM, WASM). LLM, Speech-to-Text and Image…
Multimodal AI agent with Llama 3.2: A Streamlit app that processes text, images, PDFs, and PPTs, integrating…
🖼️ A simple ChatGPT AI tutorial on how to generate images/text/code and its limitations 🤖
AI-powered digital picture frame. Generate captivating and unique art from spoken conversations.
LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for…
DALL·E playground for the Mac
Agent orchestration & security template featuring MCP tool building, agent2agent workflows, mechanistic…
This is a collection of various Generative AI projects and AI Agents exploring the realms of Images, code…
Genspark AI open-source, self-hosted Super Agent. Free alternative to Genspark.ai with multi-agent…
AI-Native Video Editor — CLI-first, MCP-ready. Generate, edit, and ship videos from your terminal.
[ICML 2024] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
200+ commands free open source code of discord bot
AI Chatbot, Image Generator & Language Translator App | OpenAI ChatGPT | AI Assistant | Dart 3 & Flutter 3.13…
⭐️ The most comprehensive ChatGPT repo with Vue 3: Vue ChatGPT AI! ⭐️ Unlock the power of AI-driven…
Our idea is to combine the power of computer vision model and LLMs. We use YOLO, CLIP and DINOv2 to extract…