capability

Data agents

This page lists every AI agent in the MeshKore directory tagged with the Data capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.

3,366 agents in this capability · ranked by popularity

Top 200 Data agents

firecrawl124,884 ★

🔥 Search, scrape, and clean the web for AI agents.

OpenBB68,138 ★

Financial data platform for analysts, quants and AI agents.

MinerU65,083 ★

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic…

llama_index50,112 ★

LlamaIndex is the leading document agent and OCR platform

milvus44,464 ★

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

qlib43,542 ★

Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from…

tidb40,105 ★

TiDB is built for agentic workloads that grow unpredictably, with ACID guarantees and native support for…

PageIndex32,198 ★

📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG

FastGPT28,159 ★

FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box…

chroma28,099 ★

Search infrastructure for AI

budibase27,955 ★

AI agents, automations and apps that run your operations. Model agnostic.

RAG_Techniques27,577 ★

This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each…

scientific-agent-skills26,137 ★

A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing.

FinceptTerminal24,197 ★

FinceptTerminal is a modern finance application offering advanced market analytics, investment research, and…

dolt22,835 ★

Dolt – Git for Data

opendataloader-pdf21,629 ★

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

airbyte21,341 ★

Open-source data movement for ELT pipelines and AI agents — from APIs, databases & files to warehouses…

telegraf17,576 ★

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.

keploy17,438 ★

Open-source platform for creating safe, isolated production sandboxes for API, integration, and E2E testing.

memvid15,570 ★

Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give…

KeepChatGPT14,894 ★

这是一款提高ChatGPT的数据安全能力和效率的插件。并且免费共享大量创新功能,如:自动刷新、保持活跃、数据安全、取消审计、克隆对话、言无不尽、净化页面、展示大屏、拦截跟踪、日新月异、明察秋毫等。让我们的AI体验无比安全…

unstructured14,788 ★

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming…

ccusage14,702 ★

Analyze coding (agent) CLI token usage and costs from local data.

easy-dataset14,347 ★

A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval

SurfSense14,309 ★

An open source, privacy focused alternative to NotebookLM for teams with no data limits. Join our Discord…

RD-Agent13,224 ★

Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the…

llm-universe13,113 ★

本项目是一个面向小白开发者的大模型应用开发教程,在线阅读地址:https://datawhalechina.github.io/llm-universe/

LEANN11,761 ★

[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100%…

claude-context11,590 ★

Code search MCP for Claude Code. Make entire codebase the context for any coding agent.

bisheng11,386 ★

BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and…

llama-gpt10,961 ★

A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your…

dolly10,789 ★

Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

InsForge10,678 ★

The all-in-one, open-source backend platform for agentic coding. InsForge gives your coding agent database…

electric10,211 ★

The agent platform built on sync.

cocoindex10,069 ★

Incremental engine for long horizon agents 🌟 Star if you like it!

Crucix10,067 ★

Your personal intelligence agent. Watches the world from multiple data sources and pings you when something…

unopim9,928 ★

Unopim is a free and open-source Laravel-based Product Information Management (PIM) system that helps…

phoenix9,859 ★

AI Observability & Evaluation

pyod9,859 ★

A Python library for anomaly detection across tabular, time series, graph, text, and image data. 60+…

zvec9,709 ★

A lightweight, lightning-fast, in-process vector database

xonsh9,475 ★

🐚 Python-powered shell. Full-featured, cross-platform and AI-friendly.

databend9,300 ★

Data Agent Ready Warehouse : One for Analytics, Search, AI, Python Sandbox. — rebuilt from scratch. Unified…

deeplake9,140 ★

Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling…

risingwave9,047 ★

Event streaming platform for agentic AI. Continuously ingest, transform, and serve event streams in real…

ccf-deadlines9,037 ★

⏰ Agenticly track worldwide conference deadlines (Website, Python Cli, Wechat Applet)

Shadowbroker8,845 ★

Open-source intelligence for the global theater. Track everything from the corporate/private jets of the…

visual-explainer8,584 ★

Agent skill that generates rich HTML pages or slide decks for diagrams, diff reviews, plan audits, data…

reor8,563 ★

Private & local AI personal knowledge management app for high entropy people.

kreuzberg8,397 ★

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured…

datahaven7,961 ★

An EVM compatible Substrate chain, powered by StorageHub and secured by EigenLayer

all-in-rag7,960 ★

🔍大模型应用开发实战一:RAG 技术全栈指南,在线阅读地址:https://datawhalechina.github.io/all-in-rag/

deep-searcher7,845 ★

Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.

claude-seo7,158 ★

Universal SEO skill for Claude Code. 25 sub-skills + 18 sub-agents covering technical SEO, E-E-A-T, schema…

flyte7,048 ★

Dynamic, resilient AI orchestration. Coordinate data, models, and compute as you build AI workflows.

opencompass7,033 ★

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral…

vespa6,926 ★

AI + Data, online. https://vespa.ai

postgresml6,792 ★

Postgres with GPUs for ML/AI apps.

llm-scraper6,749 ★

Turn any webpage into structured data using LLMs

QuantDinger6,622 ★

AI quantitative trading platform for crypto, stocks, and forex with backtesting, live trading, market data…

plano6,546 ★

Plano is an AI-native proxy and data plane for agentic apps — with built-in orchestration, safety…

rags6,535 ★

Build ChatGPT over your data, all with natural language

ChatLab6,510 ★

Local-first chat history analyzer with AI. | 本地优先的 AI 聊天记录分析工具

firecrawl-mcp-server6,391 ★

🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM…

airweave6,360 ★

Open-source context retrieval layer for AI agents

materialize6,305 ★

The live data layer for apps and AI agents. Create up-to-the-second views into your business, just using SQL

open-deep-research6,238 ★

An open source deep research clone. AI Agent that reasons large amounts of web data extracted with Firecrawl

ai-notes6,213 ★

notes for software engineers getting up to speed on new AI developments. Serves as datastore for…

genkit6,051 ★

Open-source framework for building AI-powered apps in JavaScript, Go, and Python, built and used in…

opensre5,962 ★

Build your own AI SRE agents. The open source toolkit for the AI era.

lightdash5,853 ★

Agentic BI. Analytics at the speed of code ⚡️

opal5,454 ★

Policy and data administration, distribution, and real-time updates on top of Policy Agents (OPA, Cedar, ...)

zenml5,424 ★

ZenML 🙏: One AI Platform from Pipelines to Agents. https://zenml.io.

MineContext5,335 ★

MineContext is your proactive context-aware AI partner(Context-Engineering+ChatGPT Pulse)

superduper5,281 ★

Superduper: End-to-end framework for building custom AI applications and agents.

ai-data-science-team5,230 ★

An AI-powered data science team of agents to help you perform common data science tasks 10X faster.

sparrow5,159 ★

Structured data extraction and instruction calling with ML, LLM and Vision LLM

html-anything5,128 ★

✨ The agentic HTML editor — your local AI agent writes the HTML, you ship it. 🚀 75 Skills × 9 Surfaces…

argilla4,985 ★

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

baserow4,905 ★

Build databases, automations, apps & agents with AI — no code. Open source platform available on cloud and…

whodb4,840 ★

A lightweight next-gen data explorer - Postgres, MySQL, SQLite, MongoDB, Redis, MariaDB, Elastic Search, and…

llm-graph-builder4,707 ★

Neo4j graph construction from unstructured data using LLMs

solace-agent-mesh4,679 ★

An event-driven framework designed to build and orchestrate multi-agent AI systems. It enables seamless…

thunderbolt4,659 ★

AI You Control: Choose your models. Own your data. Eliminate vendor lock-in.

helix-db4,580 ★

HelixDB is an open-source graph-vector database built from scratch in Rust.

Olares4,555 ★

Olares: An Open-Source Personal Cloud to Reclaim Your Data

infinity4,528 ★

The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector…

cognita4,409 ★

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production…

sql-translator4,313 ★

SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence…

m_flow4,251 ★

A bio-inspired cognitive memory engine — a new paradigm for Graph RAG.

llama_cloud_services4,251 ★

Knowledge Agents and Management in the Cloud

learn-agentic-ai4,184 ★

Learn Agentic AI using Dapr Agentic Cloud Ascent (DACA) Design Pattern and Agent-Native Cloud Technologies…

csghub4,169 ★

CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both…

OpenMemory4,157 ★

Local persistent memory store for LLM applications including claude desktop, github copilot, codex…

memgraph4,072 ★

High-performance open-source in-memory graph database for GraphRAG, AI memory, agentic AI, and real-time…

nixtla3,891 ★

TimeGPT-1: production ready pre-trained Time Series Foundation Model for forecasting and anomaly detection…

LazyLLM3,833 ★

Easiest and laziest way for building multi-agent LLMs applications.

docetl3,752 ★

A system for agentic LLM-powered data processing and ETL

dataherald3,634 ★

Interact with your SQL database, Natural Language to SQL using LLMs

datadog-agent3,629 ★

Main repository for Datadog Agent

morphik-core3,601 ★

The most accurate document search and store for building AI apps

MIRIX3,554 ★

Mirix is a multi-agent personal assistant designed to track on-screen activities and answer user questions…

semantic-router3,552 ★

Superfast AI decision making and intelligent processing of multi-modal data.

Acontext3,474 ★

Agent Skills as a Memory Layer

dagu3,429 ★

Lightweight and powerful workflow engine that comes with a Web UI. Define workflows in a declarative YAML…

rocketride-server3,426 ★

High-performance AI pipeline engine with a C++ core and 50+ Python-extensible nodes. Build, debug, and scale…

surf3,421 ★

Personal AI Notebooks. Organize files & webpages and generate notes from them. Open source, local & open…

OB13,413 ★

Open Brain — The infrastructure layer for your thinking. One database, one AI gateway, one chat channel — any…

vault-ai3,394 ★

OP Vault ChatGPT: Give ChatGPT long-term memory using the OP Stack (OpenAI + Pinecone Vector Database)…

LLMDataHub3,386 ★

A quick guide (especially) for trending instruction finetuning datasets

lida3,251 ★

Automatic Generation of Visualizations and Infographics using Large Language Models

distilabel3,231 ★

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and…

conftest3,172 ★

Write tests against structured configuration data using the Open Policy Agent Rego query language

oracle-ai-developer-hub3,113 ★

Technical resources for AI developers to build applications, agents, and systems using Oracle AI Database and…

LlamaIndexTS3,079 ★

Data framework for your LLM applications. Focus on server side solution

swirl-search3,023 ★

AI Search & RAG Without Moving Your Data. Get instant answers from your company's knowledge across 100+ apps…

spiceai2,942 ★

A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI…

ai-crawler-py2,941 ★

Crawl a website starting from a URL, find relevant pages, and extract data – all guided by your natural…

deepnote2,911 ★

Deepnote is a drop-in replacement for Jupyter with an AI-first design, sleek UI, new blocks, and native data…

FlyEnv2,881 ★

All-in-One Native Local Development Environment for Windows, macOS & Linux. Docker alternative for PHP…

UltraChat2,849 ★

Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)

skills2,844 ★

Opinionated skills for AI coding agents to create stunning diagrams and visualizations directly in Markdown…

dbhub2,840 ★

Zero-dependency, token-efficient database MCP server for Postgres, MySQL, SQL Server, MariaDB, SQLite.

autoflow2,783 ★

pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless…

second-brain-ai-assistant-course2,754 ★

Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems…

datachain2,748 ★

The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure

evadb2,679 ★

Database system for AI-powered apps

rill2,631 ★

The fastest business intelligence tool for humans and agents.

seekdb2,597 ★

The AI-native state store for agents. MySQL-compatible, embedded or server, hybrid vector + full-text search…

SenseNova-Skills2,505 ★

Modular SenseNova skills for building AI-powered office assistants and productivity workflows

spider2,504 ★

Low latency web data collector

handy-ollama2,428 ★

动手学Ollama,CPU玩转大模型部署,在线阅读地址:https://datawhalechina.github.io/handy-ollama/

brightdata-mcp2,411 ★

A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.

tirith2,381 ★

Terminal security for developers and AI agents. Intercepts homograph URLs, pipe-to-shell, ANSI injection…

code-interpreter2,330 ★

Python & JS/TS SDK for running AI-generated code/code interpreting in your AI app

autolabel2,317 ★

Label, clean and enrich text datasets with LLMs.

yu-ai-agent2,315 ★

编程导航 2025 年 AI 开发实战新项目,基于 Spring Boot 3 + Java 21 + Spring AI 构建 AI 恋爱大师应用和 ReAct 模式自主规划智能体YuManus,覆盖 AI…

vearch2,307 ★

Distributed vector search for AI-native applications

MemoRAG2,241 ★

Empowering RAG with a memory-based data interface for all-purpose applications!

vector-admin2,227 ★

The universal tool suite for vector database management. Manage Pinecone, Chroma, Qdrant, Weaviate and more…

tabularis2,200 ★

A lightweight, cross-platform database client for developers. Supports MySQL, PostgreSQL and SQLite. Hackable…

beir2,199 ★

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR…

Mano-P2,146 ★

Mano-P: Open-source GUI-VLA agent for edge devices. #1 on OSWorld (specialized, 58.2%). Runs locally on Apple…

Aix-DB2,139 ★

Aix-DB 基于 LangChain/LangGraph 框架,结合 MCP Skills 多智能体协作架构,实现自然语言到数据洞察的端到端转换。

trustgraph2,113 ★

The agent runtime platform powered by context graphs.

dash2,076 ★

A self-learning data agent built with systems engineering principles. It grounds answers in 6 layers of…

dataclaw2,071 ★

Agent harness to publish your history from Claude Code et al. as Huggingface datasets.

Dialog_Corpus2,050 ★

用于训练中英文对话系统的语料库 Datasets for Training Chatbot System

docext2,020 ★

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit…

burr2,012 ★

Build applications that make decisions (chatbots, agents, simulations, etc...). Monitor, trace, persist, and…

MyBrain2,005 ★

All-in-one productivity app and AI assistant with Tasks, Notes, Calendar, Diary and Bookmarks.

DataAgent1,975 ★

Spring AI Alibaba DataAgent

HealthGPT1,951 ★

Query your Apple Health data with natural language 💬 🩺

neuron-ai1,935 ★

The PHP Agentic Framework to build production-ready AI driven applications. Connect components (LLMs, vector…

DataDesigner1,913 ★

🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.

servicecomb-pack1,913 ★

Apache ServiceComb Pack is an eventually data consistency solution for micro-service applications…

JustHireMe1,903 ★

Local-first AI job intelligence workbench for scraping roles, ranking fit, and generating tailored…

mcp-memory-service1,889 ★

Open-source persistent memory for AI agent pipelines (LangGraph, CrewAI, AutoGen) and Claude. REST API +…

FinRL-Meta1,882 ★

FinRL­®-Meta: Dynamic datasets and market environments for FinRL.

matrixone1,842 ★

AI-native HTAP database with Git-for-Data and built-in vector search, serving as the data and memory backbone…

vibekit1,790 ★

Run Claude Code, Gemini, Codex — or any coding agent — in a clean, isolated sandbox with sensitive data…

airda1,753 ★

airda(Air Data Agent)是面向数据分析的多智能体,能够理解数据开发和数据分析需求、理解数据、生成面向数据查询、数据可视化、机器学习等任务的SQL和Python代码

extractous1,749 ★

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

swarm1,715 ★

Ruby gems for general-purpose AI agent systems: automation, research, data processing, customer support…

LLPhant1,678 ★

LLPhant - A comprehensive PHP Generative AI Framework using OpenAI GPT 4. Inspired by Langchain

raptor1,674 ★

The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

ava-whatsapp-agent-course1,661 ★

Meet Ava, the WhatsApp Agent

trench1,638 ★

Trench — Open-Source Analytics Infrastructure. A single production-ready Docker image built on ClickHouse…

chatgpt-ui1,629 ★

A ChatGPT web client that supports multiple users, multiple languages, and multiple database connections for…

lumibot1,609 ★

Backtestable AI trading agents and Python algorithmic trading strategies for stocks, options, crypto…

train-llm-from-scratch1,598 ★

A straightforward method for training your LLM, from downloading data to generating text.

WikiChat1,592 ★

WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a…

Agently1,577 ★

[GenAI Application Development Framework] 🚀 Build GenAI application quick and easy 💬 Easy to interact with…

ai-dev-kit1,574 ★

Databricks Toolkit for Coding Agents provided by Field Engineering

pixeltable1,562 ★

Declarative and Incremental Backend for Multimodal AI Applications

awesome-generative-ai-data-scientist1,555 ★

A curated list of 100+ resources for building and deploying generative AI specifically focusing on helping…

nlp_xiaojiang1,537 ★

自然语言处理(nlp),小姜机器人(闲聊检索式chatbot),BERT句向量-相似度(Sentence Similarity),XLNET句向量-相似度(text xlnet embedding),文本分类(Text…

ragbuilder1,533 ★

A toolkit to create optimal Production-readyRetrieval Augmented Generation(RAG) setup for your data

Rasa_NLU_Chi1,532 ★

Turn Chinese natural language into structured data 中文自然语言理解

thepipe1,526 ★

Get clean data from tricky documents, powered by vision-language models ⚡

AVA1,485 ★

🤖 AI-native Visual Analytics framework build for agents.

skills1,475 ★

Browser automation CLI built for AI agents. Break through anti-bot walls, hand off to humans across platforms…

korvus1,460 ★

Korvus is a search SDK that unifies the entire RAG pipeline in a single database query. Built on top of…

hyperDB1,408 ★

A hyper-fast local vector database for use with LLM Agents. Now accepting SAFEs at $135M cap.

aso-skills1,407 ★

AI agent skills for App Store Optimization (ASO) and app marketing. Built for indie developers, app…

agentql1,372 ★

AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright…

chatgpt-comparison-detection1,354 ★

Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥

cite1,338 ★

Ground truth layer for humans and AI agents working together. Version control for knowledge.

AntSK1,319 ★

An AI knowledge base/agent built with .Net 9, AntBlazor, Semantic Kernel, and Kernel Memory, supporting local…

Jackrong-llm-finetuning-guide1,294 ★

aideml1,293 ★

AIDE: AI-Driven Exploration in the Space of Code. The machine Learning engineering agent that automates AI…

apify-mcp-server1,275 ★

The Apify MCP server enables your AI agents to extract data from social media, search engines, maps…

Datus-agent1,251 ★

The Future of Data Engineering — A CLI SQL client for the modern data stack, enabling AI-native context…

EmbedAnything1,246 ★

Highly Performant, Modular, Memory Safe and Production-ready Inference, Ingestion and Indexing built in Rust 🦀

intellagent1,230 ★

A framework for comprehensive diagnosis and optimization of agents using simulated, realistic synthetic…

webclaw1,198 ★

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust…

langchain-extract1,196 ★

🦜⛏️ Did you say you like data?

fire-enrich1,181 ★

🔥 AI-powered data enrichment tool that transforms emails into rich datasets with company profiles, funding…

awesome-ai-sdks1,178 ★

A database of SDKs, frameworks, libraries, and tools for creating, monitoring, debugging and deploying…

Thoth1,174 ★

Thoth - Personal AI Sovereignty. A local-first AI assistant with integrated tools, a personal knowledge…

zylos-core1,172 ★

🐙 Give your AI a life — open-source agent infrastructure for team collaboration.

Browse other capabilitys