capability

Audio agents

This page lists every AI agent in the MeshKore directory tagged with the Audio capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.

303 agents in this capability · ranked by popularity

Top 200 Audio agents

ChatTTS39,328 ★

A generative speech model for daily dialogue.

CosyVoice21,264 ★

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

AudioGPT10,176 ★

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

podcastfy6,320 ★

An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into…

BibiGPT-v16,076 ★

BibiGPT v1 · one-Click AI Summary for Audio/Video & Chat with Learning Content: Bilibili | YouTube |…

OnlySwitch5,678 ★

⚙️ All-in-One menu bar app, hide 💻MacBook Pro's notch, dark mode, AirPods, Shortcuts

SimpleMem3,435 ★

SimpleMem: Efficient Lifelong Memory for LLM Agents — Text & Multimodal

Generative-Media-Skills3,330 ★

Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image…

bootcamp2,421 ★

Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video…

ui2,237 ★

ElevenLabs UI is a component library and custom registry built on top of shadcn/ui to help you build…

baresip2,110 ★

Baresip is a modular SIP User-Agent with audio and video support

epub_to_audiobook1,984 ★

EPUB to audiobook converter, optimized for Audiobookshelf, WebUI included

no-cost-ai1,483 ★

80+ free AI services for chat, image, video, voice & APIs (may sometimes include access to lead gen ai models…

AVA-AI-Voice-Agent-for-Asterisk1,045 ★

An open-source AI Voice Agent that integrates with Asterisk/FreePBX using Audiosocket/RTP technology

Whisperboard1,032 ★

The open-source iOS app that's making quality voice transcription more accessible on mobile devices.

chatgpt-cli930 ★

ChatGPT CLI is a powerful, multi-provider command-line interface for working with modern LLMs. It supports…

SwiftWhisper779 ★

🎤 The easiest way to transcribe audio in Swift

VideoAgent714 ★

"VideoAgent: All-in-One Agentic Framework for Video Understanding, Editing, and Remaking"

Virtual-Human-for-Chatting700 ★

Live2D Virtual Human for Chatting based on Unity

aisearch-openai-rag-audio555 ★

A simple example implementation of the VoiceRAG pattern to power interactive voice generative AI experiences…

insights-lm-public537 ★

Open-source, self-hosted alternative to NotebookLM. Chat with your documents, generate audio summaries, and…

audioflare481 ★

An all-in-one AI audio playground using Cloudflare AI Workers to transcribe, analyze, summarize, and…

smol-podcaster414 ★

smol-podcaster is your podcast production agent 🎙️

project-raven403 ★

Open-source AI meeting copilot - real-time transcription, echo cancellation, and AI assistance. Captures…

gemini-2-live-api-demo390 ★

Vanilla JS web interface for Gemini 2.0 flash-exp Multimodal API with text, audio, camera, screen inputs and…

potato383 ★

potato: the portable annotation tool

Whisper-transcription_and_diarization-speaker-identification-377 ★

How to use OpenAIs Whisper to transcribe and diarize audio files

VectorDB-Plugin365 ★

Program that lets you ask questions about your documents, audio, and video files.

insanely-fast-whisper-api356 ★

An API to transcribe audio with OpenAI's Whisper Large v3!

adk-rust350 ★

Rust Agent Development Kit (ADK-Rust): Build AI agents in Rust with modular components for models, tools…

whisper-website324 ★

Simple self-hosted web application, which can be used to convert audio to subtitles by OpenAI's Whisper model

RuntimeSpeechRecognizer306 ★

Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI…

aixplora274 ★

AIxplora is a open-source tool which let's you query all kind of files not limited to any length or format.

Skill-Anything259 ★

Any source (PDF, video, web, audio, text) to interactive learning package with quizzes, flashcards and spaced…

comfyui-workflow-skill259 ★

Natural language → ComfyUI workflow JSON. 34 built-in templates, 360+ node definitions, auto model download…

Stage-Whisper258 ★

The main repo for Stage Whisper — a free, secure, and easy-to-use transcription app for journalists, powered…

llama_ros252 ★

llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

awesome-NLP-resources239 ★

a collection of NLP projects&tools. 自然语言处理方向项目和工具集合。

neuralnoise225 ★

The AI Podcast Studio: generate podcasts scripts and their audio version with a team of AI workers in a…

insights-lm-local-package214 ★

Open-source, fully private and local alternative to NotebookLM. Chat with your documents, generate audio…

openai_tts193 ★

Custom TTS component for Home Assistant. Utilizes the OpenAI speech engine or any compatible endpoint to…

DeLive184 ★

System audio capture + multi-provider ASR + local-first AI review workspace. Floating live captions, 12 ASR…

Revornix183 ★

Revornix is an open-source, local-first AI information/markdown workspace. It helps you collect fragmented…

openai-whisper181 ★

A sample web app using OpenAI Whisper to transcribe audio built on Next.js. It records audio continuously for…

multimedia-gpt179 ★

Empowering your ChatGPT with vision and audio inputs.

realtime-ai179 ★

A real-time Agent framework for audio and video.

flutter_whisper.cpp171 ★

Flutter App That Can Transcribe Audio Offline/On Device with Whisper C++ Bindings via Rust

web-whisper165 ★

OpenAI's Whisper Audio to text transcription right into your web browser! An open source AI subtitling suite.

audio-to-text-transcription162 ★

This repository contains a Python script that allows users to download the audio from a YouTube video…

Unitale158 ★

一个基于Indextts和Qwen3TTS的 AI 有声书制作工具。利用 LLM 自动拆解剧本与识别情绪,集成多角色 TTS…

whatsapp-chatgpt-bot157 ★

Ready-to-use AI Multimodal ChatGPT-based WhatsApp chatbot assistant for your business. Now supports GPT-4o…

podgenai156 ★

OpenAI GPT based informational audiobook/podcast mp3 generator

llm-metahuman141 ★

An open solution for AI-powered photorealistic digital humans.

laravel-gemini139 ★

A production-ready Laravel package to integrate with the Google Gemini API. Supports text, image, video…

whisper-clip137 ★

WhisperClip simplifies your life by automatically transcribing audio recordings and saving the text directly…

magda-core135 ★

A DAW built for automation, transformation, and fast musical iteration

simplechat134 ★

Secure AI conversations with documents, video, audio, and more. Personal workspaces for focused context…

Awesome-Colorful-LLM128 ★

Recent advancements propelled by large language models (LLMs), encompassing an array of domains including…

PodAgent121 ★

PodAgent: A Comprehensive Framework for Podcast Generation

whisper-stream119 ★

A bash script using OpenAI Whisper API for continuous audio transcription with automatic silence detection

vue3-chatgpt-ai118 ★

⭐️ The most comprehensive ChatGPT repo with Vue 3: Vue ChatGPT AI! ⭐️ Unlock the power of AI-driven…

openai-realtime-python103 ★

Real-time voice agent powered by Agora and OpenAI

open-audio95 ★

Open-Audio TTS: A robust web app leveraging OpenAI's powerful Text-to-Speech (TTS) models to generate…

simulflow94 ★

A Clojure library for building real-time voice-enabled AI Agents. Simulflow handles the orchestration of…

Indic-Subtitler93 ★

Open source subtitling platform 💻 for transcribing and translating videos/audios in Indic languages.

realtime-interview-copilot93 ★

Realtime Interview Copilot is a web application that assists users in crafting responses during interviews…

avr-infra92 ★

The AVR Infrastructure project is designed to launch the Agent Voice Response application, which will start…

Audio-Sentiment-Analysis90 ★

This repository consists of work done to analyse sentiment of a customer in a conversation with a call center…

UGTLive86 ★

Live AI-powered screen translation via LLMs & GPU OCR. 26 languages, manga support, PDF/CBZ conversion, audio…

OpenAI-Text-To-Speech-for-Unity85 ★

Implementation of OpenAI's Text-To-Speech in Unity. Synthesize any text and play it via any AudioSource.

ultrasonic85 ★

A comprehensive steganography framework for embedding and extracting agentic commands in audio and video…

ableton-copilot-mcp83 ★

An MCP server built on ableton-js enables AI assistants to control Ableton Live in real time, including…

notebooklm-mcp82 ★

Google NotebookLM over MCP + a local HTTP REST API. Citation-backed Q&A, audio/video/content generation…

trx82 ★

Agent-first CLI for audio/video transcription via Whisper

bibigpt-skill81 ★

OpenClaw / Claude Code / Codex Agent skill for summarizing videos/audio via BibiGPT CLI (bibi)

fast-audio-video-transcribe-with-whisper-and-modal80 ★

Fast Audio/Video transcribe using Openai's Whisper and Modal, an hour audio/video file can be transcribed in…

reaper-mcp76 ★

A comprehensive Model Context Protocol (MCP) server that enables AI agents to create fully mixed and mastered…

easy-model-deployer75 ★

Deploy open-source LLMs on AWS in minutes — with OpenAI-compatible APIs and a powerful CLI/SDK toolkit.

omnigram71 ★

Omnigram is a Flutter-based file reader and audiobook . It accommodates EPUB and PDF and offers audiobook…

Echo71 ★

Production-ready audio and video transcription app that can run on your laptop or in the cloud.

voiceblender69 ★

A programmable voice platform: SIP and WebRTC call control, multi-party mixing, recording, TTS/STT, and…

claude-config69 ★

Comprehensive Claude Code framework: 6 specialized agents, 7 workflow commands, audio notifications - stack…

echook65 ★

🔊 echook — AI-operated audio notifications for Claude Code, Cursor IDE & Codex CLI — 26 hooks, voice + chime…

audio-plugin-dev-skills64 ★

Claude Code marketplace for audio plugin development skills.

VRCTextboxSTT63 ★

A SpeechToText application that uses OpenAI's whisper via faster-whisper to transcribe audio and send that…

Awesome-AI62 ★

Awesome AI Chat (ChatGPT4...) , Code (Github Copilot...), Read (ChatPDF...), Paint (Midjourney...), Write…

speechdigest62 ★

Audio to summary with openAI Whisper & GPT 3.5/4 using streamlit

Whisper-Transcriber59 ★

Modern Desktop Application offering a suite of tools for audio/video text recognition and a variety of other…

nexus-agents58 ★

A distributed multi-modal agent orchestration framework implementing advanced natural language processing…

auto-subtitles54 ★

Automatically generate subtitles from an input audio or video file using OpenAI Whisper

Image-to-Speech-GenAI-Tool-Using-LLM53 ★

AI tool that generates an Audio short story based on the context of an uploaded image by prompting a GenAI…

Voice-Chat-Bot52 ★

Real-time AI ChatBot and voice-enabled AI VoiceBot using Deepgram (STT ↔ TTS) and Groq LLM for natural…

realtime-webrtc51 ★

OpenAI realtime audio with WebRTC

aionair51 ★

A cutting-edge AI SaaS platform that enables users to create, discover, and enjoy podcasts with advanced…

nabu50 ★

A multi engine TTS & LLM edge computing playground with audio book features and more!

streamlit_whisper_transcription48 ★

Streamlit Audio Transcription with OPENAI's Whisper Ai: An interactive Streamlit app demonstrating real-time…

notebooklm-ai-plugin48 ★

AI Agent plugin for Google NotebookLM (Claude, OpenClaw, etc) — generate slide decks, audio overviews…

SoundSage---LLM-Audio-Processing47 ★

Open source Python program for automating gain staging. part 1 of a series for automating audio processing…

openvenice46 ★

Open-source, customizable frontend for Venice AI. Chat, image gen, audio, video, embeddings + visual…

shadow-ai45 ★

Shadow AI: stealth AI assistant for restricted/locked-down environments, enabling cross-device interaction…

songGPT45 ★

songGPT is an experimental open-source project that explores the potential of Language Models, specifically…

transcribe-video-audio45 ★

An OpenAI's Whisper-based full-stack project to transcribe audio and video files using React & Django.

whisper-server44 ★

macOS menu bar app providing a local HTTP server compatible with the OpenAI Whisper API for fast and private…

kwami43 ★

👻 kwami.io | A 3D Interactive AI Companion Library for creating engaging AI companions with visual (blob)…

offgrid-tools43 ★

Self-contained offline environment providing local AI chat, offline Wikipedia/content archives, IRC…

agent-fm42 ★

Agent FM - Tune in and stay in the loop with your agents 🎧

sift-video42 ★

Semantic video search system that indexes audio and visual content to enable timestamp-accurate retrieval…

styletts2-ukrainian-openai-tts-api41 ★

OpenAI TTS Compatible Ukrainian TTS StyleTTS2 Pipeline

livekit-voice-agent41 ★

A production-ready voice agent implementation using LiveKit and Python, featuring advanced conversational AI…

audiolizr39 ★

A bentoML-powered API to transcribe audio and make sense of it

Bulka39 ★

Live-coding music platform with AI agent — browser-based Strudel fork with Telegram bot and Russian community.

pdf-to-audiobook38 ★

Uses OpenAI API to clean pdf then converts it to professional grade audiobook with text to speech.

whisper-video37 ★

Generate subtitles for all the videos in a folder with OpenAI's Whisper privately in your computer.

Free-Unoffical-OpenAI-API37 ★

A powerful, unofficial OpenAI-compatible API service offering free access to GPT-4o, GPT-4-turbo, and audio…

JarvisV337 ★

JarvisV3 is a Streamlit-powered AI assistant inspired by Iron Man’s Jarvis. It offers both text and realtime…

antigravity-awesome-skills36 ★

🌌 Explore 255+ essential skills for AI coding assistants like Claude Code and GitHub Copilot to enhance your…

soundstorm36 ★

Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet…

Audio-transcriber36 ★

Simple Python audio transcriber using OpenAI's Whisper speech recognition model

mom-ai35 ★

MOM AI transcribes audio into meeting summary and generate minutes of meeting. Built using Langchain, OpenAI…

Waifu_AI_Vtuber35 ★

Waifu_AI_Vtuber is a AI virtual YouTuber chatbot powered by OpenAI GPT-3.5, interacting in real-time with…

mic-speaker-streamer35 ★

Cross-platform Electron app for simultaneously streaming & recording microphone and speaker audio

live-stream-chat-ai-agent35 ★

An AI-powered agent designed to watch live streams, understand the content (audio, chat, video), and…

whisper-speech-to-text34 ★

Whisper Speech-to-Text is a JavaScript library for recording and transcribing user audio into text via…

meeting-concluder33 ★

Record audio from a meeting, then transcribe, conclude and send the conclusion and a piece of advice to Slack

YATSEE33 ★

YATSEE - Yet Another Tool for Speech Extraction & Enrichment

Sophia-AI-Assistant32 ★

Sophia AI Assistant is a Python-based desktop AI that performs a variety of tasks, including answering…

TOHID-AI32 ★

A whatsapp bot you was looking for✅, It offers a wide range features like Audio & Video editing, Image & Logo…

Summarizing-Youtube-Videos-with-OpenAI-Whisper-and-GPT-332 ★

YouTube video summarization using Whisper audio transcription and GPT-based summaries.

Multimodal-Node-Editor31 ★

画像、オーディオ、テキスト、LLM/VLM のマルチモーダル パイプラインを実験するためのノードエディター(Node-based editor to compose and experiment with…

AIAudioTranscriber31 ★

A minimalistic web app to generate transciption for audio built using Python

WhisCall31 ★

A framework for AI WhatsApp calls using Whisper, Coqui TTS, GPT-3.5 Turbo, Virtual Audio Cable, and the…

neural-file-sorter31 ★

A neural network based file sorter. Trains an autoencoder to sort images or audio based on the similarity of…

coffee-chat-voice-assistant31 ★

Coffee Chat Voice Assistant is a voice-driven ordering system powered by Azure OpenAI GPT-4o Realtime API…

openrouter_client31 ★

A comprehensive OpenRouter API client library for ESP32 (ESP-IDF), enabling seamless integration with…

nbmultirag30 ★

Un framework in Italiano ed Inglese, che permette di chattare con i propri documenti in RAG, anche…

oreilly-multimodal-ai30 ★

Learn how multimodal AI merges text, image, and audio for smarter models

whisperdesk30 ★

A beautiful, native macOS desktop application for transcribing audio and video files using whisper.cpp

OBS-Mic-Swapper29 ★

Swaps/Mutes active audio input device in OBS upon a specified channel point redemption in Twitch chat.

twitch-ai-viewers29 ★

No viewers? No problem! Use AI Viewers

summarize29 ★

Summarize audio/video files

JianYan27 ★

基于 SenseVoice 的 Windows 本地语音转文字工具,支持 OpenAI 格式 API 润色,低延迟,高精度。

videodb-capture-quickstart27 ★

Give your agents real time desktop perception. Stream screen, microphone, and system audio for live context…

sinapsis-openai27 ★

Package with sinapsis templates to support OpenAI functionality

CORAG27 ★

A highly contextualized retrieval system integrating Large Language Models (LLMs), embeddings, and a dynamic…

Machine-Learning26 ★

A set of jupyter notebooks

transcribe-cli26 ★

🎙️ Fast CLI tool to transcribe audio/video files to SRT format using OpenAI Whisper API

openai-dotnet-exercises26 ★

Explore AI Capabilities for Your .NET Projects with OpenAI's API: Unlock the power of AI in your applications

SmartRAG26 ★

Multi-Modal RAG for .NET — query databases, documents, images and audio in natural language. Production-ready…

speakscribe25 ★

Speakscribe is a web application that allows users to transcribe audios using OpenAI and also interact with a…

live-translator25 ★

Real-time system audio translation for macOS — translate any audio (YouTube, podcasts, meetings) live on…

AudioInsightsGenerator25 ★

Unlock AI power with AudioInsightsGenerator! From audio to summaries, emotion analysis, idea generation…

claude-tts24 ★

Text-to-speech plugin for Claude Code — multi-provider support (ElevenLabs, OpenAI, Google, Amazon Polly…

whisper_cpp_macos_utils24 ★

Shell scripts for automated transcription on macOS: Integrates whisper.cpp with QuickTime Player and…

speech-to-text-demo24 ★

Flutter app with implementation of openAI tools (ChatGPT & Whisper)

ToolBake24 ★

A Toolbox Platform for Creating Your Own Tools. Bake Them with Code or AI.

AdobePremiereProMCP23 ★

🎬 AI-powered MCP server for Adobe Premiere Pro — 1,027 tools for timeline editing, color grading, audio…

tts-studio23 ★

Text to Speech Studio to convert text into natural-sounding speech using advanced AI models from leading…

Eolian23 ★

Eolian is a Discord music bot which provide a very powerful API for queuing songs from a variety of sources…

screen-voice-agent22 ★

Open-source macOS desktop AI agent (Tauri + React + OpenAI Realtime API) that watches your screen, listens to…

openai-realtime-fastapi22 ★

A FastAPI application that relays client WebSocket connections to OpenAI's Realtime API, enabling seamless…

meeting-minutes22 ★

Leveraging OpenAI's Whisper ASR and GPT-4 models to automate the process of generating meeting minutes from…

beatos22 ★

Local-first beat library for music producers. MCP server (Claude Code / Claude Desktop) for AI-driven…

anyknowledge20 ★

Your personal AI 「KEEP」, support docx, pdf, audio, video...

YDS-YOUTUBE-DOWNLOADER-AND-SUMMERIZE20 ★

This is a Telegram bot that can download audio from YouTube videos and summarize the content using OpenAI's…

voice-to-trade-binance-whisper20 ★

Hands-free crypto trading — speak a command, execute on Binance. Powered by OpenAI Whisper for real-time…

seo-dungeon19 ★

Codex-first 16-bit dungeon crawler that turns SEO audits into boss battles, with legacy Claude compatibility…

myChatBot19 ★

A simple JavaScript chatbot

jabberwocky19 ★

An Alexa skill providing a conversational interface to any public figure (as mimicked by GPT3). The legacy…

scribe19 ★

Scribe is a Python script that transcribes audio and video files using OpenAI Whisper and exports the…

resumico18 ★

🤖 A WhatsApp bot to transcribe and summarize audio messages.

VoxBridge17 ★

Desktop realtime speech translator for routing translated audio between virtual audio devices in calls.

talk-intelligent-applications-with-spring-ai17 ★

Intelligent Applications with Spring AI. Practical integration of LLMs, chat interaction, image generation…

echoai_helper16 ★

Real-time conversation assistant with dual audio transcription and GPT-powered responses, perfect for…

transcriber16 ★

Audio transcription UI for OpenAI Whisper, GPT4o Transcribe and AssemblyAI APIs

gemini-live16 ★

Google Gemini live voice to text realtime stream in the browser

discordant15 ★

end-to-end fullstack and real-time discord clone, all with servers, channels, video calls, audio calls…

image-to-text-to-speech15 ★

An app that uses Hugging Face AI models together with OpenAI & LangChain, to generate text from an image…

RECAP15 ★

Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning

chatapp-v214 ★

Chat Application using LLMs and Whisper for Speech-To-Text. Integrating Sherpa-ONNX for Text-to-Speech…

whisper-matrix-plugin14 ★

When an audio message is received, the bot downloads the audio file, converts it to a numpy array, loads the…

chat-any-file13 ★

ChatAnyFile is a powerful full-stack application that allows you to interact with your PDF documents, images…

YouTube-AI-Assistant13 ★

Develop a python application that allows you to extract valuable insights, engage in meaningful…

Reaper-MCP13 ★

AI-powered music production in REAPER via the Model Context Protocol — 163 tools for composition, MIDI, FX…

eShopLite-RealtimeAudio13 ★

eShopLite - Semantic Search is a reference .NET application implementing an eCommerce site with Search…

multi-modal-agent-ts12 ★

TypeScript multimodal AI agent: GPT-4o / Claude / Gemini + Whisper + Ollama (LLaVA). REST API, streaming…

SpeakingAI12 ★

SpeakingAI is a demo of privately deployable 'GPT-4o like AI + RAG', a fully functional web AI server with…

Image2AudioStoryConverter11 ★

Convert images into captivating audio stories using image-to-text, language models, and text-to-speech…

ATRI10 ★

ATRI 是本地优先的 AI Agent 架构与音乐工作站,提供多模型 LLM Runtime、工具调用闭环、上下文压缩、子 Agent 并行调度、MCP/Skills 扩展。并融合 Web DAW、MIDI…

ai-voice-agents10 ★

AI Voice Agents: Exploring the Next Generation of Human-Machine Interaction! 🎙️🤖🎧

docsifer10 ★

Docsifer is a powerful tool for converting various data formats into Markdown for applications such as…

CrewAI-Youtube-AI-Agents10 ★

▶️ Video Fact Finder for YouTube, using CrewAI agents and Perplexity to verify facts.

genblaze10 ★

Genblaze is an open source Python SDK for orchestrating generative AI media pipelines across video, audio…

whatsapp-chat-viewer10 ★

Generate a WhatsApp-style HTML page from an exported chat, with support for images, videos, audio, PDFs, and…

Article2Audio10 ★

Convert articles to audio using OpenAI's Text to Speech API via a python script or web app

OmniBot10 ★

Set of abstraction libraries to easily build Text and Audio based bots

journalling10 ★

An audio journaling app that provides AI analysis for your journal entries

AI-DJ-Mixing-System9 ★

An AI-powered DJ mixing system that crafts custom mixes from your local MP3s using natural language prompts…

whatsapp-ai-agent-sample-for-aws-agentcore9 ★

A multichannel multimodal AI agent deployed on Amazon Bedrock AgentCore Runtime with Amazon Bedrock AgentCore…

TheraBot9 ★

TheraBot is a SaaS-based AI mental health companion that offers empathetic chatbot support, therapist…

speckit-preset-fiction-book-writing9 ★

Fiction-book-writing preset for AI GitHub Spec Kit. Stackable, priority-ordered collections of template and…

Browse other capabilitys