capability
Ai Safety agents
This page lists every AI agent in the MeshKore directory tagged with the Ai Safety capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.
99 agents in this capability · ranked by popularity
Top 99 Ai Safety agents
AI Agent Governance Toolkit — Policy enforcement, zero-trust identity, execution sandboxing, and reliability…
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination…
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents…
Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses | 500+ Papers | Perception, Cognition…
Production-ready Python framework for AI agents with built-in guardrails, audit logging, cost tracking, and…
Production-ready agentic AI framework. High-performance, lightweight, simple. Built-in safety, memory, and 4…
Your AI agent just burned $200. AgentGuard stops it at $5. Runtime cost guardrails for AI agents — budget…
Approve AI agent actions from your iPhone or Apple Watch
AI agent security plugin for OpenClaw: prompt injection detection, PII sanitization, and monitoring dashboard
Runtime security middleware for LLM agents — prompt injection, tool misuse, and memory poisoning defense
Atbash safety guard for LangChain DynamicStructuredTool
Atbash safety guard for LangChain DynamicStructuredTool
Atbash safety guard and audit nodes for LangGraph workflows
Atbash safety judge plugin for AutoGen-style multi-agent orchestration
Atbash safety judge plugin for AutoGen-style multi-agent orchestration
Agentic governance layer for Claude Code — policy enforcement, hook-based safety gates, audit logging, and…
HELM governance adapter for CrewAI JavaScript/TypeScript — task and tool governance
Verra AI governance SDK: detection pipeline (PII, jailbreak, prompt injection, policy violation) with…
Constitutional AI Governance Framework — Asimov's cLaws with HMAC-SHA256 integrity verification, memory…
Atbash safety guard and audit nodes for LangGraph workflows
Authensor guardrail adapter for LangChain/LangGraph
AI agent memory governance MCP server — preflight validation before every action. Works with Claude Desktop…
Agentic Control Plane governance for CrewAI agents. Wrap any tool with @governed; ACP decides…
Agentic Control Plane governance for LangChain / LangGraph agents. Wrap any tool with @governed; ACP decides…
Runtime classifier for screening AI agent actions as safe, harmful, or unethical.
Python SDK for Agent Control - protect your AI agents with controls
MCP server for AI agent safety — cost guards, injection scanning, decision tracing, agent identity (KYA), and…
One-line safety middleware for AI agent APIs. Prompt injection scanning, cost budgets, decision audit trails…
Authorization framework for AI agent tool calls. Your AI agent needs a login screen — AgentLock is that login…
MCP Server for Claude Desktop - Agent OS kernel primitives including code safety verification, CMVK…
Mathematical drift detection library for calculating drift/hallucination scores between outputs
Security assessment framework for AI agents — adversarial test runner + server-side audit + scoring
Prompt injection & tool call security middleware for agentic LLM systems
A dotfile-driven firewall that protects the OS from destructive LLM agent tool calls
AI Action Firewall — seven-stage Decision Intelligence Core for safe agentic AI
AIR Trust Layer for CrewAI — audit trails, data tokenization, consent gates, and injection detection
AIR Trust Layer for LangChain — audit trails, Gate policy enforcement, consent gates, and injection detection
AIR Trust Layer for OpenAI Python SDK — audit trails, PII detection, injection scanning, and HMAC-SHA256…
Production-grade LLM observability. G-ARVIS scoring for Groundedness, Accuracy, Reliability, Variance…
KYA (Know Your Agent) identity verification for Microsoft AutoGen agents
Hybrid security + TDD validation for Claude Code with automatic test result capture using Google Gemini
EYDII Verify tools and guardrails for CrewAI — verify every agent action before execution
Forge Verify + Execute tools and guardrails for CrewAI — verify agent actions and track executions with…
KYA (Know Your Agent) identity verification for DSPy modules
LangChain integration for Blindfold PII detection and protection
LangChain tools for RecourseOS - evaluate consequences before destructive actions
EYDII Verify tools for LlamaIndex — verify every agent action before execution
Forge Verify + Execute tools for LlamaIndex — verify agent actions and track executions with cryptographic…
LlamaIndex tools for RecourseOS - evaluate consequences before destructive actions
Garak red-teaming evaluation adapter for eval-hub
Security testing toolkit for LLM-based systems
Cognitive Security Middleware - The 'Electronic Stability Program' (ESP) for Large Language Models…
Runtime monitoring SDK for AI applications — detect prompt injections and adversarial attacks in production.
Lightweight taint tracking for LLM pipelines — label secrets at entry, block them at unsafe sinks
Protect OpenAI and Anthropic API calls from prompt injection, jailbreaks, and data-extraction attacks.
Open-source prompt injection firewall, hallucination blocker, and agent memory layer for any LLM app
Official Python client for Open AI Guardrails policy distribution, audit evidence, and OPA control-plane APIs.
ThoughtProof Protocol — CrewAI integration for multi-model adversarial verification
Production-ready LLM security firewall powered by Groq
EYDII Verify tools and middleware for Pydantic AI — verify every agent action before execution
Forge Verify tools and middleware for Pydantic AI — verify every agent action before execution
Production-ready guardrails for Pydantic AI with native integration patterns
Quilr Guardrails Integration for LiteLLM
Enterprise-grade data poisoning detection & alerting for RAG systems
Security middleware for RAG pipelines — detect adversarial hallucination attacks before they reach your LLM.
Shadow-Sandbox DB Layer -- let AI agents modify your database safely with tenant isolation, Pydantic…
MCP server exposing the SaferAgenticAI safety framework (canonical criteria + Implementation Patterns layer)…
SCBE agent-bus: Python surface over the SCBE governed event runner. Routes AI/human/AI events through the…
Governance gate for LangChain agents. Powered by Sentinel AI — pauses risky actions for human approval, logs…
LLM sanitization SDK — DOMPurify, but for LLM context windows.
SWARM: System-Wide Assessment of Risk in Multi-agent systems - A Distributional AGI Safety framework
Enterprise-grade LLM security framework with 40+ scanners and programmable guardrails
Security scanning and monitoring for LlamaIndex applications - part of Weave Protocol
LLM Confidence Fragility Analyzer — Measure how fragile your AI's confidence really is
TypeScript reference implementation of the guardian-agent spec: a runtime supervisor for tool-using LLM…
Causal intent monitoring for LangGraph agents using bundled Structural Final Models.
Atbash safety judge exposed as a standalone MCP server
Atbash safety judge exposed as a standalone MCP server
Privacy-preserving audit framework for multi-agent AI systems. Detects cross-agent data leaks, inference…
PHANTASM: Invert LLM hallucination, confabulation, and uncertainty into productive features.
Arc Gate runtime governance for CrewAI agents
Constitutional Governance Kernel for AI Agents — trust scoring, approvals, audit trail
Zero-dependency LLM hallucination detection middleware with billing & dashboard — real-time fact-checking…
Policy-as-code guardrail enforcement for enterprise LLM applications
A2A TrustGate CLI — Safety, compliance, and governance for AI agents. Screen every action before it executes.
Action-time proof and delegation verification for MCP agents
Input and output guardrails middleware for Vercel AI SDK.
Harness for measuring LLM agent resistance to indirect prompt injection and comparing defense effectiveness.
Anthropic Claude SDK with OmegaEngine governance - AI safety and compliance
OmegaEngine governance integration for autogen-omega
OmegaEngine governance integration for cohere-omega
OmegaEngine governance integration for crewai-omega
OmegaEngine governance integration for dspy-omega
Google Gemini SDK with OmegaEngine governance
LangChain integration with OmegaEngine governance - callbacks, tools, and safety chains
OmegaEngine governance integration for langgraph-omega
LlamaIndex integration with OmegaEngine governance - RAG safety and compliance
OmegaEngine governance integration for mistral-omega