[object Object]

Open Source: Garak

NVIDIA-backed Garak (garak.ai) is the de facto starting point. Python framework for LLM red-teaming with 100+ built-in probes covering prompt injection, data leakage, jailbreaks, encoding attacks, package hallucination, and toxicity. Plug-in architecture for custom probes. Reports in HTML or JSONL.

python -m pip install garak
python -m garak --model_type openai --model_name gpt-5 \
  --probes promptinject,dan,leakreplay

Limitations: probes are public, so any model trained after late 2024 has likely seen them and trivially defeats them. Useful for regression testing, weak as a final safety check.

Microsoft PyRIT

PyRIT (Python Risk Identification Toolkit, github.com/Azure/PyRIT) — Microsoft’s open-source red-team framework. Strongest for Azure OpenAI scenarios but provider-agnostic. Particular value in multi-turn attack orchestration: PyRIT’s Orchestrator class chains probes across conversation turns, simulating a human attacker who pivots based on responses.

Hits hardest on RAG systems and tool-using agents — Microsoft Research published PyRIT-driven discoveries of tool-poisoning paths in production Copilot Studio agents through late 2025. Steeper learning curve than Garak, more capability ceiling.

NVIDIA NeMo Guardrails Eval

NeMo Guardrails (release 0.13, March 2026) ships an eval mode that pits the policy layer against an attacker LLM. Not a full red-team tool but useful for testing your guardrail config without standing up a separate framework. Pairs naturally with Garak for input-side probes.

Commercial Services

Specialized red-team firms — HiddenLayer, Robust Intelligence (now Cisco), Lakera, Promptfoo Enterprise, and the human-only services from HackerOne and Bugcrowd — offer engagements for high-stakes deployments. Typical pre-launch engagement runs $40K–$150K and 2–4 weeks. Output: a prioritized findings report, suggested mitigations, and (with the better firms) a reusable test harness handed back to your team.

Human creativity still beats automated tools on novel attack vectors. The 2025 OWASP LLM Top 10 lists three categories (excessive agency, sensitive info disclosure via tool output, and supply-chain attacks on prompts) where automated tools have under 30% coverage. Combine both.

What to Test Specifically for CRM Agents

CRM agents have unusual surface area:

  • Cross-tenant data leakage when an agent’s tool calls into a multi-tenant API.
  • SOQL/SQL injection through user prompts that reach a query-building tool.
  • IDOR via record IDs leaked in responses then re-used in follow-up turns.
  • Email/SMS exfiltration through “send a confirmation to…” prompts.
  • Privilege escalation through chained tools (read tool reveals an admin user ID, write tool then targets it).
  • Knowledge-base poisoning if any agent has write access to its own RAG corpus.

Cadence

Pre-production red team before launch — non-negotiable for any agent with write permissions or PII access. Quarterly for customer-facing agents. After every major model update (and “minor” updates from frontier vendors are not minor in behavior space), prompt changes, or tool additions. Post-incident: when something breaks in prod, replay it through the red-team harness and add a regression probe.

One-time red-team isn’t enough — it’s a discipline, not an event. Budget 5–10% of agent operational spend on continuous adversarial testing.

What to Do This Week

Run Garak’s promptinject and leakreplay probes against your current prod agent. If you find anything new, you have a process problem, not just a finding.

[object Object]
Share