Agent Guardrails

Keep your AI agents on-topic and safe. Set up restrictions to block sensitive topics, deflect off-topic questions, and escalate to humans when needed.

Guardrails are rules that control what your agent can and cannot discuss. They're your safety net. blocking harmful topics, redirecting off-topic conversations, and ensuring sensitive issues get human attention.

BlockStop responses on forbidden topics
DeflectRedirect to appropriate resources
EscalateHand off to human operators

Find guardrails in your agent's Rules tab. Click + Add Restriction to create a new rule.

Restriction Fields

NameInternal label (e.g., "Block competitor mentions", "Escalate pricing questions")
Match TypeHow to detect the trigger: Keyword, Regex, or Semantic
Pattern / KeywordsThe text or pattern to match against user messages
ActionWhat happens when triggered: Block, Deflect, Warn, or Escalate
Custom ResponseThe message shown to users when this restriction triggers
PriorityOrder of evaluation (lower = higher priority). First matching rule wins.

Choose how the system detects when a restriction should trigger.

KeywordCase-insensitive word matching. Triggers when any keyword appears in the user's message.
Best for: Simple word blocklists, common phrases
RegexRegular expression pattern matching. Powerful but requires regex knowledge.
Best for: Complex patterns, email/phone detection, multi-word variations
SemanticMeaning-based matching using AI embeddings. Catches rephrasing and synonyms.
Best for: Broad topics, intent detection, sophisticated evasion attempts
Pro Tip: Tip: Semantic matching is slower but catches more variations. Use Keyword for exact terms, Semantic for broad topics.

What happens when a restriction matches. Choose based on severity.

BlockCompletely prevents the agent from responding. Shows your custom message instead. The AI never sees the question.
Use for: Competitor names, explicit content, legal red lines
DeflectRedirects the conversation. Shows a helpful message pointing elsewhere. Softer than Block. acknowledges the topic but redirects.
Use for: Off-topic questions, topics handled by other teams
WarnLogs a warning but allows the response. Use for monitoring without disruption. Good for gathering data on what topics users ask about.
Use for: Analytics, A/B testing restrictions, soft monitoring
EscalateImmediately creates an escalation for human review. The user is informed a human will follow up. Conversation continues in the Operations portal.
Use for: Complaints, legal issues, high-value sales inquiries

When a message matches multiple restrictions, only the highest priority (lowest number) applies. Priority 1 is checked first, then 2, 3, etc.

Priority 1: Block explicit contentAction: Block
Priority 2: Escalate legal questionsAction: Escalate
Priority 3: Deflect competitor comparisonsAction: Deflect
Priority 10: Warn on pricing discussionsAction: Warn
Important: Rule: Put Block rules at lowest priority numbers (1-10), Escalate rules next (11-20), then Deflect (21-50), and Warn last (51+).
Block Competitor MentionsType: Keyword | Action: Block
Keywords: competitor1, competitor2, alternative to
Response: "I can only discuss our products. Visit our comparison page for details."
Escalate Refund RequestsType: Semantic | Action: Escalate
Concept: "User is requesting a refund or wants their money back"
Response: "I'll connect you with our billing team who can help with refund requests. Please hold."
Deflect Job ApplicationsType: Keyword | Action: Deflect
Keywords: hiring, job, career, resume, apply
Response: "For career opportunities, please visit our careers page at company.com/careers"
Block Personal Information RequestsType: Regex | Action: Block
Pattern: \b(ssn|social security|credit card|password)\b
Response: "I cannot request or store personal information like passwords or financial details."

Use the Chat Test tab to verify your guardrails work correctly.

Testing Checklist

Try exact keywords. does Block trigger immediately?
Try synonyms and rephrasing. does Semantic matching catch them?
Check custom responses are clear and helpful
Verify Escalate creates an entry in Operations portal
Test priority ordering with messages matching multiple rules
Pro Tip: Tip: Test adversarial inputs. users may try to get around restrictions. Try misspellings, spacing tricks, and indirect phrasing.
Start with essentialsBegin with 3-5 critical restrictions. Too many rules slow down responses and create false positives. Add more based on real usage patterns.
Use Semantic for broad topicsKeyword matching misses variations. "Want a refund" won't match "get my money back." Semantic matching understands meaning, catching more variations.
Write helpful responsesDon't just block. guide users. Instead of "I can't discuss that," say "For pricing details, visit our pricing page at..."
Review Operations dataCheck the Knowledge Gaps and flagged conversations in the Operations portal. This reveals where you need new restrictions or adjustments.
Prefer Escalate over Block for gray areasIf you're unsure, escalate to humans. Blocking frustrates users when they have legitimate questions. Let your team decide.
Was this helpful?