Agent Guardrails

Guardrails are rules that control what your agent can and cannot discuss. They're your safety net. blocking harmful topics, redirecting off-topic conversations, and ensuring sensitive issues get human attention.

BlockStop responses on forbidden topics

DeflectRedirect to appropriate resources

EscalateHand off to human operators

Find guardrails in your agent's Rules tab. Click + Add Restriction to create a new rule.

Restriction Fields

NameInternal label (e.g., "Block competitor mentions", "Escalate pricing questions")

Match TypeHow to detect the trigger: Keyword, Regex, or Semantic

Pattern / KeywordsThe text or pattern to match against user messages

ActionWhat happens when triggered: Block, Deflect, Warn, or Escalate

Custom ResponseThe message shown to users when this restriction triggers

PriorityOrder of evaluation (lower = higher priority). First matching rule wins.

Choose how the system detects when a restriction should trigger.

KeywordCase-insensitive word matching. Triggers when any keyword appears in the user's message.
Best for: Simple word blocklists, common phrases

RegexRegular expression pattern matching. Powerful but requires regex knowledge.
Best for: Complex patterns, email/phone detection, multi-word variations

SemanticMeaning-based matching using AI embeddings. Catches rephrasing and synonyms.
Best for: Broad topics, intent detection, sophisticated evasion attempts

Pro Tip: Tip: Semantic matching is slower but catches more variations. Use Keyword for exact terms, Semantic for broad topics.

What happens when a restriction matches. Choose based on severity.

BlockCompletely prevents the agent from responding. Shows your custom message instead. The AI never sees the question.
Use for: Competitor names, explicit content, legal red lines

DeflectRedirects the conversation. Shows a helpful message pointing elsewhere. Softer than Block. acknowledges the topic but redirects.
Use for: Off-topic questions, topics handled by other teams

WarnLogs a warning but allows the response. Use for monitoring without disruption. Good for gathering data on what topics users ask about.
Use for: Analytics, A/B testing restrictions, soft monitoring

EscalateImmediately creates an escalation for human review. The user is informed a human will follow up. Conversation continues in the Operations portal.
Use for: Complaints, legal issues, high-value sales inquiries

When a message matches multiple restrictions, only the highest priority (lowest number) applies. Priority 1 is checked first, then 2, 3, etc.

Priority 1: Block explicit contentAction: Block

Priority 2: Escalate legal questionsAction: Escalate

Priority 3: Deflect competitor comparisonsAction: Deflect

Priority 10: Warn on pricing discussionsAction: Warn

Important: Rule: Put Block rules at lowest priority numbers (1-10), Escalate rules next (11-20), then Deflect (21-50), and Warn last (51+).

Block Competitor MentionsType: Keyword | Action: Block
Keywords: competitor1, competitor2, alternative to
Response: "I can only discuss our products. Visit our comparison page for details."

Escalate Refund RequestsType: Semantic | Action: Escalate
Concept: "User is requesting a refund or wants their money back"
Response: "I'll connect you with our billing team who can help with refund requests. Please hold."

Deflect Job ApplicationsType: Keyword | Action: Deflect
Keywords: hiring, job, career, resume, apply
Response: "For career opportunities, please visit our careers page at company.com/careers"

Block Personal Information RequestsType: Regex | Action: Block
Pattern: \b(ssn|social security|credit card|password)\b
Response: "I cannot request or store personal information like passwords or financial details."

Use the Chat Test tab to verify your guardrails work correctly.

Testing Checklist

Try exact keywords. does Block trigger immediately?

Try synonyms and rephrasing. does Semantic matching catch them?

Check custom responses are clear and helpful

Verify Escalate creates an entry in Operations portal

Test priority ordering with messages matching multiple rules

Pro Tip: Tip: Test adversarial inputs. users may try to get around restrictions. Try misspellings, spacing tricks, and indirect phrasing.

Start with essentialsBegin with 3-5 critical restrictions. Too many rules slow down responses and create false positives. Add more based on real usage patterns.

Use Semantic for broad topicsKeyword matching misses variations. "Want a refund" won't match "get my money back." Semantic matching understands meaning, catching more variations.

Write helpful responsesDon't just block. guide users. Instead of "I can't discuss that," say "For pricing details, visit our pricing page at..."

Review Operations dataCheck the Knowledge Gaps and flagged conversations in the Operations portal. This reveals where you need new restrictions or adjustments.

Prefer Escalate over Block for gray areasIf you're unsure, escalate to humans. Blocking frustrates users when they have legitimate questions. Let your team decide.

Overview

Creating Restrictions

Match Types

Actions

Priority Ordering

Examples

Testing Guardrails

Best Practices