How Stilla handles the "Lethal Trifecta"
AI agents that combine access to private data, exposure to untrusted content, and the ability to communicate externally are vulnerable to prompt injection attacks — where malicious instructions hidden in content trick the agent into leaking data or taking unauthorized actions. Simon Willison calls this combination the "lethal trifecta".
Stilla has all three capabilities. That's what makes it useful — reading your emails, summarizing documents, and acting on your behalf. But it also means we need strong safeguards.
No AI system can fully prevent prompt injection or its effects today. Anyone who claims otherwise is overstating the state of the art. What we can do is constrain the blast radius so that even if an LLM follows a malicious instruction, the damage it can do is limited.
Here's how:
Mitigations
Change proposals and human-in-the-loop
In interactive chats, Stilla proposes changes as change proposals that you review and accept before they take effect. Agents (automations) can auto-accept change proposals within their configured restrictions.
Certain high-risk actions always require manual acceptance, even for agents:
- Emails to recipients outside your organization
- Messages to externally shared Slack channels (Slack Connect)
Agent restrictions
When you create an agent (an automation triggered by events or schedules), you control exactly what it can access and do:
- Data access: Limit which integrations and resources the agent can read
- Write actions: Restrict which actions the agent can execute, down to specific channels, recipients, or repositories
- Network access: Control whether the agent can make web requests, and to which domains
Restrictions are enforced at runtime — the agent cannot bypass them regardless of what instructions it receives.
Network policies
Organizations can control web access at the org level:
- No access: Block all external web requests
- Trusted sources only: Allow requests to a curated allowlist (customizable per org)
- Full access: Allow all web requests
Connection management
Organization admins can disable specific integrations entirely, preventing users from connecting tools that access data too sensitive for AI processing.
MCP server safeguards
Third-party MCP servers can span all three legs of the trifecta — reading private data, introducing untrusted content, and communicating externally. Stilla routes MCP write operations through change proposals, and MCP connections are subject to the same restriction and network policies as native integrations.
Our approach
We follow the principle articulated in the Design Patterns for Securing LLM Agents against Prompt Injections paper: "once an LLM agent has ingested untrusted input, it must be constrained so that it is impossible for that input to trigger any consequential actions."
We can't prevent the LLM from reading a malicious instruction. But we can — and do — ensure that consequential actions require explicit human approval, operate within declared restrictions, and respect organizational policies.
We're continuously improving Stilla's security posture and have several additional guardrails on our roadmap to further reduce risk.
For questions about Stilla's security practices, see our Trust Center, visit our Security page, or contact security@stilla.ai.