How Stilla handles the "Lethal Trifecta"

AI agents that combine access to private data, exposure to untrusted content, and the ability to communicate externally are vulnerable to prompt injection attacks — where malicious instructions hidden in content trick the agent into leaking data or taking unauthorized actions. Simon Willison calls this combination the "lethal trifecta".

Stilla has all three capabilities. That's what makes it useful — reading your emails, summarizing documents, and acting on your behalf. But it also means we need strong safeguards.

No AI system can fully prevent prompt injection or its effects today. Anyone who claims otherwise is overstating the state of the art. What we can do is constrain the blast radius so that even if an LLM follows a malicious instruction, the damage it can do is limited.

Here's how:

Mitigations

Organization-wide restrictions

Organization admins can configure restrictions at the org level that apply across every interaction with Stilla — user chats, Slack conversations, and agents alike. These policies set the maximum scope of what Stilla can access for anyone in the organization; individual chats and agents can be more restrictive, but cannot relax constraints beyond what the org allows.

Admins can also configure default agent restrictions so new agents inherit the organization's security posture automatically, rather than starting from full access.

Connection management

Organization admins can disable specific integrations entirely, preventing users from connecting tools that access data too sensitive for AI processing.

Network policies

Organizations can control web access at the org level:

No access: Block all external web requests
Trusted sources only: Allow requests to a curated allowlist (customizable per org)
Full access: Allow all web requests

Agent restrictions

Agents (automations triggered by events or schedules) inherit your organization's restrictions and can only narrow them further. When you create one, you control exactly what it can access and do:

Data access: Limit which integrations and resources the agent can read
Write actions: Restrict which actions the agent can execute, down to specific channels, recipients, or repositories
Network access: Control whether the agent can make web requests, and to which domains

Restrictions are enforced at runtime — the agent cannot bypass them regardless of what instructions it receives.

Change proposals and human-in-the-loop

In interactive chats, Stilla proposes changes as change proposals that you review and accept before they take effect. Agents can auto-accept change proposals within their configured restrictions.

Certain high-risk actions always require manual acceptance, even for agents:

Emails to recipients outside your organization
Messages to externally shared Slack channels (Slack Connect)

MCP server safeguards

Third-party MCP servers can span all three legs of the trifecta — reading private data, introducing untrusted content, and communicating externally. Stilla inspects each tool's readOnlyHint annotation: tools that declare themselves as mutating are blocked from direct execution and must be routed through a change proposal. MCP connections are also subject to the same restriction and network policies as native integrations.

Our approach

We follow the principle articulated in the Design Patterns for Securing LLM Agents against Prompt Injections paper: "once an LLM agent has ingested untrusted input, it must be constrained so that it is impossible for that input to trigger any consequential actions."

We can't prevent the LLM from reading a malicious instruction. But we can — and do — ensure that consequential actions require explicit human approval, operate within declared restrictions, and respect organizational policies.

We're continuously improving Stilla's security posture and have several additional guardrails on our roadmap to further reduce risk.

For questions about Stilla's security practices, see our Trust Center, visit our Security page, or contact security@stilla.ai.