If you're building with AI agents in 2025, you've probably connected them to something: GitHub repos, Slack workspaces, databases, internal tools. That's the whole point—agents are useful because they can access and act on your data.
Here's the problem: The protocols that make agents useful also make them dangerous. And the security model most founders assume exists doesn't actually exist yet.
The MCP Problem
Model Context Protocol (MCP) has become the standard way to connect AI agents to external tools and data sources. It's elegant, flexible, and almost entirely trusting. When your AI agent connects to an MCP server, it's essentially saying "I'll do whatever this server tells me to do with whatever access I have."
Security researchers at Invariant Labs demonstrated this vulnerability in a way that should concern every founder using AI tools: A malicious public GitHub issue—just text in a public repo—could hijack an AI assistant and make it exfiltrate data from private repositories back to a public one.
Read that again. Someone creates a GitHub issue with carefully crafted text. Your AI agent, configured to help you with GitHub tasks, reads the issue. The prompt injection in that issue makes the agent copy your private code to a public location. No malware. No exploits. Just instructions that your agent helpfully follows.
Why This Isn't Just a GitHub Problem
The GitHub example is vivid, but the vulnerability pattern is everywhere MCP servers exist:
Slack integrations. An agent with access to your Slack workspace can read and post messages. A malicious message in a public channel could instruct it to search for and forward sensitive information.
Database connectors. An agent connected to your database through MCP can run queries. A prompt injection could make it dump your customer data somewhere it shouldn't go.
File system access. Many agents can read and write files. A malicious document could instruct the agent to exfiltrate whatever it finds.
The common thread: AI agents trust their inputs in ways that traditional software doesn't. A SQL database won't run DROP TABLE because you wrote it in a comment. An AI agent might—because to the agent, instructions are instructions.
The Founder's Dilemma
This creates an uncomfortable choice for founders building AI-native products or using AI tools internally:
Option A: Restrict agent capabilities. Limit what your agents can access, require human approval for sensitive actions, sandbox their environments. This is safer but defeats the purpose of having autonomous agents. If every action needs approval, you just have a chatbot with extra steps.
Option B: Accept the risk. Give agents the access they need to be useful and hope you don't encounter malicious inputs. This is how most early adopters are operating. It's also how breaches happen.
Option C: Build defensive architecture. Implement the security layers that MCP and similar protocols don't provide natively. This requires engineering investment most startups don't want to make, but it's the only path that preserves both functionality and security.
What Defensive Architecture Looks Like
If you're serious about using AI agents without creating a backdoor into your systems, here's what you need to think about:
Input sanitization for AI contexts. Traditional input validation doesn't work—you can't strip dangerous characters from natural language without breaking functionality. Instead, you need systems that analyze inputs for potential prompt injections before they reach your agent. This is an emerging field, but tools exist.
Least-privilege access, enforced mechanically. Don't rely on the agent to limit itself. Configure your MCP servers and tool integrations so agents can only access what they absolutely need. If your coding assistant only needs to read code, don't give it write access. If it only needs certain repos, don't give it organization-wide access.
Output monitoring and anomaly detection. Even with good input controls, you need to watch what your agents actually do. An agent suddenly trying to access resources it's never touched before, or sending data to new destinations, should trigger alerts.
Human-in-the-loop for sensitive operations. Yes, this adds friction. But certain operations—deleting data, accessing credentials, communicating externally—should require human confirmation. The inconvenience is worth it.
Audit logging that assumes compromise. Log everything your agents do, assuming you'll need to reconstruct a breach after the fact. This won't prevent attacks, but it limits damage and enables recovery.
What Vendors Won't Tell You
Most AI tool vendors are in growth mode. Their incentive is to minimize friction and maximize adoption. Security concerns slow both down.
When you evaluate AI tools and agent frameworks, assume the security model is your responsibility unless proven otherwise. Ask specific questions:
- How do you prevent prompt injection in tool inputs?
- What access controls exist at the protocol level, not just in your application?
- How are credentials and secrets managed when agents need them?
- What audit capabilities exist for agent actions?
- If a malicious input reached my agent, what's the blast radius?
Vague answers or references to "responsible AI principles" aren't security. Concrete architectural decisions are.
The Regulatory Horizon
If you're thinking "this is mainly a technical problem," consider the regulatory trajectory. The EU AI Act creates liability for AI system providers and deployers. California's AI legislation is expanding. The FTC has signaled interest in AI-related consumer protection.
Today, an AI agent leaking data is an embarrassing security incident. Tomorrow, it may be a regulatory violation with statutory penalties. Building security into your AI architecture now is cheaper than retrofitting it under regulatory pressure later.
Practical Steps for Founders
Inventory your AI tool connections. What agents do you have? What can they access? Most founders don't actually know the full scope of their AI attack surface.
Test your own systems. Try some basic prompt injections against your agents. Create documents or messages with instructions embedded in them and see what happens. You might be surprised.
Segment sensitive data. Don't connect AI agents to systems containing your most sensitive data until you have appropriate controls. This might mean running two environments—one AI-enabled and one isolated.
Choose vendors with security depth. As you evaluate AI tools, prioritize those with concrete security architectures over those with the sleekest demos. The demo won't save you in a breach.
Build security into your AI roadmap. If you're building AI-powered features, treat security as a first-class concern, not a follow-up. The architecture decisions you make now determine how hard security will be to add later.
The Bottom Line
AI agents are genuinely useful. They're also genuinely dangerous in ways we're only beginning to understand. The protocols we've built to make them powerful also make them vulnerable—and the security tools to protect them are still catching up.
As a founder, you don't have the luxury of waiting for the ecosystem to mature. You need to ship products and run operations. But you also can't afford to ignore risks that could expose your customers' data or your company's secrets.
The path forward is informed caution: Use AI agents where they add value, understand the risks you're accepting, and build defensive depth where it matters most. The founders who get this balance right will build the AI-powered companies that actually survive contact with adversaries.
The ones who don't will be the cautionary tales we learn from.