Mapping the Future of AI Security

AI security is one of the most pressing challenges facing the world today. Artificial intelligence is extraordinarily powerful, and, especially considering the advent of Agentic AI, growing more so by the day. But it is for this reason that securing it is so important. AI handles massive amounts of data and plays an increasingly important role in operations; should cybercriminals abuse it, the consequences can be dire. 

In this blog, we’ll explore some of the most common and pressing threats to AI, frameworks designed to help secure it, and the intrinsic link between AI and API security. Think of it as your guide to AI security. So, let’s dive in. 

OWASP Top 10 for LLMs

The OWASP Top 10 for Large Language Models (LLMs) is a specialized framework developed by the Open Worldwide Application Security Project (OWASP) to address the unique security challenges posed by LLMs and GenAI tools. 

First released in November 2024, the framework aims to help organizations identify vulnerabilities, implement mitigations, and ensure secure LLM development. Here’s the full top 10.

Prompt Injection

Prompt injection is when malicious actors inject malicious inputs, like “Ignore previous instructions,” into LLMs to manipulate model behavior. When these injected prompts trick the LLM into using external tools, the consequences can be severe, including Server-Side Request Forgery (SSRF), allowing them to access internal systems, or allowing attackers to exfiltrate sensitive data. 

We’ve seen real-world examples of this type of threat. In December 2024, for example, The Guardian reported that OpenAI’s ChatGPT search tool was vulnerable to prompt injection attacks, allowing hidden webpage content to manipulate its responses. Testing showed that invisible text could override negative reviews with artificially positive assessments, potentially misleading users.

Insecure Output Handling

This vulnerability arises out of the assumption that AI-generated content is inherently safe. When an LLM returns raw HTML or JavaScript code directly into a web application, it opens the door to Cross-Site Scripting (XSS). 

Imagine a malicious script, disguised as harmless text, being injected into a webpage. When a user visits that page, their browser unknowingly executes this script, potentially leading to account takeover, data theft, or defacement of the website. It’s a stark reminder that without proper sanitization, seemingly innocent LLM outputs can be dangerous. 

Training Data Poisoning

Training data poisoning is a subtle but dangerous attack where attackers tamper with the data used to train an AI model. For example, an attacker might inject malicious data into GitHub commits that are then used to fine-tune a code recommendation model. The result would be that, instead of outputting secure and helpful code, the compromised model could start recommending wallet scams or backdoored libraries. 

Model Supply Chain Vulnerabilities

Sometimes, LLMs come with hidden threats, akin to software supply chain attacks, where malicious code is embedded within model files. For example, a model hosted on Hugging Face was found to contain a malicious payload that established a reverse shell to a remote server, effectively granting attackers complete control over the victim’s machine. 

Permission Misconfigurations

Overly broad permissions are a serious AI security threat. Imagine if an AI agent inherits administrator-level access to sensitive Human Resources or Finance APIs, even a rudimentary, seemingly harmless prompt like “show me employee salaries” could be all it takes for a malicious actor to exfiltrate highly confidential payroll data.

Overreliance on LLM Output

This is a simple one: when human operators rely too much on LLMs, they are at risk of taking potentially hallucinatory outputs as gospel, leading to compliance issues. 

Excessive Agency

Because of their autonomous capabilities, agents like AutoGPT need robust safeguards, especially when they are configured with the power to delete files or modify critical infrastructure. 

Plugin Abuse & Escalation

By exploiting an LLM’s plugin access, attackers can craft prompts that trick the model into extracting sensitive secrets or issuing arbitrary, unauthorized commands to backend systems. It’s essentially common command injection disguised as natural dialogue. 

Insecure Plugin Design

Insecurely designed plugins, riddled with vulnerabilities like absent or inadequate authentication and authorization controls, act as open doorways to backend infrastructure. Without proper safeguards, these flawed plugins can be exploited to gain unauthorized access to critical systems and the sensitive data they hold. 

Model DoS

As with so many systems, LLMs are susceptible to Denial of Service (DoS) attacks. Maliciously crafted inputs, such as recursive prompt loops that endlessly consume computational resources or the deliberate flooding of the model with massive token inputs, can effectively starve the system of compute power. 

Agentic AI: A (Very) High-Level Overview

At this point, we haven’t really discussed Agentic AI. When you plug LLMs into tools, give them memory, or let them use APIs, they become agents. This changes the security landscape dramatically. Imagine a receipts-processing agent. It accepts PDFs, queries policies from a vector DB, validates claims, and then routes approvals via API. What happens when the PDF prompt manipulates the agent to mark fraudulent expenses as urgent and valid? No human catches it. That’s agentic power misused. 

Here’s a visual walkthrough of how a typical agentic AI system flows — and where attackers strike: 

Common threats in agentic workflows often include: 

Tool Misuse

When agents are given access to tools – for example, shell access, APIs controlling critical systems, or automation platforms – weak safeguards or misaligned objectives can result in misuse. This misuse can include deleting essential files, modifying configurations, issuing unintended financial transactions, or launching network scans, and often results from inadequate validation, ambiguous instructions, or errors in reasoning and goal translation. 

Intent Manipulation

Attackers may craft prompts or input sequences that exploit weaknesses in an agent’s goal-tracking or alignment mechanisms. This manipulation can subtly or overtly shift the agent’s intent away from its original task. For example, an attacker might embed misleading or adversarial instructions in a prompt that cause an assistant to exfiltrate sensitive data, sabotage another task, or elevate its permissions without authorization.

Privilege Compromise

If an agent is provisioned with API tokens or access credentials that grant excessive privileges beyond what is necessary for its current task, a compromise in the agent’s logic or external manipulation could allow abuse. This could include accessing user data it shouldn’t see, modifying infrastructure, or impersonating other services. 

Agent-to-Agent Communication Poisoning

In systems where multiple agents interact, such as decentralized AI agents collaborating on a workflow, an attacker could inject false or manipulated data into the communication stream. If not validated, this misinformation can cascade, causing agents to make poor decisions, fail tasks, or propagate errors throughout the system. 

Securing Agentic AI: The MAESTRO Framework

So, now we understand some of the threats, we can look at how to secure Agentic AI. The MAESTRO (Multi-Agent Environment, Security, Threat, Risk, and Outcome) framework is a threat modelling approach, designed by the Cloud Security Alliance, to address the security challenges posed by Agentic AI systems, so that would probably be a good start. 

MAESTRO came into existence when CSA realized that traditional threat modelling frameworks like STRIDE, PASTA, and LINDDUN were not capable of handling the dynamic and autonomous nature of Agentic AI. These frameworks fall short of addressing AI-specific vulnerabilities such as adversarial attacks, data poisoning, and the complexities arising from multi-agent systems. 

The Seven Layers of MAESTRO  

MAESTRO’s structure centers around a seven-layer reference architecture, each representing a critical component of Agentic AI systems. Here’s a table outlining them all, along with the risks and real threats they represent.   

Layer  Risk Real Threat
Foundation Model poisoning Malicious training data leads the model to recommend scam URLs or produce harmful outputs.
Data Operations Embedding drift Outdated or manipulated vector embeddings cause the system to approve irrelevant or harmful content. 
Agent Frameworks Plugin abuse An agent exploits a plugin to access unauthorized files, such as reading confidential secrets.
Deployment and Infrastructure Secrets in logs API keys or sensitive data are inadvertently logged and exposed through cloud monitoring tools.
Evaluation and Observability Log tampering An agent deletes or alters logs to hide evidence of fraudulent or malicious actions.
Security and Compliance Guardrail bypass The agent circumvents established approval policies, executing actions without proper authorization. 
Agent Ecosystem Rogue agent A compromised agent influences or infects other agents, leading to a cascade of malicious behaviors. 

MAESTRO has been effectively applied in various contexts to enhance the security of Agentic AI systems. For example, using MAESTRO, security analysts identified potential risks associated with API interactions, ensuring robust protection against misuse. 

 How Wallarm API Security Can Help 

As we’ve seen, Agentic AI introduces new attack surfaces – and many of them converge at the API layer. APIs are the nervous system of agentic workflows: They enable tool use, connect LLMs to databases and apps, and automate actions. But that power also makes them prime targets. The MAESTRO helps us think about this systematically. Each of its layers reveals how AI and API security are tightly interlinked. 

  • Foundation: Poisoned models may call APIs in unsafe ways or under false pretenses, producing fraudulent or harmful outputs.
  • Data Operations: Embedding drift might cause agents to approve or deny API requests based on stale or manipulated representations.
  • Agent Frameworks: Plugins often function as wrappers for APIs – if an agent abuses plugin access, it’s essentially performing unauthorized API calls.
  • Infrastructure: Logging secrets or API tokens exposes backend services to attackers — a direct API security issue.
  • Evaluation: If an agent tampers with logs, API abuse may go undetected, undermining audit trails and response.
  • Compliance: When agents bypass approval policies and trigger unauthorized APIs, compliance risks escalate.
  • Ecosystem: Rogue agents can hijack APIs across environments, spreading attacks via lateral movement.

So, how does Wallarm fit in? Wallarm offers a multi-faceted approach to securing AI-driven environments: 

  • Prevent Injection Attacks and Data Leakage: Wallarm detects and blocks prompt injection attempts, preventing unauthorized access and potential breaches. 
  • Safeguard Critical Enterprise Systems: Wallarm restricts AI agents to approved APIs and monitors their interactions to protect enterprise systems from misuse and unauthorized access.
  • Control Operational Costs: By monitoring API usage in real time, Wallarm helps detect and mitigate abusive behaviors that could lead to unexpected costs. 
  • Ensure Secure and Compliant Operations: Wallarm offers tools to enforce compliance policies, monitor sensitive data flows, and maintain the integrity of AI operations. 

Want to find out more about how Wallarm protects Agentic AI? Click here.

The post Mapping the Future of AI Security appeared first on Wallarm.