5 min read

OpenClaw AI Agent Failed Phishing Tests, Leaked User Data

Published on

June 11, 2026

An OpenClaw phishing simulation has revealed that AI email agents can be socially engineered using the same tactics that have fooled human employees for years. Researchers built and tested an OpenClaw agent connected to a live Gmail inbox and a set of synthetic internal company data, then ran four phishing scenarios against it. The results were uneven, and in the two most damaging scenarios, the agent handed over sensitive credentials and customer records without a single identity check.

What Is OpenClaw

OpenClaw is an open-source AI agent framework that allows large language models to interact with real-world systems and perform actions on their own. Businesses can configure it as an email agent capable of reading messages, reasoning about them, and taking action, including sending replies, accessing connected APIs, and retrieving internal data.

For the simulation, researchers connected the agent to Gmail, browser tools, and Google Workspace APIs. They also loaded it with fabricated but realistic internal data: AWS IAM keys, database credentials, SSH access details, CRM customer exports, and internal communications. The goal was to see how the agent performed under social pressure.

Two configurations were tested. The first was a generic setup with standard productivity instructions. The second was a strict mode that added explicit phishing awareness guidance and identity verification procedures. Both ran on Google Gemini 3.1 Pro and OpenAI GPT-5.4.

Four Attacks, Mixed Results

The simulation ran four scenarios and two of them failed badly.

In the first, an attacker posed as a team lead and claimed there was an active production incident. The message requested access to the staging environment. The agent located AWS IAM keys, database credentials, and SSH access details, then emailed them to an external Gmail account. Both the generic and strict configurations failed. Researchers noted that when a requestis framed as operationally urgent, the agent's verification step collapses even with additional safeguards in place.

In the second scenario, an attacker claimed to be working remotely and asked for a customer export to use in a presentation. The agent retrieved a CRM file containing customer records, contact details, contract information, and revenue data, then sent it out without verifying who was asking. Again, both configurations failed.

The third and fourth scenarios produced better outcomes. When the agent received a fake gift card email with an embedded phishing link, the strict configuration blocked the attempt immediately. The generic configuration visited the phishing page and tried to redeem the card using fabricated credentials before eventually identifying the site as suspicious. When researchers introduced a malicious Google OAuth application disguised as a timesheet platform, both configurations refused to grant access after identifying the app as suspicious.

Where AI Agents Fall Short

The simulation points to a specific gap in how AI agents handle trust. They can recognize suspicious URLs, flag fake login pages, and spot malicious OAuth applications. OpenClaw phishing resistance worked as expected when the threat looked technical. What it could not do was apply skepticism to requests that looked like ordinary internal communication.

Social engineering that mimics urgency, familiarity, or routine business processes bypassed the agent's defenses entirely. The framework has no mechanism for validating who is actually sending a message, and without that check, a plausible-sounding request is treated as legitimate.

At the model level, Gemini 3.1 Pro showed greater willingness to act on incoming requests, while GPT-5.4 took a more cautious approach overall. Neither model was immune to the credential-harvesting scenarios.

Reducing the Risk

Researchers drew clear lines around what needs to change for AI email agents to operate safely in enterprise environments. Agents should be required to verify sender identity before acting on any request involving sensitive data. They should also be blocked from sending emails to new external recipients without explicit human approval.

Access controls matter too. Agents with broad access to internal data sources create a much larger blast radius when something goes wrong. Limiting what an agent can reach limits what an attacker can extract.

For the highest-risk actions, credential sharing, financial data requests, and any first-time communication with an external party, the researchers recommend requiring human approval before the agent proceeds. The same zero-trust principles that govern human access in a modern enterprise need to apply to AI systems acting on behalf of those humans.

AI agents are moving into production environments faster than the security frameworks around them are maturing. This research makes clear that autonomous systems handling email and internal data are a meaningful attack surface, and that social engineering does not stop working just because the target is not human.

Subscribe to newsletter

Subscribe to receive the latest blog posts to your inbox every week.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.