New macOS malware fakes errors to trick AI security tools

▼ Summary
– “Gaslight” malware hides prompt injection strings and fake debugging data to confuse AI-assisted malware analysis tools.
– The malware, a Rust binary with backdoor and info-stealing functions, is attributed with high confidence to a North Korean threat actor.
– A 3.5 KB payload contains 38 fake system messages, like crash reports and SQL alerts, designed to appear as legitimate analysis data.
– The fake errors aim to make LLM-assisted triage agents abort, truncate, or refuse analysis by attacking the agent’s perception, not the sandbox.
– The technique is experimental, with no demonstration of successful bypass, but shows threat actors testing anti-analysis methods for AI security platforms.
A newly identified macOS backdoor, which researchers have named Gaslight, is attempting to manipulate AI-assisted malware analysis tools by embedding deceptive strings and fabricated system errors directly into its code.
As cybersecurity professionals increasingly adopt AI-powered reverse engineering and triage platforms, threat actors are adapting their tactics. This specific malware targets the perception of those automated systems rather than the sandbox environment itself.
The binary, written in Rust, contains standard backdoor and information-stealing capabilities typical of malware attributed to a North Korean-linked threat actor with high confidence. However, its standout feature is a compact 3.5 KB payload that houses 38 fake “system” messages.
These fabricated messages are designed to mimic developer logs, crash reports, debugging output, and program alerts. They use Markdown formatting and template-style placeholders to appear as legitimate analysis data. Examples include fabricated memory dumps, token-expiration warnings, Redis connection failures, build-pipeline errors, and SQL injection alerts, all unrelated to the malware’s actual behavior.
According to security firm SentinelOne, the goal is not to evade execution inside a sandbox, but to confuse the AI systems that read these strings during automated analysis. “Its most notable feature is an embedded cascade of fabricated system-failure messages, designed to make an LLM-assisted triage agent doubt its own session,” the researchers explained. “It attacks the agent’s perception, rather than the sandbox it runs in. Accordingly, we dub this family macOS. Gaslight.”
The researchers describe these strings as prompt injection content, crafted to push an LLM agent into aborting, truncating, or refusing to continue analyzing the sample. “The scaffold contains fake system messages about token expiry, out-of-memory kills, disk exhaustion, and repeated operation failures,” they added. “It also plants bogus warnings about injection vulnerabilities and static-analysis flags.”
While SentinelOne did not demonstrate that this technique could successfully bypass AI malware analysis platforms in practice, the findings indicate that threat actors are actively experimenting with anti-analysis methods designed specifically to evade AI-assisted security tools.
(Source: BleepingComputer)




