Attention Tracker: Detecting Prompt Injection Attacks in LLMs

0 stars0 forks0 contributors

Overview

NAACL 2025 Findings paper detecting prompt injection by tracking attention distribution shifts — no modification to the underlying model required, making it deployable as a wrapper on any LLM

Related Projects

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub

Last commit

Created

Links & Resources

Website

Included in

Prompt Injection453

Safety in Embodied AI: Risks, Attacks, and Defenses

Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses | 500+ Papers | Perception, Cognition, Planning, Interaction, Agentic System

Stars118

Forks3

Last commit1 day ago

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

This paper explores the concept of Indirect Prompt Injection attacks on Large Language Models (LLMs) through their integration with various applications. It identifies significant security risks, including remote data theft and ecosystem contamination, present in both real-world and synthetic applications

Stars0

Forks0

Last commit

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against LLM Jailbreaks and Prompt Injections

Oct 2025 paper systematically breaking 12 published defenses using gradient descent, RL, random search, and human-guided exploration. Most defenses originally claimed near-zero attack success rates; adaptive attacks exceeded 90% against all of them

Stars0

Forks0

Last commit

The Landscape of Prompt Injection Threats in LLM Agents (SoK)

Feb 2026 systematization-of-knowledge paper with a unified taxonomy covering attack payload strategies (heuristic vs. optimisation-based) and defense intervention stages (text, model, execution). Introduces the AgentPI benchmark for context-dependent agent tasks that all prior benchmarks ignored

Stars0

Forks0

Last commit