Discovers that text embedding models have severely biased output distributions, and exploits this to find universal adversarial suffixes ("magic words") that bypass embedding-based LLM safeguards. Attacks transfer across models and languages; a train-free debiasing defense is also proposed
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.