Showing 8 of 8 projects
A command-line tool for red-teaming and vulnerability scanning of large language models (LLMs).
A curated list of practical resources for responsible machine learning, covering interpretability, governance, safety, and ethics.
Clean PyTorch implementations of imitation and reward learning algorithms for reinforcement learning.
A centralized repository summarizing practical and proposed defenses against prompt injection attacks on large language models.
A curated list of resources for understanding, detecting, and mitigating prompt injection attacks against machine learning models.
A curated collection of resources on adversarial examples in deep learning, covering attacks, defenses, and applications.
A comprehensive survey and unified safety framework for embodied AI, covering 400+ papers on risks, attacks, and defenses across perception, cognition, planning, interaction, and agentic systems.
An open-source prompt guard model that detects prompt injection attacks while mitigating over-defense against benign inputs.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.