A CLI and library for evaluating, red-teaming, and comparing LLM prompts, agents, and RAGs with simple declarative configs.
Promptfoo is a CLI and library for evaluating and red-teaming LLM applications. It helps developers test prompts, agents, and RAGs, compare different AI models, and identify security vulnerabilities through automated evaluations. The tool enables teams to ship secure, reliable AI applications by replacing trial-and-error with systematic testing.
AI developers, ML engineers, and product teams building LLM-powered applications who need to ensure prompt reliability, model performance, and application security.
Developers choose Promptfoo because it provides comprehensive LLM testing capabilities while keeping all evaluations local and private. Its battle-tested approach, flexible configuration, and integration with CI/CD pipelines make it a practical solution for production AI applications.
Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Evaluations run 100% locally without sending prompts to external servers, ensuring data security and privacy as emphasized in the README.
Works with any LLM API or programming language via declarative configs, making it adaptable to diverse tech stacks without vendor lock-in.
Includes live reload and caching for faster iteration, prioritizing a streamlined developer experience as highlighted in the philosophy.
Powers LLM apps serving 10M+ users, demonstrating reliability and scalability for real-world deployments.
Setting up comprehensive test suites requires learning YAML/JSON configs and custom metrics, which can be complex for quick prototyping.
Depends on external provider API keys for model comparisons, adding operational hassle and potential security risks if mishandled.
Primarily CLI-driven with a basic web viewer; lacks a full graphical interface for teams preferring drag-and-drop or visual testing tools.