Question 1

How do I create a private eval with my own data in OpenAI Evals?

Accepted Answer

Follow the build-eval.md guide to structure your data in JSON and define eval parameters in YAML. Private data is kept secure and not exposed publicly, as emphasized in the README's private data support feature.

Question 2

What's the difference between OpenAI Evals and lm-eval-harness?

Accepted Answer

OpenAI Evals is tightly integrated with the OpenAI API and focuses on model-graded evaluations with YAML, while lm-eval-harness is more generic, supports diverse models, but requires more coding. Evals is best for OpenAI-centric workflows with less coding overhead.

Question 3

Can I run OpenAI Evals without an API key or with local models?

Accepted Answer

No, it requires an OpenAI API key for model access, as stated in the setup. The framework is designed around OpenAI's infrastructure, so local model support is minimal or non-existent.

Question 4

How much does it typically cost to run evaluations with OpenAI Evals?

Accepted Answer

Costs depend on API usage based on OpenAI's pricing, with the README warning about associated expenses. For extensive testing, budget for token consumption per eval run.

Question 5

Is there a way to log results to databases other than Snowflake?

Accepted Answer

The README specifically mentions Snowflake integration via environment variables. For other databases, you'd need custom logging implementations, as it's not natively supported.

Question 6

How can I contribute an eval if I'm not a coder?

Accepted Answer

You can contribute model-graded evals using YAML templates without writing code, as encouraged in the FAQ for prompt engineers to share their expertise via structured files.

OpenAI Evals

What is OpenAI Evals?

Overview

Use Cases

Best For

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions