Question 1

How to download and use FLIP datasets for model training?

Accepted Answer

You can access the processed FASTA files from the external repository at http://data.bioembeddings.com/public/FLIP/fasta/. Use the splits folder to get train/test sets as per the README instructions.

Question 2

What does the green, orange, red semaphore mean in FLIP splits?

Accepted Answer

The semaphore indicates split suitability: green splits are active for performance evaluation, orange splits may overestimate performance and should be used with caution, and red splits are obsolete and not recommended.

Question 3

FLIP vs ProteinNet for protein sequence benchmarks?

Accepted Answer

FLIP focuses on protein design tasks with biologically curated splits and a semaphore system for guidance, while ProteinNet is broader for sequence prediction. FLIP is more specialized for design evaluation.

Question 4

How to add my own datasets to FLIP for benchmarking?

Accepted Answer

The README doesn't specify contribution processes. You might need to adapt the collect_splits notebooks or fork the repository, but this requires significant effort and isn't documented.

Question 5

Is FLIP suitable for evaluating transformer models on protein sequences?

Accepted Answer

Yes, FLIP's standardized benchmarks are designed to assess various ML models, including transformers, on protein design tasks like stability or function prediction.

Question 6

Can I use FLIP for commercial protein engineering projects?

Accepted Answer

FLIP is open-source, but you must verify the licenses of the aggregated datasets. The external data might have usage restrictions, so check the source repositories.

FLIP (Fitness Landscape Inference for Proteins)

What is FLIP (Fitness Landscape Inference for Proteins)?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions