A deep learning framework for predicting chromatin profiles and sequence regulatory activities from DNA sequences and variants.
Sei is a deep learning framework that predicts chromatin profiles and sequence regulatory activities from DNA sequences. It transforms any DNA sequence into predictions for 21,907 chromatin profiles and integrates them into 40 interpretable sequence classes, helping researchers understand the regulatory impact of genetic variants and sequences.
Bioinformaticians, computational biologists, and genetics researchers who need to analyze regulatory activities of DNA sequences or predict the functional impact of genetic variants.
Sei offers a comprehensive, interpretable map from sequence to regulatory activity, combining high-resolution chromatin profile predictions with biologically meaningful sequence classes, all within an open-source framework that supports both prediction and model training.
code to run sei and obtain sei and sequence class predictions
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Predicts 21,907 chromatin profiles including transcription factor binding and histone marks across diverse cell types, offering unmatched resolution for regulatory activity mapping.
Integrates complex predictions into 40 biologically meaningful classes like Promoter and Enhancer, bridging computational outputs with genetic insights as detailed in the manuscript.
Computes variant scores with nucleosome occupancy adjustment, addressing biases in histone mark predictions for more accurate functional impact assessment.
Accepts BED, FASTA, and VCF files, accommodating common genomics workflows and enabling seamless integration with existing pipelines.
Includes configuration files and scripts for training custom models on new datasets, though this requires significant GPU resources as noted in the README.
Requires GPU access for efficient prediction and training, with setup involving large downloads (e.g., model files over Zenodo) and dependency on PyTorch and Selene, limiting accessibility for smaller labs.
Pre-computed resources are only for hg19 and hg38 genomes; applying to other species necessitates retraining, which is resource-intensive and not straightforward for non-experts.
Involves multiple steps like environment setup with Anaconda, cluster job submissions (SLURM scripts), and HDF5 file handling, with documentation assuming prior bioinformatics expertise.
The license is limited to academic and research use, with commercial applications requiring separate negotiations, potentially hindering adoption in industry settings.