Question 1

How to install SciSpaCy and its models correctly?

Accepted Answer

First install the library via pip, then download specific models from the provided URLs, ensuring version compatibility with SciSpaCy. Use a virtual environment like Mamba and Python 3.6+ to avoid conflicts.

Question 2

SciSpaCy vs general spaCy for medical text?

Accepted Answer

SciSpaCy outperforms general spaCy on biomedical text due to domain-specific training on corpora like JNLPBA. However, for mixed or non-scientific content, general spaCy might be more appropriate and easier to set up.

Question 3

How to use SciSpaCy for entity linking to UMLS?

Accepted Answer

Add the EntityLinker pipe with linker_name set to 'umls' after loading a model. Configure parameters like threshold and resolve_abbreviations for precision, but note it requires a large initial download and setup time.

Question 4

Does SciSpaCy work with transformer models like BERT?

Accepted Answer

Yes, the en_core_sci_scibert model uses AllenAI's SciBERT transformer. It offers state-of-the-art performance but requires GPU for optimal speed, as noted in the model description.

Question 5

What are the differences between SciSpaCy's sm, md, and lg models?

Accepted Answer

They vary in vocabulary size and word vectors: sm has ~100k vocab, md adds 50k vectors, and lg has ~785k vocab with 600k vectors. Larger models improve coverage but increase memory usage and inference time.

Question 6

How to detect abbreviations in clinical notes with SciSpaCy?

Accepted Answer

Add the AbbreviationDetector component to the pipeline using nlp.add_pipe('abbreviation_detector'). It identifies and resolves abbreviations via doc._.abbreviations, based on the Schwartz & Hearst algorithm.

ScispaCy

What is ScispaCy?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions