Releases: allenai/scispacy
v0.6.2
v0.6.0
Allow spaCy 3.8.x
The scispacy package itself is compatible with both 3.7.x and 3.8.x. If you install one of the scispacy pipelines, spacy will get downgraded automatically to <3.8. If you force installsspacy 3.8.x, its slightly undefined behavior, although appears to work fine by my tests, and user report.
Linking extensability
Thanks to @cthoyt for various improvements to make the linking functionality more extensible. See https://github.com/allenai/scispacy?tab=readme-ov-file#extending-scispacy-to-new-databases-and-ontologies for more info.
What's Changed
- update readme to include installation method for nmslib on Apple M2 chip using python3.9 by @Archertakesitez in #531
- update installation note with Mac M4 by @LinglongQian in #532
- Switch from
setup.pytopyproject.tomlby @cthoyt in #541 - Enable opt-out of saving TF-IDF annotation index by @cthoyt in #543
- Add broader range of test environments by @cthoyt in #546
- Enable creation of entity linker from a knowledge base by @cthoyt in #544
- Refactor KnowledgeBase constructor by @cthoyt in #547
- Implement TF-IDF cache loading by @cthoyt in #549
- Document extending scispaCy with additional ontologies with PyOBO by @cthoyt in #542
- build: relax spacy minor upper bound by @JohnGiorgi in #551
- Bump version to 0.6.0 by @dakinggg in #552
New Contributors
- @Archertakesitez made their first contribution in #531
- @LinglongQian made their first contribution in #532
- @cthoyt made their first contribution in #541
Full Changelog: v0.5.5...v0.6.0
v0.5.5
Support for python 3.12
This release adds support for python 3.12 by updating scipy and using nmslib-metabrainz rather than nmslib.
What's Changed
- Fix export_umls_json.py by @ethanhkim in #511
- Add support matrix for nmslib installation by @dakinggg in #524
- Update Dockerfile by @dakinggg in #525
- Support Python 3.12 via newer scipy and nmslib-metabrainz by @jason-nance in #523
- Add shorter version of pip installing nmslib from source by @svlandeg in #529
- Version bump by @dakinggg in #530
New Contributors
- @ethanhkim made their first contribution in #511
- @jason-nance made their first contribution in #523
- @svlandeg made their first contribution in #529
Full Changelog: v0.5.4...v0.5.5
v0.5.4
Update for spacy 3.7.x
What's Changed
- Fixes #485 Project Page URL in setup.py by @sajedjalil in #495
- add progress bar to http_get by @WeixiongLin in #499
- Update for spacy 3.7 compatibility by @dakinggg in #507
- Update publish workflow to trusted publisher by @dakinggg in #508
New Contributors
- @sajedjalil made their first contribution in #495
- @WeixiongLin made their first contribution in #499
Full Changelog: v0.5.3...v0.5.4
Version 0.5.3
Retrains the models with spacy 3.6.x to be compatible with the latest spacy version
What's Changed
- Update README.md by @dakinggg in #476
- Update EntityLinker docstring by @andyjessen in #472
- Support UMLS filtering by language (Solves #477) by @nachollorca in #478
- Add a note about make_serializable argument by @JohnGiorgi in #484
- Drop umls and umls_ents attributes in linker by @JohnGiorgi in #489
- Updating nmslib hyperparameters guide url by @kaushikacharya in #493
- Update to latest spacy version by @dakinggg in #494
New Contributors
- @nachollorca made their first contribution in #478
- @JohnGiorgi made their first contribution in #484
Full Changelog: v0.5.2...v0.5.3
v0.5.2
This release includes an update of the entity linkers to use the latest UMLS release (2022AB), which includes information about newer entities like COVID-19.
In [10]: doc = nlp("COVID-19 is a global pandemic.")
In [11]: linker = nlp.get_pipe('scispacy_linker')
In [12]: linker.kb.cui_to_entity[doc.ents[0]._.kb_ents[0][0]]
Out[12]:
CUI: C5203670, Name: COVID19 (disease)
Definition: A viral disorder generally characterized by high FEVER; COUGH; DYSPNEA; CHILLS; PERSISTENT TREMOR; MUSCLE PAIN; HEADACHE; SORE THROAT; a new loss of taste and/or smell (see AGEUSIA and ANOSMIA) and other symptoms of a VIRAL PNEUMONIA. In severe cases, a myriad of coagulopathy associated symptoms often correlating with COVID-19 severity is seen (e.g., BLOOD COAGULATION; THROMBOSIS; ACUTE RESPIRATORY DISTRESS SYNDROME; SEIZURES; HEART ATTACK; STROKE; multiple CEREBRAL INFARCTIONS; KIDNEY FAILURE; catastrophic ANTIPHOSPHOLIPID ANTIBODY SYNDROME and/or DISSEMINATED INTRAVASCULAR COAGULATION). In younger patients, rare inflammatory syndromes are sometimes associated with COVID-19 (e.g., atypical KAWASAKI SYNDROME; TOXIC SHOCK SYNDROME; pediatric multisystem inflammatory disease; and CYTOKINE STORM SYNDROME). A coronavirus, SARS-CoV-2, in the genus BETACORONAVIRUS is the causative agent.
TUI(s): T047
Aliases (abbreviated, total: 47):
2019 Novel Coronavirus Infection, SARS-CoV-2 Disease, Human Coronavirus 2019 Infection, SARS-CoV-2 Infection, Disease caused by severe acute respiratory syndrome coronavirus 2 (disorder), Disease caused by SARS-CoV-2, 2019 nCoV Disease, 2019 Novel Coronavirus Disease, COVID-19 Virus Disease, Virus Disease, COVID-19
It also includes a small bug fix to the abbreviation detector.
Note: The models (e.g. en_core_sci_sm) are still labeled as version v0.5.1, as this release did not involve retraining the base models, only the entity linkers.
What's Changed
- Fix typo by @andyjessen in #453
- Update README.md by @dakinggg in #456
- Update to the latest UMLS version by @dakinggg in #474
New Contributors
- @andyjessen made their first contribution in #453
Full Changelog: v0.5.1...v0.5.2
Version 0.5.1
Retrains the models with spacy 3.4.x to be compatible with the latest spacy version
Release v0.5.0
Updates scispacy to be compatiable with the latest spacy version (3.2.3)
Scispacy 0.4.0 - Compatible with Spacy 3
This release of scispacy is compatible with Spacy 3. It also includes a new model 🥳 , en_core_sci_scibert, which uses scibert base uncased to do parsing and POS tagging (but not NER, yet. This will come in a later release).
Version 0.3.0
New Features
Hearst Patterns
This component implements Automatic Aquisition of Hyponyms from Large Text Corpora using the SpaCy Matcher component.
Passing extended=True to the HyponymDetector will use the extended set of hearst patterns, which include higher recall but lower precision hyponymy relations (e.g X compared to Y, X similar to Y, etc).
This component produces a doc level attribute on the spacy doc: doc._.hearst_patterns, which is a list containing tuples of extracted hyponym pairs. The tuples contain:
- The relation rule used to extract the hyponym (type:
str) - The more general concept (type:
spacy.Span) - The more specific concept (type:
spacy.Span)
Usage:
import spacy
from scispacy.hyponym_detector import HyponymDetector
nlp = spacy.load("en_core_sci_sm")
hyponym_pipe = HyponymDetector(nlp, extended=True)
nlp.add_pipe(hyponym_pipe, last=True)
doc = nlp("Keystone plant species such as fig trees are good for the soil.")
print(doc._.hearst_patterns)
>>> [('such_as', Keystone plant species, fig trees)]Ontonotes Mixin: Clear Format > UD
Thanks to Yoav Goldberg for this fix! Yoav noticed that the dependency labels for the Onotonotes data use a different format than the converted GENIA Trees. Yoav wrote some scripts to convert between them, including normalising of some syntactic phenomena that were being treated inconsistently between the two corpora.
Bug Fixes
#252 - removed duplicated aliases in the entity linkers, reducing the size of the UMLS linker by ~10%
#249 - fix the path to the rxnorm linker