Skip to content

Releases: allenai/scispacy

v0.6.2

01 Oct 06:18
ac03b40

Choose a tag to compare

Release of v0.6.0 so that release action reruns.

What's Changed

Full Changelog: v0.6.1...v0.6.2

v0.6.0

01 Oct 00:17
29f8f32

Choose a tag to compare

Allow spaCy 3.8.x

The scispacy package itself is compatible with both 3.7.x and 3.8.x. If you install one of the scispacy pipelines, spacy will get downgraded automatically to <3.8. If you force installsspacy 3.8.x, its slightly undefined behavior, although appears to work fine by my tests, and user report.

Linking extensability

Thanks to @cthoyt for various improvements to make the linking functionality more extensible. See https://github.com/allenai/scispacy?tab=readme-ov-file#extending-scispacy-to-new-databases-and-ontologies for more info.

What's Changed

  • update readme to include installation method for nmslib on Apple M2 chip using python3.9 by @Archertakesitez in #531
  • update installation note with Mac M4 by @LinglongQian in #532
  • Switch from setup.py to pyproject.toml by @cthoyt in #541
  • Enable opt-out of saving TF-IDF annotation index by @cthoyt in #543
  • Add broader range of test environments by @cthoyt in #546
  • Enable creation of entity linker from a knowledge base by @cthoyt in #544
  • Refactor KnowledgeBase constructor by @cthoyt in #547
  • Implement TF-IDF cache loading by @cthoyt in #549
  • Document extending scispaCy with additional ontologies with PyOBO by @cthoyt in #542
  • build: relax spacy minor upper bound by @JohnGiorgi in #551
  • Bump version to 0.6.0 by @dakinggg in #552

New Contributors

Full Changelog: v0.5.5...v0.6.0

v0.5.5

27 Oct 05:42
b5687f5

Choose a tag to compare

Support for python 3.12

This release adds support for python 3.12 by updating scipy and using nmslib-metabrainz rather than nmslib.

What's Changed

New Contributors

Full Changelog: v0.5.4...v0.5.5

v0.5.4

08 Mar 05:57
29b1e46

Choose a tag to compare

Update for spacy 3.7.x

What's Changed

New Contributors

Full Changelog: v0.5.3...v0.5.4

Version 0.5.3

30 Sep 19:50
7da5117

Choose a tag to compare

Retrains the models with spacy 3.6.x to be compatible with the latest spacy version

What's Changed

New Contributors

Full Changelog: v0.5.2...v0.5.3

v0.5.2

29 Apr 21:21
5368cc3

Choose a tag to compare

This release includes an update of the entity linkers to use the latest UMLS release (2022AB), which includes information about newer entities like COVID-19.

In [10]: doc = nlp("COVID-19 is a global pandemic.")

In [11]: linker = nlp.get_pipe('scispacy_linker')

In [12]: linker.kb.cui_to_entity[doc.ents[0]._.kb_ents[0][0]]
Out[12]:
CUI: C5203670, Name: COVID19 (disease)
Definition: A viral disorder generally characterized by high FEVER; COUGH; DYSPNEA; CHILLS; PERSISTENT TREMOR; MUSCLE PAIN; HEADACHE; SORE THROAT; a new loss of taste and/or smell (see AGEUSIA and ANOSMIA) and other symptoms of a VIRAL PNEUMONIA. In severe cases, a myriad of coagulopathy associated symptoms often correlating with COVID-19 severity is seen (e.g., BLOOD COAGULATION; THROMBOSIS; ACUTE RESPIRATORY DISTRESS SYNDROME; SEIZURES; HEART ATTACK; STROKE; multiple CEREBRAL INFARCTIONS; KIDNEY FAILURE; catastrophic ANTIPHOSPHOLIPID ANTIBODY SYNDROME and/or DISSEMINATED INTRAVASCULAR COAGULATION). In younger patients, rare inflammatory syndromes are sometimes associated with COVID-19 (e.g., atypical KAWASAKI SYNDROME; TOXIC SHOCK SYNDROME; pediatric multisystem inflammatory disease; and CYTOKINE STORM SYNDROME). A coronavirus, SARS-CoV-2, in the genus BETACORONAVIRUS is the causative agent.
TUI(s): T047
Aliases (abbreviated, total: 47):
         2019 Novel Coronavirus Infection, SARS-CoV-2 Disease, Human Coronavirus 2019 Infection, SARS-CoV-2 Infection, Disease caused by severe acute respiratory syndrome coronavirus 2 (disorder), Disease caused by SARS-CoV-2, 2019 nCoV Disease, 2019 Novel Coronavirus Disease, COVID-19 Virus Disease, Virus Disease, COVID-19

It also includes a small bug fix to the abbreviation detector.

Note: The models (e.g. en_core_sci_sm) are still labeled as version v0.5.1, as this release did not involve retraining the base models, only the entity linkers.

What's Changed

New Contributors

Full Changelog: v0.5.1...v0.5.2

Version 0.5.1

07 Sep 00:26
e30b8f4

Choose a tag to compare

Retrains the models with spacy 3.4.x to be compatible with the latest spacy version

Release v0.5.0

10 Mar 20:15
cc1a717

Choose a tag to compare

Updates scispacy to be compatiable with the latest spacy version (3.2.3)

Scispacy 0.4.0 - Compatible with Spacy 3

12 Feb 22:55
aad640f

Choose a tag to compare

This release of scispacy is compatible with Spacy 3. It also includes a new model 🥳 , en_core_sci_scibert, which uses scibert base uncased to do parsing and POS tagging (but not NER, yet. This will come in a later release).

Version 0.3.0

16 Oct 17:13
1b456f5

Choose a tag to compare

New Features

Hearst Patterns

This component implements Automatic Aquisition of Hyponyms from Large Text Corpora using the SpaCy Matcher component.

Passing extended=True to the HyponymDetector will use the extended set of hearst patterns, which include higher recall but lower precision hyponymy relations (e.g X compared to Y, X similar to Y, etc).

This component produces a doc level attribute on the spacy doc: doc._.hearst_patterns, which is a list containing tuples of extracted hyponym pairs. The tuples contain:

  • The relation rule used to extract the hyponym (type: str)
  • The more general concept (type: spacy.Span)
  • The more specific concept (type: spacy.Span)

Usage:

import spacy
from scispacy.hyponym_detector import HyponymDetector

nlp = spacy.load("en_core_sci_sm")
hyponym_pipe = HyponymDetector(nlp, extended=True)
nlp.add_pipe(hyponym_pipe, last=True)

doc = nlp("Keystone plant species such as fig trees are good for the soil.")

print(doc._.hearst_patterns)
>>> [('such_as', Keystone plant species, fig trees)]

Ontonotes Mixin: Clear Format > UD

Thanks to Yoav Goldberg for this fix! Yoav noticed that the dependency labels for the Onotonotes data use a different format than the converted GENIA Trees. Yoav wrote some scripts to convert between them, including normalising of some syntactic phenomena that were being treated inconsistently between the two corpora.

Bug Fixes

#252 - removed duplicated aliases in the entity linkers, reducing the size of the UMLS linker by ~10%
#249 - fix the path to the rxnorm linker