Skip to content

Commit e17ab12

Browse files
authored
Merge pull request #3507 from flairNLP/release-0.14.0
Bump version numbers for Flair release 0.14.0
2 parents 832f56e + c81596c commit e17ab12

File tree

16 files changed

+37
-80
lines changed

16 files changed

+37
-80
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ document embeddings, including our proposed [Flair embeddings](https://www.aclwe
2323
* **A PyTorch NLP framework.** Our framework builds directly on [PyTorch](https://pytorch.org/), making it easy to
2424
train your own models and experiment with new approaches using Flair embeddings and classes.
2525

26-
Now at [version 0.13.1](https://github.com/flairNLP/flair/releases)!
26+
Now at [version 0.14.0](https://github.com/flairNLP/flair/releases)!
2727

2828

2929
## State-of-the-Art Models
@@ -127,6 +127,7 @@ In particular:
127127
- [Tutorial 1: Basic tagging](https://flairnlp.github.io/docs/category/tutorial-1-basic-tagging) → how to tag your text
128128
- [Tutorial 2: Training models](https://flairnlp.github.io/docs/category/tutorial-2-training-models) → how to train your own state-of-the-art NLP models
129129
- [Tutorial 3: Embeddings](https://flairnlp.github.io/docs/category/tutorial-3-embeddings) → how to produce embeddings for words and documents
130+
- [Tutorial 4: Biomedical text](https://flairnlp.github.io/docs/category/tutorial-4-biomedical-text) → how to analyse biomedical text data
130131

131132
There is also a dedicated landing page for our [biomedical NER and datasets](/resources/docs/HUNFLAIR.md) with
132133
installation instructions and tutorials.

docs/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
# -- Project information -----------------------------------------------------
66
from sphinx_github_style import get_linkcode_resolve
77

8-
version = "0.13.1"
9-
release = "0.13.1"
8+
version = "0.14.0"
9+
release = "0.14.0"
1010
project = "flair"
1111
author = importlib_metadata.metadata(project)["Author"]
1212
copyright = f"2023 {author}"

docs/tutorial/tutorial-basics/entity-mention-linking.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ sentence = Sentence(
2323
ner_tagger = Classifier.load("hunflair2")
2424
ner_tagger.predict(sentence)
2525

26-
nen_tagger = EntityMentionLinker.load("disease-linker-no-ab3p")
26+
nen_tagger = EntityMentionLinker.load("disease-linker")
2727
nen_tagger.predict(sentence)
2828

2929
for tag in sentence.get_labels():

docs/tutorial/tutorial-basics/other-models.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,6 @@ We end this section with a list of all other models we currently ship with Flair
145145
| '[frame](https://huggingface.co/flair/frame-english)' | Frame Detection | English | Propbank 3.0 | **97.54** (F1) |
146146
| '[frame-fast](https://huggingface.co/flair/frame-english-fast)' | Frame Detection | English | Propbank 3.0 | **97.31** (F1) | (fast model)
147147
| 'negation-speculation' | Negation / speculation |English | Bioscope | **80.2** (F1) |
148-
| 'communicative-functions' | detecting function of sentence in research paper (BETA) | English| scholarly papers | |
149148
| 'de-historic-indirect' | historical indirect speech | German | @redewiedergabe project | **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
150149
| 'de-historic-direct' | historical direct speech | German | @redewiedergabe project | **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
151150
| 'de-historic-reported' | historical reported speech | German | @redewiedergabe project | **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |

docs/tutorial/tutorial-basics/part-of-speech-tagging.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ tagger.predict(sentence)
105105
print(sentence)
106106
```
107107

108-
## Tagging universal parts-of-speech (uPoS)​
108+
## Tagging parts-of-speech in any language
109109

110110
Universal parts-of-speech are a set of minimal syntactic units that exist across languages. For instance, most languages
111111
will have VERBs or NOUNs.

docs/tutorial/tutorial-basics/tagging-entities.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This tutorials shows you how to do named entity recognition, showcases various NER models, and provides a full list of all NER models in Flair.
44

5-
## Tagging entities with our standard model
5+
## Tagging entities with our standard model
66

77
Our standard model uses Flair embeddings and was trained over the English CoNLL-03 task and can recognize 4 different entity types. It offers a good tradeoff between accuracy and speed.
88

@@ -32,7 +32,7 @@ Sentence: "George Washington went to Washington ." → ["George Washington"/PER,
3232

3333
The printout tells us that two entities are labeled in this sentence: "George Washington" as PER (person) and "Washington" as LOC (location).
3434

35-
## Tagging entities with our best model
35+
## Tagging entities with our best model
3636

3737
Our best 4-class model is trained using a very large transformer. Use it if accuracy is the most important to you, and speed/memory not so much.
3838

docs/tutorial/tutorial-basics/tagging-sentiment.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This tutorials shows you how to do sentiment analysis in Flair.
44

5-
## Tagging sentiment with our standard model
5+
## Tagging sentiment with our standard model
66

77
Our standard sentiment analysis model uses distilBERT embeddings and was trained over a mix of corpora, notably
88
the Amazon review corpus, and can thus handle a variety of domains and language.

flair/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
device = torch.device("cpu")
3535

3636
# global variable: version
37-
__version__ = "0.13.1"
37+
__version__ = "0.14.0"
3838
"""The current version of the flair library installed."""
3939

4040
# global variable: arrow symbol

flair/datasets/treebanks.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -122,11 +122,12 @@ def __getitem__(self, index: int = 0) -> Sentence:
122122
else:
123123
with open(str(self.path_to_conll_file), encoding="utf-8") as file:
124124
file.seek(self.indices[index])
125-
sentence = self._read_next_sentence(file)
125+
sentence_or_none = self._read_next_sentence(file)
126+
sentence = sentence_or_none if isinstance(sentence_or_none, Sentence) else Sentence("")
126127

127128
return sentence
128129

129-
def _read_next_sentence(self, file) -> Sentence:
130+
def _read_next_sentence(self, file) -> Optional[Sentence]:
130131
line = file.readline()
131132
tokens: List[Token] = []
132133

@@ -139,13 +140,15 @@ def _read_next_sentence(self, file) -> Sentence:
139140
current_multiword_first_token = 0
140141
current_multiword_last_token = 0
141142

143+
newline_reached = False
142144
while line:
143145
line = line.strip()
144146
fields: List[str] = re.split("\t+", line)
145147

146148
# end of sentence
147149
if line == "":
148150
if len(tokens) > 0:
151+
newline_reached = True
149152
break
150153

151154
# comments or ellipsis
@@ -205,20 +208,18 @@ def _read_next_sentence(self, file) -> Sentence:
205208
if token_idx <= current_multiword_last_token:
206209
current_multiword_sequence += token.text
207210

208-
# print(token)
209-
# print(current_multiword_last_token)
210-
# print(current_multiword_first_token)
211211
# if multi-word equals component tokens, there should be no whitespace
212212
if token_idx == current_multiword_last_token and current_multiword_sequence == current_multiword_text:
213213
# go through all tokens in subword and set whitespace_after information
214214
for i in range(current_multiword_last_token - current_multiword_first_token):
215-
# print(i)
216215
tokens[-(i + 1)].whitespace_after = 0
217216
tokens.append(token)
218217

219218
line = file.readline()
220219

221-
return Sentence(tokens)
220+
if newline_reached or len(tokens) > 0:
221+
return Sentence(tokens)
222+
return None
222223

223224

224225
class UD_ENGLISH(UniversalDependenciesCorpus):

flair/models/language_model.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ def initialize(matrix):
189189

190190
@classmethod
191191
def load_language_model(cls, model_file: Union[Path, str], has_decoder=True):
192-
state = torch.load(str(model_file), map_location=flair.device)
192+
state = torch.load(str(model_file), map_location=flair.device, weights_only=False)
193193

194194
document_delimiter = state.get("document_delimiter", "\n")
195195
has_decoder = state.get("has_decoder", True) and has_decoder
@@ -213,7 +213,7 @@ def load_language_model(cls, model_file: Union[Path, str], has_decoder=True):
213213

214214
@classmethod
215215
def load_checkpoint(cls, model_file: Union[Path, str]):
216-
state = torch.load(str(model_file), map_location=flair.device)
216+
state = torch.load(str(model_file), map_location=flair.device, weights_only=False)
217217

218218
epoch = state.get("epoch")
219219
split = state.get("split")

0 commit comments

Comments
 (0)