Optimize character to index conversion for a 4% decrease on CONNL 2003 #1145

pommedeterresautee · 2019-09-23T10:56:52Z

Optimize the way padding is done by reducing the number of dictionary call.
Remove the conversion to UTF-8, and take care of old models in old format.
In the future, the conversion may be removed safely.

close #1133
@alanakbik finally I did both, change in dictionary deserialization and change inside the class itself.

CONNL 2003 from 24s to 23s (-4%)
French dataset from 12s to 11s (-8%)

pommedeterresautee · 2019-09-23T16:31:46Z

There was strange issues because of dictionary conversion, so I finally made a small version of the optimization, keeping things like they are :-( May be to investigate in the future

alanakbik · 2019-09-23T16:36:55Z

Ah ok I was just measuring speed differences and finding they are roughly the same. Is this because of the last changes?

pommedeterresautee · 2019-09-23T16:43:04Z

Here I still have 23s and 11s for CONNL and French dataset. May be it's a less than 1 sec change but with rounding it appears bigger?

alanakbik · 2019-09-23T16:46:45Z

Ok great - looks good! Thanks for all your help!

alanakbik · 2019-09-23T16:46:50Z

👍

yosipk · 2019-09-23T16:47:53Z

👍

pommedeterresautee added 16 commits September 22, 2019 00:52

avoid naive padding

89e5d97

fix

aea8edd

fix ref

400cedb

add new de encoding

229f330

add new de encoding

62253bb

fix save

6f4fb4e

default dict

6dc3766

can retrieve list of IDs

9958172

Merge remote-tracking branch 'upstream/master' into dict

adc8bcc

fix serialization of defaultdict

a514956

fix get_each_embedding when storage embedding is set to CPU

e0f9383

fix for other method

deacb64

black

6490e76

Merge branch 'fix_get_each_emb' into dict

fcf60d5

Merge remote-tracking branch 'upstream/master' into dict

eef80b0

simplification

dd8de0e

alanakbik mentioned this pull request Sep 23, 2019

fix get_each_embedding when storage embedding is set to CPU #1148

Merged

alanakbik merged commit 799c8dc into flairNLP:master Sep 23, 2019

pommedeterresautee deleted the dict branch September 23, 2019 16:50

alanakbik pushed a commit that referenced this pull request Oct 22, 2019

GH-1145: correct generate_text()

05e1ca3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimize character to index conversion for a 4% decrease on CONNL 2003 #1145

Optimize character to index conversion for a 4% decrease on CONNL 2003 #1145

Uh oh!

pommedeterresautee commented Sep 23, 2019 •

edited

Loading

Uh oh!

pommedeterresautee commented Sep 23, 2019

Uh oh!

alanakbik commented Sep 23, 2019

Uh oh!

pommedeterresautee commented Sep 23, 2019

Uh oh!

alanakbik commented Sep 23, 2019

Uh oh!

alanakbik commented Sep 23, 2019

Uh oh!

yosipk commented Sep 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Optimize character to index conversion for a 4% decrease on CONNL 2003 #1145

Optimize character to index conversion for a 4% decrease on CONNL 2003 #1145

Uh oh!

Conversation

pommedeterresautee commented Sep 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pommedeterresautee commented Sep 23, 2019

Uh oh!

alanakbik commented Sep 23, 2019

Uh oh!

pommedeterresautee commented Sep 23, 2019

Uh oh!

alanakbik commented Sep 23, 2019

Uh oh!

alanakbik commented Sep 23, 2019

Uh oh!

yosipk commented Sep 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pommedeterresautee commented Sep 23, 2019 •

edited

Loading