-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Optimize character to index conversion for a 4% decrease on CONNL 2003 #1145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
There was strange issues because of dictionary conversion, so I finally made a small version of the optimization, keeping things like they are :-( May be to investigate in the future |
|
Ah ok I was just measuring speed differences and finding they are roughly the same. Is this because of the last changes? |
|
Here I still have 23s and 11s for CONNL and French dataset. May be it's a less than 1 sec change but with rounding it appears bigger? |
|
Ok great - looks good! Thanks for all your help! |
|
👍 |
1 similar comment
|
👍 |
Optimize the way padding is done by reducing the number of dictionary call.
Remove the conversion to UTF-8, and take care of old models in old format.
In the future, the conversion may be removed safely.
close #1133
@alanakbik finally I did both, change in dictionary deserialization and change inside the class itself.
CONNL 2003 from 24s to 23s (-4%)
French dataset from 12s to 11s (-8%)