-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Bf/combine transformer embeddings #2558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bf/combine transformer embeddings #2558
Conversation
6de0b1e to
3803b00
Compare
0a84cf3 to
12f78cb
Compare
alanakbik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! Still testing, but found an error that appears with the following code:
embeddings = TransformerWordEmbeddings(model='xlm-roberta-base',
layers="-1",
subtoken_pooling="first",
fine_tune=True,
use_context=False,
)
text = "."
sentence = Sentence(text)
embeddings.embed(sentence)Suggestion to solve this (I think) added in-line.
alanakbik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for refactoring this!
|
@helpmefindaname I found another error. It seems the fix for the previous error now broke sentences that are too long (over 512 subtokens). Reproducible with this script: from flair.data import Sentence
from flair.embeddings import TransformerWordEmbeddings
# example transformer embeddings
embeddings = TransformerWordEmbeddings(model='distilbert-base-uncased')
# create sentence with more than 512 subtokens
long_sentence = Sentence('a ' * 513)
# embed
embeddings.embed(long_sentence)Throws the same assertion error as previously, i.e.: File ".../flair/flair/embeddings/base.py", line 769, in _add_embeddings_internal
self._add_embeddings_to_sentences(expanded_sentences)
File ".../flair/flair/embeddings/base.py", line 728, in _add_embeddings_to_sentences
self._extract_token_embeddings(sentence_hidden_states, sentences, all_token_subtoken_lengths)
File ".../flair/flair/embeddings/base.py", line 656, in _extract_token_embeddings
assert subword_start_idx < subword_end_idx <= sentence_hidden_state.size()[1]
AssertionErrorAny ideas how to fix this? |
|
Hi, |
creates a transformer embedding that combines both
TransformerWordEmbeddingandTransformerDocumentEmbeddingit should be able to:
maxormean)current state: