Skip to content

Conversation

@mirkolenz
Copy link
Contributor

@mirkolenz mirkolenz commented Apr 17, 2021

Description

Add the normalization option 'normalize="norm"' for YAKE and change the behavior of the option 'normlize=None' to return the attribute 'orth' of the token.

Motivation and Context

The documentation says that setting 'normalize=None' for YAKE returns the terms as they appear in the original document. Currently however, the attribute 'norm' of the token is returned, which can be different from the original representation (e.g., the token 'centres' would be extracted as 'centers'). Thus, I make use of the attribute 'orth' when setting 'normalize=None'. The same attribute is also used in the TextRank algorithm. Additionally, I added the option 'normalize="norm"' s.t. the current behavior can still be used.

How Has This Been Tested?

I added the corresponding tests in tests/extract/keyterms/test_yake.py.

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation, and I have updated it accordingly.

@mirkolenz
Copy link
Contributor Author

I just updated this PR to remove two assertions that would not universally hold (thus causing some tests to fail).

@bdewilde
Copy link
Collaborator

bdewilde commented May 31, 2021

Hi @mirkolenz , thanks very much for catching and fixing this! Everything looks good. Since technically it's changing functionality, I'm going to point it at the develop branch (instead of master) and then merge it in. Thanks again.

Scratch that, GitHub gave me a strange warning about "losing commits" if I switch the base branch to develop, so we're just going to roll right into master. It'll probably be fine... 😅

@bdewilde bdewilde merged commit c950fe0 into chartbeat-labs:master May 31, 2021
@mirkolenz mirkolenz deleted the fix-yake-normalization branch May 31, 2021 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants