Fix normalization for the keyword extractor YAKE #332

mirkolenz · 2021-04-17T12:17:32Z

Description

Add the normalization option 'normalize="norm"' for YAKE and change the behavior of the option 'normlize=None' to return the attribute 'orth' of the token.

Motivation and Context

The documentation says that setting 'normalize=None' for YAKE returns the terms as they appear in the original document. Currently however, the attribute 'norm' of the token is returned, which can be different from the original representation (e.g., the token 'centres' would be extracted as 'centers'). Thus, I make use of the attribute 'orth' when setting 'normalize=None'. The same attribute is also used in the TextRank algorithm. Additionally, I added the option 'normalize="norm"' s.t. the current behavior can still be used.

How Has This Been Tested?

I added the corresponding tests in tests/extract/keyterms/test_yake.py.

Screenshots (if appropriate):

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation, and I have updated it accordingly.

mirkolenz · 2021-05-16T19:12:02Z

I just updated this PR to remove two assertions that would not universally hold (thus causing some tests to fail).

bdewilde · 2021-05-31T16:41:29Z

Hi @mirkolenz , thanks very much for catching and fixing this! Everything looks good. Since technically it's changing functionality, I'm going to point it at the develop branch (instead of master) and then merge it in. Thanks again.

Scratch that, GitHub gave me a strange warning about "losing commits" if I switch the base branch to develop, so we're just going to roll right into master. It'll probably be fine... 😅

mirkolenz added 2 commits April 17, 2021 14:08

Fix normalization for the keyword extractor YAKE

aedd79b

Remove wrong assertions

fde5e84

bdewilde merged commit c950fe0 into chartbeat-labs:master May 31, 2021

mirkolenz deleted the fix-yake-normalization branch May 31, 2021 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix normalization for the keyword extractor YAKE #332

Fix normalization for the keyword extractor YAKE #332

Uh oh!

mirkolenz commented Apr 17, 2021 •

edited

Loading

Uh oh!

mirkolenz commented May 16, 2021

Uh oh!

bdewilde commented May 31, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix normalization for the keyword extractor YAKE #332

Fix normalization for the keyword extractor YAKE #332

Uh oh!

Conversation

mirkolenz commented Apr 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

Uh oh!

mirkolenz commented May 16, 2021

Uh oh!

bdewilde commented May 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mirkolenz commented Apr 17, 2021 •

edited

Loading

bdewilde commented May 31, 2021 •

edited

Loading