Add basic text data augmentation functionality #268

bdewilde · 2019-08-26T03:07:07Z

Description

Add a sub-package (textacy.augmentation) for basic text data augmentation; implements several transformations suitable for use in text classification tasks, with a higher-level function textacy.augmentation.apply() to call them
- random synonym replacement
- random synonym insertion
- random item swapping
- random item deletion
- random sentence shuffling
- Note: This code is provisional, and the API will almost definitely be changing.

Motivation and Context

I've been training spaCy TextCategorizer s on datasets that are too small, and data augmentation is a great way to improve model performance.

How Has This Been Tested?

Lots of manual validation and trial-by-error. Wrote some tests, and they pass (mostly...).

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
[TODO] My change requires a change to the documentation, and I have updated it accordingly.

these are slightly non-deterministic, yikes

bdewilde added 4 commits August 22, 2019 22:44

Add first pass on text augmentation functions

4a1a228

Improve transforms handling for ws and null syns

eaaf2ba

Add docs to data augmentation funcs

51b016e

Add tests for data augmentation

7f052ed

these are slightly non-deterministic, yikes

bdewilde mentioned this pull request Aug 26, 2019

other actions for Retokenizer? explosion/spaCy#4128

Closed

Clarify transformation.apply arg docs

78df2a0

bdewilde merged commit fd3c059 into develop Aug 26, 2019

bdewilde deleted the feature/implement-data-augmentation branch August 26, 2019 20:07

bdewilde mentioned this pull request Aug 30, 2019

Improve data augmentation functionality #269

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add basic text data augmentation functionality #268

Add basic text data augmentation functionality #268

Uh oh!

bdewilde commented Aug 26, 2019 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add basic text data augmentation functionality #268

Add basic text data augmentation functionality #268

Uh oh!

Conversation

bdewilde commented Aug 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bdewilde commented Aug 26, 2019 •

edited

Loading