Skip to content

Conversation

@bdewilde
Copy link
Collaborator

@bdewilde bdewilde commented Aug 26, 2019

Description

  • Add a sub-package (textacy.augmentation) for basic text data augmentation; implements several transformations suitable for use in text classification tasks, with a higher-level function textacy.augmentation.apply() to call them
    • random synonym replacement
    • random synonym insertion
    • random item swapping
    • random item deletion
    • random sentence shuffling
    • Note: This code is provisional, and the API will almost definitely be changing.

Motivation and Context

I've been training spaCy TextCategorizer s on datasets that are too small, and data augmentation is a great way to improve model performance.

How Has This Been Tested?

Lots of manual validation and trial-by-error. Wrote some tests, and they pass (mostly...).

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • [TODO] My change requires a change to the documentation, and I have updated it accordingly.

@bdewilde bdewilde merged commit fd3c059 into develop Aug 26, 2019
@bdewilde bdewilde deleted the feature/implement-data-augmentation branch August 26, 2019 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants