-
Notifications
You must be signed in to change notification settings - Fork 249
Closed
Labels
Description
steps to reproduce
some_string = "'A chemical combination brought about by the action of light, as in the formation of carbohydrates in living plants from the carbon di-oxid and water of the air under the influence of sunlight."
Scenario 1
import textacy
textacy.preprocess.preprocess_text(some_string ,
fix_unicode=True,
lowercase=False,
no_urls=False,
no_emails=False,
no_phone_numbers=False, no_numbers=False,
no_currency_symbols=False, no_punct=False,
no_contractions=False,
no_accents=False)
Result:
~/anaconda3/envs/py36-ml/lib/python3.6/site-packages/textacy/preprocess.py in preprocess_text(text, fix_unicode, lowercase, no_urls, no_emails, no_phone_numbers, no_numbers, no_currency_symbols, no_punct, no_contractions, no_accents)
246 text = text.lower()
247 # always normalize whitespace; treat linebreaks separately from spacing
--> 248 text = normalize_whitespace(text)
249
250 return text
~/anaconda3/envs/py36-ml/lib/python3.6/site-packages/textacy/preprocess.py in normalize_whitespace(text)
39 """
40 return constants.RE_NONBREAKING_SPACE.sub(
---> 41 " ", constants.RE_LINEBREAK.sub(r"\n", text)
42 ).strip()
43
TypeError: expected string or bytes-like object
Should fix_unicode be removed since it is no longer supported by textacy directly?
Scenario 2 (all false)
import textacy
textacy.preprocess.preprocess_text(some_string ,
fix_unicode=False,
lowercase=False,
no_urls=False,
no_emails=False,
no_phone_numbers=False, no_numbers=False,
no_currency_symbols=False, no_punct=False,
no_contractions=False,
no_accents=False)
Result:
'A chemical combination brought about by the action of light, as in the formation of carbohydrates in living plants from the carbon di-oxid and water of the air under the influence of sunlight.'
expected vs. actual behavior
"'a chemical combination brought about by the action of light as in the formation of carbohydrates in living plants from the carbon di oxid and water of the air under the influence of sunlight"
I know preprocess worked in 0.6.x
environment
- operating system: aws linux
- python version: 3.7
spacyversion: 2.1.4- installed
spacymodels: en_core_web_sm textacyversion: 0.7.0