-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
Error ReportingIncorrect or improved errors from pandasIncorrect or improved errors from pandasIO Stataread_stata, to_stataread_stata, to_stata
Milestone
Description
Code Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame({'a': ['a', None]})
df.to_stata('test.dta')
df = pd.DataFrame({'a': [None, 'a']})
df.to_stata('test.dta')
df = pd.DataFrame({'a': [None, None]})
df.to_stata('test.dta')
# ValueError: Writing general object arrays is not supportedProblem description
The df.to_stata() method writes columns containing None without error when there is at least one string value in the column, but fails if the column contains only None. It's unclear what data type to write a column of None as, so maybe that's why this isn't supported? I would propose that a column with values of only None be written as str1 with empty strings.
I came across this error because I read in a Parquet file with pd.read_parquet() and was unable to write the file to Stata format. In the Parquet schema, the column had type BYTE_ARRAY UTF8, but since the column had only missing values, it was read into Pandas as only None.
Expected Output
Stata file written to disk with missing values for the column with None.
Output of pd.show_versions()
Details
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-38-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 3.3.2
pip: 18.0
setuptools: 40.0.0
Cython: 0.28.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: 0.11.1
xarray: None
IPython: 6.5.0
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.1.14
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
Metadata
Metadata
Assignees
Labels
Error ReportingIncorrect or improved errors from pandasIncorrect or improved errors from pandasIO Stataread_stata, to_stataread_stata, to_stata