Pydomo: Error with 'ds_get'
I am trying to read a domo dataset in Python through pydomo. I got an error about encoding. My data contains name/address which may have special characters. I manually downloaded the data to csv and read it in with read_csv and had the same issue initially. However, if I include 'encoding='latin1' in 'read_csv' parameter then it works. I wonder if there is a way to fix this with pydomo.
Please see below for my query.
Domo_Input_ID='xxxxxx'
df=domo.ds_get(Domo_Input_ID)
'utf-8' codec can't decode byte 0xf0 in position 11128958: invalid continuation byte
Best Answers
-
How important is it to keep the special characters in the dataset? If ds_get() can't change its encoding (which it seems like maybe it can't?) you could do some ETL before going to pydomo to strip out all the characters that don't work with UTF-8 encoding.
I think you could regex replace this with blanks to remove all non-unicode characters: "[^\\u0000-\\u007f]+"
Please 💡/💖/👍/😊 this post if you read it and found it helpful.
Please accept the answer if it solved your problem.
1 -
the ds_get function doesn't allow for any encoding other than the default utf8. Two options here:
- Override the Domo class and write your own ds_get function to handle the latin1 encoding you have
- Bypass using the ds_get and just call the dataset.data_export function to return the data, then call pandas read_csv on your csv string and pass in the encoding='latin1' as a parameter.
Here's a modified ds_get function:
def ds_get(self, dataset_id, encoding='utf-8'):
"""
Export data to pandas Dataframe
>>> df = domo.ds_get('80268aef-e6a1-44f6-a84c-f849d9db05fb')
>>> print(df.head())
:Parameters:
- `dataset_id`: id of a dataset (str)
:Returns:
pandas dataframe
"""
csv_download = self.datasets.data_export(dataset_id, include_csv_header=True)
content = StringIO(csv_download)
df = read_csv(content, encoding=encoding)
# Convert to dates or datetimes if possible
for col in df.columns:
if df[col].dtype == 'object':
try:
df[col] = to_datetime(df[col])
except ValueError:
pass
except TypeError:
pass
return df**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**2
Answers
-
@ellibot has some experience with Python. He may be able to help you.
**Check out my Domo Tips & Tricks Videos
**Make sure to any users posts that helped you.
**Please mark as accepted the ones who solved your issue.1 -
Not sure on this one actually!
Help me @GrantSmith-wan Kenobi, you're my only hope!2 -
How important is it to keep the special characters in the dataset? If ds_get() can't change its encoding (which it seems like maybe it can't?) you could do some ETL before going to pydomo to strip out all the characters that don't work with UTF-8 encoding.
I think you could regex replace this with blanks to remove all non-unicode characters: "[^\\u0000-\\u007f]+"
Please 💡/💖/👍/😊 this post if you read it and found it helpful.
Please accept the answer if it solved your problem.
1 -
Thank you all!
@DavidChurchman could you please confirm if the following syntax looks right?
REGEXP_REPLACE
(`address`,
"[^\\u0000-\\u007f]+", "")
0 -
the ds_get function doesn't allow for any encoding other than the default utf8. Two options here:
- Override the Domo class and write your own ds_get function to handle the latin1 encoding you have
- Bypass using the ds_get and just call the dataset.data_export function to return the data, then call pandas read_csv on your csv string and pass in the encoding='latin1' as a parameter.
Here's a modified ds_get function:
def ds_get(self, dataset_id, encoding='utf-8'):
"""
Export data to pandas Dataframe
>>> df = domo.ds_get('80268aef-e6a1-44f6-a84c-f849d9db05fb')
>>> print(df.head())
:Parameters:
- `dataset_id`: id of a dataset (str)
:Returns:
pandas dataframe
"""
csv_download = self.datasets.data_export(dataset_id, include_csv_header=True)
content = StringIO(csv_download)
df = read_csv(content, encoding=encoding)
# Convert to dates or datetimes if possible
for col in df.columns:
if df[col].dtype == 'object':
try:
df[col] = to_datetime(df[col])
except ValueError:
pass
except TypeError:
pass
return df**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**2
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.6K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 102 SQL DataFlows
- 627 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 755 Beast Mode
- 61 App Studio
- 41 Variables
- 693 Automate
- 178 Apps
- 456 APIs & Domo Developer
- 49 Workflows
- 10 DomoAI
- 38 Predict
- 16 Jupyter Workspaces
- 22 R & Python Tiles
- 398 Distribute
- 115 Domo Everywhere
- 276 Scheduled Reports
- 7 Software Integrations
- 130 Manage
- 127 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 11 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 110 Community Announcements
- 4.8K Archive