APIs & Domo Developer

APIs & Domo Developer

Pydomo: Error with 'ds_get'

I am trying to read a domo dataset in Python through pydomo. I got an error about encoding. My data contains name/address which may have special characters. I manually downloaded the data to csv and read it in with read_csv and had the same issue initially. However, if I include 'encoding='latin1' in 'read_csv' parameter then it works. I wonder if there is a way to fix this with pydomo.

Please see below for my query.

Domo_Input_ID='xxxxxx'

df=domo.ds_get(Domo_Input_ID)

  1. 'utf-8' codec can't decode byte 0xf0 in position 11128958: invalid continuation byte

Welcome!

It looks like you're new here. Members get access to exclusive content, events, rewards, and more. Sign in or register to get started.
Sign In

Best Answers

  • Answer ✓

    How important is it to keep the special characters in the dataset? If ds_get() can't change its encoding (which it seems like maybe it can't?) you could do some ETL before going to pydomo to strip out all the characters that don't work with UTF-8 encoding.

    I think you could regex replace this with blanks to remove all non-unicode characters: "[^\\u0000-\\u007f]+"

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.

  • Coach
    Answer ✓

    the ds_get function doesn't allow for any encoding other than the default utf8. Two options here:

    1. Override the Domo class and write your own ds_get function to handle the latin1 encoding you have
    2. Bypass using the ds_get and just call the dataset.data_export function to return the data, then call pandas read_csv on your csv string and pass in the encoding='latin1' as a parameter.

    Here's a modified ds_get function:

    1. def ds_get(self, dataset_id, encoding='utf-8'):
    2. """
    3. Export data to pandas Dataframe

    4. >>> df = domo.ds_get('80268aef-e6a1-44f6-a84c-f849d9db05fb')
    5. >>> print(df.head())

    6. :Parameters:
    7. - `dataset_id`: id of a dataset (str)

    8. :Returns:
    9. pandas dataframe
    10. """
    11. csv_download = self.datasets.data_export(dataset_id, include_csv_header=True)

    12. content = StringIO(csv_download)
    13. df = read_csv(content, encoding=encoding)

    14. # Convert to dates or datetimes if possible
    15. for col in df.columns:
    16. if df[col].dtype == 'object':
    17. try:
    18. df[col] = to_datetime(df[col])
    19. except ValueError:
    20. pass
    21. except TypeError:
    22. pass

    23. return df
    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • @ellibot has some experience with Python. He may be able to help you.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • Contributor

    Not sure on this one actually!

    Help me @GrantSmith-wan Kenobi, you're my only hope!

  • Answer ✓

    How important is it to keep the special characters in the dataset? If ds_get() can't change its encoding (which it seems like maybe it can't?) you could do some ETL before going to pydomo to strip out all the characters that don't work with UTF-8 encoding.

    I think you could regex replace this with blanks to remove all non-unicode characters: "[^\\u0000-\\u007f]+"

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.

  • Thank you all!

    @DavidChurchman could you please confirm if the following syntax looks right?

    REGEXP_REPLACE(`address`, "[^\\u0000-\\u007f]+", "")

  • Coach
    Answer ✓

    the ds_get function doesn't allow for any encoding other than the default utf8. Two options here:

    1. Override the Domo class and write your own ds_get function to handle the latin1 encoding you have
    2. Bypass using the ds_get and just call the dataset.data_export function to return the data, then call pandas read_csv on your csv string and pass in the encoding='latin1' as a parameter.

    Here's a modified ds_get function:

    1. def ds_get(self, dataset_id, encoding='utf-8'):
    2. """
    3. Export data to pandas Dataframe

    4. >>> df = domo.ds_get('80268aef-e6a1-44f6-a84c-f849d9db05fb')
    5. >>> print(df.head())

    6. :Parameters:
    7. - `dataset_id`: id of a dataset (str)

    8. :Returns:
    9. pandas dataframe
    10. """
    11. csv_download = self.datasets.data_export(dataset_id, include_csv_header=True)

    12. content = StringIO(csv_download)
    13. df = read_csv(content, encoding=encoding)

    14. # Convert to dates or datetimes if possible
    15. for col in df.columns:
    16. if df[col].dtype == 'object':
    17. try:
    18. df[col] = to_datetime(df[col])
    19. except ValueError:
    20. pass
    21. except TypeError:
    22. pass

    23. return df
    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Welcome!

It looks like you're new here. Members get access to exclusive content, events, rewards, and more. Sign in or register to get started.
Sign In