Pydomo: Error with 'ds_get'

I am trying to read a domo dataset in Python through pydomo. I got an error about encoding. My data contains name/address which may have special characters. I manually downloaded the data to csv and read it in with read_csv and had the same issue initially. However, if I include 'encoding='latin1' in 'read_csv' parameter then it works. I wonder if there is a way to fix this with pydomo.

Please see below for my query.

Domo_Input_ID='xxxxxx'

df=domo.ds_get(Domo_Input_ID)

 'utf-8' codec can't decode byte 0xf0 in position 11128958: invalid continuation byte

Best Answers

  • DavidChurchman
    Answer ✓

    How important is it to keep the special characters in the dataset? If ds_get() can't change its encoding (which it seems like maybe it can't?) you could do some ETL before going to pydomo to strip out all the characters that don't work with UTF-8 encoding.

    I think you could regex replace this with blanks to remove all non-unicode characters: "[^\\u0000-\\u007f]+"

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    the ds_get function doesn't allow for any encoding other than the default utf8. Two options here:

    1. Override the Domo class and write your own ds_get function to handle the latin1 encoding you have
    2. Bypass using the ds_get and just call the dataset.data_export function to return the data, then call pandas read_csv on your csv string and pass in the encoding='latin1' as a parameter.

    Here's a modified ds_get function:

    def ds_get(self, dataset_id, encoding='utf-8'):
    """
    Export data to pandas Dataframe

    >>> df = domo.ds_get('80268aef-e6a1-44f6-a84c-f849d9db05fb')
    >>> print(df.head())

    :Parameters:
    - `dataset_id`: id of a dataset (str)

    :Returns:
    pandas dataframe
    """
    csv_download = self.datasets.data_export(dataset_id, include_csv_header=True)

    content = StringIO(csv_download)
    df = read_csv(content, encoding=encoding)

    # Convert to dates or datetimes if possible
    for col in df.columns:
    if df[col].dtype == 'object':
    try:
    df[col] = to_datetime(df[col])
    except ValueError:
    pass
    except TypeError:
    pass

    return df

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • @ellibot has some experience with Python. He may be able to help you.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • ellibot
    ellibot Contributor

    Not sure on this one actually!

    Help me @GrantSmith-wan Kenobi, you're my only hope!

  • DavidChurchman
    Answer ✓

    How important is it to keep the special characters in the dataset? If ds_get() can't change its encoding (which it seems like maybe it can't?) you could do some ETL before going to pydomo to strip out all the characters that don't work with UTF-8 encoding.

    I think you could regex replace this with blanks to remove all non-unicode characters: "[^\\u0000-\\u007f]+"

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.

  • Thank you all!

    @DavidChurchman could you please confirm if the following syntax looks right?

    REGEXP_REPLACE(`address`, "[^\\u0000-\\u007f]+", "")

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    the ds_get function doesn't allow for any encoding other than the default utf8. Two options here:

    1. Override the Domo class and write your own ds_get function to handle the latin1 encoding you have
    2. Bypass using the ds_get and just call the dataset.data_export function to return the data, then call pandas read_csv on your csv string and pass in the encoding='latin1' as a parameter.

    Here's a modified ds_get function:

    def ds_get(self, dataset_id, encoding='utf-8'):
    """
    Export data to pandas Dataframe

    >>> df = domo.ds_get('80268aef-e6a1-44f6-a84c-f849d9db05fb')
    >>> print(df.head())

    :Parameters:
    - `dataset_id`: id of a dataset (str)

    :Returns:
    pandas dataframe
    """
    csv_download = self.datasets.data_export(dataset_id, include_csv_header=True)

    content = StringIO(csv_download)
    df = read_csv(content, encoding=encoding)

    # Convert to dates or datetimes if possible
    for col in df.columns:
    if df[col].dtype == 'object':
    try:
    df[col] = to_datetime(df[col])
    except ValueError:
    pass
    except TypeError:
    pass

    return df

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**