Data format from Jupyter to Dataset

thinh_dao
thinh_dao Member
edited March 2023 in Jupyter Workspaces

I'm facing the issue of transforming the dataframe to domo dataset.

I have cleaned dataset, merge, join etc and make my dataframe beatifull, then the issue happen when I export by the domo.write_dataframe, the domo dataset change the the format of dataframe and it definitely not what I want.

What a waste if we have to transform data again, right?

Anyone facing or have solution for this will be appreciated.

Here is the screen shot for some fields.


Comments

  • it's unclear what your output from (presumably) magic etl scripting tiles looks like. can you send intermediate screenshots?

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • Hi jaeW_at_Onyx,

    I mentioned jupyter workplace, which is Python. And the result as the picture I uploaded

  • I was having a similar issue with a date column that changed to text when I wrote the data frame to a dataset in Domo. I played with the different dtypes, and found one that did stay as a date after moving to Domo (though the timezone changed, so it ended up being a few hours different than what I had in Pandas). Domo doesn't have an int data type, so maybe try changing it to a float to see if gives you something numeric once it's in Domo?

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.

  • RobB
    RobB Domo Employee
    edited October 2023

    I realize this is late to the game, but this is a good discussion that hasn't ever been answered.

    What I noticed in @thinh_dao's OP is the datatype for the examples that failed to convert. They're not all lower-case. Int64 and Float64 are used instead of the lower-case equivalents. The best practice for data types is to use all lower-case.

    I tested this for float64 and sure enough, if I type my data as float64 in a python dataframe, it will output get as float data type when using the domojupyter.write_dataframe() method. However, the use of Float64 returns a string data type. I repeated this test for int64 vs. Int64. In this case both worked. We are 13 months from the OP to now. I suspect in that time code change made both variations of int64 compatible. float64 remains compatible only to the lower-case variation.

    I would recommend sticking with lower-case data types regardless of how Python handles them.

    Regarding Date Types

    @DavidChurchman is spot on about the date type not working. However, this is a Python/Pandas/NumPy limitation. If you have a column with with a timestamp or a date, and you apply a method to convert to date only, it will convert an either a datetype type or an object type depending on the method used. There is no date only type for dataframes either using Pandas or NumPy.

    See https://numpy.org/doc/stable/reference/arrays.dtypes.html for details. Since Pandas gets its data types from NumPy data types this should apply to both.

    I tested this when I first learned of it. My personal incredulity got the best of me:

    The above example is using domojupyter to load a federated dataset. The column, REQUISITIONING_DATE, comes in as a timestamp. We attempt date conversion using three methods: the pandas method, pd.to_datetime(), the date method from the datetime library, and the somewhat newer datetime.datetime.fromisoformat()

    The first return output is the original column that comes in as dateime64.

    The second output used pd.to_datetime(). It specified the format of the column data time detail. It returns a datetime type. The time will default to midnight in the data.

    The dt.date method is added to the end of the column in the next example. It attempts to force a date only. It will function in Python as a date, but the dataframe date type is an object which is the equivalent of a string in Domo.

    Our last example uses the datetime.date.fromisoformat. Older versions of Python use datetime.datetime.strptime() to the same effect. The effect is the same as above. The dataframe data type is a string.

    While you can use dates columns within Pandas and NumPy, they do not have an actual date type.

    Add'l reference: https://stackoverflow.com/questions/29245848/what-are-all-the-dtypes-that-pandas-recognizes

  • Super helpful, @RobB !

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.