Daily Dataset Update via Python fails on first run each day

har_d_har1 Member
edited March 2023 in R & Python Tiles

I am using the Domo Dimensions Calendar as an input to a Python script that updates an dataset of the next 365 days as columns

|2023-01-27 | 2023-01-28 | 2023-01-29 | 2023-01-30 | ........ | 2024-01-27

So, when the script runs tomorrow the first column will be 2023-01-28, and so on.

It works, to a point. The first daily run of this script produces an error when running Domo's built-in write_dataframe code:

"Column referenced but not found: 2023-01-26" (The referenced column is yesterday's date).

I then open the ETL, run a preview, which fails.

Run a second preview, which succeeds.

My guess is that when updating the existing dataset the write_dataframe code notes the mismatch between the first columns in the output dataframe of the script and the output dataset of the ETL.

I am open to any ideas

Should/can I try to recursively update the dataset, using it as an input as well as an output?

# Import the domomagic package into the script 
from domomagic import *
import pandas as pd
import numpy as np
from datetime import *
# read data from inputs into a data frame
cal = read_dataframe('1 year')

# write your script here
# read data from inputs into a data frame
print('CAL first date value: ' + str(cal.iloc[0,0]))
print('CAL last date value: ' + str(cal.iloc[-1,0]))

# Convert the dt column from datetime to date by copying
# renaming and copying back again.
# have to do this as the column is called 'dt' and
# subsequent ETL flows expect this column name

# drop all columns except for dt
df = cal.loc[:, ['dt']]
# rename dt to DateTime
df.rename(columns = {'dt':'DateTime'}, inplace = True)
# Convert datetime to date
df['Date'] = pd.to_datetime(df['DateTime']).dt.date
# rename the new date column back to dt
df.rename(columns = {'Date':'dt'}, inplace = True)
# drop DateTime column, leaving dt column as date type
df.drop('DateTime', axis='columns', inplace=True)

# insert a blank value column, so that we have a values column for the coming pivot command
df['Value'] = 0

# pivot the new df so that dates appear as columns
temp_df = df.pivot(index=None, columns='dt', values='Value')

# drop all rows except the first
new_df = temp_df.drop(temp_df.index.to_list()[1:] ,axis = 0 )

# set all row values to 0
new_df[:] = 0

# to avoid array dict errors, set columns headers to string type
new_df.columns = new_df.columns.astype(str)

# write a data frame so it's available to the next action

Best Answer


  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Have you tried updating the dataset's schema before you upload the new dataset?


    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • har_d_har1

    @GrantSmith I have since realised that my code is changing the schema and that this is the cause of the error (duh). Rather than update the schema daily, which introduces more potential for errors (I am not a smart man) I have worked around this issue by sticking to a static start date of 1st January. Then the schema only needs to be updated once a year