"Column referenced but not found" Python error creating date-sensitive columns in ETL
Objective:
To take an existing dataset that shows event start and end dates, and a total value and parse that info into a daily value that appears in columns representing all dates between start and end.
Method:
Using Domo Dimensions Calendar, select only rows from current date to +2 years.
Use the result in a Python script that appends a new column to an existing table, one per calendar row. i.e. todays script will add columns '2022-12-21' to '2024-12-21' to the data table.
Problem:
However, the script regularly fails with "Column referenced but not found: 2022-12-09" (This is today's example) when writing the output dataframe.
I used to think that this was maybe a timezone problem (I am n GMT) and that the referenced column represented yesterdays date in my timezone, but today on the servers timezone.
However, today's error referenced a date 12 days ago!
This problem is not consistent though and tends not to occur in the afternoon GMT. Again, hinting at a timezone problem. I am now not so sure and need expert guidance as I am a Python noob
# Import the domomagic package into the script from domomagic import * import pandas as pd import numpy as np from datetime import * # read data from inputs into a data frame cal = read_dataframe('CAL.WHERE') eventdf = read_dataframe('EVENTS.APPEND') # for each value in the dt column of the calendar # create an array of length = number of calendar columns cols = [None] * cal.shape[0] # create an array of default values for each new column to be added to the dataframe vals = [0] * cal.shape[0] rng = range(cal.shape[0]) ind = [str(x) for x in rng] today = date.today() for i, row in cal.iterrows(): dt = row['dt'].date() cols[i]= str(dt) # now we need to turn our cols (columns) and vals (rows) into a dictionary # so that we can then in turn convert that into a new dataframe zip_iterator = zip(cols, vals) dictionary = dict(zip_iterator) # create a new empty dataframe from the dictionary new_df = pd.DataFrame(dictionary, index=ind) # create a new dataframe from our two input dataframes: df and new_df output_df = pd.concat([eventdf, new_df], axis=1, ignore_index=False) # iterate through each row in the dataframe to populate date columns between # start and end dates with the daily value rowcount = 0 skip = 0 for i, row in output_df.iterrows(): eventid = row["ID"] if type(eventid ) != "<class 'str'>": eventid = str(eventid) start_dt = row["Start Date"].date() end_dt = row["End Date"].date() # if either of the date values are NaT, then skip this row if pd.isnull(start_dt): print('Row ' + str(rowcount) + '. ' + eventid + " skipping as start date is null") skip = skip + 1 continue if pd.isnull(end_dt): print('Row ' + str(rowcount) + '. ' + eventid + " skipping as end date is null") skip = skip + 1 continue # we are only interested in future values, so ignore any dates prior to today if start_dt < today: print(eventid + " begins before today. Setting start date to today") start_dt = today # if the event ends before today, skip it if end_dt < today: print('Row ' + str(rowcount) + '. ' + eventid + " skipping as end date before today") skip = skip + 1 continue effort = row["Story Daily Effort"] dates = pd.date_range(start=start_dt, end=end_dt) #print(story_id + ': ' + str(start_dt) + ' - ' + str(end_dt)) col_nm = str(start_dt) j = output_df.columns.get_loc(col_nm) output_df.iloc[i, j:j+len(dates)] = effort rowcount = rowcount + 1 print("Skipped " + str(skip)) print("Successfully processed " + str(rowcount)) # write a data frame so it's available to the next action write_dataframe(output_df)
Best Answer
-
To workaround this issue and allow by ETL to complete, I have created a separate ETL dataflow just to produce a table of columns from the Domo Dimensions Calendar, which I then use as an Input DataSet to the above code, essentially removing rows 11-30 above.
0
Answers
-
To workaround this issue and allow by ETL to complete, I have created a separate ETL dataflow just to produce a table of columns from the Domo Dimensions Calendar, which I then use as an Input DataSet to the above code, essentially removing rows 11-30 above.
0
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.6K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 102 SQL DataFlows
- 626 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 753 Beast Mode
- 61 App Studio
- 41 Variables
- 692 Automate
- 177 Apps
- 456 APIs & Domo Developer
- 49 Workflows
- 10 DomoAI
- 38 Predict
- 16 Jupyter Workspaces
- 22 R & Python Tiles
- 398 Distribute
- 115 Domo Everywhere
- 276 Scheduled Reports
- 7 Software Integrations
- 130 Manage
- 127 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 11 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 110 Community Announcements
- 4.8K Archive