"Column referenced but not found" Python error creating date-sensitive columns in ETL
Objective:
To take an existing dataset that shows event start and end dates, and a total value and parse that info into a daily value that appears in columns representing all dates between start and end.
Method:
Using Domo Dimensions Calendar, select only rows from current date to +2 years.
Use the result in a Python script that appends a new column to an existing table, one per calendar row. i.e. todays script will add columns '2022-12-21' to '2024-12-21' to the data table.
Problem:
However, the script regularly fails with "Column referenced but not found: 2022-12-09" (This is today's example) when writing the output dataframe.
I used to think that this was maybe a timezone problem (I am n GMT) and that the referenced column represented yesterdays date in my timezone, but today on the servers timezone.
However, today's error referenced a date 12 days ago!
This problem is not consistent though and tends not to occur in the afternoon GMT. Again, hinting at a timezone problem. I am now not so sure and need expert guidance as I am a Python noob
# Import the domomagic package into the script from domomagic import * import pandas as pd import numpy as np from datetime import * # read data from inputs into a data frame cal = read_dataframe('CAL.WHERE') eventdf = read_dataframe('EVENTS.APPEND') # for each value in the dt column of the calendar # create an array of length = number of calendar columns cols = [None] * cal.shape[0] # create an array of default values for each new column to be added to the dataframe vals = [0] * cal.shape[0] rng = range(cal.shape[0]) ind = [str(x) for x in rng] today = date.today() for i, row in cal.iterrows(): dt = row['dt'].date() cols[i]= str(dt) # now we need to turn our cols (columns) and vals (rows) into a dictionary # so that we can then in turn convert that into a new dataframe zip_iterator = zip(cols, vals) dictionary = dict(zip_iterator) # create a new empty dataframe from the dictionary new_df = pd.DataFrame(dictionary, index=ind) # create a new dataframe from our two input dataframes: df and new_df output_df = pd.concat([eventdf, new_df], axis=1, ignore_index=False) # iterate through each row in the dataframe to populate date columns between # start and end dates with the daily value rowcount = 0 skip = 0 for i, row in output_df.iterrows(): eventid = row["ID"] if type(eventid ) != "<class 'str'>": eventid = str(eventid) start_dt = row["Start Date"].date() end_dt = row["End Date"].date() # if either of the date values are NaT, then skip this row if pd.isnull(start_dt): print('Row ' + str(rowcount) + '. ' + eventid + " skipping as start date is null") skip = skip + 1 continue if pd.isnull(end_dt): print('Row ' + str(rowcount) + '. ' + eventid + " skipping as end date is null") skip = skip + 1 continue # we are only interested in future values, so ignore any dates prior to today if start_dt < today: print(eventid + " begins before today. Setting start date to today") start_dt = today # if the event ends before today, skip it if end_dt < today: print('Row ' + str(rowcount) + '. ' + eventid + " skipping as end date before today") skip = skip + 1 continue effort = row["Story Daily Effort"] dates = pd.date_range(start=start_dt, end=end_dt) #print(story_id + ': ' + str(start_dt) + ' - ' + str(end_dt)) col_nm = str(start_dt) j = output_df.columns.get_loc(col_nm) output_df.iloc[i, j:j+len(dates)] = effort rowcount = rowcount + 1 print("Skipped " + str(skip)) print("Successfully processed " + str(rowcount)) # write a data frame so it's available to the next action write_dataframe(output_df)
Best Answer
-
To workaround this issue and allow by ETL to complete, I have created a separate ETL dataflow just to produce a table of columns from the Domo Dimensions Calendar, which I then use as an Input DataSet to the above code, essentially removing rows 11-30 above.
0
Answers
-
To workaround this issue and allow by ETL to complete, I have created a separate ETL dataflow just to produce a table of columns from the Domo Dimensions Calendar, which I then use as an Input DataSet to the above code, essentially removing rows 11-30 above.
0
Categories
- All Categories
- 1.7K Product Ideas
- 1.7K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 295 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.8K Transform
- 97 SQL DataFlows
- 608 Datasets
- 2.1K Magic ETL
- 3.8K Visualize
- 2.5K Charting
- 712 Beast Mode
- 50 App Studio
- 39 Variables
- 668 Automate
- 170 Apps
- 446 APIs & Domo Developer
- 45 Workflows
- 7 DomoAI
- 34 Predict
- 14 Jupyter Workspaces
- 20 R & Python Tiles
- 391 Distribute
- 111 Domo Everywhere
- 274 Scheduled Reports
- 6 Software Integrations
- 116 Manage
- 113 Governance & Security
- Domo Community Gallery
- 31 Product Releases
- 9 Domo University
- 5.3K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 104 Community Announcements
- 4.8K Archive