Code Engine

Code Engine

Solution Share: Using Python to Bulk Rename Columns and Convert Timestamps

Member
edited December 2024 in Code Engine

Hey Domo Fam! Wanted to share a success story using Magic ETL + Python scripting to solve a problem. My first use of a python tile!

Challenge

I'm building a Company Dimension table using Hubspot company data...all connector reports have technical column names (over 450 columns) and all date fields (50+ columns) are a mix of unixtimestamps, some with milliseconds, some without, some already as dates and yet others as timestamps. There are too many columns for a MySQL flow input (it has a limit), and managing hundreds of columns in a formula tile or select columns is unmanageable and too manual.

The objective is to rename all connector report columns to the proper UI label and convert all date or time related fields that are UNIXTIMESTAMPs over to a DATE type.

Here is the source Hubspot All Companies Dataset Preview from the connector for a few columns:

image.png

My first step was to find the Hubspot API endpoint that provided the mapping of the technical name to the UI label, which exists at this endpoint - https://api.hubapi.com/properties/v2/companies/properties.

I setup a basic JSON No Code Connector parse that data and it's looking good and provides the mapping, which doesn't always match up exactly by simply converting to Title Case or removing underscores, for example: hs_is_enriched on backend displays in Hubspot property as "Has been enriched"

image.png


Python Scripting Tile

Magic ETL was a great solution and python allowed me to make this mapping and the date conversions in bulk for hundreds of columns

image.png

Here is the first tile handling the column rename - a few lines of code with a mapping dictionary to rename 450 columns

  1. from domomagic import *
  2.  
  3. # Read data from inputs into data frames
  4. input1 = read_dataframe('Hubspot | All - Companies')
  5. input2 = read_dataframe('Hubspot API | Company Properties')
  6.  
  7. # Create name to label mapping dictionary
  8. mapping_dict = dict(zip(input2['name'], input2['label']))
  9.  
  10. # Rename columns where they exist in the mapping
  11. existing_cols = [col for col in input1.columns if col in mapping_dict]
  12. rename_dict = {col: mapping_dict[col] for col in existing_cols}
  13.  
  14. # Apply the renaming
  15. result = input1.rename(columns=rename_dict)
  16.  
  17. # Write the transformed dataframe to output
  18. write_dataframe(result)

And here is the second handling the date conversions. A bit more complex given the variety of inputs and column names, but nothing a little AI couldn't help me figure out.

  1. from domomagic import *
  2. import pandas as pd
  3.  
  4. # Read the input dataframe
  5. df = read_dataframe('Python Script - Bulk Column Rename')
  6.  
  7. # Find date columns
  8. timestamp_cols = [col for col in df.columns if 'date' in col.lower() or 'time' in col.lower()]
  9.  
  10. print("Processing timestamp columns:", timestamp_cols)
  11.  
  12. for col in timestamp_cols:
  13. try:
  14. # Skip empty or all-zero columns
  15. if df[col].isna().all() or (df[col] == 0).all():
  16. continue
  17. # If already in datetime string format
  18. if pd.api.types.is_string_dtype(df[col]):
  19. df[col] = pd.to_datetime(df[col], errors='coerce')
  20. continue
  21. # For numeric timestamps
  22. if pd.api.types.is_numeric_dtype(df[col]):
  23. # Get non-zero max value to determine timestamp type
  24. max_val = df[col][df[col] != 0].max()
  25. # 10-digit timestamp (seconds)
  26. if max_val < 10000000000:
  27. df[col] = pd.to_datetime(df[col], unit='s')
  28. # 13-digit timestamp (milliseconds)
  29. elif max_val < 10000000000000:
  30. df[col] = pd.to_datetime(df[col], unit='ms')
  31. # Larger timestamps (microseconds)
  32. else:
  33. df[col] = pd.to_datetime(df[col] / 1000, unit='ms')
  34. print(f"Converted {col}. Sample values:", df[col].head())
  35. except Exception as e:
  36. print(f"Error converting column {col}: {str(e)}")
  37. continue
  38.  
  39. # Write the final result
  40. write_dataframe(df)
Tagged:

Welcome!

It looks like you're new here. Members get access to exclusive content, events, rewards, and more. Sign in or register to get started.
Sign In