Solution Share: Using Python to Bulk Rename Columns and Convert Timestamps

jacob_folsom
jacob_folsom Member
edited December 18 in R & Python Tiles

Hey Domo Fam! Wanted to share a success story using Magic ETL + Python scripting to solve a problem. My first use of a python tile!

Challenge

I'm building a Company Dimension table using Hubspot company data...all connector reports have technical column names (over 450 columns) and all date fields (50+ columns) are a mix of unixtimestamps, some with milliseconds, some without, some already as dates and yet others as timestamps. There are too many columns for a MySQL flow input (it has a limit), and managing hundreds of columns in a formula tile or select columns is unmanageable and too manual.

The objective is to rename all connector report columns to the proper UI label and convert all date or time related fields that are UNIXTIMESTAMPs over to a DATE type.

Here is the source Hubspot All Companies Dataset Preview from the connector for a few columns:

My first step was to find the Hubspot API endpoint that provided the mapping of the technical name to the UI label, which exists at this endpoint - https://api.hubapi.com/properties/v2/companies/properties.

I setup a basic JSON No Code Connector parse that data and it's looking good and provides the mapping, which doesn't always match up exactly by simply converting to Title Case or removing underscores, for example: hs_is_enriched on backend displays in Hubspot property as "Has been enriched"


Python Scripting Tile

Magic ETL was a great solution and python allowed me to make this mapping and the date conversions in bulk for hundreds of columns

Here is the first tile handling the column rename - a few lines of code with a mapping dictionary to rename 450 columns

from domomagic import *

# Read data from inputs into data frames
input1 = read_dataframe('Hubspot | All - Companies')
input2 = read_dataframe('Hubspot API | Company Properties')

# Create name to label mapping dictionary
mapping_dict = dict(zip(input2['name'], input2['label']))

# Rename columns where they exist in the mapping
existing_cols = [col for col in input1.columns if col in mapping_dict]
rename_dict = {col: mapping_dict[col] for col in existing_cols}

# Apply the renaming
result = input1.rename(columns=rename_dict)

# Write the transformed dataframe to output
write_dataframe(result)

And here is the second handling the date conversions. A bit more complex given the variety of inputs and column names, but nothing a little AI couldn't help me figure out.

from domomagic import *
import pandas as pd

# Read the input dataframe
df = read_dataframe('Python Script - Bulk Column Rename')

# Find date columns
timestamp_cols = [col for col in df.columns if 'date' in col.lower() or 'time' in col.lower()]

print("Processing timestamp columns:", timestamp_cols)

for col in timestamp_cols:
    try:
        # Skip empty or all-zero columns
        if df[col].isna().all() or (df[col] == 0).all():
            continue
            
        # If already in datetime string format
        if pd.api.types.is_string_dtype(df[col]):
            df[col] = pd.to_datetime(df[col], errors='coerce')
            continue
            
        # For numeric timestamps
        if pd.api.types.is_numeric_dtype(df[col]):
            # Get non-zero max value to determine timestamp type
            max_val = df[col][df[col] != 0].max()
            
            # 10-digit timestamp (seconds)
            if max_val < 10000000000:
                df[col] = pd.to_datetime(df[col], unit='s')
            # 13-digit timestamp (milliseconds)
            elif max_val < 10000000000000:
                df[col] = pd.to_datetime(df[col], unit='ms')
            # Larger timestamps (microseconds)
            else:
                df[col] = pd.to_datetime(df[col] / 1000, unit='ms')
                
        print(f"Converted {col}. Sample values:", df[col].head())
        
    except Exception as e:
        print(f"Error converting column {col}: {str(e)}")
        continue

# Write the final result
write_dataframe(df)

Tagged: