Best Of
Re: Changing dataset update method from Replace to Partition doubles data
@Chinmay_Totekar I believe the ticket was closed, with the unsatisfying conclusion that this was best for starting with new dataset, not converting existing ones. I did find two ways around it, though:
- For the first one, make a clone of the dataset using the Schema Management tool (Admin » Governance Toolkit » Schema Management), making sure to have it copy the data. Once done, use the same tool to clear the data out, then temporarily rewrite the ETL to set the output dataset to partitioning and bring in the cloned data and write the data out. It's kind of a pain, but it's what I've been doing for the past couple of years.
- Last month I decided to take the Upsert and Partitioning in Magic ETL self-paced course in Domo University. At around the 25 minute mark in the video it talks about using the CLI tool to partition the dataset for you. I haven't tested it, but it does look like it would work. You'll probably still want to make a clone of the original dataset just in case something goes wrong, but this may be your best bet.
jimsteph
Changing dataset update method from Replace to Partition doubles data
We've noticed that if we change the update method of an existing dataset to Partition from replace, we end up with two records in the dataset for every new one (and both of them look identical, down to the field we're using for partitioning): it was quite the shock to see a dataset I expected to have 91 million rows suddenly had 182 million. The obvious takeaway is that we probably should start from scratch when using partitions, but the benefits of converting existing ETLs is too strong a siren song for me to resist.
Two questions:
- Has anyone else noticed this? Both of us here got bit by this, so I want to know if it's a general bug, if it just affects our instance, or if we're doing it wrong and it's Working As Designed™.
- What would be a way around this? If I have other ETLs downstream of the dataset I don't want to delete the existing one and start from scratch unless I absolutely have to. My quick-and-probably-inefficient idea is to store the data to a secondary dataset (deduping if necessary), set the original to partition, and figure out how to use the even more beta feature to tell it to keep no partitions. Let it run once to clear out the dataset, then reimport the data from the secondary dataset and send it back to the original.
Any help would be appreciated.
jimsteph
Re: Making Sense of Wildly Unstructured Data
You can create an ETL that starts by splitting each comma-separated value into columns (Split Column). That would give you each label/value in it own column but not aligned.
You would have to iterate over all the columns to find the patterns. Label: Value —> Label and Value. With Label: in one column and value in the next column.
Then convert your irregular columns into key-value pairs and unpivot the columns.
If you give us a sample csv file with a few records, we can try to give you a direct example. If you are familiar with running Python in Domo, you can do something like this (below). Again….I don't have a data sample to try it out.
import pandas as pd
import re
# -----------------------------
# 1. Load raw data
# -----------------------------
# If CSV has one column called 'raw':
df = pd.read_csv('raw_events.csv') # or pd.read_excel('raw_events.xlsx')
# If multiple columns from CSV, combine them into one string per row
df['raw_combined'] = df.apply(lambda row: ','.join(row.dropna().astype(str)), axis=1)
# -----------------------------
# 2. Split rows by comma
# -----------------------------
split_cols = df['raw_combined'].str.split(',', expand=True)
# -----------------------------
# 3. Extract label/value pairs
# -----------------------------
def extract_label_value(cell, next_cell=None):
if pd.isna(cell):
return None, None
# Case 1: Label: Value in same cell
match = re.match(r'\s*(.+?):\s*(.+)', cell)
if match:
return match.group(1).strip(), match.group(2).strip()
# Case 2: Label: in one cell, value in next
if next_cell is not None:
match2 = re.match(r'\s*(.+?):\s*$', cell)
if match2:
return match2.group(1).strip(), next_cell.strip()
return None, None
# Iterate over all columns and extract key/value pairs
records = []
for idx, row in split_cols.iterrows():
row_data = {}
i = 0
while i < len(row):
key, value = extract_label_value(row[i], row[i+1] if i+1 < len(row) else None)
if key:
row_data[key] = value
# If value was in next cell, skip it
if re.match(r'.+?:\s*$', row[i]):
i += 1
i += 1
records.append(row_data)
# -----------------------------
# 4. Create flattened DataFrame
# -----------------------------
flat_df = pd.DataFrame(records)
# -----------------------------
# 5. Optional: rename columns to match desired output
# -----------------------------
rename_map = {
'ID number': 'Session ID and Title',
'Site': 'Site',
'Room Layout Comments': 'Room Layout Comments',
'Start/End': 'Session Start date / Session End date',
'Schedule Date': 'Schedule date',
'Sponsor': 'Sponsor',
'Booked By': 'Booked by',
'Contact Person': 'Contact person',
'Registration Start-End Date': 'Registration start / Registration end',
'Fee': 'Fee',
'Contract Amount': 'Contract amount',
'Credits': 'Credits',
'Comments of Internal': 'Comments of internal',
'Confirmation Comments': 'Confirmation'
}
flat_df.rename(columns=rename_map, inplace=True)
# -----------------------------
# 6. Save to CSV
# -----------------------------
flat_df.to_csv('flattened_events.csv', index=False)
print("Flattening complete. Saved to flattened_events.csv")
November 2025 Domo Customer Feature Release-Preview
We have heard from many of you that it is helpful to have a little extra notice for what new features we are releasing. Our November release is coming up in 2 weeks so we are trying out posting a list of what is included now so you can think about how you might use these features and be ready when more information is available.
Re: Domo dashboard - AI summarization
@nazgul - you would not feed the dashboard to the agent - you can remove the step. In your AI Agent, you would add the datasets as knowledge and in the instructions for the agent you would tell the agent what information you'd like it to extract from which datasets.
Re: Calculating Time Difference
Here's what I came up with. Starting with sample data.
ID | Created Time | Responded Time |
|---|---|---|
1 | 2024-11-04 15:26:00 | 2024-11-05 08:11:00 |
2 | 2024-11-05 10:15:00 | 2024-11-05 16:45:00 |
3 | 2024-11-06 08:50:00 | 2024-11-06 17:10:00 |
4 | 2024-11-07 14:30:00 | 2024-11-08 11:20:00 |
5 | 2024-11-09 09:00:00 | 2024-11-11 12:00:00 |
6 | 2024-11-10 16:00:00 | 2024-11-12 10:30:00 |
7 | 2024-11-11 11:00:00 | 2024-11-11 15:00:00 |
8 | 2024-11-12 17:30:00 | 2024-11-13 09:15:00 |
9 | 2024-11-13 08:45:00 | 2024-11-13 09:05:00 |
10 | 2024-11-14 16:50:00 | 2024-11-15 10:10:00 |
If I interpret working hours to be hours from 8am to 5pm, then everything else is non-working hours. And I only want to calculate with working hours.
So using my sample values and the first record, ID 1:
From 3:26pm to 5:00pm would be 94 minutes. So we should have 94 minutes on Nov 4. Then another 11 minutes from 8:00am to 8:11am on Nov 5. Or 105 minutes. Subtract 60 for the hour with 45 minutes remainder the answer would be 1h:45m.
WorkSeconds :
CASE
-- if either timestamp is on weekend (Sunday=1, Saturday=7), return 0
WHEN DAYOFWEEK(`Created Time`) IN (1,7) OR DAYOFWEEK(`Responded Time`) IN (1,7) THEN 0
-- same calendar day
WHEN DATE(`Created Time`) = DATE(`Responded Time`) THEN
CASE
-- compute clipped start ts (8:00) or created time
WHEN
(CASE WHEN `Created Time` < CONCAT(DATE(`Created Time`),' 08:00:00')
THEN UNIX_TIMESTAMP(CONCAT(DATE(`Created Time`),' 08:00:00'))
ELSE UNIX_TIMESTAMP(`Created Time`)
END)
>=
(CASE WHEN `Responded Time` > CONCAT(DATE(`Created Time`),' 17:00:00')
THEN UNIX_TIMESTAMP(CONCAT(DATE(`Created Time`),' 17:00:00'))
ELSE UNIX_TIMESTAMP(`Responded Time`)
END)
THEN 0
ELSE
(CASE WHEN `Responded Time` > CONCAT(DATE(`Created Time`),' 17:00:00')
THEN UNIX_TIMESTAMP(CONCAT(DATE(`Created Time`),' 17:00:00'))
ELSE UNIX_TIMESTAMP(`Responded Time`)
END)
-
(CASE WHEN `Created Time` < CONCAT(DATE(`Created Time`),' 08:00:00')
THEN UNIX_TIMESTAMP(CONCAT(DATE(`Created Time`),' 08:00:00'))
ELSE UNIX_TIMESTAMP(`Created Time`)
END)
END
-- different calendar days (only handles first day remainder + last day morning)
ELSE
-- seconds remaining on the Created date from clipped created→17:00
(CASE
WHEN UNIX_TIMESTAMP(CONCAT(DATE(`Created Time`),' 17:00:00'))
<=
(CASE WHEN `Created Time` < CONCAT(DATE(`Created Time`),' 08:00:00')
THEN UNIX_TIMESTAMP(CONCAT(DATE(`Created Time`),' 08:00:00'))
ELSE UNIX_TIMESTAMP(`Created Time`)
END)
THEN 0
ELSE UNIX_TIMESTAMP(CONCAT(DATE(`Created Time`),' 17:00:00'))
-
(CASE WHEN `Created Time` < CONCAT(DATE(`Created Time`),' 08:00:00')
THEN UNIX_TIMESTAMP(CONCAT(DATE(`Created Time`),' 08:00:00'))
ELSE UNIX_TIMESTAMP(`Created Time`)
END)
END)
+
-- seconds on the Responded date from 08:00→clipped responded
(CASE
WHEN (CASE WHEN `Responded Time` > CONCAT(DATE(`Responded Time`),' 17:00:00')
THEN UNIX_TIMESTAMP(CONCAT(DATE(`Responded Time`),' 17:00:00'))
ELSE UNIX_TIMESTAMP(`Responded Time`)
END)
<= UNIX_TIMESTAMP(CONCAT(DATE(`Responded Time`),' 08:00:00'))
THEN 0
ELSE (CASE WHEN `Responded Time` > CONCAT(DATE(`Responded Time`),' 17:00:00')
THEN UNIX_TIMESTAMP(CONCAT(DATE(`Responded Time`),' 17:00:00'))
ELSE UNIX_TIMESTAMP(`Responded Time`)
END)
- UNIX_TIMESTAMP(CONCAT(DATE(`Responded Time`),' 08:00:00'))
END)
END
WorkSeconds_HHmmss :
CONCAT(
FLOOR(`WorkSeconds` / 3600), ':',
FLOOR(MOD(`WorkSeconds`, 3600) / 60), ':',
MOD(`WorkSeconds`, 60)
)
Re: Any way to view schedule for dataflows?
import os
import requests
from getpass import getpass
import json
import sys DOMO_INSTANCE = os.getenv("DOMO_INSTANCE", "YOUR_INSTANCE") # <-- set this or export DOMO_INSTANCE
DATAFLOW_ID = "PUT_YOUR_DATAFLOW_ID_HERE" # <-- Replace with a real dataflow id (uuid-ish string)
# Token: either set DOMO_DEV_TOKEN env var or leave empty to be prompted securely at runtime
DOMO_DEV_TOKEN = os.getenv("DOMO_DEV_TOKEN", None) def list_dataflow_ids(instance, dev_token):
url = f"https://{instance}.domo.com/api/dataprocessing/v1/dataflows"
headers = {"x-domo-developer-token": dev_token, "Accept": "application/json"}
resp = requests.get(url, headers=headers)
resp.raise_for_status()
batch = resp.json()
ids = [df["id"] for df in batch if "id" in df]
return ids
def fetch_dataflow_preview(instance, df_id, dev_token):
url = f"https://{instance}.domo.com/api/dataprocessing/v2/dataflows/{df_id}"
headers = {"x-domo-developer-token": dev_token, "Accept": "application/json"}
params = {"validationType": "PREVIEW"}
resp = requests.get(url, headers=headers, params=params)
resp.raise_for_status()
return resp.json()
# -----------------------------
# Main
# -----------------------------
dataflow_ids = list_dataflow_ids(DOMO_INSTANCE, DOMO_DEV_TOKEN)
print(f"Found {len(dataflow_ids)} dataflow IDs\n")
sys.stdout.flush()
dataflows = []
for idx, df_id in enumerate(dataflow_ids, start=1):
try:
preview = fetch_dataflow_preview(DOMO_INSTANCE, df_id, DOMO_DEV_TOKEN)
name = preview.get("displayName") or preview.get("name") or "<no name>"
dataflows.append({"id": df_id, "name": name})
print(f"[{idx}/{len(dataflow_ids)}] {df_id} -> {name}")
sys.stdout.flush() # force print in Jupyter as it runs
except Exception as e:
print(f"[{idx}/{len(dataflow_ids)}] Error fetching preview for {df_id}: {e}")
sys.stdout.flush()
This Pythong code lists out dataflow ids.
Re: Execution List Impossible To Investigate Issues
This sounds like a feature request rather than a question. Is the following (below) a correct summation? If so, maybe a forum admin could move it to
Feature Request: Improve Filtering and Visibility in Execution List
Summary
Currently, the Execution List in Domo does not allow filtering or searching by recipient (e.g., email address or person). This makes it very difficult to investigate issues with automated processes that send emails or require responses.
Use Case
- We run an automated process that sends weekly emails requesting form responses.
- We have 300+ recipients at a time.
- When investigating user-reported issues (such as someone receiving a reminder despite having completed the form), we cannot:
- Search/filter executions by recipient or variable (e.g., email).
- See at a glance who received an email without drilling into each execution.
Problem
- A recipient received reminder emails despite evidence they had already completed the form.
- The form was no longer in the queue, but we could not determine which execution sent the email because the Execution List has no filtering options.
- Troubleshooting requires manually opening each execution, which is inefficient and impractical at scale.
Requested Enhancement
Please add the following capabilities to the Execution List:
- Filtering/searching by recipient (person or email address).
- Filtering by variable values (e.g., user ID, form name, workflow inputs).
- Quick visibility into recipients of each execution without needing to click into each one.
Is there a way to set variables to an embedded page like passing pfilters?
The idea behind this would be to render the iframe of the page with a variable value already set by the src url, like passing a pfilter, so when the page loads, it is loaded with the variable with the passed value.



