Large dataset import in Domo Workbench

mroker
mroker Member
edited March 2023 in Workbench

Hi everyone, I have a very large dataset (18M rows) that I want to import and it takes too long. The query is already a grouped query so there is no unique ID. I am just trying to figure out if there is a way to import possibly a one time chunk (the data is for 2 years) and then maybe append the data after that. So maybe run a 2 year chunk, and then run a daily job that just gets the data that was there for a few days (maybe with an append). How can this be done without a unique ID?

Answers

  • Yes, there is a way to split up timed sections and append. You can import via workbench, or connect to your data with an api. Create smaller datasets such as data_yr_2021, data_yr_2022, data_qtr1, data_qtr2, etc. Then create a workflow and use the append tile to append the datasets. That way, you can pull previous years once or once per month, and more recent data daily. The workflow will give a full dataset result with all records for all times. I do this at my company to pull several years worth of data for analysis, but its only pulling the quarter to date on a daily basis.

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • I’d recommend using the CLI tool to do a large dataset import as it will use the streams API and split your data into chunks and load your data faster

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**