Appending several datasets in Workbench

iresanjogar · November 2016

Hello Dojo Community!

I have 15 different and large datasets that I need to append to create a unique output dataset that I pull into Domo using Workbench (version 4). I would like to know if I can create a single job in Workbench that allows me to append always the same 15 datasets to be able to upload a single dataset into my Domo instance related to a single job in Workbench. I know I could create 15 different jobs in Workbench and append them all in Domo using Magic ETL, but ideally I would like to have only 1 job and 1 dataset in my Domo instance.

Thanks!

n8isjack-ret · November 2016

@iresanjogar In workbench you may be able to load all 15 files in a single workbench job. The base requirements are:

All the files must have the same schema (column 1 on all 15 files must be the same data type, column 2 also, and so forth for all columns). Workbench will assume column positions match, not column names.
The files must be in the same folder (should also be alone in that folder without any other files).

When you add the file in the settings remove the filename so that it is pointing to the folder like this:

Screen Shot 2016-11-02 at 3.55.05 PM.png

Kurbz · March 2017

hi,

when i created a new csv job it worked!!

(note, did not work when trying to change an excel job to csv)..

many thanks!

sorted.

K.

cwalliser · August 2017

My bad. I found a structure difference in one of the files. When I removed that file from the folder, the update worked.

A lovely thing...

[Deleted User] · November 2016

@iresanjogar

Thanks for your question, we are taking a closer look at it.

In the meantime have a look at the FAQ document for Workbench 4.

Regards,

AustinH · November 2016

You are able to combine different datasets in a workbench job to load as one dataset into Domo. The basic way to do this would be by using a UNION or UNION ALL inbetween select statements for each dataset. With unions, you will need to make sure the columns are aligned in your datasets or in the select statement.

Example:

SELECT

`columnA`

,`columnB`

FROM dataset1

UNION ALL

SELECT

`columnA`

,`columnB`

FROM dataset2

iresanjogar · November 2016

Thanks for your reply!

However, could you specify where I should write that code within Workbench? Is this a customized transformation that I can add to my job? If so, could you explain a bit better the process? The type of transformations allowed seems pretty limited and I am not sure which option I should select to do this.

However, if I understood properly, I will still need to create 15 jobs in Workbench (associated to each of the 15 datasets I want to append), and then another one that would append all those datasets and load as one dataset into Domo, am I wrong?

Thanks once again!

AustinH · November 2016

Can you provide more context of what type of data sources you are pulling data from? Database, excel, csv?

In the job settings, you will see the "Query" tab as an option in the "Data Source Property" section. Here you can input custom SQL to pull the data you want. If you are connected to a database, you can simply create one job that references the different datasets available in the dataset:

If you are wanting to load 15 different files, excel files for example, then you will need to create different jobs from each, since you are only able to browse to one local file at a time. In this case you would then combine them once they are in Domo, which I know you are wanting to avoid.

Let me know if you need additional details and include more information about your datasources.

iresanjogar · November 2016

This was really useful! Thank you very much!

The type of files I need to append are CSV, which then means that I need to create 15 different jobs and perform the transformation in Domo. However, now I know that if I am connected to a database I will be able to do these type of operations! Thanks! It would have been great though if in the subsection "transport method" of the "Source" information you would be able to add several files.

n8isjack-ret · November 2016

@iresanjogar In workbench you may be able to load all 15 files in a single workbench job. The base requirements are:

All the files must have the same schema (column 1 on all 15 files must be the same data type, column 2 also, and so forth for all columns). Workbench will assume column positions match, not column names.
The files must be in the same folder (should also be alone in that folder without any other files).

When you add the file in the settings remove the filename so that it is pointing to the folder like this:

Screen Shot 2016-11-02 at 3.55.05 PM.png

Unknown · November 2016

@iresanjogar, did any of the above replies help you out?

iresanjogar · November 2016

Thank you @n8isjack-ret for your reply! This is exaclty what I needed (cc: @LizWR)! Since my files fulfill the requirements I can use this method without a problem. Thanks again!

[Deleted User] · November 2016

@iresanjogar, can you mark the appropriate reply "accept as solution" so others can benefit from this conversation?

Thanks!

Dani

Kurbz · March 2017

Hi Folks,

I am trying to do this method, however when I remove the filename and save it saves with a red dot. When I run the job it fails with:

Could not find a part of the path 'E:\MyDataFolder\'.

I have ran the job successfully with one selected file, but I cannot execute it successfully with no filename and just the folder path ?

Am I missing something?

Thanks Dojo's!

kshah008 · March 2017

@Kurbz, please feel free to open a new thread for better exposure to your question ?

n8isjack-ret · March 2017

Can you confirm that you are using a .csv file import?? This trick only is possible with .csv files.

Kurbz · March 2017

hi,

when i created a new csv job it worked!!

(note, did not work when trying to change an excel job to csv)..

many thanks!

sorted.

K.

cwalliser · August 2017

I tried this approach to load 3 csv files with different names in the same folder. Domo Upload Source Folder.PNG

The preview test failed with an error message below. I notice only Q1 & Q2 files show in the log.

Is there a file size limit? Some other factor I'm missing? I don't see any structure differences in any of the files.

[08.28.17 03:45:48 PM] Requesting CSV source file from the Local File Provider
[08.28.17 03:45:48 PM] Loading local file provider properties
[08.28.17 03:45:48 PM] Loading CSV file: C:\Users\cwalliser\OneDrive - Gigamon Inc\DOMO\SvcMetricsFiles\
[08.28.17 03:45:48 PM] Parsing file: 'C:\Users\cwalliser\OneDrive - Gigamon Inc\DOMO\SvcMetricsFiles\ServiceMetric_2017_Q1.csv'
[08.28.17 03:45:48 PM] Parsing file: 'C:\Users\cwalliser\OneDrive - Gigamon Inc\DOMO\SvcMetricsFiles\ServiceMetric_2017_Q2.csv'

[08.28.17 03:45:49 PM] The number of fields in the record is greater than the available space from index to the end of the destination array.
Parameter name: array

cwalliser · August 2017

My bad. I found a structure difference in one of the files. When I removed that file from the folder, the update worked.

A lovely thing...

user09630 · October 2017

Hi,

I am trying to import muitlipe CSV's in workbench by date modified order (oldest first) and it seems like the workbench imports the multilple CSV's ordered by filename.

Our developer has used guid while generating the CSV names so any help how to load by date modified will be appreciated.

Thanks.

n8isjack-ret · October 2017

Ok, help me understand the impact of the load order. My first thought is that the order in which they are loaded will not affect the end result in Domo.

Is it because you don't want to load the old files ever again once reloaded?

Are you loading all files every time?

Does the same record appear in multiple files?

user09630 · October 2017

Hi,

Let me explain further.

We are generating delta loads as a CSV files so currently the name of the files are loaded in the following order by default:

1) ACCOUNT_380f10d4-3cd9-46a2-8196-76dcd85ac4dc.csv   --date modified as of 03/10/2016 6:08 AM (DD/MM/YYYY)
2) ACCOUNT_3f771f53-9ca4-4251-ae94-96f9284ba306.csv   --date modified as of 03/10/2016 6:02 AM
3) ACCOUNT_5caaaf2d-ca50-43ff-a56c-b1358b80ebb6.csv   --date modified as of 03/10/2016 6:00 AM
4) ACCOUNT_5caaaf2d-ca50-43ff-a56c-b1358b80ebb6_1.csv --date modified as of 03/10/2016 6:00 AM
5) ACCOUNT_5e131c62-0c7d-4d02-9d9f-8e2d5f40ae53.csv   --date modified as of 03/10/2016 6:15 AM

Since its a delta load we would like the files to be loaded by file date modified order. so i am expecting this to be loaded in the following order:

3) ACCOUNT_5caaaf2d-ca50-43ff-a56c-b1358b80ebb6.csv     --date modified as of 03/10/2016 6:00 AM
4) ACCOUNT_5caaaf2d-ca50-43ff-a56c-b1358b80ebb6_1.csv --date modified as of 03/10/2016 6:00 AM
2) ACCOUNT_3f771f53-9ca4-4251-ae94-96f9284ba306.csv     --date modified as of 03/10/2016 6:02 AM
1) ACCOUNT_380f10d4-3cd9-46a2-8196-76dcd85ac4dc.csv    --date modified as of 03/10/2016 6:08 AM
5) ACCOUNT_5e131c62-0c7d-4d02-9d9f-8e2d5f40ae53.csv   --date modified as of 03/10/2016 6:15 AM

To answer your question: yes the order they are loaded does affects the data so the oldest csv needs to be loaded first. We are loading different files in every run. yes the records appears in multiple file which is named as above with the help of the guid in the file name. Due to some reasons we are not able to name the files as Account_1.csv, Account_2.csv ...

Thanks.

motard2dijon · December 2018

Hi Dojo Community,

I have a follow up question on this topic. When you upload different csv files with the method described in this post, is there an option to add an additional column where the specific name of each csv file can be written down and recorded?

I would like to be able to distinguish between the different csv files because it is important for my analysis to know from which csv file each row comes from.

The other option is to upload them separately and add that column within the Workbench transformation tool, but given that all csv files have the same structure I would like to use this trick which simplified a lot the process.

Thanks in advance!

Appending several datasets in Workbench

Best Answers

Answers

Hi,

Let me explain further.

We are generating delta loads as a CSV files so currently the name of the files are loaded in the following order by default:

Since its a delta load we would like the files to be loaded by file date modified order. so i am expecting this to be loaded in the following order:

Thanks.

Categories