Importing huge amount of data from Oracle DB through Workbench 5

user06848
user06848 Member
edited March 2023 in Datasets

Hi All,

 

I'm new to Domo Administration, I have configured data set connecting to Oracle DB from workbench 5, I have following questions in regards to data loading issues.

 

1. I have two tables in Oracle DB (Table A & B) both these tables have huge amount of data. Table 'A' has 500 millions rows and 20 columns and Table 'B' has 400 millions rows of data and 10 columns. I would like to know what is best practice to import the data to Domo.

 

2. I tried to import these two tables(A & B) to Domo by increasing the 'Query Execution timeout' limit in workbench but the job keeps failing may be due huge amount of data.

 

Please let me know what is the best practice to load the data to Domo using workbench, also need to do daily upsert for past 30 days to the dataset.

OR

Loading these two tables to datawarehouse and connecting Domo to warehouse?

 

Is Domo feasable to load huge amount data?

 

Please let me know if you have any questions,

 

Thanks in advance!!

Comments

  • cwolman
    cwolman Contributor

    I work around this issue by selecting my data in chunks and running the job multiple times until all of the data has been uploaded.  You will either need to do this by dates or a unique record identifier and replacement variables.  Depepnding on the number of columns and size of the data in the rows will determine how many records you can upload at a time.  My last scenario I was able to reliably upload 2M rows at a time and each row had a unique identifier named cluster which was numeric.  Basically my SQL was:

     

    select *

    from table

    where cluster between '!{lastvalue:cluster}!' and '!{lastvalue:cluster}!'+2000000

    order by cluster

     

     


    -----------------
    Chris
  •  

    Thanks for the response and apologies for the delayed response.

     

    I tried to apply the logic you have mentioned (select * from table where cluster between '{lastvalue:cluster}!' and '!{lastvalue:cluster}!'+2000000 order by cluster)

    to do initial data load to the dataset and i see "Replacement parameter does not contain a value for column" error when i have executed the job on workbench, could you please let me know if im missing any steps, could you please help me with step by step procedure.

     

    Thanks,

    Hari 

  • cwolman
    cwolman Contributor

    Hi Hari,

     

    Please refer to this link for information on using lastvalue replacement variable.  Section number 19 has the specifics.

     

    cluster is the name of a column in my table that contains a unique value for each record.  In the replacement variables tab is where you would put your field name and the starting value.  Since my column is a number that increments for each record I set the cluster variable to 0 to select the first 2,000,0000 rows.  After executing the cluster variable would be incremented to 2,000,000 so the query would retrieve the next 2,000,000 records by selecting all record where cluster is between 2,000,000 and 4,000,000.

     

    If you do not have a numeric column then you could do chunk the data by date and execute the query selecting all records between a specified date range.  The other important part of using this variable is to make sure you include the order by clause so that the lastvalue variable gets set correctly.


    -----------------
    Chris