Asynchronously run custom workbench plugin

Hi,

 

I am currently working on creating a custom plugin for workbench 4.5. My goal is to query multiple data sources within a single DataReader. This is required because the data in my organization is sharded across multiple databases, but the tables across the databases all have the same schemas. I have successfully created a DataReader and DataProvider to handle this task, have installed the plugin on the workbench instance, and have successfully been able to get the data from all of the databases into DOMO.

 

The issue I am currently facing is the time it takes to run the new plugin. Importing 950k records takes 20 minutes using the new plugin compared to a sum total of 2 mins when using the standard ODBC data reader against all of the sharded databases. I believe the ODBC data reader is faster because it's utilizing multiple threads and I'd like to do something similar with my custom plugin.

 

If anyone has any idea how to do this (possibly some more info on how the "ExecutionCharacteristics" property is used on the DataReader or which dll contains the source code for the ODBCDataReader) I would be extremely grateful.

 

Thanks!


**Please mark "Accept as Solution" if this post solves your problem
**Say "Thanks" by clicking the "heart" in the post that helped you.

Best Answer

  • Medinacus
    Medinacus Member
    Answer ✓

    You could avoid the Domo Workbench software altogether and use python and dataframes for manipulating and joining your data from the multiple database queries you mentioned. Dataframes are convientent because they allow SQL like joining of data from within your script. 

     

    Once you've joined the data using a pandas dataframe, you could programatically push your data directly to domo in your script using Domo's dataset API or Streams API.

     

    The Streams API is nice because it can allow you to script an asynchronous push of data to Domo and is much faster than the workbench.

     

    Hope this helps.

     

     

     

     

Answers

  • KaLin
    KaLin Member

    Is anyone able to help with this topic?

  • Medinacus
    Medinacus Member
    Answer ✓

    You could avoid the Domo Workbench software altogether and use python and dataframes for manipulating and joining your data from the multiple database queries you mentioned. Dataframes are convientent because they allow SQL like joining of data from within your script. 

     

    Once you've joined the data using a pandas dataframe, you could programatically push your data directly to domo in your script using Domo's dataset API or Streams API.

     

    The Streams API is nice because it can allow you to script an asynchronous push of data to Domo and is much faster than the workbench.

     

    Hope this helps.

     

     

     

     

  • I completely forgot about the stream API. We actually have created an in house .NET service for pulling from different datasources and pushing the data to other locations like DOMO. I was worried about performance and having to store overly large chunks of data in memory before sending it over to DOMO, but it looks like the Stream API handles all of those concerns.


    **Please mark "Accept as Solution" if this post solves your problem
    **Say "Thanks" by clicking the "heart" in the post that helped you.
  • Yeah, you'll still have to split the and gzip the query results but once you have created the parts, the Streams API is very quick if you program it to push asynchronously. 

     

    For example, we've written a little python cmd line application for pushing data up to domo using the Streams API.

     

    On one dataset, we sent data to domo 57% faster than the Domo Workbench could by using the Streams API.

     

    What you are really optimizing with the Streams API is the "send" portion of the job execution on the workbench.

     

    For the send portion specifically, we sent data 95% faster than the Domo Workbench did. (an 8M row, 130 column file in 3.5 minutes)

This discussion has been closed.