Run of Dataflow Times Out

I've got a dataflow that times out whenever I try to run it. The eror message says "Cache file timed out", and it looks to happen somewhere around 10-12M rows. I'm filtering one of the tables to reduce the number of records to process, but the time-out error occurs before it gets to the filter step. Any suggestions?

Comments

  • quinnj
    quinnj Contributor

    Hey @SLamba,

     

    A few things to think about, assuming you are using a MySQL DataFlow:

     

    - Using indexes is absolutely critical to getting good performance with MySQL dataflows; I make it a best practice to write out all the steps/transforms I'll need in my DataFlow and then go back and add an index for each column I'm joining on, using in a where clause, or grouping by. Particularly on datasets larger than about 5 million rows, this is essential.

     

    - It may be the case that the DataFlow is timing out just loading the data into MySQL; in this case, I would reach out to Domo Support (or your Domo implementation consultant) to have them see if the DataFlow load timeout can be increased.

     

    - Depending on what kind of specific transforms you're doing to the data, you may also consider using a Magic ETL flow instead. It involves a different processing model that can sometimes handle larger data manipulations better than a SQL-based DataFlow.

     

    - If nothing else seems to be helping, reach out to Domo Support about getting access to our Redshift DataFlow engine. In certain cases, this can be turned on to provide additional processing power for complex ETL solutions.