Redshift vs. MySQL vs. ETL

Question

Hi Everyone,

This Knowledge Base Article suggests guidelines for when to use certain In-Domo processing options, but admittedly I default to using Redshift for all of my transforms regardless of dataset input size. Am I missing out on faster data flow processing times by not following the guidelines in the article?

I would be particularly interested to know if any of the options are optimized to load input data sets faster/slower and process the output datasets than the others.

Darius · Accepted Answer

DDalt,

Thank you for reaching out with your question. Redshift is great for many use cases, especially those that require SQL transformations on larger data. The downside is that the Redshift service and resources alotted to processes are managed by Amazon and remove some environment controls that Domo otherwise has for MySQL and Magic ETL.

MySQL will generally present less variance in DataFlow run times, but it does not automatically index data as Redshift does, so it is better suited to smaller input row counts. MySQL doesn't share all of the same functionality as Redshift, as is the case with windowed functions available in Redshift. Otherwise, to get the best performance out of mySQL DataFlows, you should employ indexes for joins and consider other optimizations discussed here:

http://knowledge.domo.com?cid=optimizingdataflow

Magic ETL is well suited to larger input DataSets, and could be considered as an alternative to Redshift for many use cases. It will begin to process data through the transformations as the input data comes in, rather than waiting for all of the input data to load completely, as Redshift does.

To summarize, while Redshift is good for larger data, mySQL should be used for smaller Data inputs. Magic ETL is good for small and large data inputs. Each use case will determine what tool is the best fit, but they all have their place in your toolbox for data manipulation and additional data preparation for various use cases.

Regards,