Sort Column in ETL?

Options

I am trying to import some old Data and merge it with data from a connector but the data is in different orders.  The legacy data went decending

 

1

2

3

4

 

but when I merge the data with the connector its adding to the top so I get

 

5

1

2

3

4

 

I was just wondering, is it possible to merge the data onto the bottom or sort the legacy data so that its all in order, or is the only option to actually sort it within the card?

 

Thanks

Best Answer

Answers

  • MarkSnodgrass
    Answer ✓
    Options

    Ordering the data in the table should not be necessary unless you just find it easier to look at it when you are in the data center and looking at the data of a dataset. Having said that, you could use the Rank & Window tile to order one dataset before the other if you really wanted it stacked in a certain way. Here is a link to the KB article on Rank & Window if you aren't familiar with it.

    https://knowledge.domo.com/Prepare/Magic_Transforms/ETL_DataFlows/03ETL_Actions%3A_Rank_and_Window

     

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • apurkhardt
    Options

    Sorting is important if you need to remove duplicates in a certain order.

  • DomoDork
    DomoDork Contributor
    Options

    I wholeheartedly agree with the idea of a seperate sort tile. Rank and Window is fine, but not every situation calls for a window function or the additional configuration/grouping options needed for Rank and Window to function the way you want or expect. And as Apurkhardt mentions, there are actual use cases that rank and window aren't good for.

    Not only that but when I'm doing previews in Magic, many times I just want a quick and easy way to visually sort the preview data so I can validate chunks of organized data without a bunch of extra configuration. It's easier and quicker on the brain.

    From personal experience… my company is brand new to Domo and all of us come from a SQL background, but we have many developers who are struggling to grasp aspects of MagicETL because it doesn't have basic things like sort that have been in every single database technology since before I was born and is in every modern other modern BI product :)

    Sort in and of itself may not be 'necessary' in DOMO in the traditional sense, but I'd guess most end users come from a SQL background where sort is a main staple and part of everyday life where as window functions are use case specific. Giving more options, especially something as simple as a sort tile does nothing but help developers and make adoptibility for new-comers a bit easier.

    @Simon_King please post this in the ideas exchange area and I will upvote it.

  • ST_-Superman-_
    Options

    you can specify the sort order in a MySQL ETL transform


    “There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
  • DomoDork
    DomoDork Contributor
    edited April 2023
    Options

    Isn't using MySQL about 3x the cost from a pricing perspective compared to a MagicETL execution? Thats not something everyone wants to swallow.

    But @Simon_King - I just remembered that in the last Domopolooza event, they showed off a SQL Tile in MagicETL that is coming very very soon so theoretically that would work!

  • timehat
    timehat Domo Employee
    edited May 2023
    Options

    When sort order is important for cards or viewing the "Data" tab, I recommend using either the sort option within Analyzer or creating a View to add the sort. As mentioned, some DataFlow engines provide sort functionality, but as the engine powering your DataFlow is not the same engine used to power cards/data queries, that sort isn't necessarily "sticky" after the DataFlow.

    (The data is stored into Vault after a DataFlow completes and then indexed into Adrenaline, which may do fancy things like parallel loading and such for indexing speed or sharding for query performance. As the internal database structures there may not persist the row order in the raw intermediary used in Vault, a query-time sort is going to always be your safest bet.)

  • timehat
    timehat Domo Employee
    Options

    Also, in the context of MagicETL, sort will be an option in the upcoming SQL tile. Of note, however: MagicETL does as much streaming of rows from one tile to another as possible. This will remain the case with the SQL tile within MagicETL. It essentially will "compile down" to the underlying engine actions that we support. So a simple SQL query like:

    SELECT column AS new_name FROM input_tile

    Will continue to stream rows through (selecting to the one, renamed column) from the SQL tile's input to the next tile in the graph.

    If the query is modified to this:

    SELECT column AS new_name FROM input_tile ORDER BY column

    Then no rows are fed to any downstream ETL tiles until all rows have been read (needing to buffer and then sort the entire row set in order to pass rows downstream in the appropriate order). Grouping and some other actions require the same "all rows must be ingested before any downstream rows can be sent" behavior. It's not necessarily a problem (sometimes that's obviously necessary to generate the needed transform/report/etc), but if it's just a matter of preference, then leaving the sort out of the ETL (or any other DataFlow type, for that matter) is going to lead to better performance for your end-to-end data pipeline.

    Where the sort is important for visuals/reports, adding the sort at the top level (cards, a view, etc) is going to allow the optimized query engines powering those functions to do something they may be better-suited for (the sort in that case may be after filtering and aggregating so the sort operation runs on far fewer rows and takes far less time to complete, for example).