Hi there! Huge fan of the forums and hoping you can help me with something that previous posts and many late night Google searches have eluded me on...
We have an email connector pulling in many CSV files from attachments and appending them one after the other in the order received. The order of the data within each CSV is very important as it's not "raw data" as such - the first few lines of each CSV contain important information about everything in the lines underneath them. We successfully solved for this using SQL but for that Dataflow to work (the next step after an initial MagicETL) it's very important that the order is retained so I am trying to make that stick by giving myself a RowNumber field to order on.
In testing with a few thousand rows, this was easily achieved by first adding a Constant column (with a fixed value of 1) and then using Rank & Window to calculate row numbers based on that - for whatever reason, even though they were all "1", MagicETL very happily ranked them 1 through the maximum row number, in the order that the data had arrived in. This made it super easy to always have a reference point to give both MagicETL and a subsequent Dataflow to order from and apply many different order-dependent transforms to in order to pull the "headers" in the top of the CSV down against every row, until it found the next one, and so on.
However I'm now trying to do the exact same thing on the complete dataset of about 1.3M rows and something about the increased size means it's now picking its own order to rank the constant on which is making nonsense of the rest of the steps in the MagicETL flow - it loads the data from the email connector in the right order, when I add the constant "1" column it remains in that order, but once I try and give that constant column a ranking it starts moving arbitrary rows around into a different order (is this something to do with the number of rows? As I mentioned when it was just a few thousand rows it understood how to rank them in the same order).
Unfortunately the raw data as it comes in the CSV does not have any columns I can base a logical order on (and comes from a third party so I'm not going to be able to be fix that), but since DOMO clearly knows somehow to retain that order for all the steps up to this point, I'm really confused as to why it suddenly forgets that at the Rank step! Any tips? Or any alternative ways using either type of flow or in configuring the original connector that I can add row numbers based on the original order the data was collected and appended in?
Thanks in advance - some screenshots below :)
Ollie
Raw data from email connector:
After adding the constant, the above looks exactly the same.
Then this Rank & Window function is applied:
And after that, although the RowNum column appears in the correct order (1 at the top, +1 for each row that follows), it has moved all the rows into an order I can't make sense of: