Row Numbers from original data

OllieSouthgateAKA · 2021-09-02T08:37:36+00:00

There was an error rendering this rich post.

Hi there! Huge fan of the forums and hoping you can help me with something that previous posts and many late night Google searches have eluded me on...

We have an email connector pulling in many CSV files from attachments and appending them one after the other in the order received. The order of the data within each CSV is very important as it's not "raw data" as such - the first few lines of each CSV contain important information about everything in the lines underneath them. We successfully solved for this using SQL but for that Dataflow to work (the next step after an initial MagicETL) it's very important that the order is retained so I am trying to make that stick by giving myself a RowNumber field to order on.

In testing with a few thousand rows, this was easily achieved by first adding a Constant column (with a fixed value of 1) and then using Rank & Window to calculate row numbers based on that - for whatever reason, even though they were all "1", MagicETL very happily ranked them 1 through the maximum row number, in the order that the data had arrived in. This made it super easy to always have a reference point to give both MagicETL and a subsequent Dataflow to order from and apply many different order-dependent transforms to in order to pull the "headers" in the top of the CSV down against every row, until it found the next one, and so on.

However I'm now trying to do the exact same thing on the complete dataset of about 1.3M rows and something about the increased size means it's now picking its own order to rank the constant on which is making nonsense of the rest of the steps in the MagicETL flow - it loads the data from the email connector in the right order, when I add the constant "1" column it remains in that order, but once I try and give that constant column a ranking it starts moving arbitrary rows around into a different order (is this something to do with the number of rows? As I mentioned when it was just a few thousand rows it understood how to rank them in the same order).

Unfortunately the raw data as it comes in the CSV does not have any columns I can base a logical order on (and comes from a third party so I'm not going to be able to be fix that), but since DOMO clearly knows somehow to retain that order for all the steps up to this point, I'm really confused as to why it suddenly forgets that at the Rank step! Any tips? Or any alternative ways using either type of flow or in configuring the original connector that I can add row numbers based on the original order the data was collected and appended in?

Thanks in advance - some screenshots below :)

Ollie

Raw data from email connector:

After adding the constant, the above looks exactly the same.

Then this Rank & Window function is applied:

And after that, although the RowNum column appears in the correct order (1 at the top, +1 for each row that follows), it has moved all the rows into an order I can't make sense of:

Quick Links

Accepted answers

All comments

GrantSmith

Hi @OllieSouthgateAKA

My assumption here is that when Domo is processing your data the underlying architecture / platform they're using to process the data doesn't respect row ordering when performing a rank operation. This is typically the case with Big Data solutions because it's faster to read your large data set in chunks simultaneously instead of reading the dataset one row after another. If you're unable to get a sorting method implemented within the CSV file records you're likely out of luck using a Magic ETL dataflow.

Since you mentioned you're already using a SQL dataflow have you thought about attempting to calculate the row number for the CSV when you're processing it in that SQL dataflow?

jaeW_at_Onyx

if all you want to do is apply header row values (in col 1) across all rows of your dataset, then

1) split your data into header rows and transaction rows (FILTER looking for NULLs)

2) spread the header values (organized in rows) into columns using PIVOT

3) add the constant 1 to both header_set and transaction_set.

4) cross apply using inner join on 1 = 1

I have an example here

https://www.youtube.com/watch?v=cOiT3FjQ7K8

but it 's obviously not 100% the same implementation you need.

OllieSouthgateAKA

Hi @GrantSmith - the SQL dataflow I referred to actually currently happens AFTER the ETL. But I am not opposed to adding one in between the raw data and the ETL if that could fix it! Any tips on what I would do in the SQL Dataflow part in that case? Googling around only returned functions that are only available in later versions of MySQL...

Thanks @jaeW_at_Onyx . What's weird is this is exactly what I already have implemented (just on multiple layers of headers) and in a preview in MagicETL2 it's still in the right order right up to previewing the final output title, even up to a 400K row preview, but then when I actually run it on the complete dataset the ordering all goes awry again :( but thanks for the tips!

jaeW_at_Onyx

sure. but keep in mind MAgic 2.0 is a distributed ETL engine, if it can, it will chunk your job into smaller parts and distribute it. so b/c it's being distrbuted row sort order can't necessarily be guaranteed.

the solution i've described doesn't require the rows to return in a specific order. for your header rows you're filtering on rows that you've identified as 'header rows' b/c the value columns are NULL.

Omaurya

Hi Ollie,

I am wondering whether you tried ranking the data immediately after CSV connector, before appending the same with the larger dataset? Since the data coming from the connector is small, it might be easier to get the ranking done and it is likely to work correctly.

Thanks! Om

other categories

Product Ideas
Have a Domo product enhancement idea? Submit or upvote on ideas in the Ideas Exchange.
Ideas Exchange
Suggest & vote on new features you would like to see implemented in the Domo Product.
Data Connections
Ask questions about Connectors, Workbench, Cloud Amplifier and get best practices from Domo peers
Connectors
Connectors, Custom Connectors, Writeback
Workbench
Ask questions about Workbench, a secure, client-side solution for uploading your on-premise data to Domo.
Cloud Integrations
Ask questions about Cloud Integrations and Federated Data connection to your data warehouse or lake.
Data & ETL
Ask questions about Magic ETL, SQL DataFlows, DataFusion, Dataset Views and get best practices from Domo peers
Magic ETL
Ask Magic ETL questions and get answers from Domo peers
SQL DataFlows
Ask SQL DataFlow questions and get answers from Domo peers
Datasets
Ask DataFusion and Dataset Views questions and get answers from Domo peers
Visualize & Apps
Ask questions about Beast Mode, Cards, Charting, Dashboards, Stories, Variables and get best practices from Domo peers
Dashboards
Ask Cards, Dashboards, and Stories questions and get answers from Domo peers
App Studio
Ask questions about building apps in App Studio.
Pro-code Components
Ask questions about pro-code components and Domo Bricks and get answers from Domo peers.
Charting & Analyzer
Ask questions about charting and Analyzer and get answers from Domo peers.
Calculations & Variables (Beast Mode)
Ask questions about using calculated fields and Variables (Beast Modes) in Analyzer.
AI & Data science
Ask questions about DomoAI and get answers from Domo peers.
Domo AI & AI Chat
Ask questions about AI Chat and AI assistants.
Managing AI
Ask questions about managing AI with AI Playground, AI projects, AI models, and more.
Jupyter Workspaces
Ask questions about Jupyter Workspaces, Notebooks, and file share.
Automate
Ask questions about App Framework, Workflows, Domo Bricks, Domo Developer, API and get best practices from Domo peers
Workflows
Ask questions about Task Center, building automations with Domo Workflows, and executing JavaScript or Python code with Code Engine.
Alerts
Ask questions about managing alerts in Domo and get answers from Domo peers.
Distribute
Ask questions about Domo Everywhere, Scheduled Reports, Mobile and get best practices from Domo peers
Domo Everywhere
Ask questions about embedded analytics with Domo Everywhere.
Reporting
Ask questions about Scheduled Reports, Report Builder, and Slideshow Publications.
Manage
Ask questions about Governance Administration, Approvals, Teams, Alerts, and Buzz and get best practices from Domo peers
Governance & Security
Ask questions about People, Groups, Roles, Sandbox, Activity log, Buzz, Teams, Approvals and PDP and get best practices from Domo peers
Navigation & Productivity
Ask questions about navigation, Projects & Tasks, Goals, and Buzz chat.
APIs
Ask APIs and Developer.domo.com questions and get answers from Domo peers
Add-ins & Plugins
Ask questions about plugins, Microsoft add-ins, and other third-party software integrations.
Domo Community Gallery
Watch how our Customers are using Domo to solve their complex problems.
Product Releases
Domo support and product teams are here to live-answer questions about the most recent product releases. Please post questions in this Forum board for all users to benefit (rather than submitting a support ticket).
Domo University
Questions or discussions related to Domo University, trainings and certifications
Community Forums
Getting Started
Welcome to Domo's Community Forums! You'll find everything you need to get started in this category.
Community Announcements
Get the latest from Domo's Community Team.
Social Groups
Archive
Old or outdated content that could still be found helpful.

Find more posts tagged with

Connectors

Data Sources

Data Flows