Is it possible to use while/do loops in MySQL in Domo Dataflows?

user013818
user013818 Member
edited March 2023 in SQL DataFlows

I have data that has varying rows of data for each identifier, from one row to 26. For each identifier, I have four different columns that change with the various rows. For each new identifier, I want to loop through the rows that follow that have the same identifier and build the row out with the unique values in these four columns. I cannot get a while/do loop to run either as part of a select statement for a table query or as an SQL transform query. For each iteration, I would like to have unique column names (dynamically) for the unique sets of values.

Answers

  • @user013818 ,

    DO/WHILE loops are EXTREMELY inefficient in SQL (and therefore at best we'll call it an Anti-Pattern).

    Window functions will approximate DO/WHILE behavior.

    Can you be more specific about what you're trying to accomplish?

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • I have an identification number for a proposal, Proposal ID, and for that proposal there are various possibilities. These appear in separate records. So, for ID number 1 there are four different columns that can have different values that are either text or numeric. Here is an example:

    ID Col1 Col2 Col3 Col4

    1 A B C D

    1 E F G H

    1 J K L M

    These would need to be combined into a single record with new columns names:

    ID Col1-1 Col2-1 Col3-1 Col4-1 Col1-2 Col2-2 Col3-2 Col4-2 Col1-3 Col2-3 Col3-3 Col4-3

    1 A B C D E F G H J K L M

    There may be only one set of four variables. There may be a dozen. You don't know, but you need to get all the possibilities on a single record for that particular identification number. Then you go to the next ID number.

    I wanted to, basically, upon reading an ID number, loop through the following records with the same ID number to build out the record for that ID number, then do the the same for the next new ID number, and so on through the file. I have seen the number of combinations vary from 1 to 37, so far.

    Hope that helps.

  • You might look into Dynamic Pivot and see if that would work for your needs.


    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • @user013818

    i'm still not sure i understand your use case...

    but it sounds like, you need to generate a dataset that for each id, you

    identify all the possible values for each of the four variables and then one-hot-encode them into a matrix, like the example below, only except just having one column (color) you can have many columns.


    Magic 2.0 can dynamic pivot, (collapse what you see on the right into what you see on the left, but it cannot go the other way around and dynamically create columns for you.


    on most days of the week, the schema on the left will be easier to build viz against, so I'm not entirely sure why you're trying to go to the one-hot-encoded model, unless you're actually trying to create a term document frequency matrix, in which case python and R are better suited (and capable tools for the job) you can use the PyDomo SDK to pull the data into python to perform the transform and then push it back into Domo or inquire about the Domo Jupyter Notebook Integration or Data Science ETL tiles.


    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • So, let's say that, rather than a color column, we two variable columns that are free form and can contain anything text. Let's also add two numeric columns that can contain any number. Our friend ID 1 has (today) four records that are comma delimited, in the form ID, var1, var2, num1, num2, as follows:

    1,house,Polly,300,0

    1,California,blue,575,100

    1,"Number 1","Race driver",20,000,5000

    1,turtle,speed,0,3000

    Once we read that we're dealing with ID 1 we need to read all the rows with that ID and build the data into a single row:

    1,house,Polly,300,0,California,blue,575,100,"Number 1","Race driver",20,000,5000,turtle,speed,0,3000

    and we need to name the columns as ID and col1 through however many there are, in this case col16.

    My idea was to

    while ID = 1 do

    read var1, var2, num1, num2

    end while

  • I think this does what you want with an R scripting tile:

    library('tidyr')
    library('dplyr')
    # Import the domomagic library into the script.
    library('domomagic')
    
    # read data from inputs into a data frame
    input1 <- read.dataframe('domo_do_while')
    
    # write your script here
    input1 %>%
        tidyr::gather(
            key = 'name',
            value = 'value',
            -'id'
        ) %>%
        filter(!is.na(value)) %>%
        distinct(id, value) %>%
        arrange(id, value) %>%
        group_by(id) %>%
        mutate(
            name = paste0('col', row_number())
        ) %>%
        tidyr::spread(
            key = name,
            value = value
        )
    
    # write a data frame so it's available to the next action
    write.dataframe(input1)
    


  • Here is something to try in Magic ETL. If you have Magic ETL 2.0 it will run really fast for you. Here is a screenshot of generally what it would look like and then I will explain it below.

    Look at your data and determine what is the most columns that a row has, then add that many filter tiles minus one. In my example, I am assuming my largest row has 10 columns, so I will add 9 filter tiles. In each filter tile, you filter to get rows with that many columns. For example, for "Row with 10 values", I am going to use a filter rule where column10 is not null. In the "Row with 9 values" filter tile, I am going to say where column10 is null and column9 is not null. I keep doing this all the way down to 2 columns.

    I then use a Select Columns and Remove Duplicates tile to get a distinct list of IDs. I then start left join the ID list to each of the filter tiles. This will give me one row for each ID and their columns continually added to the dataset.

    This should get you what you want. Hope this helps.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • Here is an idea for you to try using Magic ETL, and if you are using 2.0, it will run very fast. Here's a sample visual of what I am proposing for you and will explain it further below.

    Look at your data and determine what the maximum number of columns are. In my example, I am assuming the largest row has 10 columns, so I am going to add 9 filter tiles. Each tile will have a filter rule that gets the data for rows with that many columns. For instance, for "Filter Rows with 10 values" I am going to use a filter rule that says where column10 is not null. For "Filter Rows with 9 values", I am going to create a filter rule that says where column10 is null and column9 is not null. I keep on doing this all the way down.

    I use the select columns tile and the remove duplicates tile to get a distinct list of IDs that are in the dataset. I then start left joining the IDs list with each filter tile. This will result in 1 row per ID and all of the columns continually being added to it after each join.

    Hope this helps.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • bdavis
    bdavis Contributor

    Correct me if I'm wrong, but if your rows per ID can technically be an infinite amount, then the only way to do this would be using something like R or Python and the resulting set would need to be constantly overwritten because your schema would change all the time. For example, today the max rows per ID is 12, tomorrow 5, the next day 33, etc. It would also mean that your visualizations potentially don't show all the data all the time due to new columns not being a part of the viz.

    If this is a one-time thing that doesn't need to update on a regular basis without human interaction it would be ok, but to update automatically and feed a card - I'm not sure there's a good solution. If there's a maximum number of rows per ID you can work with that number and handle the missing data appropriately.