Add group by as an additional field

Hi everyone,

I am looking to add a column to a dataset where I join all my users and their orders (could be 1 or more) and get a count of how many orders an individual user has placed.  I pulled all the columns I want in my ETL dataflow, and then calculated this field using the "Group By" transformation.  I group by 'user_id' and aggregate my order_id's as a count. 

This gets me the right data, but then the only columns I am left with are the 2 from the Group By transformation, with all the others dropped (i.e., name, email, city, order info, etc.).  Is there a way to get the counts of each user's total orders and add this as an additional column where I have 1 row per distinct user_id?  I tried both an ETL dataflow and a SQL transform but nothing worked.  I can get the exact data I want going directly to the SQL database, but would like to do this directly in Domo and get a card created/updated automatically.

 

Thanks for the help!!

Best Answer

  • user09776
    user09776 Member
    Answer ✓

    It sort of worked, but still wasn't exactly what I was trying to accomplish. I solved this by using a dense_rank function.

Answers

  • This should be doable.  Would you mind sharing what your ETL looks like, and where you are running into issues?


    “There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
  • I am joining the data based on the user_id's in both tables.  Then I group by the user_id from the Users table and count the order_id's from the Orders table.  This drops all my other columns and leaves me with just the two columns that get pulled into the Group By transform.  I want the Count column to be added as an additional field while maintaining all my other data.

     

    Screen Shot 2018-09-14 at 3.12.34 PM.png
    Screen Shot 2018-09-14 at 3.13.45 PM.pngScreen Shot 2018-09-14 at 3.15.31 PM.png

  • Just add the extra columns in step 1 of the group by tile.

     

    Keep in mind that if any of the columns you add to that step contain different values, you should move it down to the second step and create an aggregate calculation 


    “There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
  • That defeats the purpose of grouping by the user_id.  That is still outputting multiple rows for an individual user_id.  If a user has 3 orders it displays 3 rows.  I want to consolidate this into 1 row.

  • Any column that you list in step 1 will cause a new row of data in your output for every unique combination.  For example, if you have several different product families that you sell to customers, you can use something like this to get total sales grouped by account and product family:1.png

     

     

     

    Or, if you remove "Product Family" from the first step, you can see total sales grouped only be customer:2.png

     

     

     


    “There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
  • Is there a way to add just a single Count aggregate column while also maintaining all the other data for a single user?  Some of the data might be different for various fields (such as order_date which is different for each order), but I want just 1 column for a user with a given name, email, etc. and all the other info relating to that row.

     

    I know it can be done easily with SQL, but looking to move away from that and make this reusable with a Domo card.

  • Yes, just don't inlcude order date in the list of fields.  Otherwise you will want to add order date to the second section and create a "group by" field that would show you the Max of the order date as the most recent order.

     

     


    “There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
  • 1.png


    “There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
  • I tried that and it still outputs multiple rows for each order.  I grouped by order_id and the order_date (taking the max(order_date) as most recent), but it still outputs 2 rows for a user with 2 orders.

  • If you group by Order ID then you will get a separate row for each order ID.  You want to group by Customer.  You can't have Order ID in your output if you are wanting to combine data from multiple orders.


    “There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
  • Basically, if you are wanting to group by customer, then you can list all Customer relevant fields (the fields that will always have the same value for the same customer) in step 1 of the group by tile.  All Order information will need to be aggregated and listed in section 2 of the group by tile.


    “There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
  • @user09766

     

    Were the latest replies from @ST_-Superman-_ helpful?

     

    Thanks!

  • user09776
    user09776 Member
    Answer ✓

    It sort of worked, but still wasn't exactly what I was trying to accomplish. I solved this by using a dense_rank function.

  • Can you share your solution so others may benefit from this thread?

    Thanks!

  • To clarify can you share the dense rank function in a reply?