Datasets join issue

I'm trying to create a card that can be used to present to budget owners how much they have currently spent for the month compared to the budget they've been given. In order to do so, I have to create a dataflow that involves joining 2 datasets; One is Netsuite, which stores all the journal entry transactions, and the other is Adaptive, our forecasting software that stores the monthly budget for each dept.

The error I'm running into is when I'm joining the depts. I'm joining by the department name from both Netsuite, and Adaptive. It runs, but when creating a card, (I'm using table), the sum of amounts for both the budget for the month, and what has been spent are in the billions for all GL accounts. This leads me to believe that when joining the two datasets, bunch of duplicate rows are being created. I've tried all 4 types of joins and it's the same issue each time. When keeping these datasets separate, the cards I create are fine (Budget info shows correct values, and same with what's been spent for the month)

Answers

  • It sounds like your Netsuite data is transactional level, and your Adaptive data is monthly. You need to aggregate your Netsuite data to monthly prior to joining with your Adaptive data. You can do this with a group by tile prior to your joining tile. You will need to join on department and month/year, not just department. Whatever your date field is in your Adaptive data, you will need to create the same in your Netsuite dataflow. There are several ways to do this with a formula tile, but before suggesting too much, let me know if this makes sense and is in line with how your data is structured.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • GrantSmith
    GrantSmith Coach
    edited March 2022

    Make sure you’re filtering your data after your join so the month of the journal entry transaction is the same as the month of the budget forecast.

    You could do something like this in a conditional filter:

    LAST_DAY(`Adaptive Date`) = LAST_DAY(`Netsuite Date`)
    

    LAST_DAY is a function that returns the last date in the month (2022-03-31 for 2022-03-15 for example)

    It's a shorter way of doing:

    YEAR(`Adaptive Date`) = YEAR (`Netsuite Date`) AND MONTH(`Adaptive Date`) = MONTH(`Netsuite Date`)
    
    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • mhouston
    mhouston Contributor

    It sounds like you have different data granularity in your datasets and that is causing the the duplicate rows. It sounds like your budget data granularity is month/department and your actuals are date/dept/account(?). If you're just joining on department, you're probably getting a cartesian join of every month X every transaction.

    You could fix the join by either aggregating your datasets to the same level (i.e. group by on your actuals to get a monthly spend and then join on dept and month for the two datasets).

    I would actually do an append instead of a join - this will provide more flexibility, and you can use beast modes in your cards to report as needed but it gives you the flexibility to still be able to drill down to transactional level information if you need to. @jaeW_at_Onyx has a great youtube video that outlines the general methodology https://www.youtube.com/watch?v=PVbOeLSae9o

  • utkarsh
    utkarsh Member
    edited August 2022

    Hi @mhouston/@MarkSnodgrass and others on this thread.

    I am looking for a similar solution. I have columns as below

    Input 1: Dept, Date, Numerical Column1

    Input 2: Dept, Numerical Column2

    When creating a join I am getting a cartesian product. How can I avoid this?

    I want to do a left join as it has more records and have numerical column2 beside column1 as you would expect. Input2 mainly has certain mapping values that need to be associated with the rest of the data. Sorry cannot share the actual data here. Please let me know if this doesn't make sense or you need some more context.

    TIA for your help!

  • mhouston
    mhouston Contributor

    @utkarsh what are you expecting your output to look like? Do you have multiple rows per department in input 2? are numerical column 1 and column 2 related in any way?

    I'm assuming you are doing input 1 left join input 2 on department - is that your setup?

  • thanks for the shoutout @mhouston

    Strong agree to the UNION approach instead of JOIN.

    @utkarsh it looks like your dataset 2 needs some sort of date column. if each month everyone has the same budget, then duplicate the rows of Input 2 once for each month and add the end_of_month_date column as @GrantSmith describes before UNIONing the data.

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • @mhouston I do not have multiple rows per dept in input 2 but I have multiple rows per dept in input 1. Numerical column 1 and 2 are indirectly related. Basically Column2 is a ratio used to calculate a new value further in the ETL based on column 1 and column 2.

    Yes, I am doing input 1 left join input 2 on department.

    @jaeW_at_Onyx I am unable to add column directly in data unless there is a way to do it in domo.

  • mhouston
    mhouston Contributor

    @utkarsh when you say you are getting a cartesian join what does your output look like vs what are you expecting?

    based on what you've said, I'd expect your output has 1 row for every row in input 1.

    so if you have two rows for department A in input 1, and 1 row for department A in input 2, your output will have two rows for department A.



  • utkarsh
    utkarsh Member
    edited August 2022

    That is correct, but when I do groupby dept and sum for column1 before and after join are different which shouldn't be the case ideally right? I am definitely missing something here.

    Really appreciate your help here @mhouston

  • mhouston
    mhouston Contributor

    @utkarsh if you group input 1 by department and sum column 1, i would expect that grouping your output by department and summing column 1 would have the same number of rows and the same values from the grouping. Unless your department column in your output is coming from input 2 - then you would have nulls in that column. Are you able to post a snippet of your ETL logic so we can better understand what you're doing?

  • @mhouston happy to report that I was infact able to find the root cause of the problem.

    The excel input is input 2 in my case and the other ETL is input 1. I removed an additional text field that was creating duplicates and inflating the numbers. Sorry, I should have mentioned about this in above comments. But the first join is to create a same text fields in both data sets and next join is actually where the numbers add up as expected.

    Thanks a lot for your help!