How to use Rank function to fill in the data

Options
jp2631
jp2631 Member
edited August 2023 in Magic ETL

There are two tables, the Existing Pay/Bill Table and the EditHistoryPlacement Table, that I wish to join together using the PlacementID column in order to generate the data in the EXPECT Existing Pay/Bill Table. I'm considering the use of the RANK function (with dateAdded… as a parameter) and the PARTITION clause (with PlacementID) on the EditHistoryPlacement Table. Once Rank and Partition are established, my plan is to use a CASE statement. This statement will compare the rank and dateAdded to the PeriodEndDate, allowing me to populate the PONumber… column accordingly.

Rules populate the PONumber column: (Let's use PlacementID 1234 as an example)

  1. If Rank equals 1 and based on the dateAdded.. column, look for dates in the PeriodEndDate column that are less than the date in dateAdded… Then, fill the PONumber_atPeriodEndDate column with the value from the oldValue column.
  2. If Rank equals 2, apply the same rule as in Rule 1 and fill the PONumber_atPeriodEndDate column with the value from the newValue column for the PeriodEndDate column that is greater than the date in dateAdded
  3. If dateAdded equal to PONumber_atPeriodEndDate then populate the PONumber_atPeriodEndDate column with the value from the newValue column.

I hope the data below and the above explanation are helpful. Honestly, I don't know how this is going to work. I've been playing around with the ETL, but ranking the data in the EditHistoryPlacement Table is the furthest I've gotten. I would appreciate it if anyone could help.

Best Answer

  • marcel_luthi
    marcel_luthi Coach
    Answer ✓
    Options

    While it sounds simple, this question does have additional complexity to it, specially since you're trying to contemplate that if an EditHistory entry with date prior to the PeriodEndDate does not exists you'll take the old value of the first EditHistory entry after the PeriodEndDate, this might be achievable via ETL using the and a mix of FIRST_VALUE and LAST_VALUE, but since you still want to consider if the actual future old value is null, you will also need to do a count on how many entries you have before and the PeriodEndDate, this so that you'll know which to use, so your ETL might end up looking something like:


    This is still not perfect as if at some point something gets changed to NULL/BLANK using the NON_NULL values won't work, but in general if there are no PriorHistory Entries, you would use the ClosesOldValue (future looking) and when there is a historical entry use the LatestNewValue, again with the issues surrounding null values. Just an idea.

Answers

  • trafalger
    Options

    Maybe I'm not understanding your question, but not sure if you need to use rank here, in MySQL I think you would just join on PlacementID and something like "existing date < = DateAdded". To do this in magic you could use a common key (like placementID), join each row to all rows and then filter them out based on dates, but not 100% certain I'm following your use case.

    Something like this:

  • jp2631
    Options

    @trafalger Maybe my explanation will confuse a lot of people, but I expect the end result to look like the data in the table EXPECT Existing Pay/Bill Table

  • trafalger
    Options

    So yup, I'd still recommend the approach in my first comment!

  • jp2631
    Options

    @trafalger I tried your suggestion, which works fine for one PlacementID and one dateAdded.. in the EditHistoryPlacement Table. However, when there are multiple dateAdded.. entries for one PlacementID in the EditHistoryPlacement Table, I end up getting multiple rows returned. This is why I thought I should use the ROW RANK.

    Let's take PlacementID 1234 as an example. If I use the following CASE statement:

    CASE 
    WHEN `periodEndDate` < `dateAdded` THEN `oldValue`
    WHEN `periodEndDate` > `dateAdded` THEN `newValue`
    END

    I get the following data below. As you can see, the entry for 9-Jul has two rows for PONumber_atPeriodEndDate (123, 456). My goal is to achieve the final result resembling the data in the yellow table above.

  • marcel_luthi
    marcel_luthi Coach
    Answer ✓
    Options

    While it sounds simple, this question does have additional complexity to it, specially since you're trying to contemplate that if an EditHistory entry with date prior to the PeriodEndDate does not exists you'll take the old value of the first EditHistory entry after the PeriodEndDate, this might be achievable via ETL using the and a mix of FIRST_VALUE and LAST_VALUE, but since you still want to consider if the actual future old value is null, you will also need to do a count on how many entries you have before and the PeriodEndDate, this so that you'll know which to use, so your ETL might end up looking something like:


    This is still not perfect as if at some point something gets changed to NULL/BLANK using the NON_NULL values won't work, but in general if there are no PriorHistory Entries, you would use the ClosesOldValue (future looking) and when there is a historical entry use the LatestNewValue, again with the issues surrounding null values. Just an idea.