How to solve this problem in Magic ETL - Link items to Header rows

Hi All,


Currently we have a dataset which have records in this format:



First row is the header record and next three rows are items of the this. but there is no link.


I have ranked the dataset and the used a Lag function to join the header and item. Then I have performed the self-join to get Header information to First Item.


How can I copy the same information (from header) to other items?


Is there any easier way to achieve this

Thanks in advance

Regards,

Sandeep

Best Answer

  • Billobi
    Billobi Contributor
    Answer ✓

    Hi Skhatri, that certainly is a limited ad platform dataset. You can join header data to the daily ad rows even though they're only associated by row placement, but it's heavy on the assumption that these sort of daily rows repeat in this same fashion throughout the input.

    Here is my start dataset and output dataset:



    Here are the transform steps:

    1 - set a constant of 1 (this is used in both the Row Number function and the join)

    2 - use Rank & Window tile to get RowNum based on the constant 1.

    3 - split the headers and daily rows into two opposing filters. Based on your sample I looked for the "-" character in the Campaign column.

    4 - in the top header path, do a LEAD function on RowNum. Now you have the header's original RowNum as well as the RowNum associated with the next header.

    5 - the HeaderLead of the last header will be null, so set it to something wildly high for your purpose (ex. 1,000,000). That way you have something to filter between later.

    6 - join header and daily rows back together on the constant 1. This is a lazy/greedy join that puts all daily rows on all headers.

    7 - in the last filter, include rows where the daily RowNum is greater than the Header Start Row number AND daily RowNum is less than HeaderLead value. That's saying, only keep daily rows that are between their header and the next header.

    8 - any column cleanup, rearrange, you want in the last select tile. Might need to change your date from a string into a date too depending on how that came in from your input.

    Couple key screenshots along with the whole transform layout -

    Here's step 5, shows how you get a range on the header that can be used to filter down the day rows:

    Step 6, showing the daily RowNum in between the header start (the header's original RowNum) and header lead values. You can see for the last ad/campaign header I set it's lead value to 1,000,000 to allow the filter to work.


Answers

  • Billobi
    Billobi Contributor
    Answer ✓

    Hi Skhatri, that certainly is a limited ad platform dataset. You can join header data to the daily ad rows even though they're only associated by row placement, but it's heavy on the assumption that these sort of daily rows repeat in this same fashion throughout the input.

    Here is my start dataset and output dataset:



    Here are the transform steps:

    1 - set a constant of 1 (this is used in both the Row Number function and the join)

    2 - use Rank & Window tile to get RowNum based on the constant 1.

    3 - split the headers and daily rows into two opposing filters. Based on your sample I looked for the "-" character in the Campaign column.

    4 - in the top header path, do a LEAD function on RowNum. Now you have the header's original RowNum as well as the RowNum associated with the next header.

    5 - the HeaderLead of the last header will be null, so set it to something wildly high for your purpose (ex. 1,000,000). That way you have something to filter between later.

    6 - join header and daily rows back together on the constant 1. This is a lazy/greedy join that puts all daily rows on all headers.

    7 - in the last filter, include rows where the daily RowNum is greater than the Header Start Row number AND daily RowNum is less than HeaderLead value. That's saying, only keep daily rows that are between their header and the next header.

    8 - any column cleanup, rearrange, you want in the last select tile. Might need to change your date from a string into a date too depending on how that came in from your input.

    Couple key screenshots along with the whole transform layout -

    Here's step 5, shows how you get a range on the header that can be used to filter down the day rows:

    Step 6, showing the daily RowNum in between the header start (the header's original RowNum) and header lead values. You can see for the last ad/campaign header I set it's lead value to 1,000,000 to allow the filter to work.


  • Billobi - Thanks a lot for your solution. You have challenged your mind to make it possible. Great work


    I had a time constraint due to project deadline , so I had implement the same with Python script tile.


    sample code

    input1["Ad run date"] = input1["ad name" ]

    for i in Range(Len(input1)

    if input1["Campaign"][i] = '-'

    j = i -1

    input1["Ad name"][i] = input1["ad name"][j]

    else

    input1["Ad run date"][i] = ''