Filter in magic ETL

vaco
vaco Member

Hi, I am trying to filter a dataset using magic ETL. The logic involves examining each id in the id column and then return the row with the highest rank. This will remove duplicate ids. I have attached images showing an example: before transformation and the desired output result (after). Can we achieve this using magic ETL? Thanks

Best Answers

  • david_cunningham
    edited May 9 Answer βœ“

    Yes - you can use the Rank and Window tile to create a rank based on your id. Then use the filter tile to return the rank you want. If you want the min id you can just sort ascending on your rank/window function and then filter to where rank=1. If you want the max id, you can sort descending and then filter to where rank=1. In your case, if you're wanting the min rank per group. You would need to partition by ID to start the rank over again for each id.

    For example. Input dataset

    The above is how you could set up your rank/window tile.

    Example output

    Filter to where rank = 1

    This output matches your example requested result.

    David Cunningham

    ** Was this post helpful? Click Agree πŸ˜€, Like πŸ‘οΈ, or Awesome ❀️ below **
    ** Did this solve your problem? Accept it as a solution! βœ”οΈ**

  • Sean_Tully
    Sean_Tully Contributor
    edited May 9 Answer βœ“

    I think David's reply is correct if you don't have a rank field yet. If the rank field already exists, you instead would have to group by the id to find the max(rank), and inner join that back to the data, with the join fields being id = id and rank = max(rank).

Answers

  • david_cunningham
    edited May 9 Answer βœ“

    Yes - you can use the Rank and Window tile to create a rank based on your id. Then use the filter tile to return the rank you want. If you want the min id you can just sort ascending on your rank/window function and then filter to where rank=1. If you want the max id, you can sort descending and then filter to where rank=1. In your case, if you're wanting the min rank per group. You would need to partition by ID to start the rank over again for each id.

    For example. Input dataset

    The above is how you could set up your rank/window tile.

    Example output

    Filter to where rank = 1

    This output matches your example requested result.

    David Cunningham

    ** Was this post helpful? Click Agree πŸ˜€, Like πŸ‘οΈ, or Awesome ❀️ below **
    ** Did this solve your problem? Accept it as a solution! βœ”οΈ**

  • Sean_Tully
    Sean_Tully Contributor
    edited May 9 Answer βœ“

    I think David's reply is correct if you don't have a rank field yet. If the rank field already exists, you instead would have to group by the id to find the max(rank), and inner join that back to the data, with the join fields being id = id and rank = max(rank).

  • vaco
    vaco Member

    Thanks @david_cunningham and @Sean_Tully . I actually have both cases one without the rank and another one with rank, so thank you both for the insightful answers. I was able to filter the data.πŸ˜ƒ