I have a file that has Date, Batch_Id and Active users. I need to keep the latest row in the file

Sbhatia · 2023-10-16T13:35:19+00:00

There was an error rendering this rich post.

Sbhatia

I am trying to do this with magic ETL but I am not having much luck.

Find more posts tagged with

Magic ETL

Accepted answers

GrantSmith

I'm assuming by the latest row you mean the one with the most recent date and not the last row in your dataset.

Using Magic ETL - feed your dataset into an add constant tile, add a constant called "Join Column" with a value of 1, feed that into a group by tile and group by this new field selecting the Maximum date, then take this output and using a Join Tile do an inner join on your original dataset and the grouped dataset based on your Max date = date.

MarkSnodgrass

You can use the group by tile and then use batch_id and choose Max for your aggregate. Put date and user_id in your select list.

All comments

GrantSmith

I'm assuming by the latest row you mean the one with the most recent date and not the last row in your dataset.

Sbhatia

Thanks so much for your quick response, maybe I should have been slightly clearer in my question.
My file has multiple duplicate rows with the same date, so for example 20/09/23 could appear 4 times.
I would like to keep the row for each date that has the maximum batch_id.

GrantSmith

You'd do the same thing above except use the batch id instead of the date to get the latest batch ID records.

MarkSnodgrass

You can use the group by tile and then use batch_id and choose Max for your aggregate. Put date and user_id in your select list.

Sbhatia

Thanks both for your responses, you have been a great help.