Get latest data from Ingestion in ETL grouped by month.

I have ingested a data which is regarding the Employee HC. Its per month. I have uploaded first 6 months of data in my ingestion dataset. In my dataset, I have used the append method to update.

I have to prepare an ETL to power a new dataset. The dataset is supposed to have data for all months of all clients. The ingestion files for latest month gets pushed 4 times a month and it can have same employee information, which is a duplicate in upload. So I am wanting to create an ETL, which can get the data of the employee for the last updated date.

Below is the sample.

in the month 1/2/2024 , we have uploaded data for employee ID 200 twice, but in the final output dataset i am wanting to have the row no. 7 only for employee id 200 because its the latest upload for that employee.

I tried to use the Group by tile to group and get the max date of the data but my results are not as expected. I am new to etl and learning, please suggest.

Tagged:

Best Answer

  • ColemenWilson
    Answer ✓

    No, you would use two columns to identify the grouping as shown below:

    If I solved your problem, please select "yes" above

Answers

  • ColemenWilson
    edited July 22

    You have the right idea. Group By tile grouping by Employee ID and Month and aggregating max date. Is that what you've done? Could you share a screenshot of your group by configuration?

    If I solved your problem, please select "yes" above

  • MayaU_01
    MayaU_01 Member

    So do I need to use two group by tiles to first aggregate the month and then the Employee id?

  • ColemenWilson
    Answer ✓

    No, you would use two columns to identify the grouping as shown below:

    If I solved your problem, please select "yes" above