I have ingested a data which is regarding the Employee HC. Its per month. I have uploaded first 6 months of data in my ingestion dataset. In my dataset, I have used the append method to update.
I have to prepare an ETL to power a new dataset. The dataset is supposed to have data for all months of all clients. The ingestion files for latest month gets pushed 4 times a month and it can have same employee information, which is a duplicate in upload. So I am wanting to create an ETL, which can get the data of the employee for the last updated date.
Below is the sample.
in the month 1/2/2024 , we have uploaded data for employee ID 200 twice, but in the final output dataset i am wanting to have the row no. 7 only for employee id 200 because its the latest upload for that employee.
I tried to use the Group by tile to group and get the max date of the data but my results are not as expected. I am new to etl and learning, please suggest.