How to delete rows from an input dataset and replace the dataset

I have an input dataset, on which I want to perform an operation to highlight which records are obsolete, and then permanently delete those records from the input dataset. 

 

I was trying to use SQL transformations, but I cannot make the output dataset the same as the input dataset. Could someone help?

Comments

  • The solution is a partial recurive flow but in your case, you aren't maintaining history rather, delete/replacing the input with the output.

     

    I've created a flow to demonstrate how you can do this using simple SQL flows adding a constant that will distinguish which rows are obsolete vs. not.

     

    Let me know if this helps.

     

    Test DataTest Data

     

    Include a flag/constant that will help you distinguish which rows are obsoleteInclude a flag/constant that will help you distinguish which rows are obsoleteChoose the Custom SQLChoose the Custom SQLDelete rows which are ObsoleteDelete rows which are Obsoleteuse this output as your new inputuse this output as your new input

  • hamza_123
    hamza_123 Contributor

    Thank you for the detailed explanation. I did not know about the DELETE command, and that was very helpful. 

     

    While this is a significant improvement from what I thought I could do, there's still this issue that remains:

     

    The new data that comes in is never obsolete. Its only after a certain time that it becomes obsolete. That means that I cannot instantly do operations on incoming data, rather I have to store it in a dataset, and keep running delete statements on that to remove unwanted data. 

     

    So while my end output in purged, there is this huge dataset in the middle that I still have to maintain. 

     

    Any idea on how I can reduce the size of the middle dataset as well?