[BigQuery data] Delete data from certain days

Options

Hi Guys,

 

I have a dataset that imports data from BigQuery everyday. Every morning a query is scheduled to get data from the previous day and append it to the data set. 

Unfortunately, data from 1//26/2020 and 1/27/2020 was incomplete at the time the query ran and so the data I have in DOMO for these two days is wrong. 

Is there a way to delete the data for these two days from the dataset so I can rerun the query and append the correct data to the dataset? 

The dataset is very big so ideally I'd like to avoid replacing the whole data. 

Thank you for your help!

Julien

Comments

  • bdavis
    bdavis Contributor
    Options

    When this happens to us, I will go into the dataflow that's appending the data and add a transform (SQL) to delete the rows I need to delete, or add a filter (ETL) to filter out the rows I want to remove. Then, after I run the dataflow once, I go in and delete the filter/transform. If the rows that came in for those dates would be duplicated if you reimported those dates and you're using SQL, using a UNION will remove the duplicate rows. I'm unsure about ETL, but I believe there's a remove duplicates function in there as well.

     

    EDIT: Here's the SQL you'd use:

    DELETE FROM <TABLE_NAME> WHERE <COLUMN_NAME> <CONDITION ie: = 1, <=5, IN ('list','of','things')>