Replace with last non-null or non-zero in data flow

Anybody know how to use SQL to replace a null or zero value in a table with the last non-null or non-zero value? The table is layered by date--ex:

07-1-2014     5

10-1-2014     4 

01-1-2015     0

04-1-2015     0

07-1-2015     4

10-1-2015     4

 

My goal is to replace those 0s with whatever non-zero value preceeded it, so the result should look like this:

 

07-1-2014     5

10-1-2014     4 

01-1-2015     4

04-1-2015     4

07-1-2015     4

10-1-2015     4

Comments

  • Depending on the size of your dataset, you could use some nested queries to update the data. 

     

    Here's an example that should work for the sample data you provided, supposing the columns are called `Date_Column` and `Value_Column` and are on a table called table_name:

     

    SELECT
    `Date_Column`
    ,CASE
    WHEN `Value_Column` = 0
    THEN (
    SELECT
    `Value_Column`
    FROM
    table_name t2
    WHERE
    `Date_Column` = (
    SELECT
    MAX(`Date_Column`)
    FROM
    table_name t3
    WHERE
    `Date_Column` < t1.`Date_Column`
    AND `Value_Column` > 0
    )
    )
    ELSE `Value_Column`
    END AS `Value_Column`
    FROM
    table_name t1

     If your dataset is very large, this may take a while to run, but it should provide the functionality you're looking for.  

     

    For each record, it will check to see if the value is 0. If it is, it will search for the value associated with the max date that comes before the date for the record it's currently looking at. Also note that it will fail if you have multiple records with the same date. There are some ways to possibly fix that, though.

     

    Let me know if it helps!

  • kshah008
    kshah008 Contributor

    @steinsa, did zcameron's response help answer your question?

  • I haven't tried this solution yet. Instead I used a statement that casts a current and prior date against a current a prior value--when the current value is 0 or null, then it places the prior score in its stead.

     

    However, looking at zcameron's response, I think that is a better solution. I will probably switch it when I have enough time to validate success.

     

    Thanks!

  • I faced a similar problem and this code works, but takes really long for big datasets. Is there a more efficient code that can be applied to huge datasets. ?

  • Prajju,

     

    In MySQL, you could try using something like this, which uses replacement variables instead of self joins:

     

    SELECT
        `Date_Column`
        , (@LastValue := CASE WHEN IFNULL(`Value_Column`, 0) = 0 THEN @LastValue ELSE `Value_Column` END) AS `Value_Column`

    FROM
        table_name
    , (SELECT @LastValue := 0) lv

     

     

     

     

    I hope that helps!

  • This works.

    However I want to parition it on 2 columns. How to modify the code to achieve the same?

     

    Thanks,

    Prajwal

  • Prajwal,

     

    You'd want to first make sure the data is sorted by the two dimensions you're using as your partitions. Then you'll need to include two new replacement variables to keep track of the values in the dimension columns and mark when they've changed.

     

    The first transform would be to sort the original table:

     

    SELECT * FROM `propagate_test` ORDER BY `Category_Column`, `Date_Column`

     

    Call the output table sorted_table.

     

    Then the main transform could look like this:

     

    SELECT
        (@LastValue := CASE
                                      WHEN (`Category_Column` != @LastCategory OR `Date_Column` != `Date_Column`) THEN `Value_Column`
                                     WHEN IFNULL(`Value_Column`, 0) = 0 THEN @LastValue
                                     ELSE `Value_Column`
                                 END) AS `Value_Column`
        , @LastDate := `Date_Column` AS`Date_Column`
        , @LastCategory := `Category_Column` AS`Category_Column

    FROM
        table_name
        , (SELECT @LastValue := 0) lv
        , (SELECT @LastDate := NULL) ld
        , (SELECT @LastCategory := NULL) lc

     

     

     

    Test and tweak that to fit your needs. I hope it helps!