Replace with last non-null or non-zero in data flow
Anybody know how to use SQL to replace a null or zero value in a table with the last non-null or non-zero value? The table is layered by date--ex:
07-1-2014 5
10-1-2014 4
01-1-2015 0
04-1-2015 0
07-1-2015 4
10-1-2015 4
My goal is to replace those 0s with whatever non-zero value preceeded it, so the result should look like this:
07-1-2014 5
10-1-2014 4
01-1-2015 4
04-1-2015 4
07-1-2015 4
10-1-2015 4
Comments
-
Depending on the size of your dataset, you could use some nested queries to update the data.
Here's an example that should work for the sample data you provided, supposing the columns are called `Date_Column` and `Value_Column` and are on a table called table_name:
SELECT
`Date_Column`
,CASE
WHEN `Value_Column` = 0
THEN (
SELECT
`Value_Column`
FROM
table_name t2
WHERE
`Date_Column` = (
SELECT
MAX(`Date_Column`)
FROM
table_name t3
WHERE
`Date_Column` < t1.`Date_Column`
AND `Value_Column` > 0
)
)
ELSE `Value_Column`
END AS `Value_Column`
FROM
table_name t1If your dataset is very large, this may take a while to run, but it should provide the functionality you're looking for.
For each record, it will check to see if the value is 0. If it is, it will search for the value associated with the max date that comes before the date for the record it's currently looking at. Also note that it will fail if you have multiple records with the same date. There are some ways to possibly fix that, though.
Let me know if it helps!
0 -
I haven't tried this solution yet. Instead I used a statement that casts a current and prior date against a current a prior value--when the current value is 0 or null, then it places the prior score in its stead.
However, looking at zcameron's response, I think that is a better solution. I will probably switch it when I have enough time to validate success.
Thanks!
1 -
I faced a similar problem and this code works, but takes really long for big datasets. Is there a more efficient code that can be applied to huge datasets. ?
0 -
Prajju,
In MySQL, you could try using something like this, which uses replacement variables instead of self joins:
SELECT
`Date_Column`
, (@LastValue := CASE WHEN IFNULL(`Value_Column`, 0) = 0 THEN @LastValue ELSE `Value_Column` END) AS `Value_Column`FROM
table_name
, (SELECT @LastValue := 0) lvI hope that helps!
0 -
This works.
However I want to parition it on 2 columns. How to modify the code to achieve the same?
Thanks,
Prajwal
0 -
Prajwal,
You'd want to first make sure the data is sorted by the two dimensions you're using as your partitions. Then you'll need to include two new replacement variables to keep track of the values in the dimension columns and mark when they've changed.
The first transform would be to sort the original table:
SELECT * FROM `propagate_test` ORDER BY `Category_Column`, `Date_Column`
Call the output table sorted_table.
Then the main transform could look like this:
SELECT
(@LastValue := CASE
WHEN (`Category_Column` != @LastCategory OR `Date_Column` != `Date_Column`) THEN `Value_Column`
WHEN IFNULL(`Value_Column`, 0) = 0 THEN @LastValue
ELSE `Value_Column`
END) AS `Value_Column`
, @LastDate := `Date_Column` AS`Date_Column`
, @LastCategory := `Category_Column` AS`Category_ColumnFROM
table_name
, (SELECT @LastValue := 0) lv
, (SELECT @LastDate := NULL) ld
, (SELECT @LastCategory := NULL) lcTest and tweak that to fit your needs. I hope it helps!
1
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.9K Transform
- 100 SQL DataFlows
- 616 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 738 Beast Mode
- 57 App Studio
- 40 Variables
- 685 Automate
- 176 Apps
- 452 APIs & Domo Developer
- 47 Workflows
- 10 DomoAI
- 36 Predict
- 15 Jupyter Workspaces
- 21 R & Python Tiles
- 394 Distribute
- 113 Domo Everywhere
- 275 Scheduled Reports
- 6 Software Integrations
- 124 Manage
- 121 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 10 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 108 Community Announcements
- 4.8K Archive