Hi everyone,
I have two columns of data with similar values, e.g., Duck, Goose, Horse, etc. I am agnostic to which column it is in, I would like to know the net change over the previous year (preferably in a heat map table).
Here is an example of what the data may look like:
Column1 | Column2 | Column3 | Column4 |
Identifier-123 | 2018 | Duck | Goose |
Identifier-123 | 2018 | Duck | Goose |
Identifier-456 | 2018 | Horse | |
Identifier-789 | 2018 | Unrelated Information |
Identifier-789 | 2018 | Unrelated Information |
Identifier-012 | 2017 | Unrelated Information | Duck |
The resulting output I expect to see:
(Duck ends up being 0 because 1 in current year minus 1 in previous year, Goose is 1 because the rows of data duplicate)
What I have tried so far is creating a calculation similar to this:
(CASE WHEN `Column3`='Duck' THEN COUNT(DISTINCT `Column1`)+COUNT(DISTINCT (CASE WHEN `Column3`<>'Duck' AND `Column4`='Duck' THEN`Column1` end))
This works great initially however it breaks down when I start to throw dates to see a year over year comparison. I've tried this:
(CASE WHEN `Column3`='Duck' AND `Column2`='2018' THEN COUNT(DISTINCT `Column1`)+COUNT(DISTINCT (CASE WHEN `Column3`<>'Duck' AND `Column4`='Duck' AND `Column2`='2018' THEN `Column1` end))-(CASE WHEN `Column3`='Duck' AND `Column2`='2017' THEN COUNT(DISTINCT `Column1`)-COUNT(DISTINCT (CASE WHEN `Column3`<>'Duck' AND `Column4`='Duck' AND `Column2`='2017' THEN `Column1` end))
The calculation appears to be valid but the resulting visualization fails to load unless I list Column2 as a displayed column which breaks aggregation.
Any recommendations on how I can approach this? Expressions or operators I should look into?
In short: I need to know how many unique identifiers contained 'Ducks' this year and last year in either column and find the difference.
Thank you for any time anyone can spare, I appreciate any help or advice!