Prioritize one set of data over a duplicate set of data, when available.

I have similar data coming in from two different sources. I need to make sure that when dataset A is available, it is used, and when A is not available, B is used. I want to end up with one row of data per day per website, but I either end up with only one day's worth of data per website (Hostname in my data), or I end up with data from both Datasets A and B per day.

I'm not sure where I'm going wrong. Here's what I've got in my ETL

  1. Added a column called Priority with a Constant value of 1 to the data in Dataset A and a Constant value of 2 to Dataset B

2. After appending rows from each dataset, set up a Rank column, where the new field, Priority, is ranked

3. Then filter rows so that I only pull in those with a ranking of 1.

So the problem I do is:

If I also set up my rank by date also, it only pulls in the first date and skips all other days. If I ignore date, it pulls in all the rows of data.

What am I missing?

Tagged:

Best Answers

  • ArborRose
    ArborRose Coach
    Answer ✓

    Sounds like you need to do a left join on the matching tables and then a calculated field that says

    {columnname}:

    CASE WHEN {value in A} IS NOT NULL and {value in A}<>'' THEN {value in A} ELSE {value in B} END

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • KristinDavis
    KristinDavis Member
    Answer ✓

    Thank you. Unfortunately I really needed an append here. I was finally able to solve it by making the Priority column the ranked value, and the date column making up the partition.

Answers

  • ArborRose
    ArborRose Coach
    Answer ✓

    Sounds like you need to do a left join on the matching tables and then a calculated field that says

    {columnname}:

    CASE WHEN {value in A} IS NOT NULL and {value in A}<>'' THEN {value in A} ELSE {value in B} END

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • KristinDavis
    KristinDavis Member
    Answer ✓

    Thank you. Unfortunately I really needed an append here. I was finally able to solve it by making the Priority column the ranked value, and the date column making up the partition.