Magic ETL

Magic ETL

Prioritize one set of data over a duplicate set of data, when available.

I have similar data coming in from two different sources. I need to make sure that when dataset A is available, it is used, and when A is not available, B is used. I want to end up with one row of data per day per website, but I either end up with only one day's worth of data per website (Hostname in my data), or I end up with data from both Datasets A and B per day.

I'm not sure where I'm going wrong. Here's what I've got in my ETL

  1. Added a column called Priority with a Constant value of 1 to the data in Dataset A and a Constant value of 2 to Dataset B

2. After appending rows from each dataset, set up a Rank column, where the new field, Priority, is ranked

3. Then filter rows so that I only pull in those with a ranking of 1.

So the problem I do is:

If I also set up my rank by date also, it only pulls in the first date and skips all other days. If I ignore date, it pulls in all the rows of data.

What am I missing?

Tagged:

Best Answers

  • Coach
    Answer ✓

    Sounds like you need to do a left join on the matching tables and then a calculated field that says

    {columnname}:

    CASE WHEN {value in A} IS NOT NULL and {value in A}<>'' THEN {value in A} ELSE {value in B} END

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • Member
    Answer ✓

    Thank you. Unfortunately I really needed an append here. I was finally able to solve it by making the Priority column the ranked value, and the date column making up the partition.

Answers

  • Coach
    Answer ✓

    Sounds like you need to do a left join on the matching tables and then a calculated field that says

    {columnname}:

    CASE WHEN {value in A} IS NOT NULL and {value in A}<>'' THEN {value in A} ELSE {value in B} END

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • Member
    Answer ✓

    Thank you. Unfortunately I really needed an append here. I was finally able to solve it by making the Priority column the ranked value, and the date column making up the partition.

Welcome!

It looks like you're new here. Members get access to exclusive content, events, rewards, and more. Sign in or register to get started.
Sign In