Label ranked rows based on consecutive condition

Options

Hi Folks
I'm looking to label multiple rows but only when they consecutively reach 8 or more Rank criteria.

I'm ordering the rank on the date, (magic ETL) and partitioning by rule2_hours. When the Rank reaches 8 or more, I want each row leading up to the highest number to be labeled - see the colour field.

All feedback appreciated

Kind regards,

Best Answer

  • ColemenWilson
    edited June 2023 Answer ✓
    Options

    @NateBI I was able to accomplish what you are looking to do using a Redshift Dataflow. If you don't currently have Redshift dataflows enabled you can contact your CSM, or you could attempt this same logic in Magic.

    First, I created a new field that gives the order of the data, similar to rank but for every row in sequential order.

    Second, I used this case statement in the Dataflow:

    CASE
    WHEN "rank" = 1 AND lead("rank", 7) over (partition by "rule2_hours" order by "order") = 8 THEN 'Colour'
    WHEN "rank" = 2 AND lead("rank", 6) over (partition by "rule2_hours" order by "order") = 8 THEN 'Colour'
    WHEN "rank" = 3 AND lead("rank", 5) over (partition by "rule2_hours" order by "order") = 8 THEN 'Colour'
    WHEN "rank" = 4 AND lead("rank", 4) over (partition by "rule2_hours" order by "order") = 8 THEN 'Colour'
    WHEN "rank" = 5 AND lead("rank", 3) over (partition by "rule2_hours" order by "order") = 8 THEN 'Colour'
    WHEN "rank" = 6 AND lead("rank", 2) over (partition by "rule2_hours" order by "order") = 8 THEN 'Colour'
    WHEN "rank" = 7 AND lead("rank", 1) over (partition by "rule2_hours" order by "order") = 8 THEN 'Colour'
    WHEN "rank" >= 8 THEN 'Colour' ELSE 'No Colour'
    END as "testing"

    And it worked! See results below:


    This was fun, thanks for the challenge. If this solved your problem please accept this as the answer, if not let me know and I'll attempt the same thing in Magic ETL.

    If I solved your problem, please select "yes" above

Answers

  • MichelleH
    Options

    @NateBI It looks like you're on the right track. You should be able to add a color rule to your card based on the Rank field to add a color when Rank is greater than or equal to 8.

  • ColemenWilson
    edited June 2023 Answer ✓
    Options

    @NateBI I was able to accomplish what you are looking to do using a Redshift Dataflow. If you don't currently have Redshift dataflows enabled you can contact your CSM, or you could attempt this same logic in Magic.

    First, I created a new field that gives the order of the data, similar to rank but for every row in sequential order.

    Second, I used this case statement in the Dataflow:

    CASE
    WHEN "rank" = 1 AND lead("rank", 7) over (partition by "rule2_hours" order by "order") = 8 THEN 'Colour'
    WHEN "rank" = 2 AND lead("rank", 6) over (partition by "rule2_hours" order by "order") = 8 THEN 'Colour'
    WHEN "rank" = 3 AND lead("rank", 5) over (partition by "rule2_hours" order by "order") = 8 THEN 'Colour'
    WHEN "rank" = 4 AND lead("rank", 4) over (partition by "rule2_hours" order by "order") = 8 THEN 'Colour'
    WHEN "rank" = 5 AND lead("rank", 3) over (partition by "rule2_hours" order by "order") = 8 THEN 'Colour'
    WHEN "rank" = 6 AND lead("rank", 2) over (partition by "rule2_hours" order by "order") = 8 THEN 'Colour'
    WHEN "rank" = 7 AND lead("rank", 1) over (partition by "rule2_hours" order by "order") = 8 THEN 'Colour'
    WHEN "rank" >= 8 THEN 'Colour' ELSE 'No Colour'
    END as "testing"

    And it worked! See results below:


    This was fun, thanks for the challenge. If this solved your problem please accept this as the answer, if not let me know and I'll attempt the same thing in Magic ETL.

    If I solved your problem, please select "yes" above

  • NateBI
    NateBI Member
    Options

    @MichelleH thank you, this would be plan B since it would only highlight 8 and above but I need them all to be labeled. @colemenwilson I figured that someone would enjoy it! 😆 This looks right, I'll keep it in Reedshift - in Magic ETL that's 7 R&W tiles, plus Join tiles, per scenario and I have another two to sort like this. Thanks both

  • NateBI
    NateBI Member
    edited June 2023
    Options

    @MichelleH @colemenwilson
    Issue Update: My rank field is not skipping nulls or restarting the count - this is impacting the logic you've sent. Here are the settings:




  • ColemenWilson
    edited June 2023
    Options

    You'll want both rank and order fields, so keep your rank field as it was before. For order, you can use any number of field combinations to determine the order. I am not sure what other fields you have available but I would think some combination of Date, Batch ID and/or Batch last run date should get the order right. You'll probably also want to remove the partition option. How are you ordering your data currently?

    If I solved your problem, please select "yes" above

  • NateBI
    NateBI Member
    edited June 2023
    Options

    @colemenwilson The order field is working fine, it's based on the date. It's the Rank field that wasn't set up correctly to match the dummy table I sent in the first question. So now, I'm going back a stage to replicate the rank field - if possible - to support the final stage - your solution - of running a lead over each number.

  • ColemenWilson
    Options

    Gotcha, I thought the rank was already solved for. Okay, so the results you are getting are consistent with what I would expect to see based on how your ETL is setup. To have the rank start over after a null there needs to be some other field the rank is considering, if it is just referencing date and rule2_hours_marker then the rank will continue after nulls. Is there some other field you can use to group a streak together? For example, a batch, session, team, etc…?

    If I solved your problem, please select "yes" above

  • NateBI
    NateBI Member
    edited June 2023
    Options

    @colemenwilson Sweet, spoken with @GrantSmith and he suggested:
    step one: cumulative sum to calculate the group ID 
    step_two: rank partitioned by step_one field

    Which landed this here:

    I then run a lead over step_two to get the previous value and where previous_value_step_two = step_two then then 0 else step_two.

    'step_three' now represents the original 'Rank' field concept.


    No onto your solution for the colour marker.