How to fill-down nulls from earliest instance of non-null value?

alzwang99
alzwang99 Member
edited December 16 in Magic ETL

I am currently struggling to find a cohesive solution to being able to fill nulls with the first instance of a non-null value. The layout is very simple, basically there is column called ID and it contains one non-value null and the rest is null until the next non-value null. I would like it where the nulls are replaced by the most current non-value null.

I have seen solutions utilizing the lag function, group by, and even a recursive data flow. However, I never seem to get those solutions working properly for me.

I would greatly appreciate if anyone can provide a good breakdown of how to implement this solution. Thank you.

Best Answer

  • rco
    rco Domo Employee
    Answer ✓

    The flow consists of the following:

    1. Add Formula. New column name: "Row Number", Formula: ROW_NUMBER()

    2. Rank & Window. Order by "Row Number". No partition columns. New column name: "ID Count", contains Count of "ID" from Unbounded Preceding to 0 Following.

    3. Filter. Condition: ID Is Not Null.

    4. Join #2 to #3. Type: Inner. Equality condition: "ID Count" to "ID Count". Resolve column name conflicts by dropping ID from the #2 side and dropping all other columns from the #3 side.

    5. Select or Alter Columns to get rid of "Row Number" and "ID Count" and to reorder columns, if desired.

    Explanations:

    #1: Rank & Window requires that we sort on something, but we want to keep the data in their original order. Since there is no sequential column in the example data, we must create one. Add Formula has a function ROW_NUMBER() which fulfills this purpose.

    #2: The Count function from Unbounded Preceding to 0 Following is a "Running Count". This will increase whenever we encounter an ID that isn't null, otherwise it will remain the same. This is useful to us because it gives us a value in common between every row and its last row with a non-null ID.

    #3: This becomes the "lookup table" of "ID Count" to "ID"

    #4: This joins all our rows to their corresponding row in the "ID Count" to "ID" lookup table. We just need to be careful to preserve the "ID" column from the lookup table (#3) and all our other columns from the main table when we do duplicate column resolution.

    #5: We produced some "utility columns" along the way, and the Join may have put our "ID" out of place (depending on where it is supposed to be). An Alter Columns could be used to simply drop the unwanted columns, or a Select could accomplish both the removal of unwanted columns and the reordering of desired columns.

    Randall Oveson <randall.oveson@domo.com>

Answers

  • rco
    rco Domo Employee
    Answer ✓

    The flow consists of the following:

    1. Add Formula. New column name: "Row Number", Formula: ROW_NUMBER()

    2. Rank & Window. Order by "Row Number". No partition columns. New column name: "ID Count", contains Count of "ID" from Unbounded Preceding to 0 Following.

    3. Filter. Condition: ID Is Not Null.

    4. Join #2 to #3. Type: Inner. Equality condition: "ID Count" to "ID Count". Resolve column name conflicts by dropping ID from the #2 side and dropping all other columns from the #3 side.

    5. Select or Alter Columns to get rid of "Row Number" and "ID Count" and to reorder columns, if desired.

    Explanations:

    #1: Rank & Window requires that we sort on something, but we want to keep the data in their original order. Since there is no sequential column in the example data, we must create one. Add Formula has a function ROW_NUMBER() which fulfills this purpose.

    #2: The Count function from Unbounded Preceding to 0 Following is a "Running Count". This will increase whenever we encounter an ID that isn't null, otherwise it will remain the same. This is useful to us because it gives us a value in common between every row and its last row with a non-null ID.

    #3: This becomes the "lookup table" of "ID Count" to "ID"

    #4: This joins all our rows to their corresponding row in the "ID Count" to "ID" lookup table. We just need to be careful to preserve the "ID" column from the lookup table (#3) and all our other columns from the main table when we do duplicate column resolution.

    #5: We produced some "utility columns" along the way, and the Join may have put our "ID" out of place (depending on where it is supposed to be). An Alter Columns could be used to simply drop the unwanted columns, or a Select could accomplish both the removal of unwanted columns and the reordering of desired columns.

    Randall Oveson <randall.oveson@domo.com>

  • Manasi_Panov
    Manasi_Panov Contributor
    edited December 17

    Hi @alzwang99,

    You can do it with a simple Python tile and the following single line code (bold):

    #Import the domomagic package into the script
    from domomagic import *

    #read data from inputs into a data frame
    input1 = read_dataframe('your_previous_tile_name')

    #write your script here
    input1['ID'] = input1['ID'].map(lambda x: None if not x else x).fillna(method='ffill')

    #write a data frame so it's available to the next action
    write_dataframe(input1)

    If you found this post helpful, please use 💡/💖/👍/😊 below! If it solved your problem, don't forget to accept the answer.

  • Randall, thank you again for assisting me. This is an amazing and well-thought out solution. I think this is the cleanest way to achieve this result. I also want to say thank you for providing a clear explanation. Helps me understand the inner-workings that happen in the background.