How to input identifiers using identifiers

I have a data flow that has multiple properties and competitors. I was looking for a way to label all of the records that are in between into one property. In the example sheet that i uploaded, I was wondering if a beast mode formula could be used to rows 3-16 are labeled as the property Brandon crossroads in this case. Using Magic ETL is also an option. I just don't see how I could pivot and unpivot properly.

I am attempting to change sheet 1 into sheet 2 in reality.

I have looked at a few ways to do this but I don't see any native functions or etl flow that would be easily applicable. If I was in Python I could use fill. My only other option is to lag it and fill it off the lagged value that has text and then repeat the process.

Tagged:

Best Answer

  • marcel_luthi
    marcel_luthi Coach
    Answer ✓

    Just to be clear, your dataset currently has 3 columns:

    • row number (unique always filled)
    • C which currently has only a couple of values and most of them are null, and that you're trying to fill out
    • Property Data with the actual names of the companies

    While I would strongly advise to have this filled in the source, if that is not an option, this can be done via MagicETL in multiple ways. Myself, I'd do:

    Get only the rows that have a C value, append a JoinKey of 1 on both the filtered and the full set, do an Inner Join using the key, filter the results only to include those where the original row number is less than or equal to the joined row number, then do a Rank on the joined Row number in descending order, partitioning by the row number, filer the output to only include Rank = 1 and alter the columns, renaming the Joined C to C and dropping the Rank, JoinKey, Joined row number, Joined Property Data and C, finally store this to a new dataset. This will give you the output you're looking for, without having to use Lag an unknown number of times.

    This will work regardless of how many blanks you have and will always fill the blanks with the values of the last known one, the only dangerous thing is if the lowest row number on your dataset is lacking a value for C then all rows until the first appearance of a value for C will be removed. Here's a side by side from the source and the output:

Answers

  • marcel_luthi
    marcel_luthi Coach
    Answer ✓

    Just to be clear, your dataset currently has 3 columns:

    • row number (unique always filled)
    • C which currently has only a couple of values and most of them are null, and that you're trying to fill out
    • Property Data with the actual names of the companies

    While I would strongly advise to have this filled in the source, if that is not an option, this can be done via MagicETL in multiple ways. Myself, I'd do:

    Get only the rows that have a C value, append a JoinKey of 1 on both the filtered and the full set, do an Inner Join using the key, filter the results only to include those where the original row number is less than or equal to the joined row number, then do a Rank on the joined Row number in descending order, partitioning by the row number, filer the output to only include Rank = 1 and alter the columns, renaming the Joined C to C and dropping the Rank, JoinKey, Joined row number, Joined Property Data and C, finally store this to a new dataset. This will give you the output you're looking for, without having to use Lag an unknown number of times.

    This will work regardless of how many blanks you have and will always fill the blanks with the values of the last known one, the only dangerous thing is if the lowest row number on your dataset is lacking a value for C then all rows until the first appearance of a value for C will be removed. Here's a side by side from the source and the output:

  • mhouk
    mhouk Member

    Awesome thanks for the input! This should work. I wish i could have filled in data but its not a great source.