Median Date

Prathmesh24_Diacto
edited April 30 in Magic ETL

Hello,

I have 2 columns- One has different markets like A, B and C. Each market has some dates.

The scenario is if I pick A- there is a range of dates with a minimum date, maximum date. I want to create a new date column which will populate median of the above date range for A in it. Obviously, the median will be one date populated against all rows for A, similarly for B and C.
Also, dates can be repeating - For example: 6 December 2016 can appear 4 times, 25 December 2017 can appear twice. These are not duplicates but they are different by lets say a size of that product.


I am seeing a median date in the Data section, but I am not understanding how it can be created in the ETL as, it won't allow to get a Median function in Group By for dates.

Thanks in advance!!

Best Answers

  • Sean_Tully
    Sean_Tully Contributor
    Answer ✓

    This is a fun one! Here's how I might find the median date in ETL:

    I'd use a row number function, partitioning by market and sorting by date, to get the data in order and count the data row-by-row.

    I'd also have a separate group by, by market, where I'm getting a total count of records and dividing by two.

    Next, I'd join that total/2 number to the row number tile on the market field. Wherever the row number equals the total/2 number, that should be the median. I'd use a filter to isolate that date, and then join it back to the main data in the ETL, again on the market field, so that every record would have the median date populated.

    Hope there's an easier way to do it, but here's one way at least!

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Another potential alternative is to convert your dates into unix timestamp (TO_TIMESTAMP) which would be a number, this would then allow you to run a median on the number and then just convert the median back to a date (FROM_UNIXTIME)

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • Sean_Tully
    Sean_Tully Contributor
    Answer ✓

    This is a fun one! Here's how I might find the median date in ETL:

    I'd use a row number function, partitioning by market and sorting by date, to get the data in order and count the data row-by-row.

    I'd also have a separate group by, by market, where I'm getting a total count of records and dividing by two.

    Next, I'd join that total/2 number to the row number tile on the market field. Wherever the row number equals the total/2 number, that should be the median. I'd use a filter to isolate that date, and then join it back to the main data in the ETL, again on the market field, so that every record would have the median date populated.

    Hope there's an easier way to do it, but here's one way at least!

  • Hi Sean,

    Thanks much for replying. I will give this a try 🙂

  • Sean_Tully
    Sean_Tully Contributor

    No problem, and good luck!

    I haven't had to find a median date previously, so I think there are some details I didn't think of, like the total being an odd number, and therefore the total/2 not being whole. Not sure what the best way to handle that would be - my first thought was rounding, but I'm not sure if that would get the right result every time.

  • Noted! Thanks!

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Another potential alternative is to convert your dates into unix timestamp (TO_TIMESTAMP) which would be a number, this would then allow you to run a median on the number and then just convert the median back to a date (FROM_UNIXTIME)

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • @GrantSmith, this worked. I didn’t even have to deal with duplicate dates and the same results as we see in Data section.

    Supperb!! Thank you both @Sean_Tully and @GrantSmith for jumping in to help!!