# Median Date

Member
edited April 30

Hello,

I have 2 columns- One has different markets like A, B and C. Each market has some dates.

The scenario is if I pick A- there is a range of dates with a minimum date, maximum date. I want to create a new date column which will populate median of the above date range for A in it. Obviously, the median will be one date populated against all rows for A, similarly for B and C.
Also, dates can be repeating - For example: 6 December 2016 can appear 4 times, 25 December 2017 can appear twice. These are not duplicates but they are different by lets say a size of that product.

I am seeing a median date in the Data section, but I am not understanding how it can be created in the ETL as, it won't allow to get a Median function in Group By for dates.

• Contributor

This is a fun one! Here's how I might find the median date in ETL:

I'd use a row number function, partitioning by market and sorting by date, to get the data in order and count the data row-by-row.

I'd also have a separate group by, by market, where I'm getting a total count of records and dividing by two.

Next, I'd join that total/2 number to the row number tile on the market field. Wherever the row number equals the total/2 number, that should be the median. I'd use a filter to isolate that date, and then join it back to the main data in the ETL, again on the market field, so that every record would have the median date populated.

Hope there's an easier way to do it, but here's one way at least!

• Coach

Another potential alternative is to convert your dates into unix timestamp (TO_TIMESTAMP) which would be a number, this would then allow you to run a median on the number and then just convert the median back to a date (FROM_UNIXTIME)

**Did this solve your problem? Accept it as a solution!**

• Contributor

This is a fun one! Here's how I might find the median date in ETL:

I'd use a row number function, partitioning by market and sorting by date, to get the data in order and count the data row-by-row.

I'd also have a separate group by, by market, where I'm getting a total count of records and dividing by two.

Next, I'd join that total/2 number to the row number tile on the market field. Wherever the row number equals the total/2 number, that should be the median. I'd use a filter to isolate that date, and then join it back to the main data in the ETL, again on the market field, so that every record would have the median date populated.

Hope there's an easier way to do it, but here's one way at least!

• Member

Hi Sean,

Thanks much for replying. I will give this a try 🙂

• Contributor

No problem, and good luck!

I haven't had to find a median date previously, so I think there are some details I didn't think of, like the total being an odd number, and therefore the total/2 not being whole. Not sure what the best way to handle that would be - my first thought was rounding, but I'm not sure if that would get the right result every time.

• Member

Noted! Thanks!

• Coach