Datediff in Redshift vs Magic ETL

Robaba04
Robaba04 Member
edited May 7 in Magic ETL

Hi there!

Im trying to recreate a calculation from Redshift to Magic ETL. Im running into some trouble with a specific formula.

In Redshift: DATEDIFF('week',"start_date__c","end_date__c")

In Magic: v1 - DATEDIFF(`end_date__c`,`start_date__c`)/7

In Magic: v2 - WEEKOFYEAR(`end_date__c`) - WEEKOFYEAR(`start_date__c`)

Showing a picture of my results and the comparison that is happening with some sample data. It looks like the Redshift function is behaving similar to the weekofyear formula but is able to calculate across many years, which isnt compatible with my v2 formula.

We can see that the "weekofyear" formula on the right hand side is the most similar, but it doesnt match when calculating across multiple years.

Hoping someone has solved for this before!

Thank you in advance!

Best Answer

  • david_cunningham
    Answer ✓

    @Robaba04 - my recommended approach is to not try to replicate the output exactly. For example, in your original photo, row1 is 1.42 and your desired result is 2, but row2 is 13.42 and your desired result is 13. So it seems like Redshift is applying a varying rounding methodology. I'm not super familiar with the backend logic, so won't comment on that.

    However, the approach I typically take when doing date/time differences is to make use of either EPOCH or UNIX. In this case, if you use EPOCH you will get the number of milliseconds between your start and end dates. This removes the complexity of trying to account for partial years, multiple years, start/end weeks across multiple years, etc.

    Note, since your data is dates, without milliseconds in the time, you can use either UNIX or EPOCH and get the same result. For the sake of the example I'll use EPOCH.

    EPOCH_MILLIS(end)-EPOCH_MILLIS(start)
    

    You can then divide to get the number of weeks between those 2 datetimes with fairly accurate precision.

    Finally, you can round with whatever approach suits you best (FLOOR, CEIL, ROUND). In this case I'll use round to round up and down.

    David Cunningham

    ** Was this post helpful? Click Agree 😀, Like 👍️, or Awesome ❤️ below **
    ** Did this solve your problem? Accept it as a solution! ✔️**

Answers

  • @Robaba04 Your "week" calculation is the closest to the logic used in Redshift, but you will need to put a FLOOR function around it to round down to the nearest integer:

    FLOOR(DATEDIFF(`end_date__c`,`start_date__c`)/7)
    

  • david_cunningham
    edited May 7

    @Robaba04 - the answer provided by @MichelleH is great. When you select an answer, I would ask that you please choose her post. If it was me, I would take a similar approach as what she suggested, probably using UNIX_TIMESTAMP() instead of datediff and then dividing to get back to my desired date grain.

    Just wanted to point out that I did notice that using FLOOR is going to cause you to have a different result for row1 and row5 since you will be rounding down. If you're wanting to match the previous redshift output exactly, an option would be… Note, this will only work if all of your dates are within 1 year of each other.

    David Cunningham

    ** Was this post helpful? Click Agree 😀, Like 👍️, or Awesome ❤️ below **
    ** Did this solve your problem? Accept it as a solution! ✔️**

  • Robaba04
    Robaba04 Member

    Hi all!

    Thank you for the quick responses! I believe we are getting closer. The floor method did disagree with row1 and row5.

    The case statement hit the mark for a one year gap. But to throw another wrench in the pile… if the time difference is greater than a year, it doesnt look like the case statement worked as intended.

  • david_cunningham
    Answer ✓

    @Robaba04 - my recommended approach is to not try to replicate the output exactly. For example, in your original photo, row1 is 1.42 and your desired result is 2, but row2 is 13.42 and your desired result is 13. So it seems like Redshift is applying a varying rounding methodology. I'm not super familiar with the backend logic, so won't comment on that.

    However, the approach I typically take when doing date/time differences is to make use of either EPOCH or UNIX. In this case, if you use EPOCH you will get the number of milliseconds between your start and end dates. This removes the complexity of trying to account for partial years, multiple years, start/end weeks across multiple years, etc.

    Note, since your data is dates, without milliseconds in the time, you can use either UNIX or EPOCH and get the same result. For the sake of the example I'll use EPOCH.

    EPOCH_MILLIS(end)-EPOCH_MILLIS(start)
    

    You can then divide to get the number of weeks between those 2 datetimes with fairly accurate precision.

    Finally, you can round with whatever approach suits you best (FLOOR, CEIL, ROUND). In this case I'll use round to round up and down.

    David Cunningham

    ** Was this post helpful? Click Agree 😀, Like 👍️, or Awesome ❤️ below **
    ** Did this solve your problem? Accept it as a solution! ✔️**