Adding data that matches a date to a date dimension table.

mberkeley
mberkeley Member
edited April 2022 in Magic ETL

I have HR data that has Start Date and End Date (or null) for each employee. I am looking to be able to see the number of active employees at any given date. I have added the Domo Date Dimension table to my data set, but I can't figure out how to join so that for any date I can get the count of employees who's start date is before the date and ended after or is null.

I've been able to get start/end date counts per day added to the dimension table, but can't figure out how to get that 'active' record count added.

Any ideas?

Best Answer

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    You can utilize the Calendar dataset from the Domo Dimensions calendar to do a conditional join on your HR dataset based on the start date and end date. With Magic ETL 2.0 you can't do a straight conditional join but would need to do a cartesian join then do the filtering after the fact. Depending on your HR dataset size this may take a while to process.

    With Magic 2.0:

    • Input Datasets: Calendar and HR
    • For each dataset add a constant "Join Column" with a value of 1
    • Use a Add Formula Tile with "COALESCE(`End Date`, CURRENT_DATE())" to default any NULLs to today for the end date
    • Left join your HR dataset onto the calendar dataset (have calendar on the left side) based on the Join Column
    • Use a filter tile to filter your dataset based on `Start Date` <= `Calendar Date` and `Calendar Date` <= `End Date`
    • Then feed the resulting dataset into a group by and group by the `Calendar Date` and count the number of employees you have employed at the given date.


    Alternatively you could utilize a MySQL dataflow and do something similar except have your join clause be your filter clause (`Calendar Date` BETWEEN `Start Date` AND `End Date`) in your join clause and not worry about doing your cartesian join. This may be faster due to less data being generated.

    SELECT c.`dt` as "Calendar Date", COUNT(*)
    FROM calendar as c
    LEFT JOIN `hr_dataset` as hr on c.`dt` BETWEEN hr.`Start Date` AND COALESCE(hr.`End Date`, CURRENT_DATE())
    


    Note: This is all untested pseudo code and you may need to tweak based on your table and column names.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    You can utilize the Calendar dataset from the Domo Dimensions calendar to do a conditional join on your HR dataset based on the start date and end date. With Magic ETL 2.0 you can't do a straight conditional join but would need to do a cartesian join then do the filtering after the fact. Depending on your HR dataset size this may take a while to process.

    With Magic 2.0:

    • Input Datasets: Calendar and HR
    • For each dataset add a constant "Join Column" with a value of 1
    • Use a Add Formula Tile with "COALESCE(`End Date`, CURRENT_DATE())" to default any NULLs to today for the end date
    • Left join your HR dataset onto the calendar dataset (have calendar on the left side) based on the Join Column
    • Use a filter tile to filter your dataset based on `Start Date` <= `Calendar Date` and `Calendar Date` <= `End Date`
    • Then feed the resulting dataset into a group by and group by the `Calendar Date` and count the number of employees you have employed at the given date.


    Alternatively you could utilize a MySQL dataflow and do something similar except have your join clause be your filter clause (`Calendar Date` BETWEEN `Start Date` AND `End Date`) in your join clause and not worry about doing your cartesian join. This may be faster due to less data being generated.

    SELECT c.`dt` as "Calendar Date", COUNT(*)
    FROM calendar as c
    LEFT JOIN `hr_dataset` as hr on c.`dt` BETWEEN hr.`Start Date` AND COALESCE(hr.`End Date`, CURRENT_DATE())
    


    Note: This is all untested pseudo code and you may need to tweak based on your table and column names.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • I like the MySQL version more, but have never written SQL ETL, so I worked with the Magic ETL version. It does exactly what I needed. Thank you!

  • @mberkeley If you could accept my answer to your question so others can find it easier in the future I'd appreciate it. Thanks.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • GrantSmith if you had to do this by month, then how would you do it?

  • I'd utilize the LAST_DAY function on the dates to return the last day of the month and then filter out the date dimension table to only have those specific records. Then you can use LAST_DAY for your dates as well when you're comparing against the dates.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**