Attendance Occurrence Count

We've been able to accomplish a good portion of the original request within dataflow w/transforms rather than a beast mode, and we'd like to complete this request but need assistance on the last piece of it as the information available online hasn't helped as we'd hoped.

 

We were told that this would fall under Gaps & Islands; looking to group a series of absences into a single occurrence when they happen on consecutive days, and count those multi-day occurrences (as a single occurrence) as well as all other single day absences for a total, per employee? 

EXAMPLE: Bill was absent on Monday, Tuesday, & Wednesday (single occurrence) and then the following week was absent on Monday (single occurrence), then again later in the week on Thursday & Friday (single occurrence).

 

In total, Bill was absent 6 days but we would like to consider this as three (3) occurrences where they're related and on consecutive days; sample of the data we'll be using is below. Any thoughts? Please let us know if any other information would be needed to create this transform. Thanks!

John

 

Screen Shot 2018-12-11 at 11.59.05 AM.png

 

 

Best Answer

  • Unknown
    Answer ✓

    Hi, @John-Peddle,

     

    You should be able to do this in a MySQL dataflow. Here's what I would recommend:

    • Get or create a calendar of dates - you can access the Domo default calendar using the Domo Dimensions connector
    • In MySQL dataflow, explode each person name across each date, so if you had 365 days and three people, you would have 3*365 records. Designate if the person was absent on the date. Let's call this the date_person table
    • Join the date_person table to itself, something like this:

     

    from date_person as d
    left join date_person as d_minus_1
    on d_minus_1.person = d.person
    and dateadd(d_minus_1.date,interval 1 day) = d.date
    • Let's call this the date_compare table
    • Now, you'll need to apply the logic: if the person was absent today but not yesterday, then increment the instance count + 1; if the person was absent today and yesterday, to do NOT increment the instance count. You can apply a similar method as is laid out here: http://knowledge.domo.com?cid=rank 

     

    I hope that gets you going in the right direction. 

     

    Best, 

    Dan

Answers

  • Unknown
    Answer ✓

    Hi, @John-Peddle,

     

    You should be able to do this in a MySQL dataflow. Here's what I would recommend:

    • Get or create a calendar of dates - you can access the Domo default calendar using the Domo Dimensions connector
    • In MySQL dataflow, explode each person name across each date, so if you had 365 days and three people, you would have 3*365 records. Designate if the person was absent on the date. Let's call this the date_person table
    • Join the date_person table to itself, something like this:

     

    from date_person as d
    left join date_person as d_minus_1
    on d_minus_1.person = d.person
    and dateadd(d_minus_1.date,interval 1 day) = d.date
    • Let's call this the date_compare table
    • Now, you'll need to apply the logic: if the person was absent today but not yesterday, then increment the instance count + 1; if the person was absent today and yesterday, to do NOT increment the instance count. You can apply a similar method as is laid out here: http://knowledge.domo.com?cid=rank 

     

    I hope that gets you going in the right direction. 

     

    Best, 

    Dan

  • Dan,

    Thanks for the response but unfortunately what you've suggested is way beyond my comprehension and skillset; as much as I've tried I'm just not able to follow along and achieve what you're advising can be done with the rank function.

     

    Additionally, can you explain why we need to use a separate calendar if both dates and absences already exist in the dataset we're using? Plus, I have no idea what you mean by explode employee names across each date? Sorry!

     

    John

  • Hi, @John-Peddle,


    This is a fairly complex bit of logic you're trying to implement. Let me take another swing at explaining better. You can always reach out to your CSM or AE for more in-depth direction than I'll be able to give here. Please DM me if you need help contacting your CSM or AE.

     

    Let me address your questions in reverse order:

    1) "Explode the names across dates": What I mean here is join the employee table with the dates table. Depending on the format of your absentee dataset, this may or may not be necessary. What you want is a table something like this, where there are no gaps in the dates (let's call this the date_employee_status table):

     

    Date       |EmployeeID |EmployeeStatus
    -----------|-----------|------------------
    2018-10-12 |1 |Present
    2018-12-11 |1 |Absent
    2018-12-10 |1 |Present
    2018-12-12 |2 |Absent
    2018-12-11 |2 |Absent
    2018-12-10 |2 |Present
    2018-12-12 |3 |Absent
    2018-12-11 |3 |Present
    2018-12-10 |3 |Absent


    2) Need for a separate calendar: Again, you may or may not need this depending on the format of your absentee dataset. What you're driving toward is a dataset like I'm showing above

     

     

    Here is where it becomes a bit more difficult. What you want to do is look at each employee on each day and determine if the employee:

    • Was present
    • Was absent
    • Was absent the day before

    So, you need to know the status of each employee on each day, but you also need to know the status of each employee on the day before.

     

    There are probably many ways of doing this. From a coding perspective, the MySQL method I cited yesterday is more complex, but from the logical perspective, it is more straightforward. Another option is to us a Magic ETL dataflow. That method requires no code but is more complex from a logical perspective. Whatever method you use, you need to be able to look at each employee on each day AND the day before. That's the key.

     

    You could do this using Magic ETL using a Rank & Window tile, using a Row Number function, ordering by Date, and partitioning by EmployeeID. The result will be a new column where each date for each employee is numbered. Something like this:

     

    Date       |EmployeeID |EmployeeStatus |EmployeeDateNumber
    -----------|-----------|---------------|--------------------
    2018-10-12 |1 |Present |3
    2018-12-11 |1 |Absent |2
    2018-12-10 |1 |Present |1
    2018-12-12 |2 |Absent |3
    2018-12-11 |2 |Absent |2
    2018-12-10 |2 |Present |1
    2018-12-12 |3 |Absent |3
    2018-12-11 |3 |Present |2
    2018-12-10 |3 |Absent |1

     

    Then, using a Calculator tile, create a new column as EmployeeDateNumber + 1 and call this new column NextEmployeeDateNumber.

     

    Finally, you can join that dataset to itself where EmployeeID = EmployeeID and EmployeeDateNumber = NextEmployeeDateNumber

     

    Having performed that join, you'll have a single dataset where you have Date, EmployeeID, EmployeeStatus on that date, and EmpoyeeStatus from the day before. Then, just apply logic to determine if an employee was absent on a date and if that absence in new (first day) or continuing.

     

    Hope that is a little more helpful. 

     

    Best,
    Dan

  • Hi, @John-Peddle,


    This is a fairly complex bit of logic you're trying to implement. Let me take another swing at explaining better. You can always reach out to your CSM or AE for more in-depth direction than I'll be able to give here. Please DM me if you need help contacting your CSM or AE.

     

    Let me address your questions in reverse order:

    1) "Explode the names across dates": What I mean here is join the employee table with the dates table. Depending on the format of your absentee dataset, this may or may not be necessary. What you want is a table something like this, where there are no gaps in the dates (let's call this the date_employee_status table):

     

    Date |EmployeeID |EmployeeStatus
    -----------|-----------|------------------
    2018-10-12 |1 |Present
    2018-12-11 |1 |Absent
    2018-12-10 |1 |Present
    2018-12-12 |2 |Absent
    2018-12-11 |2 |Absent
    2018-12-10 |2 |Present
    2018-12-12 |3 |Absent
    2018-12-11 |3 |Present
    2018-12-10 |3 |Absent


    2) Need for a separate calendar: Again, you may or may not need this depending on the format of your absentee dataset. What you're driving toward is a dataset like I'm showing above

     

     

    Here is where it becomes a bit more difficult. What you want to do is look at each employee on each day and determine if the employee:

    • Was present
    • Was absent
    • Was absent the day before

    So, you need to know the status of each employee on each day, but you also need to know the status of each employee on the day before.

     

    There are probably many ways of doing this. From a coding perspective, the MySQL method I cited yesterday is more complex, but from the logical perspective, it is more straightforward. Another option is to us a Magic ETL dataflow. That method requires no code but is more complex from a logical perspective. Whatever method you use, you need to be able to look at each employee on each day AND the day before. That's the key.

     

    You could do this using Magic ETL using a Rank & Window tile, using a Row Number function, ordering by Date, and partitioning by EmployeeID. The result will be a new column where each date for each employee is numbered. Something like this:

     

    Date       |EmployeeID |EmployeeStatus |EmployeeDateNumber
    -----------|-----------|---------------|--------------------
    2018-10-12 |1 |Present |3
    2018-12-11 |1 |Absent |2
    2018-12-10 |1 |Present |1
    2018-12-12 |2 |Absent |3
    2018-12-11 |2 |Absent |2
    2018-12-10 |2 |Present |1
    2018-12-12 |3 |Absent |3
    2018-12-11 |3 |Present |2
    2018-12-10 |3 |Absent |1

     

    Then, using a Calculator tile, create a new column as EmployeeDateNumber + 1 and call this new column NextEmployeeDateNumber.

     

    Finally, you can join that dataset to itself where EmployeeID = EmployeeID and EmployeeDateNumber = NextEmployeeDateNumber

     

    Having performed that join, you'll have a single dataset where you have Date, EmployeeID, EmployeeStatus on that date, and EmpoyeeStatus from the day before. Then, just apply logic to determine if an employee was absent on a date and if that absence in new (first day) or continuing.

     

    Hope that is a little more helpful. 

     

    Best,
    Dan

  • @DanB - thanks for the clarification, it really helped to better understand the logic behind your initial post. We're currently working on this and appreciate you taking the time to offer a suggestion and then provide the detailed insight for it. Have a great day!

     

    John