I'm working on an analysis of our historic timesheet data and trying to calculate our employee utilization (actual hours per day/8), so that we have a historic benchmark and can compare to projections for future utilization.
The challenge I'm running into is actually cleaning the data. It's generally well-structured - in theory, there should be a single row per person per project per day (some individuals may charge time to multiple projects). However, there are instances where adjustments are made - ex: someone charged 1 week of time to the wrong project (project A). So then accounting adds an entry for negative hours to project A, and then positive hours to Project B in the next pay period. Example:
However, the adjustments are not always consistent in how and when they are done (and there's no code to distinguish between an adjustment entry and a regular entry).
The projected utilization is calculated on a per day basis, but trying to do that with our historic data means there are some weird spikes because of the adjustments (one case was someone charged 38 hours on a day one week, and then there is an entry for -38 on a day in the next week).
Does anyone have any ideas on how to approach this? I know that I can manually clean this but I'm trying to avoid that so that when the timesheet data is processed weekly, there isn't manual maintenance to keep a card accurate.