How to deal with attribution windows on large Digital Media data sets

One of the challenges of workigng with digital media data is "attribution windows". This is the practice by publishers of restating their numbers within a given window of time, e.g., DCM is 14 days, Youtube 35 and Facebook  28.


If you use the connector to append the last day's data to the existing data set (history) your data will likely be correct.   The solution is to get the last N days of data (where N is the window length) and replace this in the existing data set.   This can be done in Magic but the code is complex, due to the large sizes of the data sets, may be  prohibitively long. For example, at one client after just 9 months the DCM data set was 30m rows and processing the attribution window with Magic took 10 hours.


Fortunately there is a simple SQL solution. In Redshift (or MySQL if you data sets are smaller), input both the history and the window data set then use this SQL

   SELECT * FROM history
   WHERE Date < (SELECT MIN(Date) FROM window LIMIT 1)
   SELECT * FROM window


then write this out to the history data set.


We found for a  30m row history data set this takes about 90 min in Redshift, with most of that time reading and writing data. 

This discussion has been closed.