How to deal with attribution windows on large Digital Media data sets
One of the challenges of workigng with digital media data is "attribution windows". This is the practice by publishers of restating their numbers within a given window of time, e.g., DCM is 14 days, Youtube 35 and Facebook 28.
If you use the connector to append the last day's data to the existing data set (history) your data will likely be correct. The solution is to get the last N days of data (where N is the window length) and replace this in the existing data set. This can be done in Magic but the code is complex, due to the large sizes of the data sets, may be prohibitively long. For example, at one client after just 9 months the DCM data set was 30m rows and processing the attribution window with Magic took 10 hours.
Fortunately there is a simple SQL solution. In Redshift (or MySQL if you data sets are smaller), input both the history and the window data set then use this SQL
SELECT * FROM history
WHERE Date < (SELECT MIN(Date) FROM window LIMIT 1)
UNION ALL
SELECT * FROM window
then write this out to the history data set.
We found for a 30m row history data set this takes about 90 min in Redshift, with most of that time reading and writing data.
Categories
- All Categories
- 1.7K Product Ideas
- 1.7K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 292 Workbench
- 4 Cloud Amplifier
- 8 Federated
- 2.8K Transform
- 95 SQL DataFlows
- 600 Datasets
- 2.1K Magic ETL
- 3.7K Visualize
- 2.4K Charting
- 685 Beast Mode
- 43 App Studio
- 38 Variables
- 655 Automate
- 170 Apps
- 438 APIs & Domo Developer
- 42 Workflows
- 5 DomoAI
- 32 Predict
- 12 Jupyter Workspaces
- 20 R & Python Tiles
- 384 Distribute
- 110 Domo Everywhere
- 268 Scheduled Reports
- 6 Software Integrations
- 111 Manage
- 108 Governance & Security
- 8 Domo University
- 25 Product Releases
- Community Forums
- 39 Getting Started
- 29 Community Member Introductions
- 98 Community Announcements
- Domo Community Gallery
- 4.8K Archive