How to deal with attribution windows on large Digital Media data sets
One of the challenges of workigng with digital media data is "attribution windows". This is the practice by publishers of restating their numbers within a given window of time, e.g., DCM is 14 days, Youtube 35 and Facebook 28.
If you use the connector to append the last day's data to the existing data set (history) your data will likely be correct. The solution is to get the last N days of data (where N is the window length) and replace this in the existing data set. This can be done in Magic but the code is complex, due to the large sizes of the data sets, may be prohibitively long. For example, at one client after just 9 months the DCM data set was 30m rows and processing the attribution window with Magic took 10 hours.
Fortunately there is a simple SQL solution. In Redshift (or MySQL if you data sets are smaller), input both the history and the window data set then use this SQL
SELECT * FROM history
WHERE Date < (SELECT MIN(Date) FROM window LIMIT 1)
UNION ALL
SELECT * FROM window
then write this out to the history data set.
We found for a 30m row history data set this takes about 90 min in Redshift, with most of that time reading and writing data.
Categories
- All Categories
- 1.4K Product Ideas
- 1.4K Ideas Exchange
- 1.4K Connect
- 1.1K Connectors
- 278 Workbench
- 4 Cloud Amplifier
- 4 Federated
- 2.7K Transform
- 89 SQL DataFlows
- 556 Datasets
- 2K Magic ETL
- 3.3K Visualize
- 2.3K Charting
- 569 Beast Mode
- 11 App Studio
- 27 Variables
- 578 Automate
- 140 Apps
- 414 APIs & Domo Developer
- 23 Workflows
- 1 DomoAI
- 28 Predict
- 12 Jupyter Workspaces
- 16 R & Python Tiles
- 351 Distribute
- 91 Domo Everywhere
- 258 Scheduled Reports
- 2 Software Integrations
- 92 Manage
- 89 Governance & Security
- 9 Product Release Questions
- Community Forums
- 42 Getting Started
- 28 Community Member Introductions
- 85 Community Announcements
- 4.8K Archive