Get unique records from datasets
Hi,
I have two reports for last and current month which has size, folder name, date as fields. There are duplicates between last month's report and current month's report. When I combine both the reports, the output dataset should not double up - meaning if there are same folder name from previous month report, it should just show if there was a difference between last report and this report. For example, the folder called "Test1" did not change in size so no updates are needed for that folder. Can you please suggest me on this?
Thanks in advance
Best Answer
-
you're using confusing language sorry, it's not clear to me if you're asking for an UPSERT scheme which keeps only the most recent version of each row or if you're looking for a partition scheme where you only keep the block of data from the most recent month. OR if you're looking for a way to report if the value changed from one month to the next (slowly changing dimension).
I'm sprinkling in what may sound like jargon b/c they are 'terms-of-art' with 20 years of history each.
If you need UPSERT or Partitioning, you can look at a recursive dataflow pattern. https://www.youtube.com/watch?v=JNQFZCj8JcQ
The difference between UPSERT or Partitioning would just be how you define your RANK partition, do you make it at the granularity of one row, or month or folder_month.
if you need slowly changing dimensions, you still implement the RANK to get the row version number, but then you add a LAG function to get the value last period.
Jae Wilson
Check out my 🎥 Domo Training YouTube Channel 👨💻
**Say "Thanks" by clicking the ❤️ in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"3
Answers
-
you're using confusing language sorry, it's not clear to me if you're asking for an UPSERT scheme which keeps only the most recent version of each row or if you're looking for a partition scheme where you only keep the block of data from the most recent month. OR if you're looking for a way to report if the value changed from one month to the next (slowly changing dimension).
I'm sprinkling in what may sound like jargon b/c they are 'terms-of-art' with 20 years of history each.
If you need UPSERT or Partitioning, you can look at a recursive dataflow pattern. https://www.youtube.com/watch?v=JNQFZCj8JcQ
The difference between UPSERT or Partitioning would just be how you define your RANK partition, do you make it at the granularity of one row, or month or folder_month.
if you need slowly changing dimensions, you still implement the RANK to get the row version number, but then you add a LAG function to get the value last period.
Jae Wilson
Check out my 🎥 Domo Training YouTube Channel 👨💻
**Say "Thanks" by clicking the ❤️ in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"3
Categories
- All Categories
- 1.9K Product Ideas
- 1.9K Ideas Exchange
- 1.6K Connect
- 1.3K Connectors
- 302 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 104 SQL DataFlows
- 637 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 761 Beast Mode
- 65 App Studio
- 42 Variables
- 704 Automate
- 182 Apps
- 458 APIs & Domo Developer
- 53 Workflows
- 11 DomoAI
- 39 Predict
- 16 Jupyter Workspaces
- 23 R & Python Tiles
- 401 Distribute
- 116 Domo Everywhere
- 277 Scheduled Reports
- 8 Software Integrations
- 132 Manage
- 129 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 12 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 111 Community Announcements
- 4.8K Archive