Get unique records from datasets
Hi,
I have two reports for last and current month which has size, folder name, date as fields. There are duplicates between last month's report and current month's report. When I combine both the reports, the output dataset should not double up - meaning if there are same folder name from previous month report, it should just show if there was a difference between last report and this report. For example, the folder called "Test1" did not change in size so no updates are needed for that folder. Can you please suggest me on this?
Thanks in advance
Best Answer
-
you're using confusing language sorry, it's not clear to me if you're asking for an UPSERT scheme which keeps only the most recent version of each row or if you're looking for a partition scheme where you only keep the block of data from the most recent month. OR if you're looking for a way to report if the value changed from one month to the next (slowly changing dimension).
I'm sprinkling in what may sound like jargon b/c they are 'terms-of-art' with 20 years of history each.
If you need UPSERT or Partitioning, you can look at a recursive dataflow pattern. https://www.youtube.com/watch?v=JNQFZCj8JcQ
The difference between UPSERT or Partitioning would just be how you define your RANK partition, do you make it at the granularity of one row, or month or folder_month.
if you need slowly changing dimensions, you still implement the RANK to get the row version number, but then you add a LAG function to get the value last period.
Jae Wilson
Check out my 🎥 Domo Training YouTube Channel 👨💻
**Say "Thanks" by clicking the ❤️ in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"3
Answers
-
you're using confusing language sorry, it's not clear to me if you're asking for an UPSERT scheme which keeps only the most recent version of each row or if you're looking for a partition scheme where you only keep the block of data from the most recent month. OR if you're looking for a way to report if the value changed from one month to the next (slowly changing dimension).
I'm sprinkling in what may sound like jargon b/c they are 'terms-of-art' with 20 years of history each.
If you need UPSERT or Partitioning, you can look at a recursive dataflow pattern. https://www.youtube.com/watch?v=JNQFZCj8JcQ
The difference between UPSERT or Partitioning would just be how you define your RANK partition, do you make it at the granularity of one row, or month or folder_month.
if you need slowly changing dimensions, you still implement the RANK to get the row version number, but then you add a LAG function to get the value last period.
Jae Wilson
Check out my 🎥 Domo Training YouTube Channel 👨💻
**Say "Thanks" by clicking the ❤️ in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"3
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.9K Transform
- 100 SQL DataFlows
- 616 Datasets
- 2.2K Magic ETL
- 3.8K Visualize
- 2.5K Charting
- 738 Beast Mode
- 56 App Studio
- 40 Variables
- 684 Automate
- 176 Apps
- 452 APIs & Domo Developer
- 46 Workflows
- 10 DomoAI
- 35 Predict
- 14 Jupyter Workspaces
- 21 R & Python Tiles
- 394 Distribute
- 113 Domo Everywhere
- 275 Scheduled Reports
- 6 Software Integrations
- 123 Manage
- 120 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 10 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 108 Community Announcements
- 4.8K Archive