What's the best practice way to remove data from a partitioned dataset?
We have an ETL set up to a partitioned output. Now that we are far enough into 2024 I'm being told that we can reduce the size of this dataset by removing all 2022 data from it. What would be the best way to go about doing this?
Best Answer
-
@Julianna_Potter there is a configuration option in the output dataset that let's you specify which partitions you want to keep.
Exercise caution: If a partition filter expression is specified, all partitions are evaluated against it and any that do not pass are deleted.
If you're talking about on the input side, you can set up that as well in the configuration
If this answered your question, please 'like' and 'accept' my answer 😁
David Cunningham
** Was this post helpful? Click Agree 😀, Like 👍️, or Awesome ❤️ below **
** Did this solve your problem? Accept it as a solution! ✔️**0
Answers
-
@Julianna_Potter there is a configuration option in the output dataset that let's you specify which partitions you want to keep.
Exercise caution: If a partition filter expression is specified, all partitions are evaluated against it and any that do not pass are deleted.
If you're talking about on the input side, you can set up that as well in the configuration
If this answered your question, please 'like' and 'accept' my answer 😁
David Cunningham
** Was this post helpful? Click Agree 😀, Like 👍️, or Awesome ❤️ below **
** Did this solve your problem? Accept it as a solution! ✔️**0 -
@Julianna_Potter - Do an export of the dataset before you take action if it's not too big to do so!
DataMaven
Breaking Down Silos - Building Bridges
**Say "Thanks" by clicking a reaction in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"0 -
@DataMaven thanks, but definitely too large for that. It's in the billions which is why I want to remove 2022 data from it.
1 -
@david_cunningham thanks for your response. I actually knew about that feature and totally spaced last week when I was trying to remember the best way to go about this. 🤦♀️
1 -
@Julianna_Potter - I figured that may be the case! Seeing who it was, I was pretty sure it had to be something like that, but I didn't want to assume.
DataMaven
Breaking Down Silos - Building Bridges
**Say "Thanks" by clicking a reaction in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"0 -
@DataMaven and @david_cunningham thank you both! I used the configuration setting to filter 2022 data out (after testing in another test partition) and it worked perfectly.
1
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.6K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 102 SQL DataFlows
- 626 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 753 Beast Mode
- 61 App Studio
- 41 Variables
- 692 Automate
- 177 Apps
- 456 APIs & Domo Developer
- 49 Workflows
- 10 DomoAI
- 38 Predict
- 16 Jupyter Workspaces
- 22 R & Python Tiles
- 398 Distribute
- 115 Domo Everywhere
- 276 Scheduled Reports
- 7 Software Integrations
- 130 Manage
- 127 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 11 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 110 Community Announcements
- 4.8K Archive