What's the best practice way to remove data from a partitioned dataset?
We have an ETL set up to a partitioned output. Now that we are far enough into 2024 I'm being told that we can reduce the size of this dataset by removing all 2022 data from it. What would be the best way to go about doing this?
Best Answer
-
@Julianna_Potter there is a configuration option in the output dataset that let's you specify which partitions you want to keep.
Exercise caution: If a partition filter expression is specified, all partitions are evaluated against it and any that do not pass are deleted.
If you're talking about on the input side, you can set up that as well in the configuration
If this answered your question, please 'like' and 'accept' my answer 😁
David Cunningham
** Was this post helpful? Click Agree 😀, Like 👍️, or Awesome ❤️ below **
** Did this solve your problem? Accept it as a solution! ✔️**0
Answers
-
@Julianna_Potter there is a configuration option in the output dataset that let's you specify which partitions you want to keep.
Exercise caution: If a partition filter expression is specified, all partitions are evaluated against it and any that do not pass are deleted.
If you're talking about on the input side, you can set up that as well in the configuration
If this answered your question, please 'like' and 'accept' my answer 😁
David Cunningham
** Was this post helpful? Click Agree 😀, Like 👍️, or Awesome ❤️ below **
** Did this solve your problem? Accept it as a solution! ✔️**0 -
@Julianna_Potter - Do an export of the dataset before you take action if it's not too big to do so!
DataMaven
Breaking Down Silos - Building Bridges
**Say "Thanks" by clicking a reaction in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"0 -
@DataMaven thanks, but definitely too large for that. It's in the billions which is why I want to remove 2022 data from it.
1 -
@david_cunningham thanks for your response. I actually knew about that feature and totally spaced last week when I was trying to remember the best way to go about this. 🤦♀️
1 -
@Julianna_Potter - I figured that may be the case! Seeing who it was, I was pretty sure it had to be something like that, but I didn't want to assume.
DataMaven
Breaking Down Silos - Building Bridges
**Say "Thanks" by clicking a reaction in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"0 -
@DataMaven and @david_cunningham thank you both! I used the configuration setting to filter 2022 data out (after testing in another test partition) and it worked perfectly.
1
Categories
- All Categories
- 1.7K Product Ideas
- 1.7K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 292 Workbench
- 4 Cloud Amplifier
- 8 Federated
- 2.8K Transform
- 95 SQL DataFlows
- 602 Datasets
- 2.1K Magic ETL
- 3.7K Visualize
- 2.4K Charting
- 695 Beast Mode
- 43 App Studio
- 39 Variables
- 658 Automate
- 170 Apps
- 441 APIs & Domo Developer
- 42 Workflows
- 5 DomoAI
- 32 Predict
- 12 Jupyter Workspaces
- 20 R & Python Tiles
- 386 Distribute
- 111 Domo Everywhere
- 269 Scheduled Reports
- 6 Software Integrations
- 113 Manage
- 110 Governance & Security
- 8 Domo University
- 30 Product Releases
- Community Forums
- 39 Getting Started
- 29 Community Member Introductions
- 98 Community Announcements
- Domo Community Gallery
- 4.8K Archive