How do Dataflows impact Dataset Certification?
I'm exploring the option of implementing Card/Dataset Certification for our company. I need a better understanding of how a DF can impact the certification of a Dataset (DS).
My basic understanding of DS Certification is that the certification will expire on the DS if there is a schema change (i.e. Columns deleted, renamed), a configuration change (does this include datatype or column width changes?), or an Account Change (NOT clear on what this is).
How does a schema change impact DS's further down in the hierarchy? So DF1 runs and it creates a new column in the output DS1. DS1 is used as input to DF2 which outputs to DS2. When DF1 runs and it modifies the schema of DS1, it's certification should expire. If DF2 then runs and now outputs the new column to DS2, will that expire the certification for DS2?
If something is changed in the DF (i.e. adding additional filter logic) that produces the output DS but there is no schema change to the DS, will that expire the certification on the DS? Can the DF cause a DS certification to expire if there is NO schema change but the logic of how the overall DS is created is changed?
Additionally, if a DS certification is expired, does that automatically expire any card certifications that are based on that DS?
Regards,
Jack
Comments
-
Certification only applies to cards and datasets so changing the logic of a dataflow won’t affect anything unless you change the schema of the output dataset that is certified. There is no downstream decertification that occurs if an upstream dataset is decertified. I’d recommend suggesting the downstream certification in the idea exchange.
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**0 -
@Jack_Kaye certification (imho) was not a thoroughly baked feautre, and @GrantSmith already pointed out some of the shortcomings.
You could augment Certification with additional workflow steps. For example you could enable PDP on certified datasets. The upside there is that in order to create a dataflow on a dataset you must ahve access to all rows of the dataset. if PDP is enabled, people would not be able to construct downstream dataflows.
arguably you should not create ETLs on certified datasets (that really defeats the point!) so there's a win there.
if you're comfortable scripting in Python or Node.js it might be more expedient to build reports based on APIs
Jae Wilson
Check out my 🎥 Domo Training YouTube Channel 👨💻
**Say "Thanks" by clicking the ❤️ in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"1
Categories
- All Categories
- 1.9K Product Ideas
- 1.9K Ideas Exchange
- 1.6K Connect
- 1.3K Connectors
- 302 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 104 SQL DataFlows
- 637 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 762 Beast Mode
- 65 App Studio
- 42 Variables
- 704 Automate
- 182 Apps
- 458 APIs & Domo Developer
- 53 Workflows
- 11 DomoAI
- 39 Predict
- 16 Jupyter Workspaces
- 23 R & Python Tiles
- 401 Distribute
- 116 Domo Everywhere
- 277 Scheduled Reports
- 8 Software Integrations
- 133 Manage
- 130 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 12 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 111 Community Announcements
- 4.8K Archive