How do Dataflows impact Dataset Certification?

Jack_Kaye
Jack_Kaye Member
edited May 2022 in Magic ETL

I'm exploring the option of implementing Card/Dataset Certification for our company. I need a better understanding of how a DF can impact the certification of a Dataset (DS).

My basic understanding of DS Certification is that the certification will expire on the DS if there is a schema change (i.e. Columns deleted, renamed), a configuration change (does this include datatype or column width changes?), or an Account Change (NOT clear on what this is).

How does a schema change impact DS's further down in the hierarchy? So DF1 runs and it creates a new column in the output DS1. DS1 is used as input to DF2 which outputs to DS2. When DF1 runs and it modifies the schema of DS1, it's certification should expire. If DF2 then runs and now outputs the new column to DS2, will that expire the certification for DS2?

If something is changed in the DF (i.e. adding additional filter logic) that produces the output DS but there is no schema change to the DS, will that expire the certification on the DS? Can the DF cause a DS certification to expire if there is NO schema change but the logic of how the overall DS is created is changed?

Additionally, if a DS certification is expired, does that automatically expire any card certifications that are based on that DS?

Regards,

Jack

Comments

  • Certification only applies to cards and datasets so changing the logic of a dataflow won’t affect anything unless you change the schema of the output dataset that is certified. There is no downstream decertification that occurs if an upstream dataset is decertified. I’d recommend suggesting the downstream certification in the idea exchange.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • @Jack_Kaye certification (imho) was not a thoroughly baked feautre, and @GrantSmith already pointed out some of the shortcomings.

    You could augment Certification with additional workflow steps. For example you could enable PDP on certified datasets. The upside there is that in order to create a dataflow on a dataset you must ahve access to all rows of the dataset. if PDP is enabled, people would not be able to construct downstream dataflows.

    arguably you should not create ETLs on certified datasets (that really defeats the point!) so there's a win there.

    if you're comfortable scripting in Python or Node.js it might be more expedient to build reports based on APIs

    https://www.youtube.com/watch?v=d2WAKIKpKlE&t=1s

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"