I know this has been asked before, in different ways and different contexts, but I am hoping to create a general (ie object-oriented) framework that standardizes common ETL tasks across multiple datasets.
I know I can create a Magic ETL dataflow which applies a common transform across multiple inputs and gives multiple outputs, but that tends to look quite messy when things get large. The data lineage becomes very confusing, plus I still have to set up each individual data transform by hand. What I really want is the ability to call a general User Defined Function (UDF) in a transform.
Using redshift or mysql, I can almost do that now. I can create a UDF as one step of my tranform, and call it in later steps, which is cool. What I really need, though, is a UDF stored centrally and independently so that I can call it from any transform using any dataset. Any ideas on how to implement that?