How do I use window function date ranges in DOMO ETLs?

Options

The short version, I need the following query to work somewhere in DOMO ETL:

AVG(`Score`) OVER (PARTITION BY Company ORDER BY 'Date` RANGE BETWEEN INTERVAL '90' DAY PRECEDING AND INTERVAL '0' DAY FOLLOWING)

More detail:

I tried using group by but cannot use window functions, formulas but can't use aggregations, and rank and window but is only limited to preceding rows (which would be fine if I fill in the dates for each company using the connector, but I'd like to keep it simple).

Is there a beta function to fully allow ETL window functions, or am I missing something? I can probably accomplish this in a beast mode (haven't tried), but this is only a small piece of many other connected ETLs. This one is no different and feeds into another ETL to produce a total score or transposes for various graphs. I'd rather not have to make multiple beast modes and redo the structure of this particular project just for one piece to work.

I could use Python pandas SQL but I would like to avoid using the scripting tiles if possible

PS - I have no idea why Company happened to code inline correctly compared to Date and Score. Pointing it out as an interesting curiosity in case that piques the interest of any DOMO folks.

Tagged:

Best Answer

  • rco
    rco Contributor
    edited April 17 Answer ✓
    Options

    Regrettably, window functions aren't supported in Magic ETL formulas, and as you've already determined it's not possible to express that window frame using the Rank & Window tile. Including the frame boundaries in the original data, using Beast Mode, or using the Python Script tile are all options.

    If the whole input dataset is reasonably small (say, less than a million rows) you could probably do it with a cross product join of the table against itself:

    EDIT: There's a better way to do this than I had originally described. Rather than a full cross product, cross products within each Company result in less intermediate rows and obviate the need for the company equality check later.

    1. Make sure you have a row key of some kind; that is, one or more columns that can together uniquely identify a row.

    2. Join the data to itself using Company as the join column. Since you can't connect one tile to another "doubly", send one connection to a Select tile first and select just the Score, Company, and Date columns. Give them new names as well so they don't collide; prefix all of them with Other_

    4. Add Formula next, with a column called in_frame using this expression: CASE WHEN DATEDIFF(Date, Other_Date) < 91 THEN 1 ELSE 0 END

    5. Now a Group By, using the "row key" column(s) that we mentioned we'd need in step 1 as the grouping key. You'll need to add an aggregate for every non-grouping column that you want to preserve in the output as well; this can just be "First value". For the window calculation, add the aggregate expression: SUM(CASE WHEN in_frame THEN Score ELSE 0 END) / SUM(in_frame)

    Randall Oveson <randall.oveson@domo.com>

Answers

  • rco
    rco Contributor
    edited April 17 Answer ✓
    Options

    Regrettably, window functions aren't supported in Magic ETL formulas, and as you've already determined it's not possible to express that window frame using the Rank & Window tile. Including the frame boundaries in the original data, using Beast Mode, or using the Python Script tile are all options.

    If the whole input dataset is reasonably small (say, less than a million rows) you could probably do it with a cross product join of the table against itself:

    EDIT: There's a better way to do this than I had originally described. Rather than a full cross product, cross products within each Company result in less intermediate rows and obviate the need for the company equality check later.

    1. Make sure you have a row key of some kind; that is, one or more columns that can together uniquely identify a row.

    2. Join the data to itself using Company as the join column. Since you can't connect one tile to another "doubly", send one connection to a Select tile first and select just the Score, Company, and Date columns. Give them new names as well so they don't collide; prefix all of them with Other_

    4. Add Formula next, with a column called in_frame using this expression: CASE WHEN DATEDIFF(Date, Other_Date) < 91 THEN 1 ELSE 0 END

    5. Now a Group By, using the "row key" column(s) that we mentioned we'd need in step 1 as the grouping key. You'll need to add an aggregate for every non-grouping column that you want to preserve in the output as well; this can just be "First value". For the window calculation, add the aggregate expression: SUM(CASE WHEN in_frame THEN Score ELSE 0 END) / SUM(in_frame)

    Randall Oveson <randall.oveson@domo.com>

  • MattLynn
    Options

    Thanks @rco, I had a feeling it wasn't supported. I like your idea, I'll give it a try. Basically, need to perform all the "window functions" explicitly. It'll blow up the dimensions for calculating but I have a feeling this will still be faster than running a scripting tile.

  • rco
    rco Contributor
    Options

    @MattLynn Be sure to check out the edit to my answer; I was missing a pretty significant optimization/simplification. Let me know if it works out.

    Randall Oveson <randall.oveson@domo.com>