Only using data from latest append

The dataset I'm using is appeneded to once a day (but at somewhat irregular times), domo has created the columns _BATCH_LAST_RUN_ and _BATCH_ID_ to show this.

I want to create a card using this dataset, but I only want to use data from the latest batch.

 

What is the best way to do this?

Thanks.

Best Answer

  • n8isjack
    n8isjack Contributor
    Answer ✓

    There are a handful of ways to achieve this, but almost all of them require a dataflow or blend. My prefered way is via SQL, which also could have a lot of approaches. It can be done via magic etl as well, so I'll show that too.

     

    Here is the logical thought process:

    • Get the maximum _BATCH_ID_ value in the dataset (1 record)
    • Add a new column (max _BATCH_ID_) to the dataset with this maximum _BATCH_ID_ we just found
    • Filter based on _BATCH_ID_ = max_BATCH_ID_. This can be done at the card level or in the dataflow.

    SQL

    Here is my SQL dataflow approach. This is a very rudimentary approach and not all steps above are visible... but they are happening. I use a subquery which also means I am not using indexes, you may wish do that as a transform all on its own if you have a very large dataset so you can add indexes on _BATCH_ID_.

     

    SELECT a.*
    FROM
    mytable a
    INNER JOIN (SELECT MAX(_BATCH_ID_) AS '_BATCH_ID_' FROM mytable) b
    USING(`_BATCH_ID_`)

    Magic ETL

    If you want to go with Magic ETL, this is how I would approach it.

     

    First: [Add Constants] This is not a logical step above, but we need it because of some strangeness to in the Magic ETL tool. Use these settings in the action setup panel:

    1. I will name my field `JOINER`
    2. Make it a Whole Number
    3. Simple value, like 1 or 0

     

    Second: [Group By] Use these settings

    1. Select JOINER for the grouping
    2. We will use `max_BATCH_ID_` for the field name
    3. Use _BATCH_ID_ as the basis for the calculation
    4. "Maximum" is the aggregate function

     

    Third: [Join Data] Besides steps 1 and 2 in this action block which I'll cover in a second, you also need to pick a join condition with the icon in the middle between 1 and 2. Choose "Inner" if you only want the last batch to appear, Choose "Left Outer" if you want all the data in the output and have the ability for the card to filter to the last batch as needed. Otherwise, here are your settings:

    1. Pick the [Add Constants] input and `JOINER` for the field.
    2. Domo will have already added the [Group By] input, you need to pick the `JOINER` field.

     

    Last: [Output Dataset]

    If you chose "Inner" join then only the last batch will be output. If you chose 'Left Outer' then you can create a beastmode in the card something like this:

     

    CASE
    WHEN `max_BATCH_ID_` = `_BATCH_ID_` THEN 'Last Import'
    ELSE 'Old Import'
    END

    Then filter the card on that beastmode and select 'Last Import' as the filter option. Here is a snapshot of the layout when it is done...

     

    image.png

     

    Good luck! Hope this helps!

     


    **Say "Thanks" by clicking the "heart" in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"

Answers

  • n8isjack
    n8isjack Contributor
    Answer ✓

    There are a handful of ways to achieve this, but almost all of them require a dataflow or blend. My prefered way is via SQL, which also could have a lot of approaches. It can be done via magic etl as well, so I'll show that too.

     

    Here is the logical thought process:

    • Get the maximum _BATCH_ID_ value in the dataset (1 record)
    • Add a new column (max _BATCH_ID_) to the dataset with this maximum _BATCH_ID_ we just found
    • Filter based on _BATCH_ID_ = max_BATCH_ID_. This can be done at the card level or in the dataflow.

    SQL

    Here is my SQL dataflow approach. This is a very rudimentary approach and not all steps above are visible... but they are happening. I use a subquery which also means I am not using indexes, you may wish do that as a transform all on its own if you have a very large dataset so you can add indexes on _BATCH_ID_.

     

    SELECT a.*
    FROM
    mytable a
    INNER JOIN (SELECT MAX(_BATCH_ID_) AS '_BATCH_ID_' FROM mytable) b
    USING(`_BATCH_ID_`)

    Magic ETL

    If you want to go with Magic ETL, this is how I would approach it.

     

    First: [Add Constants] This is not a logical step above, but we need it because of some strangeness to in the Magic ETL tool. Use these settings in the action setup panel:

    1. I will name my field `JOINER`
    2. Make it a Whole Number
    3. Simple value, like 1 or 0

     

    Second: [Group By] Use these settings

    1. Select JOINER for the grouping
    2. We will use `max_BATCH_ID_` for the field name
    3. Use _BATCH_ID_ as the basis for the calculation
    4. "Maximum" is the aggregate function

     

    Third: [Join Data] Besides steps 1 and 2 in this action block which I'll cover in a second, you also need to pick a join condition with the icon in the middle between 1 and 2. Choose "Inner" if you only want the last batch to appear, Choose "Left Outer" if you want all the data in the output and have the ability for the card to filter to the last batch as needed. Otherwise, here are your settings:

    1. Pick the [Add Constants] input and `JOINER` for the field.
    2. Domo will have already added the [Group By] input, you need to pick the `JOINER` field.

     

    Last: [Output Dataset]

    If you chose "Inner" join then only the last batch will be output. If you chose 'Left Outer' then you can create a beastmode in the card something like this:

     

    CASE
    WHEN `max_BATCH_ID_` = `_BATCH_ID_` THEN 'Last Import'
    ELSE 'Old Import'
    END

    Then filter the card on that beastmode and select 'Last Import' as the filter option. Here is a snapshot of the layout when it is done...

     

    image.png

     

    Good luck! Hope this helps!

     


    **Say "Thanks" by clicking the "heart" in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
This discussion has been closed.