GA4 Weekly and Monthly data - should I schedule a better import or make an ETL?

I'm not sure which direction to take - has anyone dealt with GA4 weekly, monthly, and yearly datasets before?

I've realized that if I need to get a summary number of users over a certain time period in GA4, the only true way to make sure they are not double counted is to make separate datasets that pull in users by day, by week, by month, and by year.

If, though, I make a dataset that pulls in users per week and that appends and updates daily, it pulls in duplicate data - On Day 1 of the week, it pulls in Day 1 data. On Day 2 of the week, it pulls in users from Days 1 and 2. On Day 3 of the week, it pulls in users from Days 1, 2, and 3. So by the end of the week, my summary number for the week is massively inflated.

If I make it replace, the data will start deleting after 14 months, which is the longest time Google is storing that data.

So, for what I can figure, I need to either schedule the dataset to only update once a week, and for the previous week only, or I need to set up an ETL for the Weekly, Monthly, and Yearly datasets to only accept the last data update of any week, month, or year.

Am I missing something? Is there a recommended course of action for this?

Best Answers

  • MarkSnodgrass
    Answer ✓

    I don't have to work with GA4 data, but generally speaking, you can eliminate duplicates in Magic ETL. You can use the Remove Duplicates tile, or you can use the Group By tile and use aggregations such as Max and Count Distinct to remove duplicates.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • DataMaven
    DataMaven Coach
    Answer ✓

    Hi @KristinDavis - I just finished an extensive project getting GA4 data to come in even remotely accurately. There are SO MANY nuances. For the User Counts, you are correct that if you don't want to double count users for the period, you have to have pulls by each desired grain. This is because all of the data is aggregated, with no distinct user identifiers. The solution I use is recursive dataflows, which also solves the issue of the 14 month history that you mentioned.

    If you need help with this or want to know more about my findings on GA4, please feel free to message me.

    DataMaven
    Breaking Down Silos - Building Bridges
    **Say "Thanks" by clicking a reaction in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"

Answers

  • MarkSnodgrass
    Answer ✓

    I don't have to work with GA4 data, but generally speaking, you can eliminate duplicates in Magic ETL. You can use the Remove Duplicates tile, or you can use the Group By tile and use aggregations such as Max and Count Distinct to remove duplicates.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • DataMaven
    DataMaven Coach
    Answer ✓

    Hi @KristinDavis - I just finished an extensive project getting GA4 data to come in even remotely accurately. There are SO MANY nuances. For the User Counts, you are correct that if you don't want to double count users for the period, you have to have pulls by each desired grain. This is because all of the data is aggregated, with no distinct user identifiers. The solution I use is recursive dataflows, which also solves the issue of the 14 month history that you mentioned.

    If you need help with this or want to know more about my findings on GA4, please feel free to message me.

    DataMaven
    Breaking Down Silos - Building Bridges
    **Say "Thanks" by clicking a reaction in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • KristinDavis
    KristinDavis Member
    edited February 2024

    @DataMaven I'm so glad a I found another person deep in the weeds with me on GA4!

    I was looking at recursive dataflows in another thread. How does that benefit me more than just making an ETL with a new dataset? Does it eliminate the additional rows of data?