Is there an advantage to splitting a DataSet with a DataFlow over using a page/card filter?


We're using and learning Domo, and I'm checking out some alternatives as they come up. The Magic ETL DataFlow editor is slick. I'm not normall one for visual programming tools but, dang, that window is impressive.


Here's my test setup. I've got a DataSet with about 220K rows. It lumps together data from a variety of facilities into the one DataSource. This is very convenient for the import and seems fine. Once inside Domo most, if not all, cards and pages will only pull data from one facility out of the full DataSet. So, imagine out of the 220K rows that any one card/page only needs 5-10K of the records, max.


I just tried out a simple DataFlow that produces a new DataSet for one facility by using a simple filter rule. Here's the question, is there any point? I don't know enough (anything) about Domo's internals. Will I see any speed advantage to having pre-created a smaller DataSet for individual facilities? For all I know, the DataFlow is functionally a mask and I'll see no benefit. Or, it could be making a duplicate set with its own indexes and I will. And, for all I know, page filters may already do that sort of work behind the scenes automatically.


I know that my example row counts are small - but they'll grow. I figured I'd ask about alternatives now before we've sunk a lot of time into any particular approach.


Thanks very much! I've been getting great answers here, much appreciated.


  • Echelon
    Echelon Contributor

    Here's an example that might help you think it through: for our org, we have about 3 products. But 90% of our focus and business is on product #1.


    To have a data output that focuses by product, is helpful so that every time a new card is created from scratch, you don't have to add in that 1st layer filter of Product = 1. Especially if the Products would be siloed from each other anyway, there would be no reason to have them intermingle for cross comparison.


    And if you're saving the card builders from a few extra steps, it will keep human error out of the equation as much as possible, and save time on the card building process.

  • I feel that there are more advantages to keeping the dataset together as one big dataset.  We have some pretty large data sets and I have not seen any noticeable delay in performance when using a large data set.  The biggest advantage is that when you create a new card, even if it is for a specific facility, you could easily edit the card for the other facilities.  Any beastmodes that were created on the dataset could be applied to all of the facilities as well.  If you want to limit a user's access to data from other facilities, I would look into PDP solutions.  

    “There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
  • DataSquirrel

    Thanks for the answers!

    I built a pair of test charts drawn from the two different data sources (~200K with a fitler and ~5K with no filter) and they both render pretty much instantly. Domo is pretty mind-blowingly fast.


    I like the idea of a single data set because it's less hassle in so many ways. However, we need to be sure not to leak data inappropriately, so the idea of a pre-filtered data set is also appealing. Building out that many data sets over time and adding new ones for new customers and locations would be a big pain. I'd figure we would need to look at the Publications Group features. That sounds like a very nice way to do, but it seems that the Pub Group's security interactions with PDP are...a bit confusing. Also, Pub Groups don't support SumoCards and some other interesting-sounding features.


    Is there any sort of best practice or common pattern/strategy for dealing with data where the format is the same amongst customers but the content is private to each customer? (Conceputally, filters/views seem like a sensible approach.)


    I'm new here and just trying to come to grips with the alternatives and best stategies before we get too locked into an approach that we latter find insecure or too restrictive. So thanks for any and all points of views and suggestions.

  • I would recommend learning how to implement PDP in your instance.  Contact your Domo CSM and see if they can put you in touch with someone to set this up.  We have a fairly large instance and we are pushing our metrics out to our global salesforce.  We don't want sales reps to have access to information for any account that is not theirs, and we only wanted our manager and area directors to have access to the customers in their area.  


    PDP is working beautifully to accomplish this.  It also allows us to share any PDP enabled data set to our team and we don't have to worry about who sees what.  If they have access to the account in our Salesforce instance, they can see it in Domo.  

    “There is a superhero in all of us, we just need the courage to put on the cape.” -Superman