Pushing from multiple sources and questions about using the DataSet API

DataSquirrel
DataSquirrel Contributor

Currently, we're storing a lot of custom data in a central server, summarizing it, generating spreadsheet, and feeding these into Domo via Workbench. It works really well, partly because we've been able to do full DataSet replacements rather than "appends." As we're accumulating more data, the process is slowing down. Also, as we build out, we'll have multiple servers, not a single consolidated servers. This information is the background for what we're looking at doing next.

 

We've got our DatSets, page filters and PDP rules set up to allow us to reuse pages and cards to show very different slices of data. What we're hoping to do is continue along these lines, but have multiple sites append data to a single DataSet. Is this possible, or does each site need a distinct DataSet that we then combine on the back-end?

 

I've had a look at several options for how to handle multiple sites pushing into Domo:

  • Workbench + append.

  • Centralizing everything in Postgres and pulling from there in Domo directly, without Workbench.

  • Using the DataSet API to perform custom appends.

  • Using the Stream API to perform custom appends.

Of these options, the DataSet API seems like the most straightforward. I've used a bunch of the "read" APIs in the past to pull down DataSet, etc. details for auditing and review. The APIs were fine to work with. I just tried a little "hello world" code and was able to create a DataSet easily enough. I've looked through the docs and have some questions. I'd love to hear back from anyone who has used the DataSet and or Stream API to define, configure and update DataSets. Not just with answers to my questions, but info on gotchas, tips, anything you're inclined to share.

 

My questions so far are:

 

  • Can you use the DataSet API to allow different locations to push data the same DataSet? I can test this out myself with a bit of work, but it seems like something others will already know and might be able to share.

  • I see that REPLACE and APPEND are supported. I'm guessing that UPSERT is not supported, based on comments I've seen on the forums. Is that right?

  • How do people manageAPPEND to avoid duplicates and gaps? We've got a custom concurrency ID (sequence number) system in place for a custom sync tool, so we're able to track what we have and have not posted. Assuming we can get accurate feedback from the API about success and failure. For example, if we push 200 rows and 1 fails, is there a ROLLBACK, or does Domo insert 199 rows?

  • Any words of wisdom on the CSV? I hate CSV with a hatred that is red hot, sharp, and very pointy. Then again, it's a lot more compact than sending up typical XML or JSON. So that's good for performance.

  • Do you know if Domo accepts compressed payloads? The examples don't list compression headers. I'm hoping that this is only because every effort was made to make the examples as straightforward as possible.

  • Is it okay to quote every element, even ones that don't need it? From the docs (https://developer.domo.com/docs/dataset/formatting-data-to-import), I'm guessing "yes, you can throw in more quoting that strictly required."

  • The docs include a note which reads, in part, "Note: Only DataSets created with the API can be updated via APIs." Is this correct? If so, it means we'll need to create new DataSets, populate them, and then migrate all of our pages/cards/rules to the newly defined DataSets. Ouch. Have I misunderstood?

  • Our likely cases are modestly sized data sets updated on a regular basis. These are not one-off data uploads, neither are they huge uploads. This leaves me wondering if we should be looking at the Stream API instead of the DataSet API. (Stream sounds better for continuous and reliable updates.)
  • Can we define a DataSet via the DataSet API and then later use it with the Stream API? Not clear on that...it looks like the answer is "yes", but I'm not sure. The examples show a Stream including a dataSet.id property, and then a full DataSet definition. I'm not clear how the two sets of functionality overlap. I'd like to be able to set everything up with the DataSet API and then use the Stream API for pushes, where justified.

I can test all, or nearly all, of what I've asked on my own....but am hoping to save some time by learning from others that have been down this road in the past. And, please, any broad or specific suggestions or notes that I haven't though to ask about are more than welcome.

 

Thanks!

This discussion has been closed.