Is it possible to batch edit source data?

Options

I have a large dataset that needs some cleaning. We have appended to this dataset daily over several years and some of the data values have drifted over time as users have renamed what are essentially labels for different entities. If this were SQL, I could write an update statement and quickly replace the values in label fields based on an id field. For example:

UPDATE dataset JOIN updated_labels ON dataset.id_column = updated_labels.id_column SET dataset.label_field = updated_labels.label_field;

Is there a way to run this kind of data editing operation directly in Domo? So far, the solutions I find fall into these categories:

  1. use data edit mode to select batches that were appended to the data, remove those batches, and then re-import the suspect data batches
  2. use an ETL workflow to transform the data and output to a new dataset

The first option won't work here because there are far too many historical appends to go back and re-import them all. It would take a considerable amount of effort to rebuild the data.

The second option means treating the original dataset as immutable and copying it over to an identical dataset with the labels fixed. The drawbacks here are it would be a waste of storage to keep two copies of the dataset, and I would have to edit all of my cards to point to the new dataset.

Is there instead a way to simply clean an existing dataset? I just need a way to run a one-time, ad-hoc query to realign data values.

Best Answers

  • GrantSmith
    GrantSmith Coach
    Answer ✓
    Options

    A very unsupported method would be to create a dataflow where you have your input and output with the transformations you want to clean up the data. This will write out to a new dataset as you suggested but this is good for validation purposes. If the output dataset now meets your criteria you can hack the JSON powering your dataflow and change the dataset ID of the output to be your input dataset. This will write to your original input dataset. Again, this is not for the faint of heart and can permanently corrupt your dataset if you're not careful.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • hanmari
    hanmari Member
    Answer ✓
    Options

    The answer I received from support is that Domo does not have any tools that will allow altering a dataset directly. If there were options to perform an ETL that imports from a dataset, makes changes, and then writes the changes back to the imported dataset, they are no longer available. @GrantSmith suggests an unsupported hack that could result in writing back to the initial dataset. Before I take that approach, I will see what I can do to refactor my dataset to keep labels out of it. I then won't have to worry about tweaking historical data to keep trivial values from drifting. My best option right now is to export the data out of Domo, alter it locally, then upload it back to Domo so it overwrites the original dataset.

Answers

  • ellibot
    ellibot Contributor
    Options

    @hanmari try appending '?_f=dataRepair' to the end of the dataset URL (so it would look something like this: https://{{instance}}.domo.com/datasources/987da1fb-0b5e-493d-85a0-fe8b10136cf4/details/overview?_f=dataRepair).

    You can then download the dataset as a csv, clean it in Excel, and then reupload it using the 'Replace with CSV' button.

  • hanmari
    hanmari Member
    Options

    @ellibot Does the data repair mode allow you to replace all of the uploads at once? I can export the entire 10M row dataset to CSV (about 1GB) and I can edit the dataset locally, but I don't see how to upload the cleaned dataset using the 'Replace with CSV' button. Because my dataset has been appended to daily I have a list hundreds of entries long that allows me to export and replace each day's upload. Perhaps I could just click on the most recent 'Replace with CSV' button and upload the whole dataset as if it were yesterday's append upload. I'm wary of doing so, however, as I suspect it would treat the massive upload as yesterday's data append and I would double the size of my dataset by adding duplicate rows.

  • ArborRose
    Options

    I asked Domo advisers to workgroup with me to help me do what you suggest. After explaining what I wanted to do…I was told we couldn't. That's doesn't mean its not possible. It means between me and the other knuckleheads in the work session, we didn't know how.

    I really like what @Elibot is selling, but I'm not familiar with the dataRepair parameter. I'm hoping you say it works so I can explore this myself. Please mark his answer correct if it works.

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • hanmari
    hanmari Member
    edited March 7
    Options

    @ArborRose Part of the problem may have to do with the underlying architecture of how Domo stores large amounts of data. I suppose that as Domo receives records it immediately shoves them into a read-only state and assigns a batch number to the uploaded data. The data is therefore highly available for reading, but not for writing. I can excise a batch of uploaded records and replace them with a different upload, but I would not be able to edit the data that is stored. This type of architecture would make it easy to read stored records, use transformations to modify them, and then save the altered values into a new, read-only dataset. The drawback is if the architecture renders stored data as immutable, then I will not be able to simply edit the stored data.

    I have created a help ticket with Domo to see if they do have any tools that will enable editing stored data, and I hope their customer service can point me to a tool or API that might allow me to edit stored data. If there isn't such a tool, then I will most likely need to export my entire dataset, transform it locally, then delete the old dataset in Domo and write the updated dataset back to Domo in a single upload transaction. I'm hoping I can do so without needing to reconnect cards or rebuild the ETL transformations that stem from this dataset.

  • ArborRose
    Options

    I agree. And something to do with indexing and prepping that data.

    I think many of us have this issue. We're used to environments like SQL where we can deal directly with the data. And in Domo, we find ways to bend it to our will. @Elibot - thanks for the info on the repair parameter.

    ** Was this post helpful? Click Agree or Like below. **
    ** Did this solve your problem? Accept it as a solution! **

  • GrantSmith
    GrantSmith Coach
    Answer ✓
    Options

    A very unsupported method would be to create a dataflow where you have your input and output with the transformations you want to clean up the data. This will write out to a new dataset as you suggested but this is good for validation purposes. If the output dataset now meets your criteria you can hack the JSON powering your dataflow and change the dataset ID of the output to be your input dataset. This will write to your original input dataset. Again, this is not for the faint of heart and can permanently corrupt your dataset if you're not careful.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • hanmari
    hanmari Member
    Answer ✓
    Options

    The answer I received from support is that Domo does not have any tools that will allow altering a dataset directly. If there were options to perform an ETL that imports from a dataset, makes changes, and then writes the changes back to the imported dataset, they are no longer available. @GrantSmith suggests an unsupported hack that could result in writing back to the initial dataset. Before I take that approach, I will see what I can do to refactor my dataset to keep labels out of it. I then won't have to worry about tweaking historical data to keep trivial values from drifting. My best option right now is to export the data out of Domo, alter it locally, then upload it back to Domo so it overwrites the original dataset.