Changing dataset update method from Replace to Partition doubles data

jimsteph
jimsteph Contributor

We've noticed that if we change the update method of an existing dataset to Partition from replace, we end up with two records in the dataset for every new one (and both of them look identical, down to the field we're using for partitioning): it was quite the shock to see a dataset I expected to have 91 million rows suddenly had 182 million. The obvious takeaway is that we probably should start from scratch when using partitions, but the benefits of converting existing ETLs is too strong a siren song for me to resist.

Two questions:

  • Has anyone else noticed this? Both of us here got bit by this, so I want to know if it's a general bug, if it just affects our instance, or if we're doing it wrong and it's Working As Designed™.
  • What would be a way around this? If I have other ETLs downstream of the dataset I don't want to delete the existing one and start from scratch unless I absolutely have to. My quick-and-probably-inefficient idea is to store the data to a secondary dataset (deduping if necessary), set the original to partition, and figure out how to use the even more beta feature to tell it to keep no partitions. Let it run once to clear out the dataset, then reimport the data from the secondary dataset and send it back to the original.

Any help would be appreciated.

Best Answers

  • Jones01
    Jones01 Contributor
    Answer ✓

    @jimsteph

    I believe this is the expected behaviour. I had discussed this with someone at domo. I think you end up having the original data with a null partition key. They suggested filtering this out.

  • jimsteph
    jimsteph Contributor
    Answer ✓

    Well that's annoying, @Jones01. I'll go ahead and put in a ticket and hope you're wrong, because doubling the number of instantiated rows could get expensive for everyone.

    Thanks both of you for the response.

Answers

  • This sounds like a bug. I'd recommend logging a ticket with Domo Support.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • Jones01
    Jones01 Contributor
    Answer ✓

    @jimsteph

    I believe this is the expected behaviour. I had discussed this with someone at domo. I think you end up having the original data with a null partition key. They suggested filtering this out.

  • jimsteph
    jimsteph Contributor
    Answer ✓

    Well that's annoying, @Jones01. I'll go ahead and put in a ticket and hope you're wrong, because doubling the number of instantiated rows could get expensive for everyone.

    Thanks both of you for the response.