Data transformation ideas around old/new user data

HI, we're migrating from username sign in, to okta that uses email as user id. we have an existing user info data with user id as username (xfsgsbss) but we would have a new data with user id as email(fdhsvbsjsbsjsh@email.com). I want to be able to join both old and new data and make sure the right attributes goes to them, but also create a dedup user (or unique user) so same user attributes with username will not be inflated by the same user attributes with email, basically not overcounting.

Any ideas or tips would help. Thank you

Best Answer

  • MarkSnodgrass
    Answer ✓

    It's very unclear as to how you can map the old user id to the new user id based on the way you are describing it. I would suggest you upload a spreadsheet that has two columns, old user id and new user id that serves as a lookup table. You can then create an ETL that has 3 input datasets: old id table, new id table, and the lookup table spreadsheet. You can then perform a join from your old id table to your lookup table and then join this to your new id table. Finally, you can use the group by tile to eliminate duplicates by doing a count on your new id names.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.

Answers

  • You can use a formula tile to populate a new user id column in your old dataset by just appending the @email.com with a CONCAT formula:

    CONCAT(`Old User Name`, '@email.com')
    

    Then you can use that new field to join to your new dataset and get similar records.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • @GrantSmith Thanks for the response

    One thing I forgot to add is, there are some users with multiple username to their email, but after migration if a user has more than one username to their email, their usage will be aggregated to their email . so the above might not work in that case.

  • Hi @Pellogat, in the case of multiple usernames for any email, does it work to just include the latest username to email?

  • @Pellogat

    Does you old system have an email address associated with the usernames or do you only have the usernames?

    How can you determine in the old system if a user has multiple usernames / accounts?

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • MarkSnodgrass
    Answer ✓

    It's very unclear as to how you can map the old user id to the new user id based on the way you are describing it. I would suggest you upload a spreadsheet that has two columns, old user id and new user id that serves as a lookup table. You can then create an ETL that has 3 input datasets: old id table, new id table, and the lookup table spreadsheet. You can then perform a join from your old id table to your lookup table and then join this to your new id table. Finally, you can use the group by tile to eliminate duplicates by doing a count on your new id names.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • I agree with Mark's solution, and would add that in your grouping, you need to be cognizant of any other attributes you are capturing. For example, if there are multiple user names associated with an email address, what is different between them? Are their position, department, manager, etc., included in with the user name data? Do you want to incorporate any of that or just take the new info from the email user name? If you are just needing to do simple mapping, Mark's solution will make it a piece of cake!

    DataMaven
    Breaking Down Silos - Building Bridges
    **Say "Thanks" by clicking a reaction in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • @Pellogat any chance you could show an example with your ideal inputs / outputs? Have any of the previous posters answered your question?