Common Practice Joining
Hi,
I am running into an issue where I am trying to join two datasets. Joining on a Unique ID is not an option as whoever set up this dataset decided not to include one. My next thought process was to join on date and name. Here is the issue I am running into. Dataset one has the full name for example Fredrick. Dataset 2 just has Fred. Its for multiple names in the datasets. I was thinking I could join them on last name instead but there are multiple last names that are similar. Is there a way to do index matching where Id just take the 3 first indexes and match them? What other ways would yall approach this?
thank you in Advance
Best Answer
-
You said it's for multiple names in the dataset - I'm interpreting that as not very many and you know which they are? I would suggest just cleaning up the names on the dataset in the ETL using the formula tile:
CASE WHEN `FirstName` = 'Fred' AND `LastName` = 'Johnson' THEN 'Frederick' ELSE `FirstName` ENDBecause even full names are not always unique, I suggest bringing in something like email or creating your own ID for each individual that is in your historic data and having a new ID generated for every name added later.
If I solved your problem, please select "yes" above
0
Answers
-
You said it's for multiple names in the dataset - I'm interpreting that as not very many and you know which they are? I would suggest just cleaning up the names on the dataset in the ETL using the formula tile:
CASE WHEN `FirstName` = 'Fred' AND `LastName` = 'Johnson' THEN 'Frederick' ELSE `FirstName` ENDBecause even full names are not always unique, I suggest bringing in something like email or creating your own ID for each individual that is in your historic data and having a new ID generated for every name added later.
If I solved your problem, please select "yes" above
0 -
I believe Its only for a few, however, I was trying to future proof it in case new names are added.
0 -
I'd work to find a better unique identifier than name for the future. The best place to do this is the source systems. Otherwise maintenance could be a headache.
If I solved your problem, please select "yes" above
0 -
Agreed, I didnt set up this dataset nor do I have access to go and change it at the source system.
0
Categories
- All Categories
- 1.7K Product Ideas
- 1.7K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 295 Workbench
- 6 Cloud Amplifier
- 8 Federated
- 2.8K Transform
- 97 SQL DataFlows
- 608 Datasets
- 2.1K Magic ETL
- 3.8K Visualize
- 2.4K Charting
- 710 Beast Mode
- 49 App Studio
- 39 Variables
- 668 Automate
- 170 Apps
- 446 APIs & Domo Developer
- 45 Workflows
- 7 DomoAI
- 33 Predict
- 13 Jupyter Workspaces
- 20 R & Python Tiles
- 391 Distribute
- 111 Domo Everywhere
- 274 Scheduled Reports
- 6 Software Integrations
- 115 Manage
- 112 Governance & Security
- Domo Community Gallery
- 31 Product Releases
- 9 Domo University
- 5.3K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 103 Community Announcements
- 4.8K Archive