MagicETL/SQL to join datasets using a comparison between values
I am trying to get a geolocation based on the IPNumber.
Dataset a:
ipaddress
ipnumber (translated from ipaddress)
Dataset b:
lowIPNumber
highIPNumber
Country
Region
City
I would like to add the Country, Region, and City values to an output dataset:
ipaddress
ipnumber
Country
Region
City
where a.ipnumber >= b.lowIPNumber and a.ipnumber <= b.highIPNumber
any ideas on the best approach?
Best Answer
-
Depending on the size of your dataset and the processing needs you can do this in either MySQL (simpler but slower) or MagicETL (more complex, possibly slower depending on dataset size).
MySQL dataflow allows for conditional joins so it's a simple query:
SELECT a.`ipaddress`, a.`ipnumber`, `Country`, `Region`, `City` FROM `dataset_a` as a join `dataset_b` as b on a.ipnumber >= b.lowIPNumber and a.ipnumber <= b.highIPNumber
MagicETL doesn't support conditional joins so you'd need to do a cartesian join and then filter after the fact. You'd have an Add Column tile to create a new column (call it 'Join Column') with a value of 1 for each, then do a join based on the join column then after that you'd do your filter in a filter tile (ipnumber >= lowIPNumber and ipnumber <= highIPNumber)
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**1
Answers
-
Depending on the size of your dataset and the processing needs you can do this in either MySQL (simpler but slower) or MagicETL (more complex, possibly slower depending on dataset size).
MySQL dataflow allows for conditional joins so it's a simple query:
SELECT a.`ipaddress`, a.`ipnumber`, `Country`, `Region`, `City` FROM `dataset_a` as a join `dataset_b` as b on a.ipnumber >= b.lowIPNumber and a.ipnumber <= b.highIPNumber
MagicETL doesn't support conditional joins so you'd need to do a cartesian join and then filter after the fact. You'd have an Add Column tile to create a new column (call it 'Join Column') with a value of 1 for each, then do a join based on the join column then after that you'd do your filter in a filter tile (ipnumber >= lowIPNumber and ipnumber <= highIPNumber)
**Was this post helpful? Click Agree or Like below**
**Did this solve your problem? Accept it as a solution!**1 -
Way too big for a cartesian join. (18M records x 2.9M records)!
0
Categories
- All Categories
- 2K Product Ideas
- 2K Ideas Exchange
- 1.6K Connect
- 1.3K Connectors
- 311 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 3.8K Transform
- 656 Datasets
- 115 SQL DataFlows
- 2.2K Magic ETL
- 812 Beast Mode
- 3.3K Visualize
- 2.5K Charting
- 80 App Studio
- 45 Variables
- 771 Automate
- 190 Apps
- 481 APIs & Domo Developer
- 77 Workflows
- 23 Code Engine
- 36 AI and Machine Learning
- 19 AI Chat
- AI Playground
- AI Projects and Models
- 17 Jupyter Workspaces
- 410 Distribute
- 120 Domo Everywhere
- 280 Scheduled Reports
- 10 Software Integrations
- 142 Manage
- 138 Governance & Security
- 8 Domo Community Gallery
- 48 Product Releases
- 12 Domo University
- 5.4K Community Forums
- 41 Getting Started
- 31 Community Member Introductions
- 114 Community Announcements
- 4.8K Archive