MagicETL/SQL to join datasets using a comparison between values

mberkeley
mberkeley Contributor

I am trying to get a geolocation based on the IPNumber.

Dataset a:

ipaddress

ipnumber (translated from ipaddress)


Dataset b:

lowIPNumber

highIPNumber

Country

Region

City


I would like to add the Country, Region, and City values to an output dataset:

ipaddress

ipnumber

Country

Region

City


where a.ipnumber >= b.lowIPNumber and a.ipnumber <= b.highIPNumber


any ideas on the best approach?

Tagged:

Best Answer

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Depending on the size of your dataset and the processing needs you can do this in either MySQL (simpler but slower) or MagicETL (more complex, possibly slower depending on dataset size).


    MySQL dataflow allows for conditional joins so it's a simple query:

    SELECT a.`ipaddress`, a.`ipnumber`, `Country`, `Region`, `City`
    FROM `dataset_a` as a
    join `dataset_b` as b on a.ipnumber >= b.lowIPNumber and a.ipnumber <= b.highIPNumber
    


    MagicETL doesn't support conditional joins so you'd need to do a cartesian join and then filter after the fact. You'd have an Add Column tile to create a new column (call it 'Join Column') with a value of 1 for each, then do a join based on the join column then after that you'd do your filter in a filter tile (ipnumber >= lowIPNumber and ipnumber <= highIPNumber)

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • GrantSmith
    GrantSmith Coach
    Answer ✓

    Depending on the size of your dataset and the processing needs you can do this in either MySQL (simpler but slower) or MagicETL (more complex, possibly slower depending on dataset size).


    MySQL dataflow allows for conditional joins so it's a simple query:

    SELECT a.`ipaddress`, a.`ipnumber`, `Country`, `Region`, `City`
    FROM `dataset_a` as a
    join `dataset_b` as b on a.ipnumber >= b.lowIPNumber and a.ipnumber <= b.highIPNumber
    


    MagicETL doesn't support conditional joins so you'd need to do a cartesian join and then filter after the fact. You'd have an Add Column tile to create a new column (call it 'Join Column') with a value of 1 for each, then do a join based on the join column then after that you'd do your filter in a filter tile (ipnumber >= lowIPNumber and ipnumber <= highIPNumber)

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • mberkeley
    mberkeley Contributor

    Way too big for a cartesian join. (18M records x 2.9M records)!