Join in ETL

Question:
I have two separate databases where there is only one common column, which is the "salesperson."

  • The first database contains sales values for each salesperson.
  • The second database contains the area of operation and products associated with each salesperson.

I plan to create a new column to calculate the value/area of operation ratio. The challenge is that when I filter out products, I want the sales value to remain constant, but the area of operation should decrease, thus increasing the value/area ratio.

How can I merge these two databases, ensuring that I can calculate the adjusted value/area ratio dynamically as I filter products, while keeping the value intact?

Any advice or step-by-step guidance would be greatly appreciated!

This version adds the specific calculation and filtering process you're aiming for.

Tagged:

Comments

  • Data_Devon
    Data_Devon Contributor
    edited January 31

    Hello caiorogano!

    Why would the sales remain constant even if products are removed?

    Also - what is your relationship between the datasets assuming that Salesperson is your Join Key? Is there one row per Salesperson, and values/area of operation/products are pivoted out to new rows?

    I want to make sure I understand before I offer my two pennies. Perhaps a GOAT can read through the lines to help, but I don't want to lead you astray with insufficient information.

    ✅Did this solve your problem? Accept it as a solution!

    ❤️Did you love this answer? Mark it as "Awesome"!

    👍Do you agree with this process? Click "Agree"!

  • My ultimate goal is to calculate USD per meter, using the Salesperson field as the key. In one dataset, I have all the revenue (USD) per salesperson, and in another dataset, I have Salesperson, Area of operation (meters), and products sold for each of them. I need to filter products while keeping the revenue value constant because some products don’t make sense for certain salespeople in specific regions.

    And to answer your question, I have one row per salesperson for each year in the revenue dataset.

  • Can you create a fake version of your two datasets with at least a few rows of what the data looks like? That will get you a fast, more reliable answer.

    Based on you've said so far, I think what you'll need to do is pivot your Salesperson data long and join on the operation and product with the other table. To do the filtering you want, instead of having a table that's like this:

    You want one that looks like this:

    Then when you filter by area/product, you'll still be able to get totals by Salesperson. (Or filter by Salesperson and see all the associated products.)

    Please 💡/💖/👍/😊 this post if you read it and found it helpful.

    Please accept the answer if it solved your problem.