Performance Issue MySQL DataFlow
Hi there!
I have a bit of an performance issue.
I am basically trying to create a table for our sales department, which includes all sales countries (as a filter), sales towns (column 1) and sales (Total) from this year (column 2), last year sales (column 3) and a variance column (column 4). Furthermore I want to insert a date picker and a possibility for YTD figures.
Here is my existing code:
1) Select created_at_trans,shop_town_id,
Sum(Total) AS Total
From buchungsliste_sh GROUP BY created_at_trans,shop_town_id
-Generate Output Table: proc_group
2)
SELECT DISTINCT shop_town_id from buchungsliste_sh
-Generate Output Table: townsliste
3)
Select a.Dates,shop_town_id
from (
select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)+ (1000*d.a)) DAY as Dates,townsliste.shop_town_id
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
CROSS JOIN townsliste
) a
where a.Dates between '2010-01-01' and '2025-01-01'
order by dates
-Generate Output Table: proc1
4)
SELECT dt.Dates as datum,dt.shop_town_id,
d.Total,
DATE_SUB(dt.Dates, interval WEEK(STR_TO_DATE(CONCAT('12-31-',YEAR(dt.Dates)-1),'%m-%d-%Y'),0) week) AS LY_Date
From proc1 dt LEFT JOIN proc_group d ON dt.dates = d.created_at_trans AND dt.shop_town_id = d.shop_town_id
order by dt.dates
-Generate Output Table: proc_join
5)
Select n.*,
n1.total AS Ly_Total
From proc_join n LEFT JOIN proc_group n1 on n1.created_at_trans = n.ly_date
AND n1.shop_town_id=n.shop_town_id
-Generate Output Table: proc2
6)
ALTER TABLE proc2
ADD Season text;
7)
UPDATE proc2
SET season = (CASE
WHEN datum BETWEEN '2005-05-01' and '2006-04-30'
THEN ("2005/2006")
WHEN datum BETWEEN '2006-05-01' and '2007-04-30'
THEN ("2006/2007")
WHEN datum BETWEEN '2007-05-01' and '2008-04-30'
THEN ("2007/2008")
WHEN datum BETWEEN '2008-05-01' and '2009-04-30'
THEN ("2008/2009")
WHEN datum BETWEEN '2009-05-01' and '2010-04-30'
THEN ("2009/2010")
WHEN datum BETWEEN '2010-05-01' and '2011-04-30'
THEN ("2010/2011")
WHEN datum BETWEEN '2011-05-01' and '2012-04-30'
THEN ("2011/2012")
WHEN datum BETWEEN '2012-05-01' and '2013-04-30'
THEN ("2012/2013")
WHEN datum BETWEEN '2013-05-01' and '2014-04-30'
THEN ("2013/2014")
WHEN datum BETWEEN '2014-05-01' and '2015-04-30'
THEN ("2014/2015")
WHEN datum BETWEEN '2015-05-01' and '2016-04-30'
THEN ("2015/2016")
WHEN datum BETWEEN '2016-05-01' and '2017-04-30'
THEN ("2016/2017")
WHEN datum BETWEEN '2017-05-01' and '2018-04-30'
THEN ("2017/2018")
WHEN datum BETWEEN '2018-05-01' and '2019-04-30'
THEN ("2018/2019")
WHEN datum BETWEEN '2019-05-01' and '2020-04-30'
THEN ("2019/2020")
WHEN datum BETWEEN '2020-05-01' and '2021-04-30'
THEN ("2020/2021")
WHEN datum BETWEEN '2021-05-01' and '2022-04-30'
THEN ("2021/2022")
WHEN datum BETWEEN '2022-05-01' and '2023-04-30'
THEN ("2022/2023")
WHEN datum BETWEEN '2023-05-01' and '2024-04-30'
THEN ("2023/2024")
WHEN datum BETWEEN '2024-05-01' and '2025-04-30'
THEN ("2024/2025")
WHEN datum BETWEEN '2025-05-01' and '2026-04-30'
THEN ("2025/2026")
END
)
As you can see it is a very large and complex statement. Unfortunately, it is necessary to include all dates per town, because otherwise I have incorrect last year figures.
e.g.
Town A had sales last year, but not on the same day this year. The figure won´t show up, if I change the code.
Step 4 took about 6 hours and step 5 about 10h.
Does someone could advise me how to have the same result, but in a more efficient way?
Many thanks for your help!
Best Answer
-
You can improve the performance by adding an index prior to the joins. You have to add a transform for each table you are indexing.
3b)
ALTER TABLE proc1 ADD INDEX(`dates`,`shop_town_id`);
3c)
ALTER TABLE proc_group ADD INDEX(`created_at_trans`,`shop_town_id`);
then run your join statement (#4)
4b)
ALTER TABLE proc_join ADD INDEX(`created_at_trans`,`shop_town_id`);
4c)
ALTER TABLE proc_group ADD INDEX(`ly_date`,`shop_town_id`);
then run your join statement (#5)
Let me know if that speeds up the process. My next thought would be to create a "join" field in those data tables by concating the two fields you are joining on CONCAT(`dates`,`shop_town_id`) as `join`
You would then want to index that new field prior to the join.
Good luck to you, let me know how it turns out.
2
Answers
-
You can improve the performance by adding an index prior to the joins. You have to add a transform for each table you are indexing.
3b)
ALTER TABLE proc1 ADD INDEX(`dates`,`shop_town_id`);
3c)
ALTER TABLE proc_group ADD INDEX(`created_at_trans`,`shop_town_id`);
then run your join statement (#4)
4b)
ALTER TABLE proc_join ADD INDEX(`created_at_trans`,`shop_town_id`);
4c)
ALTER TABLE proc_group ADD INDEX(`ly_date`,`shop_town_id`);
then run your join statement (#5)
Let me know if that speeds up the process. My next thought would be to create a "join" field in those data tables by concating the two fields you are joining on CONCAT(`dates`,`shop_town_id`) as `join`
You would then want to index that new field prior to the join.
Good luck to you, let me know how it turns out.
2 -
@ST_-Superman-_ gives very solid advice.
I offer one more suggestion to possibly speed things up for you. I have a similar transform process in one of my dataflows where I combine a list of all days with a list of all customers to get an array of all possible customer-day combinations. Instead of doing date math using your self-generated list of n days ago, I used Domo's Domo Dimensions connector (https://yourinstance.domo.com/appstore/connectors/com.domo.connector.domodimension) to bring to our data center a calendar file that has all days in it. Then I just join that calendar against a list of all customers to generate the customer-day combinations:
SELECT
a.dt
,b.customer_id
FROM
(select distinct customer_id from customers) a
,domo_calendar b
--notice no join condition since you want a cartesian product, but you could still use CROSS JOIN if you wanted
I think it's really user-friendly this way.
Aaron
MajorDomo @ Merit Medical
**Say "Thanks" by clicking the heart in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"1
Categories
- All Categories
- 1.8K Product Ideas
- 1.8K Ideas Exchange
- 1.6K Connect
- 1.2K Connectors
- 300 Workbench
- 6 Cloud Amplifier
- 9 Federated
- 2.9K Transform
- 102 SQL DataFlows
- 626 Datasets
- 2.2K Magic ETL
- 3.9K Visualize
- 2.5K Charting
- 753 Beast Mode
- 61 App Studio
- 41 Variables
- 692 Automate
- 177 Apps
- 456 APIs & Domo Developer
- 49 Workflows
- 10 DomoAI
- 38 Predict
- 16 Jupyter Workspaces
- 22 R & Python Tiles
- 398 Distribute
- 115 Domo Everywhere
- 276 Scheduled Reports
- 7 Software Integrations
- 130 Manage
- 127 Governance & Security
- 8 Domo Community Gallery
- 38 Product Releases
- 11 Domo University
- 5.4K Community Forums
- 40 Getting Started
- 30 Community Member Introductions
- 110 Community Announcements
- 4.8K Archive