brief introduction
In the past few days, we have to move the report query statements in the old project from oracle database. Along the way, we have summarized some optimization experience and written it down for your reference
Original query statement
This is not the initial query. The initial query is an oracle statement, which must be translated into mysql statement first. Here are the steps of Translation:
1. All connections with (+) become external connections when they are converted to mysql
2. All functions that differ in mysql should be converted
3. The start with tree query statement has no corresponding query method in mysql. Finally, manually perform this associated query first, and then place the query results in large sql statements.
This is the result of the conversion. This statement is to be run in the Groovy script, so ${orgId} and? It's parameter information. You can ignore it.
SELECT '' id, '' rpt_baseinfo_id, '' rpt_no, '' rpt_name, '' rpt_cyc, '' rpt_ym, '' org_no, cons_name, cons_no, tg_name, org_name, resource_req_name, install_name, started_time, last_state_time, mobile, mad_no, meter_type, latest_chk_date, rcvble_amt, electype, volte_code, metercap, app_run_cap, app_type_code, '' remark1, '' remark2, '' remark3, '' stand1, '' stand2, '' stand3 FROM ( SELECT cons.cons_name cons_name, a.cons_no cons_no, g.tg_name tg_name, o.name org_name, u.user_name resource_req_name, u.user_name install_name, date_format(ah.START_TIME_, '%Y-%m-%d') started_time, date_format(ah.END_TIME_, '%Y-%m-%d') last_state_time, t.mobile mobile, d.made_no mad_no, p.name meter_type, date_format(d.latest_chk_date, '%Y-%m-%d') latest_chk_date, 0 rcvble_amt, p1.name electype, p2.name volte_code, p3.name metercap, a.app_run_cap app_run_cap, p4.name app_type_code FROM (act_hi_taskinst ah, mp_info c, cons_info cons, tg_info g, org_info o, contact_info t, meter_mp_rela rela, meter_info d) INNER JOIN procinst ahp ON ahp.ID_ = ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL) INNER JOIN app_arc a ON ahp.BUSINESS_KEY_ = a.id LEFT JOIN user_info u ON ah.ASSIGNEE_=u.login_name LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code WHERE a.cons_id = c.cons_id AND a.cust_id = t.cust_id AND c.id = rela.mp_id AND rela.meter_id = d.id AND cons.org_no = o.id AND a.main_app_id IS NULL AND c.cons_id = cons.id AND c.tg_id = g.id AND c.mp_level = 1 AND ah.TASK_DEF_KEY_ = 'installInfoInput' AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d') AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH) AND a.ps_org_no in (${orgId}) UNION ALL SELECT cons.cons_name cons_name, a.cons_no cons_no, g.tg_name tg_name, o.name org_name, u.user_name resource_req_name, u.user_name install_name, date_format(ah.START_TIME_, '%Y-%m-%d') started_time, date_format(ah.END_TIME_, '%Y-%m-%d') last_state_time, t.mobile mobile, d.made_no mad_no, p.name meter_type, date_format(d.latest_chk_date, '%Y-%m-%d') latest_chk_date, 0 rcvble_amt, p1.name electype, p2.name volte_code, p3.name metercap, a.app_run_cap app_run_cap, p4.name app_type_code FROM (act_hi_taskinst ah, batch_app_arc batch, mp_info c, cons_info cons, tg_info g, org_info o, contact_info t, meter_mp_rela rela, meter_info d) INNER JOIN procinst ahp ON ahp.ID_ = ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL) INNER JOIN app_arc a ON ahp.BUSINESS_KEY_ = a.id LEFT JOIN user_info u ON ah.ASSIGNEE_=u.login_name LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code WHERE a.id = batch.app_id AND a.cons_id = c.cons_id AND a.cust_id = t.cust_id AND c.id = rela.mp_id AND rela.meter_id = d.id AND cons.org_no = o.id AND a.main_app_id IS NULL AND c.cons_id = cons.id AND c.tg_id = g.id AND c.mp_level = 1 AND ah.TASK_DEF_KEY_ = 'installInfoInput' AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d') AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH) AND a.ps_org_no in (${orgId}) ) t ORDER BY org_name, cons_no, last_state_time;
analysis
Before optimization, first analyze the structure, which is still relatively clear.
SELECT * FROM( SELECT * FROM xxx a UNION ALL SELECT * FROM xxx b )
First, simply understand the above structure. Our main optimization objects are the two query statements connected by UNION ALL. Since the processing methods of these two queries are similar, I will only show the optimization method of one of them.
Take it out first.
SELECT cons.cons_name cons_name, a.cons_no cons_no, g.tg_name tg_name, o.name org_name, u.user_name resource_req_name, u.user_name install_name, date_format(ah.START_TIME_, '%Y-%m-%d') started_time, date_format(ah.END_TIME_, '%Y-%m-%d') last_state_time, t.mobile mobile, d.made_no mad_no, p.name meter_type, date_format(d.latest_chk_date, '%Y-%m-%d') latest_chk_date, 0 rcvble_amt, p1.name electype, p2.name volte_code, p3.name metercap, a.app_run_cap app_run_cap, p4.name app_type_code FROM (act_hi_taskinst ah, mp_info c, cons_info cons, tg_info g, org_info o, contact_info t, meter_mp_rela rela, meter_info d) INNER JOIN procinst ahp ON ahp.ID_ = ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL) INNER JOIN app_arc a ON ahp.BUSINESS_KEY_ = a.id INNER JOIN procinst ahp ON ahp.ID_ = ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL) LEFT JOIN user_info u ON ah.ASSIGNEE_=u.login_name LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code WHERE a.cons_id = c.cons_id AND a.cust_id = t.cust_id AND c.id = rela.mp_id AND rela.meter_id = d.id AND cons.org_no = o.id AND a.main_app_id IS NULL AND c.cons_id = cons.id AND c.tg_id = g.id AND c.mp_level = 1 AND ah.TASK_DEF_KEY_ = 'installInfoInput' AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d') AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH) AND a.ps_org_no in (${orgId})
Let me talk about the optimization idea:
- All statements appearing in where should be transformed into INNER JOIN with related tables, because direct INNER JOIN will directly perform Cartesian product multiplication. The execution sequence of sql statements is to execute JOIN first and then execute where condition. However, in the above sql, the data of several tables has reached 300000, so we must not put them together directly with such INNER JOIN, which has a great impact on the operation efficiency.
- All INNER JOIN statements should be placed before the LEFT JOIN, because when you use the LEFT JOIN, you usually need to query the additional information of a record. If the INNER JOIN is first, many records can be filtered out. But if the LEFT JOIN is first, it is equivalent to querying additional information for some records that need to be filtered out later. This is unnecessary.
- Conditions that can filter out data should be used as soon as possible.
Start optimization
At the beginning of the conversion, it was very big, because there were so many tables associated with each other, and there were A lot of filter conditions. But then I found some tricks: start with the directly connected table and the conditions behind WHERE. If the filter conditions associated with the direct INNER JOIN table A in the WHERE condition are all constants, or if it is associated with another table, table A can be converted to INNER JOIN immediately
For example, the following tg_info,org_info,contact_info,meter_info these tables only find one filter condition in the Where condition
SELECT * -- Don't consider the query fields first FROM (act_hi_taskinst ah, mp_info c, cons_info cons, tg_info g, -- A filter condition org_info o, -- A filter condition contact_info t, -- A filter condition meter_mp_rela rela, meter_info d) -- A filter condition INNER JOIN procinst ahp ON ahp.ID_ = ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL) INNER JOIN app_arc a ON ahp.BUSINESS_KEY_ = a.id INNER JOIN procinst ahp ON ahp.ID_ = ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL) LEFT JOIN user_info u ON ah.ASSIGNEE_=u.login_name LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code WHERE a.cons_id = c.cons_id AND a.cust_id = t.cust_id -- Filter conditions that need attention AND c.id = rela.mp_id AND rela.meter_id = d.id -- Filter conditions that need attention AND cons.org_no = o.id -- Filter conditions that need attention AND a.main_app_id IS NULL AND c.cons_id = cons.id AND c.tg_id = g.id -- Filter conditions that need attention AND c.mp_level = 1 AND ah.TASK_DEF_KEY_ = 'installInfoInput' AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d') AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH) AND a.ps_org_no in (${orgId})
Before turning into an INNER JOIN statement, if the association filter between these tables appears in the filter criteria, you need to pay attention to the order of inner joins. However, these four tables are not related to each other, so they can be converted directly. Of course, a principle is also followed. Small tables come first and large tables come later. I first query the data volume of these tables, and then sort the INNER JOIN reasonably according to the data volume
SELECT * -- Don't consider the query fields first FROM (act_hi_taskinst ah, mp_info c, cons_info cons, -- A filter condition meter_mp_rela rela) -- A filter condition INNER JOIN procinst ahp ON ahp.ID_ = ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL) INNER JOIN app_arc a ON ahp.BUSINESS_KEY_ = a.id INNER JOIN procinst ahp ON ahp.ID_ = ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL) INNER JOIN tg_info g ON c.tg_id = g.id INNER JOIN org_info o ON cons.org_no = o.id INNER JOIN contact_info t ON a.cust_id = t.cust_id INNER JOIN meter_info d ON rela.meter_id = d.id LEFT JOIN user_info u ON ah.ASSIGNEE_=u.login_name LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code WHERE a.cons_id = c.cons_id AND c.id = rela.mp_id -- Filter conditions that need attention AND a.main_app_id IS NULL AND c.cons_id = cons.id -- Filter conditions that need attention AND c.mp_level = 1 AND ah.TASK_DEF_KEY_ = 'installInfoInput' AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d') AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH) AND a.ps_org_no in (${orgId})
It's comfortable to kill four watches in one breath. Then analyze whether there are any remaining internal connections that can be adjusted. At this time, it is found that cons_info and meter_mp_rela can be converted again. However, we should pay attention to the order with other inner joins. For example, INNER JOIN org_info o ON cons. org_ The no = o.id statement uses cons_info table, so cons_ The INNER JOIN of info table should be in org_info table before
SELECT * -- Don't consider the query fields first FROM (act_hi_taskinst ah, mp_info c) -- sure INNER JOIN INNER JOIN procinst ahp ON ahp.ID_ = ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL) INNER JOIN app_arc a ON ahp.BUSINESS_KEY_ = a.id INNER JOIN procinst ahp ON ahp.ID_ = ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL) INNER JOIN tg_info g ON c.tg_id = g.id INNER JOIN cons_info cons ON c.cons_id = cons.id INNER JOIN org_info o ON cons.org_no = o.id INNER JOIN contact_info t ON a.cust_id = t.cust_id INNER JOIN meter_mp_rela rela ON c.id = rela.mp_id INNER JOIN meter_info d ON rela.meter_id = d.id LEFT JOIN user_info u ON ah.ASSIGNEE_=u.login_name LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code WHERE a.cons_id = c.cons_id -- Conditions requiring attention AND a.main_app_id IS NULL AND c.mp_level = 1 -- Conditions requiring attention AND ah.TASK_DEF_KEY_ = 'installInfoInput' AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d') AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH) AND a.ps_org_no in (${orgId})
Convert mp_info table, note: it should be placed in app_ After the arc table, put it in tg_info,cons_info,meter_ mp_ In front of rela and other tables
SELECT * -- Don't consider the query fields first FROM act_hi_taskinst ah INNER JOIN procinst ahp ON ahp.ID_ = ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL) INNER JOIN app_arc a ON ahp.BUSINESS_KEY_ = a.id INNER JOIN procinst ahp ON ahp.ID_ = ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL) INNER JOIN mp_info c ON a.cons_id = c.cons_id AND c.mp_level = 1 INNER JOIN tg_info g ON c.tg_id = g.id INNER JOIN cons_info cons ON c.cons_id = cons.id INNER JOIN org_info o ON cons.org_no = o.id INNER JOIN contact_info t ON a.cust_id = t.cust_id INNER JOIN meter_mp_rela rela ON c.id = rela.mp_id INNER JOIN meter_info d ON rela.meter_id = d.id LEFT JOIN user_info u ON ah.ASSIGNEE_=u.login_name LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code WHERE AND a.main_app_id IS NULL AND ah.TASK_DEF_KEY_ = 'installInfoInput' AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d') AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH) AND a.ps_org_no in (${orgId})
Finally, according to the principle that the conditions that can filter data should be used as soon as possible, the last table and other filter conditions will be added to the JOIN
SELECT * -- Don't consider the query fields first FROM act_hi_taskinst ah INNER JOIN procinst ahp ON ahp.ID_ = ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL) AND ah.TASK_DEF_KEY_ = 'installInfoInput' AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d') AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH) INNER JOIN app_arc a ON ahp.BUSINESS_KEY_ = a.id AND a.main_app_id IS NULL AND a.ps_org_no in (${orgId}) INNER JOIN procinst ahp ON ahp.ID_ = ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL) INNER JOIN mp_info c ON a.cons_id = c.cons_id AND c.mp_level = 1 INNER JOIN tg_info g ON c.tg_id = g.id INNER JOIN cons_info cons ON c.cons_id = cons.id INNER JOIN org_info o ON cons.org_no = o.id INNER JOIN contact_info t ON a.cust_id = t.cust_id INNER JOIN meter_mp_rela rela ON c.id = rela.mp_id INNER JOIN meter_info d ON rela.meter_id = d.id LEFT JOIN user_info u ON ah.ASSIGNEE_=u.login_name LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code
So far, all the WHERE conditions have been eliminated and all of them have been converted to JOIN conditions.
Next optimization direction:
- Sort the INNER JOIN according to the amount of data in the table. The small table comes first and the large table comes last. But don't forget the order requirements when the INNER JOIN involves other tables.
- Add the appropriate index according to the filter criteria. It is worth noting that ah END_ TIME_ Is the time. If a common index is added to this field, it will not go through the index here. It must be forced to go through the index, that is, in the from act_ hi_ Add FORCE INDEX (ACT_IDX_HI_TASK_INST_PROCINST,act_hi_taskinst_END_TIME_index) after taskinst ah to let it go through its own index. I have formulated two here.
So far, the optimization has been completed, and the optimization effect is obvious. The original 5-minute run has not been completed, but now it is over in a few seconds