mysql complex multi table query optimization case

brief introduction

In the past few days, we have to move the report query statements in the old project from oracle database. Along the way, we have summarized some optimization experience and written it down for your reference

Original query statement

This is not the initial query. The initial query is an oracle statement, which must be translated into mysql statement first. Here are the steps of Translation:
1. All connections with (+) become external connections when they are converted to mysql
2. All functions that differ in mysql should be converted
3. The start with tree query statement has no corresponding query method in mysql. Finally, manually perform this associated query first, and then place the query results in large sql statements.
This is the result of the conversion. This statement is to be run in the Groovy script, so ${orgId} and? It's parameter information. You can ignore it.

SELECT
  '' id,
  '' rpt_baseinfo_id,
  '' rpt_no,
  '' rpt_name,
  '' rpt_cyc,
  '' rpt_ym,
  '' org_no,
  cons_name,
  cons_no,
  tg_name,
  org_name,
  resource_req_name,
  install_name,
  started_time,
  last_state_time,
  mobile,
  mad_no,
  meter_type,
  latest_chk_date,
  rcvble_amt,
  electype,
  volte_code,
  metercap,
  app_run_cap,
  app_type_code,
  '' remark1,
  '' remark2,
  '' remark3,
  '' stand1,
  '' stand2,
  '' stand3
FROM
  (
    SELECT
      cons.cons_name cons_name,
      a.cons_no cons_no,
      g.tg_name tg_name,
      o.name org_name,
      u.user_name resource_req_name,
      u.user_name install_name,
      date_format(ah.START_TIME_, '%Y-%m-%d') started_time,
      date_format(ah.END_TIME_, '%Y-%m-%d') last_state_time,
      t.mobile mobile,
      d.made_no mad_no,
      p.name meter_type,
      date_format(d.latest_chk_date, '%Y-%m-%d') latest_chk_date,
      0 rcvble_amt,
      p1.name electype,
      p2.name volte_code,
      p3.name metercap,
      a.app_run_cap app_run_cap,
      p4.name app_type_code
    FROM
      (act_hi_taskinst ah,
      mp_info c,
      cons_info cons,
      tg_info g,
      org_info o,
      contact_info t,
      meter_mp_rela rela,
      meter_info d)
    INNER JOIN procinst ahp ON ahp.ID_ =  ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL)
    INNER JOIN app_arc a ON ahp.BUSINESS_KEY_  = a.id
    LEFT JOIN user_info u ON  ah.ASSIGNEE_=u.login_name
    LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code
    LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code
    LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code
    LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current 
    LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code 
    WHERE
      a.cons_id = c.cons_id
      AND a.cust_id = t.cust_id
      AND c.id = rela.mp_id
      AND rela.meter_id = d.id
      AND cons.org_no = o.id
      AND a.main_app_id IS NULL
      AND c.cons_id = cons.id
      AND c.tg_id = g.id
      AND c.mp_level = 1
      AND ah.TASK_DEF_KEY_ = 'installInfoInput' 
      AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d')
      AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH)
      AND a.ps_org_no in (${orgId})
    UNION ALL
    SELECT
      cons.cons_name cons_name,
      a.cons_no cons_no,
      g.tg_name tg_name,
      o.name org_name,
      u.user_name resource_req_name,
      u.user_name install_name,
      date_format(ah.START_TIME_, '%Y-%m-%d') started_time,
      date_format(ah.END_TIME_, '%Y-%m-%d') last_state_time,
      t.mobile mobile,
      d.made_no mad_no,
      p.name meter_type,
      date_format(d.latest_chk_date, '%Y-%m-%d') latest_chk_date,
      0 rcvble_amt,
      p1.name electype,
      p2.name volte_code,
      p3.name metercap,
      a.app_run_cap app_run_cap,
      p4.name app_type_code
    FROM
      (act_hi_taskinst ah,
      batch_app_arc batch,
      mp_info c,
      cons_info cons,
      tg_info g,
      org_info o,
      contact_info t,
      meter_mp_rela rela,
      meter_info d)
    INNER JOIN procinst ahp ON ahp.ID_ =  ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL)
    INNER JOIN app_arc a ON ahp.BUSINESS_KEY_  = a.id
    LEFT JOIN user_info u ON  ah.ASSIGNEE_=u.login_name
    LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code
    LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code
    LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code
    LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current 
    LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code  
    WHERE
      a.id = batch.app_id
      AND a.cons_id = c.cons_id
      AND a.cust_id = t.cust_id
      AND c.id = rela.mp_id
      AND rela.meter_id = d.id
      AND cons.org_no = o.id
      AND a.main_app_id IS NULL
      AND c.cons_id = cons.id
      AND c.tg_id = g.id
      AND c.mp_level = 1
      AND ah.TASK_DEF_KEY_ = 'installInfoInput' 
      AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d')
      AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH)
      AND a.ps_org_no in (${orgId})
  ) t
ORDER BY
  org_name,
  cons_no,
  last_state_time;

analysis

Before optimization, first analyze the structure, which is still relatively clear.

SELECT * FROM(
	SELECT * FROM xxx a
	UNION ALL
	SELECT * FROM xxx b
)

First, simply understand the above structure. Our main optimization objects are the two query statements connected by UNION ALL. Since the processing methods of these two queries are similar, I will only show the optimization method of one of them.
Take it out first.

SELECT
 cons.cons_name cons_name,
  a.cons_no cons_no,
  g.tg_name tg_name,
  o.name org_name,
  u.user_name resource_req_name,
  u.user_name install_name,
  date_format(ah.START_TIME_, '%Y-%m-%d') started_time,
  date_format(ah.END_TIME_, '%Y-%m-%d') last_state_time,
  t.mobile mobile,
  d.made_no mad_no,
  p.name meter_type,
  date_format(d.latest_chk_date, '%Y-%m-%d') latest_chk_date,
  0 rcvble_amt,
  p1.name electype,
  p2.name volte_code,
  p3.name metercap,
  a.app_run_cap app_run_cap,
  p4.name app_type_code
FROM
  (act_hi_taskinst ah,
  mp_info c,
  cons_info cons,
  tg_info g,
  org_info o,
  contact_info t,
  meter_mp_rela rela,
  meter_info d)
INNER JOIN procinst ahp ON ahp.ID_ =  ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL)
INNER JOIN app_arc a ON ahp.BUSINESS_KEY_  = a.id
INNER JOIN procinst ahp ON ahp.ID_ =  ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL)
LEFT JOIN user_info u ON  ah.ASSIGNEE_=u.login_name
LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code
LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code
LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code
LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current 
LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code 
WHERE
  a.cons_id = c.cons_id
  AND a.cust_id = t.cust_id
  AND c.id = rela.mp_id
  AND rela.meter_id = d.id
  AND cons.org_no = o.id
  AND a.main_app_id IS NULL
  AND c.cons_id = cons.id
  AND c.tg_id = g.id
  AND c.mp_level = 1
  AND ah.TASK_DEF_KEY_ = 'installInfoInput' 
  AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d')
  AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH)
  AND a.ps_org_no in (${orgId})

Let me talk about the optimization idea:

  1. All statements appearing in where should be transformed into INNER JOIN with related tables, because direct INNER JOIN will directly perform Cartesian product multiplication. The execution sequence of sql statements is to execute JOIN first and then execute where condition. However, in the above sql, the data of several tables has reached 300000, so we must not put them together directly with such INNER JOIN, which has a great impact on the operation efficiency.
  2. All INNER JOIN statements should be placed before the LEFT JOIN, because when you use the LEFT JOIN, you usually need to query the additional information of a record. If the INNER JOIN is first, many records can be filtered out. But if the LEFT JOIN is first, it is equivalent to querying additional information for some records that need to be filtered out later. This is unnecessary.
  3. Conditions that can filter out data should be used as soon as possible.

Start optimization

At the beginning of the conversion, it was very big, because there were so many tables associated with each other, and there were A lot of filter conditions. But then I found some tricks: start with the directly connected table and the conditions behind WHERE. If the filter conditions associated with the direct INNER JOIN table A in the WHERE condition are all constants, or if it is associated with another table, table A can be converted to INNER JOIN immediately
For example, the following tg_info,org_info,contact_info,meter_info these tables only find one filter condition in the Where condition

SELECT * -- Don't consider the query fields first
FROM
  (act_hi_taskinst ah,
  mp_info c,
  cons_info cons,
  tg_info g, -- A filter condition
  org_info o, -- A filter condition
  contact_info t, -- A filter condition
  meter_mp_rela rela,
  meter_info d) -- A filter condition
INNER JOIN procinst ahp ON ahp.ID_ =  ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL)
INNER JOIN app_arc a ON ahp.BUSINESS_KEY_  = a.id
INNER JOIN procinst ahp ON ahp.ID_ =  ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL)
LEFT JOIN user_info u ON  ah.ASSIGNEE_=u.login_name
LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code
LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code
LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code
LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current 
LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code 
WHERE
  a.cons_id = c.cons_id
  AND a.cust_id = t.cust_id -- Filter conditions that need attention
  AND c.id = rela.mp_id
  AND rela.meter_id = d.id -- Filter conditions that need attention
  AND cons.org_no = o.id -- Filter conditions that need attention
  AND a.main_app_id IS NULL
  AND c.cons_id = cons.id
  AND c.tg_id = g.id  -- Filter conditions that need attention
  AND c.mp_level = 1
  AND ah.TASK_DEF_KEY_ = 'installInfoInput' 
  AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d')
  AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH)
  AND a.ps_org_no in (${orgId})

Before turning into an INNER JOIN statement, if the association filter between these tables appears in the filter criteria, you need to pay attention to the order of inner joins. However, these four tables are not related to each other, so they can be converted directly. Of course, a principle is also followed. Small tables come first and large tables come later. I first query the data volume of these tables, and then sort the INNER JOIN reasonably according to the data volume

SELECT * -- Don't consider the query fields first
FROM
  (act_hi_taskinst ah,
  mp_info c,
  cons_info cons, -- A filter condition 
  meter_mp_rela rela) -- A filter condition
INNER JOIN procinst ahp ON ahp.ID_ =  ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL)
INNER JOIN app_arc a ON ahp.BUSINESS_KEY_  = a.id
INNER JOIN procinst ahp ON ahp.ID_ =  ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL)
INNER JOIN tg_info g ON c.tg_id = g.id
INNER JOIN org_info o ON cons.org_no = o.id
INNER JOIN contact_info t ON a.cust_id = t.cust_id
INNER JOIN meter_info d ON rela.meter_id = d.id
LEFT JOIN user_info u ON  ah.ASSIGNEE_=u.login_name
LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code
LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code
LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code
LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current 
LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code 
WHERE
  a.cons_id = c.cons_id
  AND c.id = rela.mp_id -- Filter conditions that need attention
  AND a.main_app_id IS NULL
  AND c.cons_id = cons.id -- Filter conditions that need attention
  AND c.mp_level = 1
  AND ah.TASK_DEF_KEY_ = 'installInfoInput' 
  AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d')
  AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH)
  AND a.ps_org_no in (${orgId})

It's comfortable to kill four watches in one breath. Then analyze whether there are any remaining internal connections that can be adjusted. At this time, it is found that cons_info and meter_mp_rela can be converted again. However, we should pay attention to the order with other inner joins. For example, INNER JOIN org_info o ON cons. org_ The no = o.id statement uses cons_info table, so cons_ The INNER JOIN of info table should be in org_info table before

SELECT * -- Don't consider the query fields first
FROM
  (act_hi_taskinst ah,
  mp_info c) -- sure INNER JOIN
INNER JOIN procinst ahp ON ahp.ID_ =  ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL)
INNER JOIN app_arc a ON ahp.BUSINESS_KEY_  = a.id
INNER JOIN procinst ahp ON ahp.ID_ =  ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL)
INNER JOIN tg_info g ON c.tg_id = g.id
INNER JOIN cons_info cons ON c.cons_id = cons.id
INNER JOIN org_info o ON cons.org_no = o.id
INNER JOIN contact_info t ON a.cust_id = t.cust_id
INNER JOIN meter_mp_rela rela ON c.id = rela.mp_id
INNER JOIN meter_info d ON rela.meter_id = d.id
LEFT JOIN user_info u ON  ah.ASSIGNEE_=u.login_name
LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code
LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code
LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code
LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current 
LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code 
WHERE
  a.cons_id = c.cons_id -- Conditions requiring attention
  AND a.main_app_id IS NULL
  AND c.mp_level = 1 -- Conditions requiring attention
  AND ah.TASK_DEF_KEY_ = 'installInfoInput' 
  AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d')
  AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH)
  AND a.ps_org_no in (${orgId})

Convert mp_info table, note: it should be placed in app_ After the arc table, put it in tg_info,cons_info,meter_ mp_ In front of rela and other tables

SELECT * -- Don't consider the query fields first
FROM act_hi_taskinst ah
INNER JOIN procinst ahp ON ahp.ID_ =  ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL)
INNER JOIN app_arc a ON ahp.BUSINESS_KEY_  = a.id
INNER JOIN procinst ahp ON ahp.ID_ =  ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL)
INNER JOIN mp_info c ON a.cons_id = c.cons_id AND c.mp_level = 1
INNER JOIN tg_info g ON c.tg_id = g.id
INNER JOIN cons_info cons ON c.cons_id = cons.id
INNER JOIN org_info o ON cons.org_no = o.id
INNER JOIN contact_info t ON a.cust_id = t.cust_id
INNER JOIN meter_mp_rela rela ON c.id = rela.mp_id
INNER JOIN meter_info d ON rela.meter_id = d.id
LEFT JOIN user_info u ON  ah.ASSIGNEE_=u.login_name
LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code
LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code
LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code
LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current 
LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code 
WHERE
  AND a.main_app_id IS NULL
  AND ah.TASK_DEF_KEY_ = 'installInfoInput' 
  AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d')
  AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH)
  AND a.ps_org_no in (${orgId})

Finally, according to the principle that the conditions that can filter data should be used as soon as possible, the last table and other filter conditions will be added to the JOIN

SELECT * -- Don't consider the query fields first
FROM act_hi_taskinst ah
INNER JOIN procinst ahp ON ahp.ID_ =  ah.PROC_INST_ID_ 
	AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL)
	AND ah.TASK_DEF_KEY_ = 'installInfoInput'
	AND ah.END_TIME_ >= date_format(?, '%Y-%m-%d')
  	AND ah.END_TIME_ < DATE_ADD(?, INTERVAL 1 MONTH)
INNER JOIN app_arc a ON ahp.BUSINESS_KEY_  = a.id
	AND a.main_app_id IS NULL
	AND a.ps_org_no in (${orgId})
INNER JOIN procinst ahp ON ahp.ID_ =  ah.PROC_INST_ID_ AND (ahp.DELETE_REASON_ <> 'manager' OR ahp.DELETE_REASON_ IS NULL)
INNER JOIN mp_info c ON a.cons_id = c.cons_id AND c.mp_level = 1
INNER JOIN tg_info g ON c.tg_id = g.id
INNER JOIN cons_info cons ON c.cons_id = cons.id
INNER JOIN org_info o ON cons.org_no = o.id
INNER JOIN contact_info t ON a.cust_id = t.cust_id
INNER JOIN meter_mp_rela rela ON c.id = rela.mp_id
INNER JOIN meter_info d ON rela.meter_id = d.id
LEFT JOIN user_info u ON  ah.ASSIGNEE_=u.login_name
LEFT JOIN code_msg p ON p.code_type = 'meterModelNo' AND p.value = d.model_code
LEFT JOIN code_msg p1 ON p1.code_type = 'elecTypeCode' AND p1.value = a.elec_type_code
LEFT JOIN code_msg p2 ON p2.code_type = 'psVoltCode' AND p2.value = a.volt_code
LEFT JOIN code_msg p3 ON p3.code_type = 'meterRcSort' AND p3.value = d.rated_current 
LEFT JOIN code_msg p4 ON p4.code_type = 'busiTypeCode' AND p4.value = a.app_type_code

So far, all the WHERE conditions have been eliminated and all of them have been converted to JOIN conditions.
Next optimization direction:

  1. Sort the INNER JOIN according to the amount of data in the table. The small table comes first and the large table comes last. But don't forget the order requirements when the INNER JOIN involves other tables.
  2. Add the appropriate index according to the filter criteria. It is worth noting that ah END_ TIME_ Is the time. If a common index is added to this field, it will not go through the index here. It must be forced to go through the index, that is, in the from act_ hi_ Add FORCE INDEX (ACT_IDX_HI_TASK_INST_PROCINST,act_hi_taskinst_END_TIME_index) after taskinst ah to let it go through its own index. I have formulated two here.

So far, the optimization has been completed, and the optimization effect is obvious. The original 5-minute run has not been completed, but now it is over in a few seconds

Tags: MySQL SQL Oracle

Posted by someone2088 on Sat, 16 Apr 2022 08:13:12 +0930