MySQL index for advanced SQL optimization

An overview of the index

1. The advantages of indexing

​ Why create an index? This is because, creating an index can greatly improve the query performance of the system. If no index is used, the query starts from the first row. If an index is used, the desired data can be found more quickly.

  • First, by creating a unique index, you can ensure the uniqueness of each row of data found in the database table.
  • Second, it can greatly speed up the retrieval of data, which is the main reason for creating so.
  • Third, the connection between tables can be accelerated, especially in terms of achieving referential integrity of data.
  • Fourth, when using grouping and sorting clauses for data retrieval, the time for grouping and sorting in the query can also be significantly reduced.
  • Fifth, by using the index, the query optimizer can be used in the query process to improve the performance of the system.
2. Disadvantages of Indexing

Some people may ask: there are so many advantages to adding indexes, why not create an index for each column in the table? Although this idea has its rationality, it also has its one-sidedness. Although indexes have many advantages, it is very unwise to add indexes to every column in the table. This is because increasing the index also has a number of downsides:

  • First, it takes time to create and maintain indexes, and this time increases with the amount of data.
  • Second, the index needs to occupy space in the house. In addition to the data space for the data table, each index also occupies a certain space in the house. If you want to create a clustered index, then the space required will be greater.
  • Third, when the data in the table is added. When deleting and modifying, the index is also dynamically maintained, which reduces the speed of data maintenance.
3. What kind of fields are suitable for creating indexes

​ Indexes are built on certain columns in a database table. Therefore, when creating an index, you should carefully consider which columns can be indexed and which columns cannot:

  • First, in the laceration that often needs to be searched, it can speed up the search;
  • Second, on the column as the primary key, enforce the uniqueness of the column and the arrangement of the data in the organization table;
  • Third, on the columns that are often used in the connection, these columns are mainly some foreign keys, which can speed up the connection;
  • Fourth, create an index on the column that often needs to be searched according to the range, because the index is already sorted, and the specified range is continuous.
  • Fifth, create an index in the laceration that often needs sorting, because the index is already sorted, this Yang query can use the sorting of the index to speed up the sorting query time;
  • Sixth, create an index on the column that is often used in the WHERE clause to speed up the condition judgment. ​ To build an index, it is generally established according to the where condition of select. For example, the condition of select is where f1 and f2, then it is useless if we build an index on field f1 or field f2, only on field f1 and f2 at the same time. Indexes are useful etc.
4. What kind of fields are not suitable for creating indexes:

Also, some columns should not be indexed. In general, these columns that should not be indexed have the following characteristics:

  • First, for those who are rarely used in the query, so there is a response or no epitome, and it cannot improve the travel speed. On the contrary, due to the addition of indexes, the maintenance speed of the system is reduced and the space requirement is increased.
  • Second, indexes should not be added to columns with few data values. This is because because these columns have few values, such as the gender column of the personnel table, in the query results, the data rows of the result set account for a large proportion of the data rows in the table, that is, the data that needs to be searched in the table The proportion of rows is large. Increasing the index does not significantly speed up the retrieval speed.
  • Third, indexes should not be added to columns defined as text,image and bit data types. This is because the amount of data in these columns is either quite large or has very few values.
  • Fourth, when the modification performance is far greater than the retrieval performance, the index should not be created. This is because modification performance and retrieval performance are contradictory. When increasing the index, the retrieval performance will be improved, but the modification performance will be reduced. When reducing the index, it will improve the modification performance and reduce the retrieval performance. Therefore, indexes should not be created when the modification performance is much greater than the retrieval performance.

Second, the types of indexes in MySQL

1. Single column index

It is to add an index to a column.

  • Ordinary index: Do not consider too many situations, mainly to make the query faster.
  • The only epitome: the value in the column cannot be repeated and can be null.
  • Primary key index: The value in the column cannot be repeated and cannot be null.
2. Composite Index

​ Add an index to the table greater than or equal to two columns. But the leftmost prefix needs to be satisfied.

3. Full-text indexing (full-text)
  • Only available in MyISAM storage engine
  • Full-text indexing can only be used in char, varchar, text and other fields
  • Full-text indexing is to extract the keywords of a column of content, and build an index through the keywords. Full-text indexing is suitable for queries containing like s. But it can only solve the problem of low efficiency of 'xxxx%' fuzzy query.

3. Index Management in MySQL

​ To create an index, you can directly specify the index when creating the table, or you can add the index later. And note that progressive constraints automatically have a primary key index, and unique constraints automatically have a unique index. In MySQL, viewing and deleting of indexes are common to all index types.

1. Ordinary index

​ This is the most basic index. It does not have any restrictions on the default BTREE type of index in MyISAM, and it is also the index we use in most cases.

  • create index
CREATE INDEX index_name ON table_name (column(length))
ALTER TABLE table_name ADD INDEX index_name (column(length))
CREATE TABLE table_name (id int not null auto_increment,title varchar(30),PRIMARY KEY(id),INDEX index_name(title(5)))
 
show index from test;
create index normal_index on test(title);
# After viewing the execution plan, intuitively feel the query time
explain select * from test where title='plus or minus zero 0';
  • view index
SHOW INDEX FROM [table_name];
SHOW KEYS FROM [table_name]; # It is intended that the keys keyword can be used in MySQL
  • drop index
DROP INDEX index_name ON table_name;
ALTER TABLE table_name DROP INDEX index_name;
ALTER TABLE table_name DROP PRIMARY KEY;
2. Unique Index

Similar to a normal index, the difference is that the value of the index column must be unique, but null values ​​are allowed (note that it is different from the primary key). If it is a composite index, the combination of column values ​​must be unique, the creation method and the normal index class

  • create index
CREATE UNIQUE INDEX index_name ON table_name (column(length))
ALTER TABLE table_name ADD UNIQUE INDEX index_name (column(length))
CREATE TABLE table_name (id int not null auto_increment,title varchar(30),PRIMARY KEY(id),UNIQUE INDEX index_name(title(5)))
 
create unique index unique_index on tescher(name);
explain select * from teacher where name='teacher 3';
drop index unique_index on teacher;
3. Full-text indexing

​ ​ ​ MySQL supports full-text indexing and full-text retrieval since version 3.23.23. FULLTEXT indexes can be used with MyISAM tables before MySQL 5.6, and InnoDB after MySQL 5.7; they can be used as CREATE TABLE from CHAR, VARCHAR or TEXT columns A part of the statement is created, or subsequently added using ALTER TABLE or CREATE INDEX. For larger datasets, entering your data into a table without a FULLTEXT index and then creating the index is faster than entering the data into an existing FULLTEXT index. However, keep in mind that for large-capacity data tables, generating a full-text index is a very time-consuming and hard-disk space-consuming practice. ​ The full-text index does not support Chinese well. If you search in Chinese, you can only search according to the leftmost control, and if it is in English, you can match the middle.

  • create index
CREATE FULLTEXT INDEX index_name ON table_name(column(length))
ALTER TABLE table_name ADD FULLTEXT index_name(column);
CREATE TABLE table_name(id int not null auto_increment,title varchar(30),PRIMARY KEY(id),FULLTEXT index_name(title))
  • The created full-text index needs to be used with match( column, column) against('content').
    • The columns in the match must be the same as the columns used to create the full-text index. For example, when creating a full-text index (id,name) and match(name), the full-text index cannot be used, and a full-text index for the name column must be established separately.
alter table teacher add column(address varchar(200));
update teacher set address='Beijing Haidian';
create fulltext index full_index3 on teacher(name,address);
explain select * from teacher where match(address) against('Beijing');
select * from student where match(address) against('bei' in NATURAL LANGUAGE mode);
drop index full_index3 on teacher;
    • There are three modes of content in against
    • Natural language mode: IN NATURAL LANGUAGE MODE
    • Boolean mode: IN BOOLEAN MODE
    • Query expansion mode: WITH QUERY EXPANSION
    • natural language patterns
update teacher et address='Shanxi Jincheng' where id=2;
update teacher set address='Shanxi Jinzhong' where id=3;
create fulltext index full_index_address on teacher(address);
show index from teacher;
select * from teacher;
# You can find it in Changping, Beijing, but you can't find it in Beijing
explain select * from teacher where match(address) against('Shanxi Jincheng' in natural language mode);
    • Boolean mode: supports special symbols. Timely searches on columns that are not full-text indexed are also possible, but very slow. When querying, you must start the query from the far left, for example: Jincheng, Shanxi, cannot be queried according to Jinheng
+ Must have (data bars that do not contain Karma keywords are ignored).
- Not allowed (excluding the specified keyword, all containing the keyword will be ignored)
> Increase the weight value of the matching data
< Decrease the weight of the matching data
~ Replicating its relevance from formal, indicating that owning the word reduces relevance (but not as much as-Exclude it), but the weight value is reduced at the lower end.
* Wild words, do not want to be in front of other more and more formulas, this should be connected after the string.
"" Enclose a sentence in double quotation marks to indicate that it must be completely consistent and cannot be split.
 
# Query Shanxi *Yes, but *Jincheng can't
explain select * from teacher where match(address) against('Shanxi*' in boolean mode);
    • Query expansion. Expand the query when querying, and query the content related to the reverse and conditions
update teacher set address='oracle beijing changping' where id=1;
update teacher set address='haidian fengtai beijing' where id=2;
update teacher set address='oracle is database' where id=3;
# Only two rows can be queried
explain select * from teacher where match(address) against('beijing changping' in natural language mode);
# Three lines are found in the query, because the oracle appears in the beijing changping peer, and the oracle is considered to be related to the beijing changping during the query, so the query is performed according to the oracle
explain select * from teacher where match(address) against('beijing changing' with query EXPANSION);
  • Chinese dismantling porcelain ngram

Since there are spaces in Chinese, MySQL has built-in ngram Chinese word segmentation plugin since 5.7.6. You can set the whole Chinese to split words according to the specified size.

    • 1. Add parameters under [mysqld] in my.ini to set the length of word splitting
ngram_token_size=2
    • 2. Create a table and insert data
create table ft ( id int primary key auto_increment, name varchar(20), address varchar(200) ); insert into ft values(1, 'Zhang San', 'No. 85 Courtyard of Building Materials City, Changping District, Beijing'); insert into ft values(2, 'Li Si', 'Shanghai Hongqiao Airport'); insert into ft values(3, 'Wang Wu', 'Baoding, Hebei'); insert into ft values(4, 'Zhao Liu', 'Haidian District, Beijing, People's Republic of China');
    • 3. address creates a full-text index. Note the later with parse ngram
create fulltext index index3 on ft(address) with parse ngram;
    • 4. Check whether the index information has been saved to the ft table. If not, set the index in ft to save to the index table. The premise of modification is that this table has a full-text index.
show variables like '%onnodb_ft_aux_table%'; 
set global innodb_ft_aux_table='optimization/ft';
    • 5. View index information
select * from information_schema.INNODB_FT_INDEX_CACHE;
    • 6. Test, no matter what the condition value is during the query, the query condition will be split and matched according to the matching excellent level.
mysql explain select * from ft where MATCH(address) AGAINST('123 cities in Beijing');
4. Composite Index

A composite index is one that has at least two columns added to the index when the index is created. When you create a composite index, you actually create multiple indexes. So put the most frequently used columns on the far left when you join the index.

show index from teacher;
create index mul_index on teacher(name, address);
# use index, type=ref
explain select * from teacher where name='teacher 1';
# No index is used, type=index
explain select * from teacher where address='oracle is a database';
# use index, type=ref
explain select * from teacher name='teacher 1' and address='o';
  • create index
CREATE INDEX index_name ON table_name(column_list);
5. Index optimization in MySQL

​ The above are all about the benefits of using indexes, but excessive use of indexes will cause abuse. So indexing also has its drawbacks. Although the index greatly improves the query speed, it will reduce the speed of updating the table. For example, when the number of INSERT, UPDATE and DELETE performed on the table is greater than the number of queries, the index will be abandoned. Therefore, when updating the table, MySQL not only saves the data, but also saves the index file. Indexing an index file that consumes disk space. In general, this problem is not very eye-catching, but if you create multiple composite indexes on a large table, the index file will swell very quickly. Index knowledge is one factor that improves efficiency. If your MySQL has tables with large amounts of data, you need to spend time researching to establish the best index, or optimize query statements.

  • Use a short index (prefix index) to index the string column, specifying a prefix length if possible. For example, if you have a CHAR(255) column, don't index the entire column if the majority value is unique within the first 10 or 20 characters. Segment miniaturization not only improves query speed, but also saves disk space and I/O operations.
CREATE INDEX index_name ON table_name(column(length));
  • Index column ordering MySQL queries use only one index, so if an index is already used in the where clause, the column in the order by will not use the index. Therefore, do not use the sorting operation if the default sorting of the database can meet the requirements; try not to include sorting of multiple columns, and create composite indexes for these columns if necessary.
  • When sorting, sorting too many rows may cause the index to fail.
  • Try not to use * to query all columns, which may cause the index to fail. (including index columns)
  • Like statement operations are generally not encouraged to use like operations. If they are not edible, how to use them is also a problem. like='%333%'will not use indexes, while like"aaa%" (non leading fuzzy queries) can use indexes After use, optimize to the range level
explain select * from teahcer where address like '%oracle%';
  • Do not perform operations on columns. Such as: select * from users where year (adddate) <'2007-01-01'; The calculation should be done on the business code instead of the database
  • The index range conditions used by the range column are: <, < =, >, > =, between, etc Range columns can be indexed (the joint index must be the leftmost prefix), but the columns after the range column cannot be indexed. The index can be used for at most one range column. If there are two range columns in the query condition, the index cannot be used for all of them. . Therefore, the most important query conditions are placed first in where.
alter table teacher add column(age int(3));
alter table teacher add column(weight int(3));
select * from teacher;
update teacher set age=10,weight=90 where id=1;
update teacher set age=20,weight=100 where id=2;
update teacher set age=30,weight=100 where id=3;
 
create index age_index on teacher(age);
create index weight_index on teacher(weight);
 
explain select * from teacher where age between 10 and 20 and weight between 90 and 100;
  • Type conversion will cause the index to be invalid When the column is a text type, the condition that treats a numeric type as a column will discard the index
explain select * from teacher where name=20;
6. Index Summary

Finally, to summarize, MySQL only uses indexes for the following operations: <, < =, =, > =, between, in, and sometimes like (not in the case of wildcard% or \u beginning) In theory, up to 16 indexes can be created in each table, but unless the amount of data is really large, using too many indexes is not so fun.

  • Recommendation: The number of indexes in a table should not exceed 6. If there are too many, consider whether it is necessary to build indexes on some columns that are not frequently used.
7. Common optimization strategies for SQL in MySQL
  • NULL value In the old version, the column containing NULL cannot trigger the index, but it can trigger the index in MySQL 5.7. But "unforeseen results" may occur. So add a not null constraint or default default value to the column when creating the table.
  • Avoid full table scan To optimize the query, you should try to avoid full table scan. First, you should consider creating indexes on the columns involved in where and order by.
  • Avoid negative conditions and try to avoid using them in where sentences= Or < > operator, otherwise the engine will give up using the index and perform a full table scan Negative conditions are:! =, < >, Not in, not exists, not like, etc
explain select * from teacher where address != 'aa';
  • Avoid using or logic You should try to avoid using or in the where clause to connect conditions, otherwise the engine will give up using the index and perform a full table scan, such as:
select id from t where num = 10 or num-20;
You can query like this
select id from t where num=10;
union all
select id from t where num = 20;
  • In the current MySQL 5.7 or can already trigger the index, the old version cannot.
  • Use in and not in with caution. The logical in can replace the union all operation. Although it slightly increases the CPU performance, it can also be ignored. In and not in should also be used with caution, otherwise it will result in a full table scan, such as:
select id from t1 where num in(select id from t2 where id > 10);
At this time, the outer query will perform a full table scan, and indexes are not applicable. Can be modified to:

SQL
select id from t1, (select id from t1 where id > 10) t2 where t1.id = t2.id;
At this time, the index is used, which can significantly improve the query efficiency.
  • Note Fuzzy Queries The query below will also result in a full table scan
select id from t where name like '%abc%';

If the fuzzy query is a necessary condition, you can use select id from t where name like 'abc%' to use the fuzzy query, and the index will be used at this time. If header matching is necessary logic, it is recommended to use a full-text search engine (Elasticsearch, Lucene, Solr, etc.).

  • Avoid field calculation in query conditions You should try to avoid performing expression operations on fields in where clauses. Zhejiang causes the engine to give up using indexes and perform full table scans. Such as:
select id from t where num/2=100;
should be changed to
select id from t where num = 100*2;
  • Avoid performing functional operations on fields in query conditions. Also try to avoid performing functional operations on fields in where clauses. Zhejiang causes the engine to give up using indexes and perform full table scans. For example:
select id from t where substring(name, 1, 3)='abc'
name with abc initial  id should be changed to
select id from t where name like 'abc%';
  • Note on the left side of the WHERE clause '=' Do not perform functions, arithmetic operations or other expression operations on the left side of the "=" in the where clause, otherwise the system may not be able to use the index correctly.
  • When using an index field as a condition for a composite index, if the index is a qualified index, the first field in the miniature must be used as a condition to ensure that the system can use the index. Otherwise, The field order should be as consistent as possible with the index order
  • Don't define meaningless queries Don't write meaningless queries, such as generating an empty table structure:
select col1,col2 into # from t where 1=0;
This kind of code will not return any result set, but will consume system resources, it should be changed to this:
create table #t(...)
  • exists is often a good choice to use exists instead of in:
select num from a where num in (select num from b)
Replace with the following statement
select num from a where exists(select 1 from b where num = a.num)
  • Indexes may also fail. Not all indexes are valid for queries. SQL optimizes queries based on the data in the table. When a large amount of data is duplicated in the index column, SQL queries may not use indexes. For example, the table has fields sex, male and female account for almost half of each, so even if an index is built on sex, it will not affect the query efficiency
  • Selection of table field type Use numeric fields as much as possible. If the fields of function value information should not be designed as character types, this will reduce the performance of query and connection, and increase the storage cost. This is because the engine compares each character of the string one by one when processing queries and joins, whereas only one comparison is required for numbers. Use varchar instead of char as much as possible, because first of all, the storage space of variable-length fields is small, which can save storage space, and secondly, for queries, the search efficiency in a relatively small field is obviously higher.
  • Fields in query syntax Do not use select * from t anywhere, use a specific field list to represent "*", and do not return any fields that are not used.
  • Index-independent optimization

Do not use *, try not to apply union,union all and other keywords, try not to apply or keyword, try to use equivalent judgment.

No more than 5 table joins are recommended. If there are more than 5, consider the design of the table. (in Internet applications).

The table join method uses outer joins due to inlining. Outer joins have underlying data. Such as: A left join B, the basic data is A. A inner join B, if there is no basic data, first use the Cartesian product to complete the full join, and then obtain the inner join result set according to the join conditions.

When performing paging query for tables with large data volume, if the number of page numbers is too large, sub-queries are used to complete the paging logic.

select * from table limit 1000000, 10;
select * from table where id in (select pk from table limit 1000000, 10);
 

Tags: MySQL

Posted by threaders on Fri, 01 Jul 2022 01:55:23 +0930