MySQL联合索引中不同区分度列的顺序对查询性能的影响

Posted Oct 9, 2024

By Bao Tingyu

4 min read

当构建联合索引时，需要考虑索引的顺序问题，除了考虑使用场景，索引的顺序是如何影响查询性能的呢？先来做个小实验。

首先构建以下表：

  
create table if not exists test.flow
(
    id            bigint auto_increment primary key,
    random_string varchar(255) not null,
    binary_int    int          not null
);

插入100w条数据，其中random_string字段为随机字符串，区分度高，而binary_int为0和1，区分度低，数据采样如下

id,random_string,binary_int
229594,00000dc2-cba2-4991-a53a-1c5dda022b5d,0
826489,000010dc-3b23-4e9d-b10d-658b5830d09f,1
331127,00002c6e-f375-4c17-81ef-7cf446386a5b,1
548640,000031c1-7f99-4793-96e4-99212b3938c2,0
181729,000031f9-d23c-4ff0-929f-d5ecb5adb6fb,1
211099,00003b8b-f832-41be-bc1c-e0b4ffc77db7,1
669026,00004837-47cb-4bf8-9b67-bffb2919bd98,0
156546,00004cf6-5b5a-46c4-9f5c-64bbf58c15a2,0
154805,00004e1e-2b6b-4d08-90f2-f63881d5b96c,1
191821,00005155-575e-4e3c-b08f-2ed901940ca3,1
809912,0000550b-f255-4c29-b0ae-850fed20943b,0
585914,00005c98-16e3-4f29-b674-5094946fdb2a,0
222754,00006c08-3a8e-4a33-be34-3e86d1c64253,0
260696,00007c12-a386-4567-8d01-333fb6a35e1a,0

分别对该表添加联合索引:

alter table flow add index idx_rstr_bint (binary_int, random_string)
alter table flow add index idx_bint_rstr (random_string, binary_int)

分别在不同联合索引情况下执行以下等值查询的explain语句：

  
explain select *
from flow where binary_int = 0 and random_string = '0000dc2f-4b56-4e54-910c-a88ba2511ec9';

执行结果如下：

select_type	table	partitions	type	possible_keys	key	ken_len	ref	rows	filtered	Extra
SIMPLE	flow	null	ref	idx_rstr_bint	idx_rstr_bint	1022	const,const	1	100	Using index

select_type	table	partitions	type	possible_keys	key	ken_len	ref	rows	filtered	Extra
SIMPLE	flow	null	ref	idx_bint_rstr	idx_bint_rstr	1026	const,const	1	100	Using index

可以看到等值查询的执行扫描行数都是一样的，即无影响。

如果对random_string进行范围查询呢？分别在不同联合索引情况下执行以下等值查询的explain语句：

  
explain select *
from flow where binary_int = 0 and random_string < '0000dc2f-4b56-4e54-910c-a88ba2511ec9';

select_type	table	partitions	type	possible_keys	key	ken_len	ref	rows	filtered	Extra
SIMPLE	flow	null	range	idx_rstr_bint	idx_rstr_bint	1022	null	21	10	Using where; Using index

select_type	table	partitions	type	possible_keys	key	ken_len	ref	rows	filtered	Extra
SIMPLE	flow	null	range	idx_bint_rstr	idx_bint_rstr	1026	null	10	100	Using index

可以看到当索引顺序是 (random_string, binary_int)时，扫描的行数会多些，是先利用索引筛选出 random_string < '0000dc2f-4b56-4e54-910c-a88ba2511ec9' 的数据行，然后使用where binary_int = 0进行过滤；当索引顺序是(binary_int, random_string)是，直接走索引查出返回的数据行。

可以看到等值查询时，列的顺序不影响联合索引的查询性能，而范围查询时，范围查询列应放在联合索引的列的最后。

explain

结合15.8.2 EXPLAIN Statement，回顾一下explain语句。

explain可以对 SELECT, DELETE, INSERT, REPLACE, UPDATE, TABLE的执行过程进行分析展示，不过最常用来分析SELECT的执行过程。

看下对SELECT语句explain的字段含义：

Column	Meaning
select_type	The SELECT type
table	The table for the output row
partitions	The matching partitions ，非分区表为null
type	The join type
possible_keys	The possible indexes to choose
key	The index actually chosen
key_len	The length of the chosen key
ref	The columns compared to the index
rows	Estimate of rows to be examined
filtered	Percentage of rows filtered by table condition
Extra	Additional information

关于字段具体内容可参考：10.8.2 EXPLAIN Output Format

explain可指定输入格式为TREE：

  
explain format = TREE select * from flow where random_string = '00002c6e-f375-4c17-81ef-7cf446386a5b';

结果：

  
-> Filter: (flow.random_string = '00002c6e-f375-4c17-81ef-7cf446386a5b')  (cost=36116 rows=99568)
    -> Covering index skip scan on flow using idx_binaryint_randomstring over random_string = '00002c6e-f375-4c17-81ef-7cf446386a5b'  (cost=36116 rows=99568)

explain可指定输入格式为JSON：

explain format = JSON select * from flow where random_string = '00002c6e-f375-4c17-81ef-7cf446386a5b';

结果：

  
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "36116.17"
    },
    "table": {
      "table_name": "flow",
      "access_type": "range",
      "possible_keys": [
        "idx_binaryint_randomstring"
      ],
      "key": "idx_binaryint_randomstring",
      "used_key_parts": [
        "binary_int",
        "random_string"
      ],
      "key_length": "1026",
      "rows_examined_per_scan": 99568,
      "rows_produced_per_join": 99568,
      "filtered": "100.00",
      "using_index_for_skip_scan": true,
      "cost_info": {
        "read_cost": "26159.38",
        "eval_cost": "9956.80",
        "prefix_cost": "36116.18",
        "data_read_per_join": "98M"
      },
      "used_columns": [
        "id",
        "random_string",
        "binary_int"
      ],
      "attached_condition": "(`test`.`flow`.`random_string` = '00002c6e-f375-4c17-81ef-7cf446386a5b')"
    }
  }
}

参考

编程, MySQL

mysql

This post is licensed under CC BY 4.0 by the author.

explain

参考

Trending Tags