Post

MySQL联合索引中不同区分度列的顺序对查询性能的影响

当构建联合索引时,需要考虑索引的顺序问题,除了考虑使用场景,索引的顺序是如何影响查询性能的呢?先来做个小实验。

首先构建以下表:

1
2
3
4
5
6
create table if not exists test.flow
(
    id            bigint auto_increment primary key,
    random_string varchar(255) not null,
    binary_int    int          not null
);

插入100w条数据,其中random_string字段为随机字符串,区分度高,而binary_int为0和1,区分度低,数据采样如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
id,random_string,binary_int
229594,00000dc2-cba2-4991-a53a-1c5dda022b5d,0
826489,000010dc-3b23-4e9d-b10d-658b5830d09f,1
331127,00002c6e-f375-4c17-81ef-7cf446386a5b,1
548640,000031c1-7f99-4793-96e4-99212b3938c2,0
181729,000031f9-d23c-4ff0-929f-d5ecb5adb6fb,1
211099,00003b8b-f832-41be-bc1c-e0b4ffc77db7,1
669026,00004837-47cb-4bf8-9b67-bffb2919bd98,0
156546,00004cf6-5b5a-46c4-9f5c-64bbf58c15a2,0
154805,00004e1e-2b6b-4d08-90f2-f63881d5b96c,1
191821,00005155-575e-4e3c-b08f-2ed901940ca3,1
809912,0000550b-f255-4c29-b0ae-850fed20943b,0
585914,00005c98-16e3-4f29-b674-5094946fdb2a,0
222754,00006c08-3a8e-4a33-be34-3e86d1c64253,0
260696,00007c12-a386-4567-8d01-333fb6a35e1a,0

分别对该表添加联合索引:

  • alter table flow add index idx_rstr_bint (binary_int, random_string)
  • alter table flow add index idx_bint_rstr (random_string, binary_int)

分别在不同联合索引情况下执行以下等值查询的explain语句:

1
2
explain select *
from flow where binary_int = 0 and random_string = '0000dc2f-4b56-4e54-910c-a88ba2511ec9';

执行结果如下:

select_typetablepartitionstypepossible_keyskeyken_lenrefrowsfilteredExtra
SIMPLEflownullrefidx_rstr_bintidx_rstr_bint1022const,const1100Using index
select_typetablepartitionstypepossible_keyskeyken_lenrefrowsfilteredExtra
SIMPLEflownullrefidx_bint_rstridx_bint_rstr1026const,const1100Using index

可以看到等值查询的执行扫描行数都是一样的,即无影响。

如果对random_string进行范围查询呢? 分别在不同联合索引情况下执行以下等值查询的explain语句:

1
2
explain select *
from flow where binary_int = 0 and random_string < '0000dc2f-4b56-4e54-910c-a88ba2511ec9';
select_typetablepartitionstypepossible_keyskeyken_lenrefrowsfilteredExtra
SIMPLEflownullrangeidx_rstr_bintidx_rstr_bint1022null2110Using where; Using index
select_typetablepartitionstypepossible_keyskeyken_lenrefrowsfilteredExtra
SIMPLEflownullrangeidx_bint_rstridx_bint_rstr1026null10100Using index

可以看到当索引顺序是 (random_string, binary_int)时,扫描的行数会多些,是先利用索引筛选出 random_string < '0000dc2f-4b56-4e54-910c-a88ba2511ec9' 的数据行,然后使用where binary_int = 0进行过滤;当索引顺序是(binary_int, random_string)是,直接走索引查出返回的数据行。

可以看到等值查询时,列的顺序不影响联合索引的查询性能,而范围查询时,范围查询列应放在联合索引的列的最后。

explain

结合15.8.2 EXPLAIN Statement,回顾一下explain语句。

explain可以对 SELECT, DELETE, INSERT, REPLACE, UPDATE, TABLE的执行过程进行分析展示,不过最常用来分析SELECT的执行过程。

看下对SELECT语句explain的字段含义:

ColumnMeaning
select_typeThe SELECT type
tableThe table for the output row
partitionsThe matching partitions ,非分区表为null
typeThe join type
possible_keysThe possible indexes to choose
keyThe index actually chosen
key_lenThe length of the chosen key
refThe columns compared to the index
rowsEstimate of rows to be examined
filteredPercentage of rows filtered by table condition
ExtraAdditional information

关于字段具体内容可参考:10.8.2 EXPLAIN Output Format

explain可指定输入格式为TREE:

1
explain format = TREE select * from flow where random_string = '00002c6e-f375-4c17-81ef-7cf446386a5b';

结果:

1
2
3
-> Filter: (flow.random_string = '00002c6e-f375-4c17-81ef-7cf446386a5b')  (cost=36116 rows=99568)
    -> Covering index skip scan on flow using idx_binaryint_randomstring over random_string = '00002c6e-f375-4c17-81ef-7cf446386a5b'  (cost=36116 rows=99568)

explain可指定输入格式为JSON:

1
explain format = JSON select * from flow where random_string = '00002c6e-f375-4c17-81ef-7cf446386a5b';

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "36116.17"
    },
    "table": {
      "table_name": "flow",
      "access_type": "range",
      "possible_keys": [
        "idx_binaryint_randomstring"
      ],
      "key": "idx_binaryint_randomstring",
      "used_key_parts": [
        "binary_int",
        "random_string"
      ],
      "key_length": "1026",
      "rows_examined_per_scan": 99568,
      "rows_produced_per_join": 99568,
      "filtered": "100.00",
      "using_index_for_skip_scan": true,
      "cost_info": {
        "read_cost": "26159.38",
        "eval_cost": "9956.80",
        "prefix_cost": "36116.18",
        "data_read_per_join": "98M"
      },
      "used_columns": [
        "id",
        "random_string",
        "binary_int"
      ],
      "attached_condition": "(`test`.`flow`.`random_string` = '00002c6e-f375-4c17-81ef-7cf446386a5b')"
    }
  }
}

参考

This post is licensed under CC BY 4.0 by the author.