如何優化MySQL反連接

發布時間：2021-11-01 11:53:43 來源：億速云閱讀：111 作者：小新欄目：MySQL數據庫

這篇文章給大家分享的是有關如何優化MySQL反連接的內容。小編覺得挺實用的，因此分享給大家做個參考，一起跟隨小編過來看看吧。

總體來看這個環境還是相對比較繁忙的，線程大概是200多個。
# mysqladmin pro|less|wc -l
235
帶著好奇查看慢日志，馬上定位到這個語句，已做了脫敏處理。
# Time: 161013 9:51:45
# User@Host: root[root] @ localhost []
# Thread_id: 24630498 Schema: test Last_errno: 1160 Killed: 0
# Query_time: 61213.561106 Lock_time: 0.000082 Rows_sent: 7551 Rows_examined: 201945890920 Rows_affected: 0 Rows_read: 7551
# Bytes_sent: 0 Tmp_tables: 1 Tmp_disk_tables: 0 Tmp_table_sizes: 0
# InnoDB_trx_id: 2F8E5A82
SET timestamp=1476323505;
select account from t_fund_info
where money >=300 and account not in
(select distinct(login_account) from t_user_login_record where login_time >='2016-06-01')
into outfile '/tmp/data.txt';
從慢日志來看，執行時間達61213s,這個是相當驚人了，也就意味著這個語句跑了一整天。
這引起了我的好奇和興趣，這個問題有得搞頭了。
表t_fund_info數據量近200萬，存在一個主鍵在id列，唯一性索引在account上。
CREATE TABLE `t_fund_info`
。。。
PRIMARY KEY (`id`),
UNIQUE KEY `account` (`account`)
) ENGINE=InnoDB AUTO_INCREMENT=1998416 DEFAULT CHARSET=utf8
表t_user_login_record數據量2千多萬，存在主鍵列id
CREATE TABLE `t_user_login_record`
。。。
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=22676193 DEFAULT CHARSET=utf8
從語句可以看出，是在做一個批量的大查詢，是希望把查詢結果生成一個文本文件來，但是過濾條件很有限。目前根據查詢來看肯定是全表掃描。
先簡單看了下過濾條件，從t_fund_info這個表中，根據一個過濾條件能過濾掉絕大多數的數據，得到1萬多數據，還是比較理想的。
> select count(*)from t_fund_info where money >=300;
+----------+
| count(*) |
+----------+
|    13528 |
+----------+
1 row in set (0.99 sec)
那問題的瓶頸看來是在后面的子查詢了。
把下面的語句放入一個SQL腳本query.sql
select distinct(login_account) from t_user_login_record where login_time >='2016-06-01';
導出數據，大概耗時1分鐘。
time mysql test < query.sql > query_rt.log
real    0m59.149s
user    0m0.394s
sys     0m0.046s
過濾后的數據有50多萬，相對還是比較理想的過濾情況。
# less query_rt.log|wc -l
548652
我們來解析一下這個語句，看看里面的Not in的條件是怎么解析的。
explain extended select account from t_fund_info
where money >=300 and account not in
(select distinct(login_account) from t_user_login_record where login_time >='2016-06-01')；
show warnings;
結果如下：
Message: select `test`.`t_fund_info`.`account` AS `account` from `test`.`t_fund_info` where ((`test`.`t_fund_info`.`money` >= 300) and (not((`test`.`t_fund_info`.`account`,(select distinct 1 from `test`.`t_user_login_record` where ((`test`.`t_user_login_record`.`login_time` >= '2016-06-01') and (((`test`.`t_fund_info`.`account`) = `test`.`t_user_login_record`.`login_account`) or isnull(`test`.`t_user_login_record`.`login_account`))) having (`test`.`t_user_login_record`.`login_account`))))))
可以看到整個解析的過程非常復雜，原本簡單的一個語句，經過解析，竟然變得如此復雜。

因為MySQL里面的優化改進空間相比Oracle還是少很多，我決定循序漸進來嘗試優化。因為這個環境還是很重要的，所以我在從庫端使用mysqldump導出數據，導入到另外一個測試環境，放開手腳來測試了。
首先對于not in的部分，是否是因為生成臨時表的消耗代價太高導致，所以我決定建立一個臨時表來緩存子查詢的數據。
> create table test_tab as select distinct(login_account) login_account from t_user_login_record where login_time >='2016-06-01';
Query OK, 548650 rows affected (1 min 3.78 sec)
Records: 548650 Duplicates: 0 Warnings: 0
這樣查看這個臨時表就很輕松了，不到1秒就出結果。
> select count(*)from test_tab;
+----------+
| count(*) |
+----------+
|   548650 |
+----------+
1 row in set (0.38 sec)
然后再次查看使用臨時表后的查詢是否解析會有改善。
explain extended select account from t_fund_info
where money >=300 and account not in(select login_account from test_tab);
show warnings;
發現還是一樣，可見臨時表的改進效果不大。
| Note | 1003 | select `test`.`t_fund_info`.`account` AS `account` from `test`.`t_fund_info` where ((`test`.`t_fund_info`.`money` >= 300) and (not((`test`.`t_fund_info`.`account`,(select 1 from `test`.`test_tab` where (((`test`.`t_fund_info`.`account`) = `test`.`test_tab`.`login_account`) or isnull(`test`.`test_tab`.`login_account`)) having (`test`.`test_tab`.`login_account`)))))) |
是否是因為子查詢中的數據量太大導致整個反連接的查詢過程中回表太慢，那我縮小一下子查詢的數據條數。
select account from t_fund_info
where money >=300 and not exists (select login_account from test_tab where login_account=t_fund_info.account limit 1,10);
這種方式依舊很卡，持續了近半個小時還是沒有反應，所以果斷放棄。
是不是t_fund_info的過濾查詢導致了性能問題，我們也創建一個臨時表
> create table test_tab1 as select account from t_fund_info
    -> where money >=300;
Query OK, 13528 rows affected (1.38 sec)
Records: 13528 Duplicates: 0 Warnings: 0
再次查詢效果依舊很不理想。
select account from test_tab1
where not exists (select login_account from test_tab where login_account=test_tab1.account limit 1,10);
持續了20多分鐘還是沒有反應，所以還是果斷放棄。
這個時候能想到就是索引了，我們在臨時表test_tab上創建索引。
> create index ind_tmp_login_account on test_tab(login_account);
Query OK, 0 rows affected (4.31 sec)
Records: 0 Duplicates: 0 Warnings: 0
在臨時表test_tab1上也創建索引。
> create index ind_tmp_account on test_tab1(account);
Query OK, 0 rows affected (0.18 sec)
Records: 0 Duplicates: 0 Warnings: 0
再次查看性能就變得很好了，運行時間0.15秒，簡直不敢相信。
explain select account from test_tab1
where not exists (select login_account from test_tab where login_account=test_tab1.account );
11364 rows in set (0.15 sec)

執行計劃如下：
如何優化MySQL反連接
可見通過這種拆分，不斷的猜測和排除，已經找到了一些思路。
我們開始抓住問題的本質。
首先刪除test_tab1上的索引，看看執行效果如何。
> alter table test_tab1 drop index ind_tmp_account;
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
語句如下，執行時間0.15秒
select account from test_tab1
     where not exists (select login_account from test_tab where login_account=test_tab1.account );
+--------------------------------+
11364 rows in set (0.15 sec)
是否not in的方式會有很大的差別呢，持續0.18秒，有差別，但差別不大。
select account from test_tab1
     where account not in (select login_account from test_tab );
+--------------------------------+
11364 rows in set (0.18 sec)
我們逐步恢復原來的查詢，去除臨時表test_tab1,整個查詢持續了1.12秒。
select account from t_fund_info
where money >=300 and account not in(select login_account from test_tab);
+--------------------------------+
11364 rows in set (1.12 sec)
使用explain extended解析的內容如下：
Message: select `test`.`t_fund_info`.`account` AS `account` from `test`.`t_fund_info` where ((`test`.`t_fund_info`.`money` >= 300) and (not((`test`.`t_fund_info`.`account`,(((`test`.`t_fund_info`.`account`) in test_tab on ind_tmp_login_account checking NULL having (`test`.`test_tab`.`login_account`)))))))
這個時候，問題已經基本定位了。在反連接的查詢中，在這個問題場景中，需要對子查詢的表添加一個索引基于login_account，可以和外層的查詢字段映射，提高查詢效率。
當然在一個數據量龐大，業務相對繁忙的系統中，添加一個臨時需求的索引可能不是一個很好的方案。不過我們還是在測試環境體驗一下。
> create index ind_tmp_account1 on t_user_login_record(login_account);
Query OK, 0 rows affected (4 min 45.48 sec)
Records: 0 Duplicates: 0 Warnings: 0
添加索引的過程持續了近4分鐘，在這個時候我們使用最開始的查詢語句，性能如何呢。
select account from t_fund_info where money >=300 and account not in (select distinct(login_account) from t_user_login_record where);
+--------------------------------+
11364 rows in set (2.52 sec)
只要2.52秒就可以完成之前20多個小時查詢結果，性能簡直就是天壤之別。
不過話說回來，跑批查詢可以在從庫上執行，從庫上創建一個這樣的索引，用完再刪掉也是不錯的選擇，要么就是創建一個臨時表，在臨時表上創建索引，臨時表的意義就在于此，不是為了做查詢結果緩存而是創建索引來提高數據過濾效率。
在此有個問題就是臨時表只有一個字段，創建索引的意義在哪里呢。
我畫一個圖來解釋一下。
如何優化MySQL反連接
首先這個查詢的數據是以t_fund_info的過濾條件為準，從200萬數據中過濾得到1萬條數據，然后兩個字段通過account=login_account的條件關聯，而不是先關聯子查詢的過濾條件 login_time,過濾完之后account的值之后再過濾login_time，最后根據not in的邏輯來取舍數據，整個數據集就會大大減少。如此一來，子查詢的表千萬行，性能的差別就不會是指數級的。

感謝各位的閱讀！關于“如何優化MySQL反連接”這篇文章就分享到這里了，希望以上內容可以對大家有一定的幫助，讓大家可以學到更多知識，如果覺得文章不錯，可以把它分享出去讓更多的人看到吧！

向AI問一下細節

亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

如何優化MySQL反連接

猜你喜歡

亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

如何優化MySQL反連接

猜你喜歡

最新資訊

相關推薦

相關標簽