PostgreSQL SQL HINT的使用說明

發布時間：2020-08-11 08:44:47 來源：ITPUB博客閱讀：238 作者：sqysl 欄目：關系型數據庫

本文來自： http://www.023dns.com/Database_mssql/5974.html

PostgreSQL優化器是基于成本的 (CBO) , (當然, 如果開啟了GEQO的話, 在關聯表數量超過一定閾值后, 會采用GEQO, 這主要是由于在關聯表太多的情況下, 窮舉法可能帶來巨大的PLAN開銷, 所以GEQO輸出的執行計劃不一定是最優的)

　　本文要談的和GEQO沒什么關系, 主要是與CBO相關.

　　當PostgreSQL使用CBO時, 就一定能每次都輸出最優的執行計劃嗎?

　　1. 首選我們看看CBO考察了哪些因素, 它是如何計算成本的?

　　成本和掃描方式, 關聯方式, 操作符, 成本因子, 數據集等都有關, 具體的計算方法可參考如下代碼:

　　src/backend/optimizer/path/costsize.c

　　我們這里簡單的列舉一下, 哪些因素會影響成本計算的結果, 具體算法見costsize.c :

　　-- 表有多少條記錄, 影響全表掃描的 CPU處理記錄的COST.

　　-- 表有多少個數據塊, 影響掃描數據塊的成本; 例如全表掃描, 索引掃描, 都需要掃描數據塊.

　　-- 成本因子, 影響成本的計算結果; 例如連續或隨機掃描單個數據塊的成本因子, CPU從HEAP塊處理一條記錄的成本因子, 從INDEX塊處理一條索引記錄的成本因子, 執行一個操作符或函數的成本因子.

　　-- 數據存儲物理順序和索引順序的離散度, 影響索引掃描的計算成本.

　　-- 內存大小, 影響索引掃描的計算成本.

　　-- 列統計信息(列寬, 空值比例, 唯一值比例, 高頻值及其比例, bucket, 物理順序和索引順序的離散度, 數組的話還有數組的統計信息, 等), 影響選擇性, 即結果集行數, 最終影響索引掃描的計算成本.

　　-- 創建函數或操作符時設置的成本.

　　2. 然后我們看看哪些因素CBO沒有考慮進去, 還有哪些因素CBO考慮進去了, 但是可能會隨時發生變化的.

　　PostgreSQL是否能動態的跟上這些變化?

　　2.1 PostgreSQL開啟自動analyze, 可以適時更新的因素如下 :

　　-- 表有多少數據塊, 記錄數, 更新pg_class.relpages, pg_class.reltuples

　　-- 列統計信息, 數據存儲物理順序和索引順序的離散度, 更新pg_statistic

　　2.2 靜態配置因素 :

　　-- 實際可用作緩存的內存, 因為數據庫所在的操作系統中可能還運行了其他程序, 可用作緩存的內存可能會發生變化. 即使沒有運行其他程序, 當數據庫會話中有大量使用了work_mem時, 也會造成可用做緩存的內存發生變化.

　　-- 創建函數或操作符時設置的成本, 當函數因為內部SQL或處理邏輯等變化, 可能導致函數本身的處理時間發生變化.

　　2.3 未考慮的因素 :

　　-- 塊設備的的預讀, 一般情況下一次讀取時, 會預讀128KB的數據.

　　# blockdev --getra /dev/sda

　　256

　　這又有什么影響呢? 如果你要讀取的數據在連續的128KB數據塊中, 那么只需要一次塊設備的IO. 對于數據庫來說, 掃描數據時掃多少個數據塊可不管這個, 都會計算成本, 因此對于不同的塊設備預讀配置, 或者對于不同的塊設備(如機械盤和SSD), 掃描成本可能不一樣. PostgreSQL塊設備的性能反映在成本計算方面, 就是seq_page_cost, random_page_cost.

　　這兩個參數可以針對表空間設置, 也就是說, 對于不同的表空間, 可以設置不同的值, 比如我們有在SSD建立的表空間, 也有在普通機械盤上創建的表空間, 當然需要設置不同的seq_page_cost, random_page_cost值.

　　但是對于預讀來說, 如果發生了變更, 對實際的性能會有細微的影響, 一般應該不會一天到晚變更塊設備的read ahead吧.

　　2.4 generic plan cache, 即執行計劃緩存.

　　PostgreSQL 通過choose_custom_plan選擇重新規劃執行計劃還是使用緩存的執行計劃, 當cached plan成本大于custom的平均成本時, 會選擇custom plan , 所以當統計信息正確的情況下, 可以及時發現緩存執行計劃的問題并及時規劃新的執行計劃.

　　詳情請見 : src/backend/utils/cache/plancache.c

　　2.5 采樣精度參數default_statistics_target , 影響bucket個數, 采樣的精度.

　　經過一番分析, PostgreSQL使用了CBO, 就一定能"每次"都輸出最優的執行計劃嗎?

　　1. 首選要確保人為設置成本因子準確, 另外還需要打開自動analyze(適時更新列統計信息, 塊, 離散度等),

　　2. 影響成本的因素還有一些是靜態配置的 : 比如可用作BUFFER的內存, 函數的成本.

　　3. 還有沒考慮的: 預讀 (甚微).

　　在大多數情況下, 如果我們設置了合理的配置,那么很少需要使用hint的. 除了以上2,3提到的兩點.

　　同時hint也存在比較嚴重的弊端, 如果將hint寫在程序代碼中, 一旦需要變更執行計劃, 還需要改程序代碼, 不靈活.

　　當然, 我們不排除另一種用HINT的出發點, 比如調試. 我就想看看不同執行計劃下執行效率是否和想象的一樣.

　　(我們也可以使用開關來控制執行計劃, 但是有HINT不是更直接一點嘛)

　　從長遠來看, 如果僅僅從性能角度來說, 不斷地改進數據庫本身的優化器是比較靠譜的. 但是對于例如調試這樣的需求, 有HINT更方便也是對的.

　　進入主題, 大多數Oracle用戶在接觸到PostgreSQL后, 會問PG有沒有SQL hint?

　　為了讓數據庫按照用戶的想法輸出執行計劃, 一般來說PostgreSQL提供了一些開關, 比如關閉全表掃描, 讓它去走索引.

　　關閉索引掃描, 讓它去走bitmap或全表掃描, 關閉嵌套循環, 讓他去走hash join或merge join等.

　　但是僅僅有這些開關, 還不是非常的好用, 那么到底有沒有直接點的HINT呢?

　　有一個插件可以解決你的問題,：pg_hint_plan.

　　pg_hint_plan利用PostgreSQL 開放的hook接口, 所以不需要改PG代碼就實現了注入HINT的功能.

　　* Module load callbacks

　　void

　　_PG_init(void)

　　{

　　...

　　}

　　由于不同PostgreSQL 版本, plan部分的代碼可能會不一致, 所以pg_hint_plan也是分版本發布的源碼.

　　比如我要在PostgreSQL 9.4.1中測試一下這個工具.

　　接下來測試一下 :

　　安裝

　　# wget http://iij.dl.sourceforge.jp/pghintplan/62456/pg_hint_plan94-1.1.3.tar.gz

　　# tar -zxvf pg_hint_plan94-1.1.3.tar.gz

　　# cd pg_hint_plan94-1.1.3

　　[root@db-172-16-3-150 pg_hint_plan94-1.1.3]# export PATH=/opt/pgsql/bin:$PATH

　　[root@db-172-16-3-150 pg_hint_plan94-1.1.3]# which psql

　　/opt/pgsql/bin/psql

　　[root@db-172-16-3-150 pg_hint_plan94-1.1.3]# psql -V

　　psql (PostgreSQL) 9.4.1

　　# gmake clean

　　# gmake

　　# gmake install

　　[root@db-172-16-3-150 pg_hint_plan94-1.1.3]# ll -rt /opt/pgsql/lib|tail -n 1

　　-rwxr-xr-x 1 root root 78K Feb 18 09:31 pg_hint_plan.so

　　[root@db-172-16-3-150 pg_hint_plan94-1.1.3]# su - postgres

　　$ vi $PGDATA/postgresql.conf

　　shared_preload_libraries = 'pg_hint_plan'

　　pg_hint_plan.enable_hint = on

　　pg_hint_plan.debug_print = on

　　pg_hint_plan.message_level = log

　　$ pg_ctl restart -m fast

　　postgres@db-172-16-3-150-> psql

　　psql (9.4.1)

　　Type "help" for help.

　　postgres=# create extension pg_hint_plan;

　　CREATE EXTENSION

　　用法舉例說明 :

　　postgres=# create table a(id int primary key, info text, crt_time timestamp);

　　CREATE TABLE

　　postgres=# create table b(id int primary key, info text, crt_time timestamp);

　　CREATE TABLE

　　postgres=# insert into a select generate_series(1,100000), 'a_'||md5(random()::text), clock_timestamp();

　　INSERT 0 100000

　　postgres=# insert into b select generate_series(1,100000), 'b_'||md5(random()::text), clock_timestamp();

　　INSERT 0 100000

　　postgres=# analyze a;

　　ANALYZE

　　postgres=# analyze b;

　　ANALYZE

　　postgres=# explain select a.*,b.* from a,b where a.id=b.id and a.id<10;< p="">

　　QUERY PLAN

　　-----------------------------------------------------------------------

　　Nested Loop (cost=0.58..83.35 rows=9 width=94)

　　-> Index Scan using a_pkey on a (cost=0.29..8.45 rows=9 width=47)

　　Index Cond: (id < 10)

　　-> Index Scan using b_pkey on b (cost=0.29..8.31 rows=1 width=47)

　　Index Cond: (id = a.id)

　　(5 rows)

　　在沒有pg_hint_plan時, 我們需要使用開關來改變PostgreSQL的執行計劃

　　postgres=# set enable_nestloop=off;

　　SET

　　postgres=# explain select a.*,b.* from a,b where a.id=b.id and a.id<10;< p="">

　　QUERY PLAN

　　-----------------------------------------------------------------------------

　　Hash Join (cost=8.56..1616.65 rows=9 width=94)

　　Hash Cond: (b.id = a.id)

　　-> Seq Scan on b (cost=0.00..1233.00 rows=100000 width=47)

　　-> Hash (cost=8.45..8.45 rows=9 width=47)

　　-> Index Scan using a_pkey on a (cost=0.29..8.45 rows=9 width=47)

　　Index Cond: (id < 10)

　　(6 rows)

　　postgres=# set enable_nestloop=on;

　　SET

　　postgres=# explain select a.*,b.* from a,b where a.id=b.id and a.id<10;< p="">

　　QUERY PLAN

　　-----------------------------------------------------------------------

　　Nested Loop (cost=0.58..83.35 rows=9 width=94)

　　-> Index Scan using a_pkey on a (cost=0.29..8.45 rows=9 width=47)

　　Index Cond: (id < 10)

　　-> Index Scan using b_pkey on b (cost=0.29..8.31 rows=1 width=47)

　　Index Cond: (id = a.id)

　　(5 rows)

　　使用pg_hint_plan來改變PostgreSQL的執行計劃，如下所示 :

　　postgres=# /*+

　　HashJoin(a b)

　　SeqScan(b)

　　*/ explain select a.*,b.* from a,b where a.id=b.id and a.id<10;< p="">

　　QUERY PLAN

　　-----------------------------------------------------------------------------

　　Hash Join (cost=8.56..1616.65 rows=9 width=94)

　　Hash Cond: (b.id = a.id)

　　-> Seq Scan on b (cost=0.00..1233.00 rows=100000 width=47)

　　-> Hash (cost=8.45..8.45 rows=9 width=47)

　　-> Index Scan using a_pkey on a (cost=0.29..8.45 rows=9 width=47)

　　Index Cond: (id < 10)

　　(6 rows)

　　postgres=# /*+ SeqScan(a) */ explain select * from a where id<10;< p="">

　　QUERY PLAN

　　------------------------------------------------------

　　Seq Scan on a (cost=0.00..1483.00 rows=10 width=47)

　　Filter: (id < 10)

　　(2 rows)

　　postgres=# /*+ BitmapScan(a) */ explain select * from a where id<10;< p="">

　　QUERY PLAN

　　---------------------------------------------------------------------

　　Bitmap Heap Scan on a (cost=4.36..35.17 rows=9 width=47)

　　Recheck Cond: (id < 10)

　　-> Bitmap Index Scan on a_pkey (cost=0.00..4.36 rows=9 width=0)

　　Index Cond: (id < 10)

　　(4 rows)

　　目前pg_hint_plan支持的HINT

　　http://pghintplan.sourceforge.jp/hint_list.html

　　The available hints are listed below.

　　 PostgreSQL SQL HINT的使用說明

　　[參考]

　　1. http://pghintplan.sourceforge.jp/pg_hint_plan-en.html

　　2. http://pghintplan.sourceforge.jp/pg_hint_plan.html

　　3. http://pghintplan.sourceforge.jp/hint_list.html

　　4. http://pghintplan.sourceforge.jp/

　　5. src/backend/optimizer/path/costsize.c

　　6. src/backend/utils/cache/plancache.c

向AI問一下細節

亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

PostgreSQL SQL HINT的使用說明

猜你喜歡

亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

PostgreSQL SQL HINT的使用說明

猜你喜歡

最新資訊

相關推薦

相關標簽