您好,登錄后才能下訂單哦!
這篇文章主要介紹“PostgreSQL中remove_useless_joins的實現邏輯是怎樣的”,在日常操作中,相信很多人在PostgreSQL中remove_useless_joins的實現邏輯是怎樣的問題上存在疑惑,小編查閱了各式資料,整理出簡單好用的操作方法,希望對大家解答”PostgreSQL中remove_useless_joins的實現邏輯是怎樣的”的疑惑有所幫助!接下來,請跟著小編一起來學習吧!
query_planner代碼片段:
//... /* * Remove any useless outer joins. Ideally this would be done during * jointree preprocessing, but the necessary information isn't available * until we've built baserel data structures and classified qual clauses. */ joinlist = remove_useless_joins(root, joinlist);//清除無用的外連接 /* * Also, reduce any semijoins with unique inner rels to plain inner joins. * Likewise, this can't be done until now for lack of needed info. */ reduce_unique_semijoins(root);//消除半連接 /* * Now distribute "placeholders" to base rels as needed. This has to be * done after join removal because removal could change whether a * placeholder is evaluable at a base rel. */ add_placeholders_to_base_rels(root);//在"base rels"中添加PH //...
PlaceHolderVar
上一小節已介紹過PHInfo
/* * Placeholder node for an expression to be evaluated below the top level * of a plan tree. This is used during planning to represent the contained * expression. At the end of the planning process it is replaced by either * the contained expression or a Var referring to a lower-level evaluation of * the contained expression. Typically the evaluation occurs below an outer * join, and Var references above the outer join might thereby yield NULL * instead of the expression value. * * Although the planner treats this as an expression node type, it is not * recognized by the parser or executor, so we declare it here rather than * in primnodes.h. */ typedef struct PlaceHolderVar { Expr xpr; Expr *phexpr; /* the represented expression */ Relids phrels; /* base relids syntactically within expr src */ Index phid; /* ID for PHV (unique within planner run) */ Index phlevelsup; /* > 0 if PHV belongs to outer query */ } PlaceHolderVar;
SpecialJoinInfo
/* * "Special join" info. * * One-sided outer joins constrain the order of joining partially but not * completely. We flatten such joins into the planner's top-level list of * relations to join, but record information about each outer join in a * SpecialJoinInfo struct. These structs are kept in the PlannerInfo node's * join_info_list. * * Similarly, semijoins and antijoins created by flattening IN (subselect) * and EXISTS(subselect) clauses create partial constraints on join order. * These are likewise recorded in SpecialJoinInfo structs. * * We make SpecialJoinInfos for FULL JOINs even though there is no flexibility * of planning for them, because this simplifies make_join_rel()'s API. * * min_lefthand and min_righthand are the sets of base relids that must be * available on each side when performing the special join. lhs_strict is * true if the special join's condition cannot succeed when the LHS variables * are all NULL (this means that an outer join can commute with upper-level * outer joins even if it appears in their RHS). We don't bother to set * lhs_strict for FULL JOINs, however. * * It is not valid for either min_lefthand or min_righthand to be empty sets; * if they were, this would break the logic that enforces join order. * * syn_lefthand and syn_righthand are the sets of base relids that are * syntactically below this special join. (These are needed to help compute * min_lefthand and min_righthand for higher joins.) * * delay_upper_joins is set true if we detect a pushed-down clause that has * to be evaluated after this join is formed (because it references the RHS). * Any outer joins that have such a clause and this join in their RHS cannot * commute with this join, because that would leave noplace to check the * pushed-down clause. (We don't track this for FULL JOINs, either.) * * For a semijoin, we also extract the join operators and their RHS arguments * and set semi_operators, semi_rhs_exprs, semi_can_btree, and semi_can_hash. * This is done in support of possibly unique-ifying the RHS, so we don't * bother unless at least one of semi_can_btree and semi_can_hash can be set * true. (You might expect that this information would be computed during * join planning; but it's helpful to have it available during planning of * parameterized table scans, so we store it in the SpecialJoinInfo structs.) * * jointype is never JOIN_RIGHT; a RIGHT JOIN is handled by switching * the inputs to make it a LEFT JOIN. So the allowed values of jointype * in a join_info_list member are only LEFT, FULL, SEMI, or ANTI. * * For purposes of join selectivity estimation, we create transient * SpecialJoinInfo structures for regular inner joins; so it is possible * to have jointype == JOIN_INNER in such a structure, even though this is * not allowed within join_info_list. We also create transient * SpecialJoinInfos with jointype == JOIN_INNER for outer joins, since for * cost estimation purposes it is sometimes useful to know the join size under * plain innerjoin semantics. Note that lhs_strict, delay_upper_joins, and * of course the semi_xxx fields are not set meaningfully within such structs. */ typedef struct SpecialJoinInfo { NodeTag type; Relids min_lefthand; /* base relids in minimum LHS for join */ Relids min_righthand; /* base relids in minimum RHS for join */ Relids syn_lefthand; /* base relids syntactically within LHS */ Relids syn_righthand; /* base relids syntactically within RHS */ JoinType jointype; /* always INNER, LEFT, FULL, SEMI, or ANTI */ bool lhs_strict; /* joinclause is strict for some LHS rel */ bool delay_upper_joins; /* can't commute with upper RHS */ /* Remaining fields are set only for JOIN_SEMI jointype: */ bool semi_can_btree; /* true if semi_operators are all btree */ bool semi_can_hash; /* true if semi_operators are all hash */ List *semi_operators; /* OIDs of equality join operators */ List *semi_rhs_exprs; /* righthand-side expressions of these ops */ } SpecialJoinInfo;
remove_useless_joins
清除無用的連接,比如以下的SQL語句:
select t1.dwbh from t_grxx t1 left join t_dwxx t2 on t1.dwbh = t2.dwbh;
左連接,而且t_dwxx.dwbh唯一,這樣的連接是不需要的連接,直接查詢t_grxx即可.
從執行計劃來看,PG只對t_grxx進行掃描:
testdb=# explain verbose select t1.dwbh from t_grxx t1 left join t_dwxx t2 on t1.dwbh = t2.dwbh; QUERY PLAN -------------------------------------------------------------------- Seq Scan on public.t_grxx t1 (cost=0.00..14.00 rows=400 width=38) Output: t1.dwbh (2 rows)
源代碼如下:
/* * remove_useless_joins * Check for relations that don't actually need to be joined at all, * and remove them from the query. * * We are passed the current joinlist and return the updated list. Other * data structures that have to be updated are accessible via "root". */ List * remove_useless_joins(PlannerInfo *root, List *joinlist) { ListCell *lc; /* * We are only interested in relations that are left-joined to, so we can * scan the join_info_list to find them easily. */ restart: foreach(lc, root->join_info_list)//遍歷連接信息鏈表 { SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc); int innerrelid; int nremoved; /* Skip if not removable */ if (!join_is_removable(root, sjinfo))//判斷是否可以清除連接 continue; /* * Currently, join_is_removable can only succeed when the sjinfo's * righthand is a single baserel. Remove that rel from the query and * joinlist. */ innerrelid = bms_singleton_member(sjinfo->min_righthand); remove_rel_from_query(root, innerrelid, bms_union(sjinfo->min_lefthand, sjinfo->min_righthand));//從查詢中刪除相應的Rel /* We verify that exactly one reference gets removed from joinlist */ nremoved = 0; joinlist = remove_rel_from_joinlist(joinlist, innerrelid, &nremoved); if (nremoved != 1) elog(ERROR, "failed to find relation %d in joinlist", innerrelid); /* * We can delete this SpecialJoinInfo from the list too, since it's no * longer of interest. */ //更新連接鏈表信息 root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo); /* * Restart the scan. This is necessary to ensure we find all * removable joins independently of ordering of the join_info_list * (note that removal of attr_needed bits may make a join appear * removable that did not before). Also, since we just deleted the * current list cell, we'd have to have some kluge to continue the * list scan anyway. */ goto restart; } return joinlist; }
reduce_unique_semijoins
把可以簡化的半連接轉化為內連接.
比如以下的SQL語句:
select t1.* from t_grxx t1 where dwbh IN (select t2.dwbh from t_dwxx t2);
由于子查詢"select t2.dwbh from t_dwxx t2"的dwbh是PK,子查詢提升后,t_grxx的dwbh只對應t_dwxx唯一的一條記錄,因此可以把半連接轉換為內連接,執行計劃如下:
testdb=# explain verbose select t1.* from t_grxx t1 where dwbh IN (select t2.dwbh from t_dwxx t2); QUERY PLAN ----------------------------------------------------------------------------- Hash Join (cost=1.07..20.10 rows=6 width=176) Output: t1.dwbh, t1.grbh, t1.xm, t1.xb, t1.nl Inner Unique: true Hash Cond: ((t1.dwbh)::text = (t2.dwbh)::text) -> Seq Scan on public.t_grxx t1 (cost=0.00..14.00 rows=400 width=176) Output: t1.dwbh, t1.grbh, t1.xm, t1.xb, t1.nl -> Hash (cost=1.03..1.03 rows=3 width=38) Output: t2.dwbh -> Seq Scan on public.t_dwxx t2 (cost=0.00..1.03 rows=3 width=38) Output: t2.dwbh (10 rows)
跟蹤分析:
(gdb) n 199 reduce_unique_semijoins(root); (gdb) step reduce_unique_semijoins (root=0x1702968) at analyzejoins.c:520 520 for (lc = list_head(root->join_info_list); lc != NULL; lc = next) (gdb) n 522 SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc); (gdb)
查看SpecialJoinInfo內存結構:
528 next = lnext(lc); (gdb) p *sjinfo $1 = {type = T_SpecialJoinInfo, min_lefthand = 0x1749818, min_righthand = 0x1749830, syn_lefthand = 0x1749570, syn_righthand = 0x17495d0, jointype = JOIN_SEMI, lhs_strict = true, delay_upper_joins = false, semi_can_btree = true, semi_can_hash = true, semi_operators = 0x17496c8, semi_rhs_exprs = 0x17497b8}
內表(innerrel,即t_dwxx)如支持唯一性,則可以考慮把半連接轉換為內連接
550 if (!rel_supports_distinctness(root, innerrel)) ... 575 root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo); ...
源代碼如下:
/* * reduce_unique_semijoins * Check for semijoins that can be simplified to plain inner joins * because the inner relation is provably unique for the join clauses. * * Ideally this would happen during reduce_outer_joins, but we don't have * enough information at that point. * * To perform the strength reduction when applicable, we need only delete * the semijoin's SpecialJoinInfo from root->join_info_list. (We don't * bother fixing the join type attributed to it in the query jointree, * since that won't be consulted again.) */ void reduce_unique_semijoins(PlannerInfo *root) { ListCell *lc; ListCell *next; /* * Scan the join_info_list to find semijoins. We can't use foreach * because we may delete the current cell. */ for (lc = list_head(root->join_info_list); lc != NULL; lc = next) { SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);//特殊連接信息,先前通過deconstruct函數生成 int innerrelid; RelOptInfo *innerrel; Relids joinrelids; List *restrictlist; next = lnext(lc); /* * Must be a non-delaying semijoin to a single baserel, else we aren't * going to be able to do anything with it. (It's probably not * possible for delay_upper_joins to be set on a semijoin, but we * might as well check.) */ if (sjinfo->jointype != JOIN_SEMI || sjinfo->delay_upper_joins) continue; if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid)) continue; innerrel = find_base_rel(root, innerrelid); /* * Before we trouble to run generate_join_implied_equalities, make a * quick check to eliminate cases in which we will surely be unable to * prove uniqueness of the innerrel. */ if (!rel_supports_distinctness(root, innerrel)) continue; /* Compute the relid set for the join we are considering */ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand); /* * Since we're only considering a single-rel RHS, any join clauses it * has must be clauses linking it to the semijoin's min_lefthand. We * can also consider EC-derived join clauses. */ restrictlist = list_concat(generate_join_implied_equalities(root, joinrelids, sjinfo->min_lefthand, innerrel), innerrel->joininfo); /* Test whether the innerrel is unique for those clauses. */ if (!innerrel_is_unique(root, joinrelids, sjinfo->min_lefthand, innerrel, JOIN_SEMI, restrictlist, true)) continue; /* OK, remove the SpecialJoinInfo from the list. */ root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo);//刪除特殊連接信息 } }
add_placeholders_to_base_rels
把PHV分發到base rels中,代碼較為簡單
/* * add_placeholders_to_base_rels * Add any required PlaceHolderVars to base rels' targetlists. * * If any placeholder can be computed at a base rel and is needed above it, * add it to that rel's targetlist. This might look like it could be merged * with fix_placeholder_input_needed_levels, but it must be separate because * join removal happens in between, and can change the ph_eval_at sets. There * is essentially the same logic in add_placeholders_to_joinrel, but we can't * do that part until joinrels are formed. */ void add_placeholders_to_base_rels(PlannerInfo *root) { ListCell *lc; foreach(lc, root->placeholder_list)//遍歷PH鏈表 { PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc); Relids eval_at = phinfo->ph_eval_at; int varno; if (bms_get_singleton_member(eval_at, &varno) && bms_nonempty_difference(phinfo->ph_needed, eval_at))//添加到需要的RelOptInfo中 { RelOptInfo *rel = find_base_rel(root, varno); rel->reltarget->exprs = lappend(rel->reltarget->exprs, copyObject(phinfo->ph_var)); /* reltarget's cost and width fields will be updated later */ } } }
到此,關于“PostgreSQL中remove_useless_joins的實現邏輯是怎樣的”的學習就結束了,希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習,快去試試吧!若想繼續學習更多相關知識,請繼續關注億速云網站,小編會繼續努力為大家帶來更多實用的文章!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。