您好,登錄后才能下訂單哦!
這篇文章主要介紹“PostgreSQL查詢優化中對Having和Group By子句的簡化處理分析”,在日常操作中,相信很多人在PostgreSQL查詢優化中對Having和Group By子句的簡化處理分析問題上存在疑惑,小編查閱了各式資料,整理出簡單好用的操作方法,希望對大家解答”PostgreSQL查詢優化中對Having和Group By子句的簡化處理分析”的疑惑有所幫助!接下來,請跟著小編一起來學習吧!
簡化Having語句
把Having中的約束條件,如滿足可以提升到Where條件中的,則移動到Where子句中,否則仍保留在Having語句中.這樣做的目的是因為Having過濾在Group by之后執行,如能把Having中的過濾提升到Where中,則可以提前執行"選擇"運算,減少Group by的開銷.
以下語句,條件dwbh='1002'提升到Where中執行:
testdb=# explain verbose select a.dwbh,a.xb,count(*) testdb-# from t_grxx a testdb-# group by a.dwbh,a.xb testdb-# having count(*) >= 1 and dwbh = '1002'; QUERY PLAN ----------------------------------------------------------------------------- GroupAggregate (cost=15.01..15.06 rows=1 width=84) Output: dwbh, xb, count(*) Group Key: a.dwbh, a.xb Filter: (count(*) >= 1) -- count(*) >= 1 仍保留在Having中 -> Sort (cost=15.01..15.02 rows=2 width=76) Output: dwbh, xb Sort Key: a.xb -> Seq Scan on public.t_grxx a (cost=0.00..15.00 rows=2 width=76) Output: dwbh, xb Filter: ((a.dwbh)::text = '1002'::text) -- 提升到Where中,掃描時過濾Tuple (10 rows)
如存在Group by & Grouping sets則不作處理:
testdb=# explain verbose testdb-# select a.dwbh,a.xb,count(*) testdb-# from t_grxx a testdb-# group by testdb-# grouping sets ((a.dwbh),(a.xb),()) testdb-# having count(*) >= 1 and dwbh = '1002' testdb-# order by a.dwbh,a.xb; QUERY PLAN ------------------------------------------------------------------------------- Sort (cost=28.04..28.05 rows=3 width=84) Output: dwbh, xb, (count(*)) Sort Key: a.dwbh, a.xb -> MixedAggregate (cost=0.00..28.02 rows=3 width=84) Output: dwbh, xb, count(*) Hash Key: a.dwbh Hash Key: a.xb Group Key: () Filter: ((count(*) >= 1) AND ((a.dwbh)::text = '1002'::text)) -- 掃描數據表后再過濾 -> Seq Scan on public.t_grxx a (cost=0.00..14.00 rows=400 width=76) Output: dwbh, grbh, xm, xb, nl (11 rows)
簡化Group by語句
如Group by中的字段列表已包含某個表主鍵的所有列,則該表在Group by語句中的其他列可以刪除,這樣的做法有利于提升在Group by過程中排序或Hash的性能,減少不必要的開銷.
testdb=# explain verbose select a.dwbh,a.dwmc,count(*) testdb-# from t_dwxx a testdb-# group by a.dwbh,a.dwmc testdb-# having count(*) >= 1; QUERY PLAN -------------------------------------------------------------------------- HashAggregate (cost=13.20..15.20 rows=53 width=264) Output: dwbh, dwmc, count(*) Group Key: a.dwbh, a.dwmc -- 分組鍵為dwbh & dwmc Filter: (count(*) >= 1) -> Seq Scan on public.t_dwxx a (cost=0.00..11.60 rows=160 width=256) Output: dwmc, dwbh, dwdz (6 rows) testdb=# alter table t_dwxx add primary key(dwbh); -- 添加主鍵 ALTER TABLE testdb=# explain verbose select a.dwbh,a.dwmc,count(*) from t_dwxx a group by a.dwbh,a.dwmc having count(*) >= 1; QUERY PLAN ----------------------------------------------------------------------- HashAggregate (cost=1.05..1.09 rows=1 width=264) Output: dwbh, dwmc, count(*) Group Key: a.dwbh -- 分組鍵只保留dwbh Filter: (count(*) >= 1) -> Seq Scan on public.t_dwxx a (cost=0.00..1.03 rows=3 width=256) Output: dwmc, dwbh, dwdz (6 rows)
相關處理的源碼位于文件subquery_planner.c中,主函數為subquery_planner,代碼片段如下:
/* * In some cases we may want to transfer a HAVING clause into WHERE. We * cannot do so if the HAVING clause contains aggregates (obviously) or * volatile functions (since a HAVING clause is supposed to be executed * only once per group). We also can't do this if there are any nonempty * grouping sets; moving such a clause into WHERE would potentially change * the results, if any referenced column isn't present in all the grouping * sets. (If there are only empty grouping sets, then the HAVING clause * must be degenerate as discussed below.) * * Also, it may be that the clause is so expensive to execute that we're * better off doing it only once per group, despite the loss of * selectivity. This is hard to estimate short of doing the entire * planning process twice, so we use a heuristic: clauses containing * subplans are left in HAVING. Otherwise, we move or copy the HAVING * clause into WHERE, in hopes of eliminating tuples before aggregation * instead of after. * * If the query has explicit grouping then we can simply move such a * clause into WHERE; any group that fails the clause will not be in the * output because none of its tuples will reach the grouping or * aggregation stage. Otherwise we must have a degenerate (variable-free) * HAVING clause, which we put in WHERE so that query_planner() can use it * in a gating Result node, but also keep in HAVING to ensure that we * don't emit a bogus aggregated row. (This could be done better, but it * seems not worth optimizing.) * * Note that both havingQual and parse->jointree->quals are in * implicitly-ANDed-list form at this point, even though they are declared * as Node *. */ newHaving = NIL; foreach(l, (List *) parse->havingQual)//存在Having條件語句 { Node *havingclause = (Node *) lfirst(l);//獲取謂詞 if ((parse->groupClause && parse->groupingSets) || contain_agg_clause(havingclause) || contain_volatile_functions(havingclause) || contain_subplans(havingclause)) { /* keep it in HAVING */ //如果有Group&&Group Sets語句 //保持不變 newHaving = lappend(newHaving, havingclause); } else if (parse->groupClause && !parse->groupingSets) { /* move it to WHERE */ //只有group語句,可以加入到jointree的條件中 parse->jointree->quals = (Node *) lappend((List *) parse->jointree->quals, havingclause); } else//既沒有group也沒有grouping set,拷貝一份到jointree的條件中 { /* put a copy in WHERE, keep it in HAVING */ parse->jointree->quals = (Node *) lappend((List *) parse->jointree->quals, copyObject(havingclause)); newHaving = lappend(newHaving, havingclause); } } parse->havingQual = (Node *) newHaving;//調整having子句 /* Remove any redundant GROUP BY columns */ remove_useless_groupby_columns(root);//去掉group by中無用的數據列
remove_useless_groupby_columns
/* * remove_useless_groupby_columns * Remove any columns in the GROUP BY clause that are redundant due to * being functionally dependent on other GROUP BY columns. * * Since some other DBMSes do not allow references to ungrouped columns, it's * not unusual to find all columns listed in GROUP BY even though listing the * primary-key columns would be sufficient. Deleting such excess columns * avoids redundant sorting work, so it's worth doing. When we do this, we * must mark the plan as dependent on the pkey constraint (compare the * parser's check_ungrouped_columns() and check_functional_grouping()). * * In principle, we could treat any NOT-NULL columns appearing in a UNIQUE * index as the determining columns. But as with check_functional_grouping(), * there's currently no way to represent dependency on a NOT NULL constraint, * so we consider only the pkey for now. */ static void remove_useless_groupby_columns(PlannerInfo *root) { Query *parse = root->parse;//查詢樹 Bitmapset **groupbyattnos;//位圖集合 Bitmapset **surplusvars;//位圖集合 ListCell *lc; int relid; /* No chance to do anything if there are less than two GROUP BY items */ if (list_length(parse->groupClause) < 2)//如果只有1個ITEMS,無需處理 return; /* Don't fiddle with the GROUP BY clause if the query has grouping sets */ if (parse->groupingSets)//存在Grouping sets,不作處理 return; /* * Scan the GROUP BY clause to find GROUP BY items that are simple Vars. * Fill groupbyattnos[k] with a bitmapset of the column attnos of RTE k * that are GROUP BY items. */ //用于分組的屬性 groupbyattnos = (Bitmapset **) palloc0(sizeof(Bitmapset *) * (list_length(parse->rtable) + 1)); foreach(lc, parse->groupClause) { SortGroupClause *sgc = lfirst_node(SortGroupClause, lc); TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList); Var *var = (Var *) tle->expr; /* * Ignore non-Vars and Vars from other query levels. * * XXX in principle, stable expressions containing Vars could also be * removed, if all the Vars are functionally dependent on other GROUP * BY items. But it's not clear that such cases occur often enough to * be worth troubling over. */ if (!IsA(var, Var) || var->varlevelsup > 0) continue; /* OK, remember we have this Var */ relid = var->varno; Assert(relid <= list_length(parse->rtable)); groupbyattnos[relid] = bms_add_member(groupbyattnos[relid], var->varattno - FirstLowInvalidHeapAttributeNumber); } /* * Consider each relation and see if it is possible to remove some of its * Vars from GROUP BY. For simplicity and speed, we do the actual removal * in a separate pass. Here, we just fill surplusvars[k] with a bitmapset * of the column attnos of RTE k that are removable GROUP BY items. */ surplusvars = NULL; /* don't allocate array unless required */ relid = 0; //如某個Relation的分組鍵中已含主鍵列,去掉其他列 foreach(lc, parse->rtable) { RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc); Bitmapset *relattnos; Bitmapset *pkattnos; Oid constraintOid; relid++; /* Only plain relations could have primary-key constraints */ if (rte->rtekind != RTE_RELATION) continue; /* Nothing to do unless this rel has multiple Vars in GROUP BY */ relattnos = groupbyattnos[relid]; if (bms_membership(relattnos) != BMS_MULTIPLE) continue; /* * Can't remove any columns for this rel if there is no suitable * (i.e., nondeferrable) primary key constraint. */ pkattnos = get_primary_key_attnos(rte->relid, false, &constraintOid); if (pkattnos == NULL) continue; /* * If the primary key is a proper subset of relattnos then we have * some items in the GROUP BY that can be removed. */ if (bms_subset_compare(pkattnos, relattnos) == BMS_SUBSET1) { /* * To easily remember whether we've found anything to do, we don't * allocate the surplusvars[] array until we find something. */ if (surplusvars == NULL) surplusvars = (Bitmapset **) palloc0(sizeof(Bitmapset *) * (list_length(parse->rtable) + 1)); /* Remember the attnos of the removable columns */ surplusvars[relid] = bms_difference(relattnos, pkattnos); /* Also, mark the resulting plan as dependent on this constraint */ parse->constraintDeps = lappend_oid(parse->constraintDeps, constraintOid); } } /* * If we found any surplus Vars, build a new GROUP BY clause without them. * (Note: this may leave some TLEs with unreferenced ressortgroupref * markings, but that's harmless.) */ if (surplusvars != NULL) { List *new_groupby = NIL; foreach(lc, parse->groupClause) { SortGroupClause *sgc = lfirst_node(SortGroupClause, lc); TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList); Var *var = (Var *) tle->expr; /* * New list must include non-Vars, outer Vars, and anything not * marked as surplus. */ if (!IsA(var, Var) || var->varlevelsup > 0 || !bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber, surplusvars[var->varno])) new_groupby = lappend(new_groupby, sgc); } parse->groupClause = new_groupby; } }
到此,關于“PostgreSQL查詢優化中對Having和Group By子句的簡化處理分析”的學習就結束了,希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習,快去試試吧!若想繼續學習更多相關知識,請繼續關注億速云網站,小編會繼續努力為大家帶來更多實用的文章!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。