您好,登錄后才能下訂單哦!
這篇文章主要講解了“PostgreSQL執行聚合函數所使用的數據結構有哪些”,文中的講解內容簡單清晰,易于學習與理解,下面請大家跟著小編的思路慢慢深入,一起來研究和學習“PostgreSQL執行聚合函數所使用的數據結構有哪些”吧!
AggState
聚合函數執行時狀態結構體,內含AggStatePerAgg等結構體
/* --------------------- * AggState information * * ss.ss_ScanTupleSlot refers to output of underlying plan. * ss.ss_ScanTupleSlot指的是基礎計劃的輸出. * (ss = ScanState,ps = PlanState) * * Note: ss.ps.ps_ExprContext contains ecxt_aggvalues and * ecxt_aggnulls arrays, which hold the computed agg values for the current * input group during evaluation of an Agg node's output tuple(s). We * create a second ExprContext, tmpcontext, in which to evaluate input * expressions and run the aggregate transition functions. * 注意:ss.ps.ps_ExprContext包含了ecxt_aggvalues和ecxt_aggnulls數組, * 這兩個數組保存了在計算agg節點的輸出元組時當前輸入組已計算的agg值. * --------------------- */ /* these structs are private in nodeAgg.c: */ //在nodeAgg.c中私有的結構體 typedef struct AggStatePerAggData *AggStatePerAgg; typedef struct AggStatePerTransData *AggStatePerTrans; typedef struct AggStatePerGroupData *AggStatePerGroup; typedef struct AggStatePerPhaseData *AggStatePerPhase; typedef struct AggStatePerHashData *AggStatePerHash; typedef struct AggState { //第一個字段是NodeTag(繼承自ScanState) ScanState ss; /* its first field is NodeTag */ //targetlist和quals中所有的Aggref List *aggs; /* all Aggref nodes in targetlist & quals */ //鏈表的大小(可以為0) int numaggs; /* length of list (could be zero!) */ //pertrans條目大小 int numtrans; /* number of pertrans items */ //Agg策略模式 AggStrategy aggstrategy; /* strategy mode */ //agg-splitting模式,參見nodes.h AggSplit aggsplit; /* agg-splitting mode, see nodes.h */ //指向當前步驟數據的指針 AggStatePerPhase phase; /* pointer to current phase data */ //步驟數(包括0) int numphases; /* number of phases (including phase 0) */ //當前步驟 int current_phase; /* current phase number */ //per-Aggref信息 AggStatePerAgg peragg; /* per-Aggref information */ //per-Trans狀態信息 AggStatePerTrans pertrans; /* per-Trans state information */ //長生命周期數據的ExprContexts(hashtable) ExprContext *hashcontext; /* econtexts for long-lived data (hashtable) */ ////長生命周期數據的ExprContexts(每一個GS使用) ExprContext **aggcontexts; /* econtexts for long-lived data (per GS) */ //輸入表達式的ExprContext ExprContext *tmpcontext; /* econtext for input expressions */ #define FIELDNO_AGGSTATE_CURAGGCONTEXT 14 //當前活躍的aggcontext ExprContext *curaggcontext; /* currently active aggcontext */ //當前活躍的aggregate(如存在) AggStatePerAgg curperagg; /* currently active aggregate, if any */ #define FIELDNO_AGGSTATE_CURPERTRANS 16 //當前活躍的trans state AggStatePerTrans curpertrans; /* currently active trans state, if any */ //輸入結束? bool input_done; /* indicates end of input */ //Agg掃描結束? bool agg_done; /* indicates completion of Agg scan */ //最后一個grouping set int projected_set; /* The last projected grouping set */ #define FIELDNO_AGGSTATE_CURRENT_SET 20 //將要解析的當前grouping set int current_set; /* The current grouping set being evaluated */ //當前投影操作的分組列 Bitmapset *grouped_cols; /* grouped cols in current projection */ //倒序的分組列鏈表 List *all_grouped_cols; /* list of all grouped cols in DESC order */ /* These fields are for grouping set phase data */ //-------- 下面的列用于grouping set步驟數據 //所有步驟中最大的sets大小 int maxsets; /* The max number of sets in any phase */ //所有步驟的數組 AggStatePerPhase phases; /* array of all phases */ //對于phases > 1,已排序的輸入信息 Tuplesortstate *sort_in; /* sorted input to phases > 1 */ //對于下一個步驟,輸入已拷貝 Tuplesortstate *sort_out; /* input is copied here for next phase */ //排序結果的slot TupleTableSlot *sort_slot; /* slot for sort results */ /* these fields are used in AGG_PLAIN and AGG_SORTED modes: */ //------- 下面的列用于AGG_PLAIN和AGG_SORTED模式: //per-group指針的grouping set編號數組 AggStatePerGroup *pergroups; /* grouping set indexed array of per-group * pointers */ //當前組的第一個元組拷貝 HeapTuple grp_firstTuple; /* copy of first tuple of current group */ /* these fields are used in AGG_HASHED and AGG_MIXED modes: */ //--------- 下面的列用于AGG_HASHED和AGG_MIXED模式: //是否已填充hash表? bool table_filled; /* hash table filled yet? */ //hash桶數? int num_hashes; //相應的哈希表數據數組 AggStatePerHash perhash; /* array of per-hashtable data */ //per-group指針的grouping set編號數組 AggStatePerGroup *hash_pergroup; /* grouping set indexed array of * per-group pointers */ /* support for evaluation of agg input expressions: */ //---------- agg輸入表達式解析支持 #define FIELDNO_AGGSTATE_ALL_PERGROUPS 34 //首先是->pergroups,然后是hash_pergroup AggStatePerGroup *all_pergroups; /* array of first ->pergroups, than * ->hash_pergroup */ //投影實現機制 ProjectionInfo *combinedproj; /* projection machinery */ } AggState; /* Primitive options supported by nodeAgg.c: */ //nodeag .c支持的基本選項 #define AGGSPLITOP_COMBINE 0x01 /* substitute combinefn for transfn */ #define AGGSPLITOP_SKIPFINAL 0x02 /* skip finalfn, return state as-is */ #define AGGSPLITOP_SERIALIZE 0x04 /* apply serializefn to output */ #define AGGSPLITOP_DESERIALIZE 0x08 /* apply deserializefn to input */ /* Supported operating modes (i.e., useful combinations of these options): */ //支持的操作模式 typedef enum AggSplit { /* Basic, non-split aggregation: */ //基本 : 非split聚合 AGGSPLIT_SIMPLE = 0, /* Initial phase of partial aggregation, with serialization: */ //部分聚合的初始步驟,序列化 AGGSPLIT_INITIAL_SERIAL = AGGSPLITOP_SKIPFINAL | AGGSPLITOP_SERIALIZE, /* Final phase of partial aggregation, with deserialization: */ //部分聚合的最終步驟,反序列化 AGGSPLIT_FINAL_DESERIAL = AGGSPLITOP_COMBINE | AGGSPLITOP_DESERIALIZE } AggSplit; /* Test whether an AggSplit value selects each primitive option: */ //測試AggSplit選擇了哪些基本選項 #define DO_AGGSPLIT_COMBINE(as) (((as) & AGGSPLITOP_COMBINE) != 0) #define DO_AGGSPLIT_SKIPFINAL(as) (((as) & AGGSPLITOP_SKIPFINAL) != 0) #define DO_AGGSPLIT_SERIALIZE(as) (((as) & AGGSPLITOP_SERIALIZE) != 0) #define DO_AGGSPLIT_DESERIALIZE(as) (((as) & AGGSPLITOP_DESERIALIZE) != 0)
AggStatePerAggData
per-aggregate信息,這個結構體包含了調用最終函數的信息,用以從狀態值中產生一個最終的聚合結果.如果查詢中有多個相同的Aggrefs,共享相同的per-agg數據.
/* * AggStatePerAggData - per-aggregate information * AggStatePerAggData - per-aggregate信息 * * This contains the information needed to call the final function, to produce * a final aggregate result from the state value. If there are multiple * identical Aggrefs in the query, they can all share the same per-agg data. * 這個結構體包含了調用最終函數的信息,用以從狀態值中產生一個最終的聚合結果. * 如果查詢中有多個相同的Aggrefs,共享相同的per-agg數據. * * These values are set up during ExecInitAgg() and do not change thereafter. * 這些值在ExecInitAgg()中設置,此后不再變化. */ typedef struct AggStatePerAggData { /* * Link to an Aggref expr this state value is for. * 鏈接到該狀態值代表的Aggref expr上. * * There can be multiple identical Aggref's sharing the same per-agg. This * points to the first one of them. * 可能有多個相同的Aggref共享相同的per-agg.指向第一個. */ Aggref *aggref; /* index to the state value which this agg should use */ //該agg應使用的狀態值索引 int transno; /* Optional Oid of final function (may be InvalidOid) */ //final function函數的Oid(可以是InvalidOid) Oid finalfn_oid; /* * fmgr lookup data for final function --- only valid when finalfn_oid is * not InvalidOid. * final function的fmgr檢索數據 --- 在finalfn_oid不是InvalidOid時才有效 */ FmgrInfo finalfn; /* * Number of arguments to pass to the finalfn. This is always at least 1 * (the transition state value) plus any ordered-set direct args. If the * finalfn wants extra args then we pass nulls corresponding to the * aggregated input columns. * 傳遞給finalfn的參數個數. * 這通常不小于1(轉換狀態值)加上所有已排序集合的直接參數. * 如果finalfn需要特別的參數,那么會傳遞nulls對應聚合的輸入列. * */ int numFinalArgs; /* ExprStates for any direct-argument expressions */ //所有直接參數表達式的ExprStates List *aggdirectargs; /* * We need the len and byval info for the agg's result data type in order * to know how to copy/delete values. * 對于agg結果數據類型需要長度和byval信息,用以知道如何拷貝和刪除值. */ int16 resulttypeLen; bool resulttypeByVal; /* * "shareable" is false if this agg cannot share state values with other * aggregates because the final function is read-write. * 如因為final function是RW,agg不能與其他aggregates共享狀態值,則shareable為F */ bool shareable; } AggStatePerAggData;
AggStatePerTransData
聚合狀態值信息(per aggregate state value information), 通過輸入行調用轉換函數更新聚合狀態值的工作狀態.該結構體不會存儲從轉換狀態而來的用于產生最終聚合結果的相關信息,這些信息會存儲在AggStatePerAggData中.
/* * AggStatePerTransData - per aggregate state value information * AggStatePerTransData - 聚合狀態值信息 * * Working state for updating the aggregate's state value, by calling the * transition function with an input row. This struct does not store the * information needed to produce the final aggregate result from the transition * state, that's stored in AggStatePerAggData instead. This separation allows * multiple aggregate results to be produced from a single state value. * 通過輸入行調用轉換函數更新聚合狀態值的工作狀態. * 該結構體不會存儲從轉換狀態而來的用于產生最終聚合結果的相關信息,這些信息會存儲在AggStatePerAggData中. * 這樣的分離可以做到多個聚合結果可通過單個狀態值產生. */ typedef struct AggStatePerTransData { /* * These values are set up during ExecInitAgg() and do not change * thereafter: * 這些值在ExecInitAgg()執行期間設置,以后不會修改. */ /* * Link to an Aggref expr this state value is for. * 鏈接到該狀態值所代表的Aggref表達式上面. * * There can be multiple Aggref's sharing the same state value, so long as * the inputs and transition functions are identical and the final * functions are not read-write. This points to the first one of them. * 參見AggStatePerAggData結構體注釋 */ Aggref *aggref; /* * Is this state value actually being shared by more than one Aggref? * 是否共享? */ bool aggshared; /* * Number of aggregated input columns. This includes ORDER BY expressions * in both the plain-agg and ordered-set cases. Ordered-set direct args * are not counted, though. * 聚合輸入列個數. */ int numInputs; /* * Number of aggregated input columns to pass to the transfn. This * includes the ORDER BY columns for ordered-set aggs, but not for plain * aggs. (This doesn't count the transition state value!) * 傳遞給transfn的聚合輸入列個數. */ int numTransInputs; /* Oid of the state transition or combine function */ //轉換或組合函數Oid Oid transfn_oid; /* Oid of the serialization function or InvalidOid */ //序列化函數Oid或InvalidOid Oid serialfn_oid; /* Oid of the deserialization function or InvalidOid */ //反序列化函數Oid或InvalidOid Oid deserialfn_oid; /* Oid of state value's datatype */ //狀態值數據類型Oid Oid aggtranstype; /* * fmgr lookup data for transition function or combine function. Note in * particular that the fn_strict flag is kept here. * 轉換函數或組合函數的fmgr檢索數據. */ FmgrInfo transfn; /* fmgr lookup data for serialization function */ //序列化函數fmgr FmgrInfo serialfn; /* fmgr lookup data for deserialization function */ //反序列化函數fmgr FmgrInfo deserialfn; /* Input collation derived for aggregate */ //派生于聚合的輸入排序規則 Oid aggCollation; /* number of sorting columns */ //排序列個數 int numSortCols; /* number of sorting columns to consider in DISTINCT comparisons */ /* (this is either zero or the same as numSortCols) */ //在DISTINCT比較時需考慮的排序列數 int numDistinctCols; /* deconstructed sorting information (arrays of length numSortCols) */ //重組排序信息 AttrNumber *sortColIdx; Oid *sortOperators; Oid *sortCollations; bool *sortNullsFirst; /* * Comparators for input columns --- only set/used when aggregate has * DISTINCT flag. equalfnOne version is used for single-column * comparisons, equalfnMulti for the case of multiple columns. * 輸入列比較器,在聚合有DISTINCT標記時才會設置/使用 * equalfnOne用于單個列比較,equalfnMulti用于多列. */ FmgrInfo equalfnOne; ExprState *equalfnMulti; /* * initial value from pg_aggregate entry * pg_aggregate條目的初始值 */ Datum initValue; bool initValueIsNull; /* * We need the len and byval info for the agg's input and transition data * types in order to know how to copy/delete values. * 需要聚合輸入的len和byval信息以及轉換數據類型,以便知道如何拷貝/刪除值 * * Note that the info for the input type is used only when handling * DISTINCT aggs with just one argument, so there is only one input type. * 注意:輸入類型的信息僅用于處理單個參數的DISTINCT聚合,因此只有一個輸入類型 */ int16 inputtypeLen, transtypeLen; bool inputtypeByVal, transtypeByVal; /* * Slots for holding the evaluated input arguments. These are set up * during ExecInitAgg() and then used for each input row requiring either * FILTER or ORDER BY/DISTINCT processing. * 保存解析輸入參數的slots. * 在ExecInitAgg()中設置用于每個輸入行,在FILTER或ORDER BY/DISTINCT處理過程中需要. */ //當前輸入的tuple TupleTableSlot *sortslot; /* current input tuple */ //用于多列DISTINCT TupleTableSlot *uniqslot; /* used for multi-column DISTINCT */ //輸入元組描述符 TupleDesc sortdesc; /* descriptor of input tuples */ /* * These values are working state that is initialized at the start of an * input tuple group and updated for each input tuple. * 這些值是在輸入tuple group被初始化時的工作狀態,在處理每個tuple都會更新. * * For a simple (non DISTINCT/ORDER BY) aggregate, we just feed the input * values straight to the transition function. If it's DISTINCT or * requires ORDER BY, we pass the input values into a Tuplesort object; * then at completion of the input tuple group, we scan the sorted values, * eliminate duplicates if needed, and run the transition function on the * rest. * 對于簡單的(不是DISTINCT/ORDER BY)聚合,直接把輸入值提供給轉換函數即可. * 如果是DISTINCT/ORDER BY,傳遞輸入值給Tuplesort對象, * 在輸入的tuple組結束時,掃描已存儲值,如需要去重并在剩余的元組上執行轉換函數 * * We need a separate tuplesort for each grouping set. * 需要為每一個grouping set提供tuplesort */ //排序對象,僅用于DISTINCT/ORDER BY Tuplesortstate **sortstates; /* sort objects, if DISTINCT or ORDER BY */ /* * This field is a pre-initialized FunctionCallInfo struct used for * calling this aggregate's transfn. We save a few cycles per row by not * re-initializing the unchanging fields; which isn't much, but it seems * worth the extra space consumption. * 該字段是預初始化FunctionCallInfo結構體,用于調用聚合的轉換函數transfn. * 對于每一行,通過減少不會改變的字段的初始化可以節省一些CPU處理周期, * 但這個收益不會太大,但看起來值得額外的空間消耗. */ FunctionCallInfoData transfn_fcinfo; /* Likewise for serialization and deserialization functions */ //序列化和反序列化函數信息 FunctionCallInfoData serialfn_fcinfo; FunctionCallInfoData deserialfn_fcinfo; } AggStatePerTransData;
AggStatePerGroupData
per-aggregate-per-group工作狀態,這些工作狀態值在第一個輸入tuple group時初始化,后續在處理每個輸入tuple時更新.
/* * AggStatePerGroupData - per-aggregate-per-group working state * AggStatePerGroupData - per-aggregate-per-group工作狀態 * * These values are working state that is initialized at the start of * an input tuple group and updated for each input tuple. * 這些工作狀態值在第一個輸入tuple group時初始化,后續在處理每個輸入tuple時更新. * * In AGG_PLAIN and AGG_SORTED modes, we have a single array of these * structs (pointed to by aggstate->pergroup); we re-use the array for * each input group, if it's AGG_SORTED mode. In AGG_HASHED mode, the * hash table contains an array of these structs for each tuple group. * 在AGG_PLAIN/AGG_SORTED模式,這些結構體都有一個單獨的數組(aggstate->pergroup指向這些結構體); * 在AGG_SORTED模式,對于每一個輸入group,都會重用這些數組. * 在AGG_HASHED模式,hash表中都有對應每一個tuple group的這些結構體的數組. * * Logically, the sortstate field belongs in this struct, but we do not * keep it here for space reasons: we don't support DISTINCT aggregates * in AGG_HASHED mode, so there's no reason to use up a pointer field * in every entry of the hashtable. * 邏輯上來說,sortstate字段屬于該結構體,但出于空間大小考慮,不在這里保存: * 在AGG_HASHED模式,不支持DISTINCT聚合,因此沒有理由在hash表的每一個條目中都包含指針域. */ typedef struct AggStatePerGroupData { #define FIELDNO_AGGSTATEPERGROUPDATA_TRANSVALUE 0 //當前轉換值 Datum transValue; /* current transition value */ #define FIELDNO_AGGSTATEPERGROUPDATA_TRANSVALUEISNULL 1 bool transValueIsNull; #define FIELDNO_AGGSTATEPERGROUPDATA_NOTRANSVALUE 2 //如transValue尚未設置,則為T bool noTransValue; /* true if transValue not set yet */ /* * Note: noTransValue initially has the same value as transValueIsNull, * and if true both are cleared to false at the same time. They are not * the same though: if transfn later returns a NULL, we want to keep that * NULL and not auto-replace it with a later input value. Only the first * non-NULL input will be auto-substituted. * 注意:noTransValue與transValueIsNull在初始化時值一樣,如同為T,則同時將二者設置為F. * 但它們并不相同,如果transfn后續返回NULL,需要保存該NULL值而不是用隨后的輸入值自動替換之. * 只有在第一個非NULL輸入會被自動替換. */ } AggStatePerGroupData;
AggStatePerPhaseData
per-grouping-set-phase狀態.Grouping sets會被分拆為多個”步驟”,每一個單獨的步驟在輸入上都會完成一輪處理.
/* * AggStatePerPhaseData - per-grouping-set-phase state * AggStatePerPhaseData - per-grouping-set-phase狀態 * * Grouping sets are divided into "phases", where a single phase can be * processed in one pass over the input. If there is more than one phase, then * at the end of input from the current phase, state is reset and another pass * taken over the data which has been re-sorted in the mean time. * Grouping sets會被分拆為多個"步驟",每一個單獨的步驟在輸入上都會完成一輪處理. * 如果步驟多于一個,在當前步驟的最后一個輸入處,狀態會被重置,同時另一次傳遞接管了在此期間重排的數據. * * Accordingly, each phase specifies a list of grouping sets and group clause * information, plus each phase after the first also has a sort order. * 相應的,每一個步驟指定了grouping sets和group clause信息鏈表,外加在第一個步驟的排序. */ typedef struct AggStatePerPhaseData { //該步驟使用的策略 AggStrategy aggstrategy; /* strategy for this phase */ //grouping sets個數,如無則為0 int numsets; /* number of grouping sets (or 0) */ //grouping sets的大小 int *gset_lengths; /* lengths of grouping sets */ //rollup(上卷)列組 Bitmapset **grouped_cols; /* column groupings for rollup */ //返回等價的表達式,比較列序號作為索引 ExprState **eqfunctions; /* expression returning equality, indexed by * nr of cols to compare */ //對應步驟數據的Agg節點 Agg *aggnode; /* Agg node for phase data */ //該步驟的輸入排序Sort節點 Sort *sortnode; /* Sort node for input ordering for phase */ //轉換函數解析 ExprState *evaltrans; /* evaluation of transition functions */ } AggStatePerPhaseData;
AggStatePerHashData
per-hashtable狀態.使用哈希進行grouping set,每一個grouping set都會有一個這樣的結構體.
/* * AggStatePerHashData - per-hashtable state * AggStatePerHashData - per-hashtable狀態 * * When doing grouping sets with hashing, we have one of these for each * grouping set. (When doing hashing without grouping sets, we have just one of * them.) * 使用哈希進行grouping set,每一個grouping set都會有一個這樣的結構體. * (如無grouping sets執行哈希,則只需要一個即可) */ typedef struct AggStatePerHashData { //每一個group都有一個條目的哈希表 TupleHashTable hashtable; /* hash table with one entry per group */ //訪問哈希表的迭代器 TupleHashIterator hashiter; /* for iterating through hash table */ //裝載哈希表的slot TupleTableSlot *hashslot; /* slot for loading hash table */ //per-grouping-field哈希函數 FmgrInfo *hashfunctions; /* per-grouping-field hash fns */ //per-grouping-field等價函數 Oid *eqfuncoids; /* per-grouping-field equality fns */ //哈希鍵列個數 int numCols; /* number of hash key columns */ //哈希表中的列數 int numhashGrpCols; /* number of columns in hash table */ //請求哈希最大的列 int largestGrpColIdx; /* largest col required for hashing */ //輸入slot中的hash col索引數組 AttrNumber *hashGrpColIdxInput; /* hash col indices in input slot */ //hashtbl元組索引數組 AttrNumber *hashGrpColIdxHash; /* indices in hashtbl tuples */ //元素的Agg節點,用于numGroups等等 Agg *aggnode; /* original Agg node, for numGroups etc. */ } AggStatePerHashData;
感謝各位的閱讀,以上就是“PostgreSQL執行聚合函數所使用的數據結構有哪些”的內容了,經過本文的學習后,相信大家對PostgreSQL執行聚合函數所使用的數據結構有哪些這一問題有了更深刻的體會,具體使用情況還需要大家實踐驗證。這里是億速云,小編將為大家推送更多相關知識點的文章,歡迎關注!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。