PostgreSQL怎么調用mergeruns函數

發布時間：2021-11-09 11:51:21 來源：億速云閱讀：116 作者：iii 欄目：關系型數據庫

這篇文章主要介紹“PostgreSQL怎么調用mergeruns函數”，在日常操作中，相信很多人在PostgreSQL怎么調用mergeruns函數問題上存在疑惑，小編查閱了各式資料，整理出簡單好用的操作方法，希望對大家解答”PostgreSQL怎么調用mergeruns函數”的疑惑有所幫助！接下來，請跟著小編一起來學習吧！

TupleTableSlot
執行器在”tuple table”中存儲元組,這個表是各自獨立的TupleTableSlots鏈表.

/*----------
 * The executor stores tuples in a "tuple table" which is a List of
 * independent TupleTableSlots.  There are several cases we need to handle:
 *      1. physical tuple in a disk buffer page
 *      2. physical tuple constructed in palloc'ed memory
 *      3. "minimal" physical tuple constructed in palloc'ed memory
 *      4. "virtual" tuple consisting of Datum/isnull arrays
 * 執行器在"tuple table"中存儲元組,這個表是各自獨立的TupleTableSlots鏈表.
 * 有以下情況需要處理:
 *      1. 磁盤緩存頁中的物理元組
 *      2. 在已分配內存中構造的物理元組
 *      3. 在已分配內存中構造的"minimal"物理元組
 *      4. 含有Datum/isnull數組的"virtual"虛擬元組
 *
 * The first two cases are similar in that they both deal with "materialized"
 * tuples, but resource management is different.  For a tuple in a disk page
 * we need to hold a pin on the buffer until the TupleTableSlot's reference
 * to the tuple is dropped; while for a palloc'd tuple we usually want the
 * tuple pfree'd when the TupleTableSlot's reference is dropped.
 * 最上面2種情況跟"物化"元組的處理方式類似,但資源管理是不同的.
 * 對于在磁盤頁中的元組,需要pin在緩存中直至TupleTableSlot依賴的元組被清除,
 *   而對于通過palloc分配的元組在TupleTableSlot依賴被清除后通常希望使用pfree釋放
 *
 * A "minimal" tuple is handled similarly to a palloc'd regular tuple.
 * At present, minimal tuples never are stored in buffers, so there is no
 * parallel to case 1.  Note that a minimal tuple has no "system columns".
 * (Actually, it could have an OID, but we have no need to access the OID.)
 * "minimal"元組與通常的palloc分配的元組處理類似.
 * 截止目前為止,"minimal"元組不會存儲在緩存中,因此對于第一種情況不會存在并行的問題.
 * 注意"minimal"沒有"system columns"系統列
 * (實際上,可以有OID,但不需要訪問OID列)
 *
 * A "virtual" tuple is an optimization used to minimize physical data
 * copying in a nest of plan nodes.  Any pass-by-reference Datums in the
 * tuple point to storage that is not directly associated with the
 * TupleTableSlot; generally they will point to part of a tuple stored in
 * a lower plan node's output TupleTableSlot, or to a function result
 * constructed in a plan node's per-tuple econtext.  It is the responsibility
 * of the generating plan node to be sure these resources are not released
 * for as long as the virtual tuple needs to be valid.  We only use virtual
 * tuples in the result slots of plan nodes --- tuples to be copied anywhere
 * else need to be "materialized" into physical tuples.  Note also that a
 * virtual tuple does not have any "system columns".
 * "virtual"元組是用于在嵌套計劃節點中拷貝時最小化物理數據的優化.
 * 所有通過引用傳遞指向與TupleTableSlot非直接相關的存儲的元組的Datums使用,
 *   通常它們會指向存儲在低層節點輸出的TupleTableSlot中的元組的一部分,
 *   或者指向在計劃節點的per-tuple內存上下文econtext中構造的函數結果.
 * 產生計劃節點的時候有責任確保這些資源未被釋放,確保virtual元組是有效的.
 * 我們使用計劃節點中的結果slots中的虛擬元組 --- 元組會拷貝到其他地方需要"物化"到物理元組中.
 * 注意virtual元組不需要有"system columns"
 *
 * It is also possible for a TupleTableSlot to hold both physical and minimal
 * copies of a tuple.  This is done when the slot is requested to provide
 * the format other than the one it currently holds.  (Originally we attempted
 * to handle such requests by replacing one format with the other, but that
 * had the fatal defect of invalidating any pass-by-reference Datums pointing
 * into the existing slot contents.)  Both copies must contain identical data
 * payloads when this is the case.
 * TupleTableSlot包含物理和minimal元組拷貝是可能的.
 * 在slot需要提供格式化而不是當前持有的格式時會出現這種情況.
 * (原始的情況是我們準備通過另外一種格式進行替換來處理這種請求,但在校驗引用傳遞Datums時會出現致命錯誤)
 * 同時在這種情況下,拷貝必須含有唯一的數據payloads.
 *
 * The Datum/isnull arrays of a TupleTableSlot serve double duty.  When the
 * slot contains a virtual tuple, they are the authoritative data.  When the
 * slot contains a physical tuple, the arrays contain data extracted from
 * the tuple.  (In this state, any pass-by-reference Datums point into
 * the physical tuple.)  The extracted information is built "lazily",
 * ie, only as needed.  This serves to avoid repeated extraction of data
 * from the physical tuple.
 * TupleTableSlot中的Datum/isnull數組有雙重職責.
 * 在slot包含虛擬元組時,它們是authoritative(權威)數據.
 * 在slot包含物理元組時,時包含從元組中提取的數據的數組.
 * (在這種情況下,所有通過引用傳遞的Datums指向物理元組)
 * 提取的信息通過'lazily'在需要的時候才構建.
 * 這樣可以避免從物理元組的重復數據提取.
 *
 * A TupleTableSlot can also be "empty", holding no valid data.  This is
 * the only valid state for a freshly-created slot that has not yet had a
 * tuple descriptor assigned to it.  In this state, tts_isempty must be
 * true, tts_shouldFree false, tts_tuple NULL, tts_buffer InvalidBuffer,
 * and tts_nvalid zero.
 * TupleTableSlot可能為"empty",沒有有效數據.
 * 對于新鮮創建仍未分配描述的的slot來說這是唯一有效的狀態.
 * 在這種狀態下,tts_isempty必須為T,tts_shouldFree為F, tts_tuple為NULL,
 *   tts_buffer為InvalidBuffer,tts_nvalid為0.
 *
 * The tupleDescriptor is simply referenced, not copied, by the TupleTableSlot
 * code.  The caller of ExecSetSlotDescriptor() is responsible for providing
 * a descriptor that will live as long as the slot does.  (Typically, both
 * slots and descriptors are in per-query memory and are freed by memory
 * context deallocation at query end; so it's not worth providing any extra
 * mechanism to do more.  However, the slot will increment the tupdesc
 * reference count if a reference-counted tupdesc is supplied.)
 * tupleDescriptor只是簡單的引用并沒有通過TupleTableSlot中的代碼進行拷貝.
 * ExecSetSlotDescriptor()的調用者有責任提供與slot生命周期一樣的描述符.
 * (典型的,不管是slots還是描述符會在per-query內存中,
 *  并且會在查詢結束時通過內存上下文的析構器釋放,因此不需要提供額外的機制來處理.
 *  但是,如果使用了引用計數型tupdesc,slot會增加tupdesc引用計數)
 *
 * When tts_shouldFree is true, the physical tuple is "owned" by the slot
 * and should be freed when the slot's reference to the tuple is dropped.
 * 在tts_shouldFree為T的情況下,物理元組由slot持有,并且在slot引用元組被清除時釋放內存.
 *
 * If tts_buffer is not InvalidBuffer, then the slot is holding a pin
 * on the indicated buffer page; drop the pin when we release the
 * slot's reference to that buffer.  (tts_shouldFree should always be
 * false in such a case, since presumably tts_tuple is pointing at the
 * buffer page.)
 * 如tts_buffer不是InvalidBuffer,那么slot持有緩存頁中的pin,在釋放引用該buffer的slot時會清除該pin.
 * (tts_shouldFree通常來說應為F,因為tts_tuple會指向緩存頁)
 *
 * tts_nvalid indicates the number of valid columns in the tts_values/isnull
 * arrays.  When the slot is holding a "virtual" tuple this must be equal
 * to the descriptor's natts.  When the slot is holding a physical tuple
 * this is equal to the number of columns we have extracted (we always
 * extract columns from left to right, so there are no holes).
 * tts_nvalid指示了tts_values/isnull數組中的有效列數.
 * 如果slot含有虛擬元組,該字段必須跟描述符的natts一樣.
 * 在slot含有物理元組時,該字段等于我們提取的列數.
 * (我們通常從左到右提取列,因此不會有空洞存在)
 *
 * tts_values/tts_isnull are allocated when a descriptor is assigned to the
 * slot; they are of length equal to the descriptor's natts.
 * 在描述符分配給slot時tts_values/tts_isnull會被分配內存,長度與描述符natts長度一樣.
 *
 * tts_mintuple must always be NULL if the slot does not hold a "minimal"
 * tuple.  When it does, tts_mintuple points to the actual MinimalTupleData
 * object (the thing to be pfree'd if tts_shouldFreeMin is true).  If the slot
 * has only a minimal and not also a regular physical tuple, then tts_tuple
 * points at tts_minhdr and the fields of that struct are set correctly
 * for access to the minimal tuple; in particular, tts_minhdr.t_data points
 * MINIMAL_TUPLE_OFFSET bytes before tts_mintuple.  This allows column
 * extraction to treat the case identically to regular physical tuples.
 * 如果slot沒有包含minimal元組,tts_mintuple通常必須為NULL.
 * 如含有,則tts_mintuple執行實際的MinimalTupleData對象(如tts_shouldFreeMin為T,則需要通過pfree釋放內存).
 * 如果slot只有一個minimal而沒有通常的物理元組,那么tts_tuple指向tts_minhdr,
 *   結構體的其他字段會被正確的設置為用于訪問minimal元組.
 *   特別的, tts_minhdr.t_data指向tts_mintuple前的MINIMAL_TUPLE_OFFSET字節.
 * 這可以讓列提取可以獨立處理通常的物理元組.
 *
 * tts_slow/tts_off are saved state for slot_deform_tuple, and should not
 * be touched by any other code.
 * tts_slow/tts_off用于存儲slot_deform_tuple狀態,不應通過其他代碼修改.
 *----------
 */
typedef struct TupleTableSlot
{
    NodeTag     type;//Node標記
    //如slot為空,則為T
    bool        tts_isempty;    /* true = slot is empty */
    //是否需要pfree tts_tuple?
    bool        tts_shouldFree; /* should pfree tts_tuple? */
    //是否需要pfree tts_mintuple?
    bool        tts_shouldFreeMin;  /* should pfree tts_mintuple? */
#define FIELDNO_TUPLETABLESLOT_SLOW 4
    //為slot_deform_tuple存儲狀態?
    bool        tts_slow;       /* saved state for slot_deform_tuple */
#define FIELDNO_TUPLETABLESLOT_TUPLE 5
    //物理元組,如為虛擬元組則為NULL
    HeapTuple   tts_tuple;      /* physical tuple, or NULL if virtual */
#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 6
    //slot中的元組描述符
    TupleDesc   tts_tupleDescriptor;    /* slot's tuple descriptor */
    //slot所在的上下文
    MemoryContext tts_mcxt;     /* slot itself is in this context */
    //元組緩存,如無則為InvalidBuffer
    Buffer      tts_buffer;     /* tuple's buffer, or InvalidBuffer */
#define FIELDNO_TUPLETABLESLOT_NVALID 9
    //tts_values中的有效值
    int         tts_nvalid;     /* # of valid values in tts_values */
#define FIELDNO_TUPLETABLESLOT_VALUES 10
    //當前每個屬性的值
    Datum      *tts_values;     /* current per-attribute values */
#define FIELDNO_TUPLETABLESLOT_ISNULL 11
    //isnull數組
    bool       *tts_isnull;     /* current per-attribute isnull flags */
    //minimal元組,如無則為NULL
    MinimalTuple tts_mintuple;  /* minimal tuple, or NULL if none */
    //在minimal情況下的工作空間
    HeapTupleData tts_minhdr;   /* workspace for minimal-tuple-only case */
#define FIELDNO_TUPLETABLESLOT_OFF 14
    //slot_deform_tuple的存儲狀態
    uint32      tts_off;        /* saved state for slot_deform_tuple */
    //不能被變更的描述符(固定描述符)
    bool        tts_fixedTupleDescriptor;   /* descriptor can't be changed */
} TupleTableSlot;
/* base tuple table slot type */
typedef struct TupleTableSlot
{
    NodeTag     type;//Node標記
#define FIELDNO_TUPLETABLESLOT_FLAGS 1
    uint16      tts_flags;      /* 布爾狀態;Boolean states */
#define FIELDNO_TUPLETABLESLOT_NVALID 2
    AttrNumber  tts_nvalid;     /* 在tts_values中有多少有效的values;# of valid values in tts_values */
    const TupleTableSlotOps *const tts_ops; /* slot的實際實現;implementation of slot */
#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 4
    TupleDesc   tts_tupleDescriptor;    /* slot的元組描述符;slot's tuple descriptor */
#define FIELDNO_TUPLETABLESLOT_VALUES 5
    Datum      *tts_values;     /* 當前屬性值;current per-attribute values */
#define FIELDNO_TUPLETABLESLOT_ISNULL 6
    bool       *tts_isnull;     /* 當前屬性isnull標記;current per-attribute isnull flags */
    MemoryContext tts_mcxt;     /*內存上下文; slot itself is in this context */
} TupleTableSlot;
/* routines for a TupleTableSlot implementation */
//TupleTableSlot的"小程序"
struct TupleTableSlotOps
{
    /* Minimum size of the slot */
    //slot的最小化大小
    size_t          base_slot_size;
    /* Initialization. */
    //初始化方法
    void (*init)(TupleTableSlot *slot);
    /* Destruction. */
    //析構方法
    void (*release)(TupleTableSlot *slot);
    /*
     * Clear the contents of the slot. Only the contents are expected to be
     * cleared and not the tuple descriptor. Typically an implementation of
     * this callback should free the memory allocated for the tuple contained
     * in the slot.
     * 清除slot中的內容。
     * 只希望清除內容，而不希望清除元組描述符。
     * 通常，這個回調的實現應該釋放為slot中包含的元組分配的內存。
     */
    void (*clear)(TupleTableSlot *slot);
    /*
     * Fill up first natts entries of tts_values and tts_isnull arrays with
     * values from the tuple contained in the slot. The function may be called
     * with natts more than the number of attributes available in the tuple,
     * in which case it should set tts_nvalid to the number of returned
     * columns.
     * 用slot中包含的元組的值填充tts_values和tts_isnull數組的第一個natts條目。
     * 在調用該函數時，natts可能多于元組中可用屬性的數量，在這種情況下，
     *   應該將tts_nvalid設置為返回列的數量。
     */
    void (*getsomeattrs)(TupleTableSlot *slot, int natts);
    /*
     * Returns value of the given system attribute as a datum and sets isnull
     * to false, if it's not NULL. Throws an error if the slot type does not
     * support system attributes.
     * 將給定系統屬性的值作為基準返回，如果不為NULL，
     *   則將isnull設置為false。如果slot類型不支持系統屬性，則引發錯誤。
     */
    Datum (*getsysattr)(TupleTableSlot *slot, int attnum, bool *isnull);
    /*
     * Make the contents of the slot solely depend on the slot, and not on
     * underlying resources (like another memory context, buffers, etc).
     * 使slot的內容完全依賴于slot，而不是底層資源(如另一個內存上下文、緩沖區等)。
     */
    void (*materialize)(TupleTableSlot *slot);
    /*
     * Copy the contents of the source slot into the destination slot's own
     * context. Invoked using callback of the destination slot.
     * 將源slot的內容復制到目標slot自己的上下文中。
     * 使用目標slot的回調函數調用。
     */
    void (*copyslot) (TupleTableSlot *dstslot, TupleTableSlot *srcslot);
    /*
     * Return a heap tuple "owned" by the slot. It is slot's responsibility to
     * free the memory consumed by the heap tuple. If the slot can not "own" a
     * heap tuple, it should not implement this callback and should set it as
     * NULL.
     * 返回slot“擁有”的堆元組。
     * slot負責釋放堆元組分配的內存。
     * 如果slot不能“擁有”堆元組，它不應該實現這個回調函數，應該將它設置為NULL。
     */
    HeapTuple (*get_heap_tuple)(TupleTableSlot *slot);
    /*
     * Return a minimal tuple "owned" by the slot. It is slot's responsibility
     * to free the memory consumed by the minimal tuple. If the slot can not
     * "own" a minimal tuple, it should not implement this callback and should
     * set it as NULL.
     * 返回slot“擁有”的最小元組。
     * slot負責釋放最小元組分配的內存。
     * 如果slot不能“擁有”最小元組，它不應該實現這個回調函數，應該將它設置為NULL。
     */
    MinimalTuple (*get_minimal_tuple)(TupleTableSlot *slot);
    /*
     * Return a copy of heap tuple representing the contents of the slot. The
     * copy needs to be palloc'd in the current memory context. The slot
     * itself is expected to remain unaffected. It is *not* expected to have
     * meaningful "system columns" in the copy. The copy is not be "owned" by
     * the slot i.e. the caller has to take responsibilty to free memory
     * consumed by the slot.
     * 返回表示slot內容的堆元組副本。
     * 需要在當前內存上下文中對副本進行內存分配palloc。
     * 預計slot本身不會受到影響。
     * 它不希望在副本中有有意義的“系統列”。副本不是slot“擁有”的，即調用方必須負責釋放slot消耗的內存。
     */
    HeapTuple (*copy_heap_tuple)(TupleTableSlot *slot);
    /*
     * Return a copy of minimal tuple representing the contents of the slot. The
     * copy needs to be palloc'd in the current memory context. The slot
     * itself is expected to remain unaffected. It is *not* expected to have
     * meaningful "system columns" in the copy. The copy is not be "owned" by
     * the slot i.e. the caller has to take responsibilty to free memory
     * consumed by the slot.
     * 返回表示slot內容的最小元組的副本。
     * 需要在當前內存上下文中對副本進行palloc。
     * 預計slot本身不會受到影響。
     * 它不希望在副本中有有意義的“系統列”。副本不是slot“擁有”的，即調用方必須負責釋放slot消耗的內存。
     */
    MinimalTuple (*copy_minimal_tuple)(TupleTableSlot *slot);
};
typedef struct tupleDesc
{
    int         natts;          /* tuple中的屬性數量;number of attributes in the tuple */
    Oid         tdtypeid;       /* tuple類型的組合類型ID;composite type ID for tuple type */
    int32       tdtypmod;       /* tuple類型的typmode;typmod for tuple type */
    int         tdrefcount;     /* 依賴計數,如為-1,則沒有依賴;reference count, or -1 if not counting */
    TupleConstr *constr;        /* 約束,如無則為NULL;constraints, or NULL if none */
    /* attrs[N] is the description of Attribute Number N+1 */
    //attrs[N]是第N+1個屬性的描述符
    FormData_pg_attribute attrs[FLEXIBLE_ARRAY_MEMBER];
}  *TupleDesc;

SortState
排序運行期狀態信息

/* ----------------
 *   SortState information
 *   排序運行期狀態信息
 * ----------------
 */
typedef struct SortState
{
    //基類
    ScanState   ss;             /* its first field is NodeTag */
    //是否需要隨機訪問排序輸出?
    bool        randomAccess;   /* need random access to sort output? */
    //結果集是否存在邊界?
    bool        bounded;        /* is the result set bounded? */
    //如存在邊界,需要多少個元組?
    int64       bound;          /* if bounded, how many tuples are needed */
    //是否已完成排序?
    bool        sort_Done;      /* sort completed yet? */
    //是否使用有界值?
    bool        bounded_Done;   /* value of bounded we did the sort with */
    //使用的有界值?
    int64       bound_Done;     /* value of bound we did the sort with */
    //tuplesort.c的私有狀態
    void       *tuplesortstate; /* private state of tuplesort.c */
    //是否worker?
    bool        am_worker;      /* are we a worker? */
    //每個worker對應一個條目
    SharedSortInfo *shared_info;    /* one entry per worker */
} SortState;
/* ----------------
 *   Shared memory container for per-worker sort information
 *   per-worker排序信息的共享內存容器
 * ----------------
 */
typedef struct SharedSortInfo
{
    //worker個數?
    int         num_workers;
    //排序機制
    TuplesortInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
} SharedSortInfo;

TuplesortInstrumentation
報告排序統計的數據結構.

/*
 * Data structures for reporting sort statistics.  Note that
 * TuplesortInstrumentation can't contain any pointers because we
 * sometimes put it in shared memory.
 * 報告排序統計的數據結構.
 * 注意TuplesortInstrumentation不能包含指針因為有時候會把該結構體放在共享內存中.
 */
typedef enum
{
    SORT_TYPE_STILL_IN_PROGRESS = 0,//仍然在排序中
    SORT_TYPE_TOP_N_HEAPSORT,//TOP N 堆排序
    SORT_TYPE_QUICKSORT,//快速排序
    SORT_TYPE_EXTERNAL_SORT,//外排序
    SORT_TYPE_EXTERNAL_MERGE//外排序后的合并
} TuplesortMethod;//排序方法
typedef enum
{
    SORT_SPACE_TYPE_DISK,//需要用上磁盤
    SORT_SPACE_TYPE_MEMORY//使用內存
} TuplesortSpaceType;
typedef struct TuplesortInstrumentation
{
    //使用的排序算法
    TuplesortMethod sortMethod; /* sort algorithm used */
    //排序使用空間類型
    TuplesortSpaceType spaceType;   /* type of space spaceUsed represents */
    //空間消耗(以K為單位)
    long        spaceUsed;      /* space consumption, in kB */
} TuplesortInstrumentation;

二、源碼解讀

mergeruns歸并所有已完成初始輪的數據.

/*
 * mergeruns -- merge all the completed initial runs.
 * mergeruns -- 歸并所有已完成的數據.
 *
 * This implements steps D5, D6 of Algorithm D.  All input data has
 * already been written to initial runs on tape (see dumptuples).
 * 實現了算法D中的D5和D6.
 * 所有輸入數據已寫入到磁盤上(dumptuples函數負責完成).
 */
static void
mergeruns(Tuplesortstate *state)
{
    int         tapenum,
                svTape,
                svRuns,
                svDummy;
    int         numTapes;
    int         numInputTapes;
    Assert(state->status == TSS_BUILDRUNS);
    Assert(state->memtupcount == 0);
    if (state->sortKeys != NULL && state->sortKeys->abbrev_converter != NULL)
    {
        /*
         * If there are multiple runs to be merged, when we go to read back
         * tuples from disk, abbreviated keys will not have been stored, and
         * we don't care to regenerate them.  Disable abbreviation from this
         * point on.
         * 如果從磁盤上讀回元組時存在多個運行需要被歸并,
         *   縮寫鍵不會被存儲,并不關系是否需要重新生成它們.
         * 在這一刻起,禁用縮寫.
         */
        state->sortKeys->abbrev_converter = NULL;
        state->sortKeys->comparator = state->sortKeys->abbrev_full_comparator;
        /* Not strictly necessary, but be tidy */
        //非嚴格性需要,但需要tidy
        state->sortKeys->abbrev_abort = NULL;
        state->sortKeys->abbrev_full_comparator = NULL;
    }
    /*
     * Reset tuple memory.  We've freed all the tuples that we previously
     * allocated.  We will use the slab allocator from now on.
     * 重置元組內存.
     * 已釋放了先前分配的內存.從現在起使用slab分配器.
     */
    MemoryContextDelete(state->tuplecontext);
    state->tuplecontext = NULL;
    /*
     * We no longer need a large memtuples array.  (We will allocate a smaller
     * one for the heap later.)
     * 不再需要大塊的memtuples數組.(將為后面的堆分配更小塊的內存)
     */
    FREEMEM(state, GetMemoryChunkSpace(state->memtuples));
    pfree(state->memtuples);
    state->memtuples = NULL;
    /*
     * If we had fewer runs than tapes, refund the memory that we imagined we
     * would need for the tape buffers of the unused tapes.
     * 比起tapes,如果runs要少, 退還我們認為需要用于tape緩存但其實用不上的內存.
     *
     * numTapes and numInputTapes reflect the actual number of tapes we will
     * use.  Note that the output tape's tape number is maxTapes - 1, so the
     * tape numbers of the used tapes are not consecutive, and you cannot just
     * loop from 0 to numTapes to visit all used tapes!
     * numTapes和numInputTapes反映了實際的使用tapes數.
     * 注意輸出的tape編號是maxTapes - 1,因此已使用的tape編號不是連續的,
     *   不能簡單的從0 - numTapes循環訪問所有已使用的tapes.
     */
    if (state->Level == 1)
    {
        numInputTapes = state->currentRun;
        numTapes = numInputTapes + 1;
        FREEMEM(state, (state->maxTapes - numTapes) * TAPE_BUFFER_OVERHEAD);
    }
    else
    {
        numInputTapes = state->tapeRange;
        numTapes = state->maxTapes;
    }
    /*
     * Initialize the slab allocator.  We need one slab slot per input tape,
     * for the tuples in the heap, plus one to hold the tuple last returned
     * from tuplesort_gettuple.  (If we're sorting pass-by-val Datums,
     * however, we don't need to do allocate anything.)
     * 初始化slab分配器.每一個輸入的tape都有一個slab slot,對于堆中的元組,
     *   外加1用于保存最后從tuplesort_gettuple返回的元組.
     * (但是,如果通過傳值的方式傳遞Datums,不需要執行內存分配)
     *
     * From this point on, we no longer use the USEMEM()/LACKMEM() mechanism
     * to track memory usage of individual tuples.
     * 從這點起,不再使用USEMEM()/LACKMEM()這種機制來跟蹤獨立元組的內存使用.
     */
    if (state->tuples)
        init_slab_allocator(state, numInputTapes + 1);
    else
        init_slab_allocator(state, 0);
    /*
     * Allocate a new 'memtuples' array, for the heap.  It will hold one tuple
     * from each input tape.
     * 為堆分配新的'memtuples'數組
     * 對于每一個輸入的tape,都會保存有一個元組.
     */
    state->memtupsize = numInputTapes;
    state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple));
    USEMEM(state, GetMemoryChunkSpace(state->memtuples));
    /*
     * Use all the remaining memory we have available for read buffers among
     * the input tapes.
     * 使用所有可使用的剩余內存讀取輸入tapes之間的緩存.
     *
     * We don't try to "rebalance" the memory among tapes, when we start a new
     * merge phase, even if some tapes are inactive in the new phase.  That
     * would be hard, because logtape.c doesn't know where one run ends and
     * another begins.  When a new merge phase begins, and a tape doesn't
     * participate in it, its buffer nevertheless already contains tuples from
     * the next run on same tape, so we cannot release the buffer.  That's OK
     * in practice, merge performance isn't that sensitive to the amount of
     * buffers used, and most merge phases use all or almost all tapes,
     * anyway.
     * 在新的階段就算存在某些tapes不再活動,在開始新的歸并階段時,不再嘗試在tapes之間重平衡內存.
     * 這是比較難以實現的,因為logtape.c不知道某個運行在哪里結束了,那個運行在哪里開始.
     * 在新的歸并階段開始時,tape不需要分享,盡管如此,它的緩沖區已包含來自同一tape上下一次運行需要的元組,
     * 因此不需要釋放緩沖區.
     * 實踐中,這是沒有問題的,歸并的性能對于緩存的使用不是性能敏感的,大多數歸并階段使用所有或大多數的tapes.
     */
#ifdef TRACE_SORT
    if (trace_sort)
        elog(LOG, "worker %d using " INT64_FORMAT " KB of memory for read buffers among %d input tapes",
             state->worker, state->availMem / 1024, numInputTapes);
#endif
    state->read_buffer_size = Max(state->availMem / numInputTapes, 0);
    USEMEM(state, state->read_buffer_size * numInputTapes);
    /* End of step D2: rewind all output tapes to prepare for merging */
    //D2完成,倒回所有輸出tapes準備歸并
    for (tapenum = 0; tapenum < state->tapeRange; tapenum++)
        LogicalTapeRewindForRead(state->tapeset, tapenum, state->read_buffer_size);
    for (;;)
    {
        //------------- 循環
        /*
         * At this point we know that tape[T] is empty.  If there's just one
         * (real or dummy) run left on each input tape, then only one merge
         * pass remains.  If we don't have to produce a materialized sorted
         * tape, we can stop at this point and do the final merge on-the-fly.
         * 在這時候,我們已知tape[T]是空的.
         * 如果正好在每一個輸入tape上只剩下某個run(實際或者虛擬的),那么只剩下一次歸并.
         * 如果不需要產生物化排序后的tape,這時候可以停止并執行內存中的最終歸并.
         */
        if (!state->randomAccess && !WORKER(state))
        {
            bool        allOneRun = true;
            Assert(state->tp_runs[state->tapeRange] == 0);
            for (tapenum = 0; tapenum < state->tapeRange; tapenum++)
            {
                if (state->tp_runs[tapenum] + state->tp_dummy[tapenum] != 1)
                {
                    allOneRun = false;
                    break;
                }
            }
            if (allOneRun)
            {
                /* Tell logtape.c we won't be writing anymore */
                //通知logtape.c,不再寫入.
                LogicalTapeSetForgetFreeSpace(state->tapeset);
                /* Initialize for the final merge pass */
                //為最終的歸并做準備
                beginmerge(state);
                state->status = TSS_FINALMERGE;
                return;
            }
        }
        /* Step D5: merge runs onto tape[T] until tape[P] is empty */
        //步驟D5:歸并runs到tape[T]中直至tape[P]為空
        while (state->tp_runs[state->tapeRange - 1] ||
               state->tp_dummy[state->tapeRange - 1])
        {
            bool        allDummy = true;
            for (tapenum = 0; tapenum < state->tapeRange; tapenum++)
            {
                if (state->tp_dummy[tapenum] == 0)
                {
                    allDummy = false;
                    break;
                }
            }
            if (allDummy)
            {
                state->tp_dummy[state->tapeRange]++;
                for (tapenum = 0; tapenum < state->tapeRange; tapenum++)
                    state->tp_dummy[tapenum]--;
            }
            else
                mergeonerun(state);
        }
        /* Step D6: decrease level */
        //步驟D6:往上層匯總
        if (--state->Level == 0)
            break;
        /* rewind output tape T to use as new input */
        //倒回輸入的Tape T作為新的輸入
        LogicalTapeRewindForRead(state->tapeset, state->tp_tapenum[state->tapeRange],
                                 state->read_buffer_size);
        /* rewind used-up input tape P, and prepare it for write pass */
        //倒回使用上的輸入tape P,并為寫入輪準備
        LogicalTapeRewindForWrite(state->tapeset, state->tp_tapenum[state->tapeRange - 1]);
        state->tp_runs[state->tapeRange - 1] = 0;
        /*
         * reassign tape units per step D6; note we no longer care about A[]
         * 每一個步驟D6,重分配tape單元.
         * 注意我們不再關心A[]了.
         */
        svTape = state->tp_tapenum[state->tapeRange];
        svDummy = state->tp_dummy[state->tapeRange];
        svRuns = state->tp_runs[state->tapeRange];
        for (tapenum = state->tapeRange; tapenum > 0; tapenum--)
        {
            state->tp_tapenum[tapenum] = state->tp_tapenum[tapenum - 1];
            state->tp_dummy[tapenum] = state->tp_dummy[tapenum - 1];
            state->tp_runs[tapenum] = state->tp_runs[tapenum - 1];
        }
        state->tp_tapenum[0] = svTape;
        state->tp_dummy[0] = svDummy;
        state->tp_runs[0] = svRuns;
    }
    /*
     * Done.  Knuth says that the result is on TAPE[1], but since we exited
     * the loop without performing the last iteration of step D6, we have not
     * rearranged the tape unit assignment, and therefore the result is on
     * TAPE[T].  We need to do it this way so that we can freeze the final
     * output tape while rewinding it.  The last iteration of step D6 would be
     * a waste of cycles anyway...
     * 大功告成!結果位于TAPE[1]中,但因為沒有執行步驟D6中最后一個迭代就退出了循環,
     *   因此不需要重新整理tape單元分配,因此結果在TAPE[T]中.
     * 通過這種方法來處理一遍可以在倒回時凍結結果輸出TAPE.
     * 步驟D6的最后一輪迭代會是浪費.
     */
    state->result_tape = state->tp_tapenum[state->tapeRange];
    if (!WORKER(state))
        LogicalTapeFreeze(state->tapeset, state->result_tape, NULL);
    else
        worker_freeze_result_tape(state);
    state->status = TSS_SORTEDONTAPE;
    /* Release the read buffers of all the other tapes, by rewinding them. */
    //通過倒回tapes,釋放所有其他tapes的讀緩存
    for (tapenum = 0; tapenum < state->maxTapes; tapenum++)
    {
        if (tapenum != state->result_tape)
            LogicalTapeRewindForWrite(state->tapeset, tapenum);
    }
}

三、跟蹤分析

測試腳本

select * from t_sort order by c1,c2;

跟蹤分析

(gdb) b mergeruns
Breakpoint 1 at 0xa73508: file tuplesort.c, line 2570.
(gdb) 
Note: breakpoint 1 also set at pc 0xa73508.
Breakpoint 2 at 0xa73508: file tuplesort.c, line 2570.

輸入參數

(gdb) c
Continuing.
Breakpoint 1, mergeruns (state=0x2b808a8) at tuplesort.c:2570
2570        Assert(state->status == TSS_BUILDRUNS);
(gdb) p *state
$1 = {status = TSS_BUILDRUNS, nKeys = 2, randomAccess = false, bounded = false, boundUsed = false, bound = 0, 
  tuples = true, availMem = 3164456, allowedMem = 4194304, maxTapes = 16, tapeRange = 15, sortcontext = 0x2b80790, 
  tuplecontext = 0x2b827a0, tapeset = 0x2b81480, comparetup = 0xa7525b <comparetup_heap>, 
  copytup = 0xa76247 <copytup_heap>, writetup = 0xa76de1 <writetup_heap>, readtup = 0xa76ec6 <readtup_heap>, 
  memtuples = 0x7f0cfeb14050, memtupcount = 0, memtupsize = 37448, growmemtuples = false, slabAllocatorUsed = false, 
  slabMemoryBegin = 0x0, slabMemoryEnd = 0x0, slabFreeHead = 0x0, read_buffer_size = 0, lastReturnedTuple = 0x0, 
  currentRun = 3, mergeactive = 0x2b81350, Level = 1, destTape = 2, tp_fib = 0x2b80d58, tp_runs = 0x2b81378, 
  tp_dummy = 0x2b813d0, tp_tapenum = 0x2b81428, activeTapes = 0, result_tape = -1, current = 0, eof_reached = false, 
  markpos_block = 0, markpos_offset = 0, markpos_eof = false, worker = -1, shared = 0x0, nParticipants = -1, 
  tupDesc = 0x2b67ae0, sortKeys = 0x2b80cc0, onlyKey = 0x0, abbrevNext = 10, indexInfo = 0x0, estate = 0x0, heapRel = 0x0, 
  indexRel = 0x0, enforceUnique = false, high_mask = 0, low_mask = 0, max_buckets = 0, datumType = 0, datumTypeLen = 0, 
  ru_start = {tv = {tv_sec = 0, tv_usec = 0}, ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, 
        tv_usec = 0}, {ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0, 
        __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 0, __ru_minflt_word = 0}, {ru_majflt = 0, 
        __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0}, {
        ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, 
        __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 0, __ru_nvcsw_word = 0}, {
        ru_nivcsw = 0, __ru_nivcsw_word = 0}}}}
(gdb)

排序鍵等信息

(gdb) n
2571        Assert(state->memtupcount == 0);
(gdb) 
2573        if (state->sortKeys != NULL && state->sortKeys->abbrev_converter != NULL)
(gdb) p *state->sortKeys
$2 = {ssup_cxt = 0x2b80790, ssup_collation = 0, ssup_reverse = false, ssup_nulls_first = false, ssup_attno = 2, 
  ssup_extra = 0x0, comparator = 0x4fd4af <btint4fastcmp>, abbreviate = true, abbrev_converter = 0x0, abbrev_abort = 0x0, 
  abbrev_full_comparator = 0x0}
(gdb) p *state->sortKeys->abbrev_converter
Cannot access memory at address 0x0

重置元組內存,不再需要大塊的memtuples數組.

(gdb) n
2593        MemoryContextDelete(state->tuplecontext);
(gdb) 
2594        state->tuplecontext = NULL;
(gdb) 
(gdb) n
2600        FREEMEM(state, GetMemoryChunkSpace(state->memtuples));
(gdb) 
2601        pfree(state->memtuples);
(gdb) 
2602        state->memtuples = NULL;
(gdb) 
2613        if (state->Level == 1)
(gdb)

計算Tapes數

(gdb) n
2615            numInputTapes = state->currentRun;
(gdb) p state->currentRun
$3 = 3
(gdb) p state->Level
$4 = 1
(gdb) p state->tapeRange
$5 = 15
(gdb) p state->maxTapes
$6 = 16
(gdb) n
2616            numTapes = numInputTapes + 1;
(gdb) 
2617            FREEMEM(state, (state->maxTapes - numTapes) * TAPE_BUFFER_OVERHEAD);
(gdb) 
2634        if (state->tuples)
(gdb) p numInputTapes
$7 = 3
(gdb) p numTapes
$8 = 4
(gdb)

初始化slab分配器/為堆分配新的’memtuples’數組/倒回所有輸出tapes準備歸并

(gdb) n
2635            init_slab_allocator(state, numInputTapes + 1);
(gdb) n
2643        state->memtupsize = numInputTapes;
(gdb) 
2644        state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple));
(gdb) 
2645        USEMEM(state, GetMemoryChunkSpace(state->memtuples));
(gdb) p state->memtupsize
$9 = 3
(gdb) n
2662        if (trace_sort)
(gdb) 
2667        state->read_buffer_size = Max(state->availMem / numInputTapes, 0);
(gdb) 
2668        USEMEM(state, state->read_buffer_size * numInputTapes);
(gdb) p state->read_buffer_size
$10 = 1385762
(gdb) n
2671        for (tapenum = 0; tapenum < state->tapeRange; tapenum++)
(gdb) 
2672            LogicalTapeRewindForRead(state->tapeset, tapenum, state->read_buffer_size);
(gdb) p state->tapeRange
$11 = 15
(gdb) p state->status
$12 = TSS_BUILDRUNS
(gdb)

進入循環

2671        for (tapenum = 0; tapenum < state->tapeRange; tapenum++)
(gdb) 
2682            if (!state->randomAccess && !WORKER(state))
(gdb) 
2684                bool        allOneRun = true;
(gdb) p state->randomAccess
$15 = false
(gdb) p WORKER(state)
$16 = 0
(gdb)

循環判斷allOneRun是否為F

2687                for (tapenum = 0; tapenum < state->tapeRange; tapenum++)
(gdb) 
2695                if (allOneRun)
(gdb) p allOneRun
$19 = true
(gdb)

開始歸并,并設置狀態,返回

(gdb) n
2698                    LogicalTapeSetForgetFreeSpace(state->tapeset);
(gdb) 
2700                    beginmerge(state);
(gdb) 
2701                    state->status = TSS_FINALMERGE;
(gdb) 
2702                    return;
(gdb) 
2779    }
(gdb) 
tuplesort_performsort (state=0x2b808a8) at tuplesort.c:1866
1866                state->eof_reached = false;
(gdb)

完成排序

(gdb) n
1867                state->markpos_block = 0L;
(gdb) 
1868                state->markpos_offset = 0;
(gdb) 
1869                state->markpos_eof = false;
(gdb) 
1870                break;
(gdb) 
1878        if (trace_sort)
(gdb) 
1890        MemoryContextSwitchTo(oldcontext);
(gdb) 
1891    }
(gdb) 
ExecSort (pstate=0x2b67640) at nodeSort.c:123
123         estate->es_direction = dir;
(gdb) c
Continuing.

到此，關于“PostgreSQL怎么調用mergeruns函數”的學習就結束了，希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習，快去試試吧！若想繼續學習更多相關知識，請繼續關注億速云網站，小編會繼續努力為大家帶來更多實用的文章！

向AI問一下細節

亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

PostgreSQL怎么調用mergeruns函數

二、源碼解讀

三、跟蹤分析

猜你喜歡

亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

PostgreSQL怎么調用mergeruns函數

二、源碼解讀

三、跟蹤分析

猜你喜歡

最新資訊

相關推薦

相關標簽