亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

PostgreSQL 源碼解讀(133)- MVCC#17(vacuum過程-lazy_vacuum_index函數#2)

發布時間:2020-08-11 16:11:39 來源:ITPUB博客 閱讀:174 作者:husthxd 欄目:關系型數據庫

本節簡單介紹了PostgreSQL手工執行vacuum的處理流程,主要分析了B-Tree索引vacuum的處理過程,實現函數是btvacuumscan。

一、數據結構

IndexVacuumInfo
傳遞給ambulkdelete/amvacuumcleanup的輸入參數結構體


/*
 * Struct for input arguments passed to ambulkdelete and amvacuumcleanup
 * 傳遞給ambulkdelete/amvacuumcleanup的輸入參數結構體
 *
 * num_heap_tuples is accurate only when estimated_count is false;
 * otherwise it's just an estimate (currently, the estimate is the
 * prior value of the relation's pg_class.reltuples field).  It will
 * always just be an estimate during ambulkdelete.
 * 在estimated_count為F的情況下,num_heap_tuples才是精確的.
 * 否則,該值只是一個故事(當前的實現是,該值是relation's pg_class.reltuples字段的上一個值).
 * 在ambulkdelete期間該值會一直都是估算值.
 */
typedef struct IndexVacuumInfo
{
    //index relation
    Relation    index;          /* the index being vacuumed */
    //是否只是ANALYZE(沒有實際的vacuum)
    bool        analyze_only;   /* ANALYZE (without any actual vacuum) */
    //如為T,則num_heap_tuples是一個估算值
    bool        estimated_count;    /* num_heap_tuples is an estimate */
    //進度信息的日志等級
    int         message_level;  /* ereport level for progress messages */
    //在堆中仍存在的元組數
    double      num_heap_tuples;    /* tuples remaining in heap */
    //訪問策略
    BufferAccessStrategy strategy;  /* access strategy for reads */
} IndexVacuumInfo;

IndexBulkDeleteResult
ambulkdelete/amvacuumcleanup返回的統計信息結構體


/*
 * Struct for statistics returned by ambulkdelete and amvacuumcleanup
 * ambulkdelete/amvacuumcleanup返回的統計信息結構體
 * 
 * This struct is normally allocated by the first ambulkdelete call and then
 * passed along through subsequent ones until amvacuumcleanup; however,
 * amvacuumcleanup must be prepared to allocate it in the case where no
 * ambulkdelete calls were made (because no tuples needed deletion).
 * Note that an index AM could choose to return a larger struct
 * of which this is just the first field; this provides a way for ambulkdelete
 * to communicate additional private data to amvacuumcleanup.
 * 該結構體通常由第一個ambulkdelete調用分配內存,傳遞到下一個處理過程,直至amvacuumcleanup;
 * 但是,在ambulkdelete沒有調用時,amvacuumcleanup必須預分配(因為沒有元組需要刪除).
 * 注意索引訪問方法(AM)可以選擇返回一個更大的結構體,而該結構體是這個更大的結構體的第一個域;
 * 這為ambulkdelete提供了一個方法用于與需要額外私有數據的amvacuumcleanup函數通訊.
 *
 * Note: pages_removed is the amount by which the index physically shrank,
 * if any (ie the change in its total size on disk).  pages_deleted and
 * pages_free refer to free space within the index file.  Some index AMs
 * may compute num_index_tuples by reference to num_heap_tuples, in which
 * case they should copy the estimated_count field from IndexVacuumInfo.
 * 注意:pages_remove是索引物理收縮(shrank)的數量,如果有的話(即它在磁盤上的總大小的變化)。
 * pages_deleted和pages_free指的是索引文件中的空閑空間.
 * 某些索引訪問方法(AMs)可能通過參考num_heap_tuples計算num_index_tuples,
 *   在這種情況下會拷貝從IndexVacuumInfo中拷貝estimated_count域.
 */
typedef struct IndexBulkDeleteResult
{
    //index中剩下的pages
    BlockNumber num_pages;      /* pages remaining in index */
    //在vacuum期間清除的元組數
    BlockNumber pages_removed;  /* # removed during vacuum operation */
    //num_index_tuples是一個估算值?
    bool        estimated_count;    /* num_index_tuples is an estimate */
    //剩余的元組數
    double      num_index_tuples;   /* tuples remaining */
    //在vacuum期間清除的元組數
    double      tuples_removed; /* # removed during vacuum operation */
    //索引中未使用的pages
    BlockNumber pages_deleted;  /* # unused pages in index */
    //可重用的pages
    BlockNumber pages_free;     /* # pages available for reuse */
} IndexBulkDeleteResult;

BTPageOpaque
在每個頁面的尾部,我們存儲了一個指針用于指向樹中的兄弟


/*
 *  BTPageOpaqueData -- At the end of every page, we store a pointer
 *  to both siblings in the tree.  This is used to do forward/backward
 *  index scans.  The next-page link is also critical for recovery when
 *  a search has navigated to the wrong page due to concurrent page splits
 *  or deletions; see src/backend/access/nbtree/README for more info.
 *  BTPageOpaqueData -- 在每個頁面的尾部,我們存儲了一個指針用于指向樹中的兄弟.
 *  這用于執行正向/反向索引掃描。
 *  當搜索由于并發的頁面分裂或刪除而導航到錯誤的頁面時,
 *    下一頁鏈接對于恢復也非常關鍵;
 *  有關更多信息,請參見src/backend/access/nbtree/README。 
 *
 *  In addition, we store the page's btree level (counting upwards from
 *  zero at a leaf page) as well as some flag bits indicating the page type
 *  and status.  If the page is deleted, we replace the level with the
 *  next-transaction-ID value indicating when it is safe to reclaim the page.
 *  此外,我們存儲頁面的btree級別(在葉子頁面上從0開始計數)以及一些標志位,
 *    這些標志位指示頁面類型和狀態。
 *  如果頁面被刪除,我們將用next-transaction-ID值替換該級別,該值指示何時可以安全地回收頁面。
 *
 *  We also store a "vacuum cycle ID".  When a page is split while VACUUM is
 *  processing the index, a nonzero value associated with the VACUUM run is
 *  stored into both halves of the split page.  (If VACUUM is not running,
 *  both pages receive zero cycleids.)  This allows VACUUM to detect whether
 *  a page was split since it started, with a small probability of false match
 *  if the page was last split some exact multiple of MAX_BT_CYCLE_ID VACUUMs
 *  ago.  Also, during a split, the BTP_SPLIT_END flag is cleared in the left
 *  (original) page, and set in the right page, but only if the next page
 *  to its right has a different cycleid.
 *  我們同樣會存儲"vacuum cycle ID".當頁面在vacuum處理,索引被分裂時,
 *    與vacuum運行相關的非零值存儲在分裂頁面的兩個部分中。
 *  (如果VACUUM沒有運行,兩個頁面都接收到的cycleid均為0)
 *  這允許VACUUM檢測頁面是否在開始時就被分裂了,
 *    如果頁面上次分裂的時間恰好是MAX_BT_CYCLE_ID VACUUMs值的倍數,則有很小的可能出現錯誤匹配。
 *  此外,在分裂期間,BTP_SPLIT_END標記在左側(原始)頁面中被清除,
 *    并在右側頁面中設置,但只有在其右側的下一頁具有不同的cycleid時才會這樣做。
 *
 *  NOTE: the BTP_LEAF flag bit is redundant since level==0 could be tested
 *  instead.
 * 注意:BTP_LEAF標記為是冗余的,因為level==0可以被測試.
 */
typedef struct BTPageOpaqueData
{
    //左兄弟,如為最左邊的節點,則為P_NONE
    BlockNumber btpo_prev;      /* left sibling, or P_NONE if leftmost */
    //右兄弟,如為最右邊的節點,則為P_NONE
    BlockNumber btpo_next;      /* right sibling, or P_NONE if rightmost */
    union
    {
        //樹層次,如為葉子節點,則為0
        uint32      level;      /* tree level --- zero for leaf pages */
        //如已刪除,記錄下一個事務ID
        TransactionId xact;     /* next transaction ID, if deleted */
    }           btpo;//聯合體
    //標記位
    uint16      btpo_flags;     /* flag bits, see below */
    //最后一次分裂的vacuum cycle ID
    BTCycleId   btpo_cycleid;   /* vacuum cycle ID of latest split */
} BTPageOpaqueData;
typedef BTPageOpaqueData *BTPageOpaque;
/* Bits defined in btpo_flags */
#define BTP_LEAF        (1 << 0)    /* leaf page, i.e. not internal page */
#define BTP_ROOT        (1 << 1)    /* root page (has no parent) */
#define BTP_DELETED     (1 << 2)    /* page has been deleted from tree */
#define BTP_META        (1 << 3)    /* meta-page */
#define BTP_HALF_DEAD   (1 << 4)    /* empty, but still in tree */
#define BTP_SPLIT_END   (1 << 5)    /* rightmost page of split group */
#define BTP_HAS_GARBAGE (1 << 6)    /* page has LP_DEAD tuples */
#define BTP_INCOMPLETE_SPLIT (1 << 7)   /* right sibling's downlink is missing */

二、源碼解讀

lazy_vacuum_index->index_bulk_delete->…btbulkdelete->btvacuumscan
btvacuumscan掃描索引,執行vacuum
該函數的功能包括:
A.搜索符合vacuum callback條件的已刪除的葉子元組;
B.搜索可刪除的空頁;
C.搜索老舊已刪除可被回收的頁面.
btbulkdelete和btvacuumcleanup函數都會調用該過程(后者僅在沒有發生btbulkdelete調用時才會發生)

其主要處理邏輯如下:
1.初始化統計信息(IndexBulkDeleteResult結構體)
2.初始化vstate狀態信息(BTVacState結構體)
3.構造臨時上下文
4.循環遍歷page
4.1獲取relation鎖
4.2遍歷block,執行btvacuumpage
4.3如需要,多次遍歷relation
5.WAL Record處理
6.刪除臨時上下文
7.處理空閑空間
8.更新統計信息


/*
 * btvacuumscan --- scan the index for VACUUMing purposes
 * btvacuumscan --- 為VACUUMing掃描索引
 *
 * This combines the functions of looking for leaf tuples that are deletable
 * according to the vacuum callback, looking for empty pages that can be
 * deleted, and looking for old deleted pages that can be recycled.  Both
 * btbulkdelete and btvacuumcleanup invoke this (the latter only if no
 * btbulkdelete call occurred).
 * 該函數的功能包括:
 *    A.搜索符合vacuum callback條件的已刪除的葉子元組;
 *    B.搜索可刪除的空頁;
 *    C.搜索老舊已刪除可被回收的頁面.
 * btbulkdelete和btvacuumcleanup函數都會調用該過程(后者僅在沒有發生btbulkdelete調用時才會發生)
 *
 * The caller is responsible for initially allocating/zeroing a stats struct
 * and for obtaining a vacuum cycle ID if necessary.
 * 調用者有責任初始化分配或者歸零統計結構體,如需要獲取一個vacuum cycle ID.
 */
static void
btvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
             IndexBulkDeleteCallback callback, void *callback_state,
             BTCycleId cycleid, TransactionId *oldestBtpoXact)
{
    Relation    rel = info->index;
    BTVacState  vstate;
    BlockNumber num_pages;
    BlockNumber blkno;
    bool        needLock;
    /*
     * Reset counts that will be incremented during the scan; needed in case
     * of multiple scans during a single VACUUM command
     * 在掃描重置計數會被增加,在單個VACUUM命令期間多次掃描時需要.
     */
    stats->estimated_count = false;
    stats->num_index_tuples = 0;
    stats->pages_deleted = 0;
    /* Set up info to pass down to btvacuumpage */
    //設置傳遞給btvacuumpage函數的參數
    vstate.info = info;
    vstate.stats = stats;
    vstate.callback = callback;
    vstate.callback_state = callback_state;
    vstate.cycleid = cycleid;
    vstate.lastBlockVacuumed = BTREE_METAPAGE;  /* Initialise at first block */
    vstate.lastBlockLocked = BTREE_METAPAGE;
    vstate.totFreePages = 0;
    vstate.oldestBtpoXact = InvalidTransactionId;
    /* Create a temporary memory context to run _bt_pagedel in */
    //創建臨時內存上下文用于運行_bt_pagedel
    vstate.pagedelcontext = AllocSetContextCreate(CurrentMemoryContext,
                                                  "_bt_pagedel",
                                                  ALLOCSET_DEFAULT_SIZES);
    /*
     * The outer loop iterates over all index pages except the metapage, in
     * physical order (we hope the kernel will cooperate in providing
     * read-ahead for speed).  It is critical that we visit all leaf pages,
     * including ones added after we start the scan, else we might fail to
     * delete some deletable tuples.  Hence, we must repeatedly check the
     * relation length.  We must acquire the relation-extension lock while
     * doing so to avoid a race condition: if someone else is extending the
     * relation, there is a window where bufmgr/smgr have created a new
     * all-zero page but it hasn't yet been write-locked by _bt_getbuf(). If
     * we manage to scan such a page here, we'll improperly assume it can be
     * recycled.  Taking the lock synchronizes things enough to prevent a
     * problem: either num_pages won't include the new page, or _bt_getbuf
     * already has write lock on the buffer and it will be fully initialized
     * before we can examine it.  (See also vacuumlazy.c, which has the same
     * issue.)  Also, we need not worry if a page is added immediately after
     * we look; the page splitting code already has write-lock on the left
     * page before it adds a right page, so we must already have processed any
     * tuples due to be moved into such a page.
     * 外部循環按照物理順序遍歷除元數據頁面之外的所有索引頁
     *   (我們希望內核能夠配合提供預讀以提高性能)。
     * 至關重要的是,我們要訪問所有頁,包括在開始掃描后添加的頁,否則可能無法刪除一些可刪除的元組。
     * 因此,我們必須反復檢查關系的大小。我們必須在獲取關系擴展鎖的同時避免競爭條件:
     *   如果其他人正在擴展該關系,則會出現一個窗口,其中bufmgr/smgr創建了一個新的初始化頁面(全0),
     *   但是_bt_getbuf()尚未對其進行寫鎖定。
     * 如果我們成功地掃描了這樣一個頁面,我們就會錯誤地認為它可以被回收。
     * 使用鎖可以同步足夠的信息以防止出現此類問題:num_pages不包含新頁面,
     *   或者_bt_getbuf已經在緩沖區上有寫鎖,在我們檢查它之前,它將被完全初始化。
     * (參見vacuumlazy.c,里面有相同的主題和內容).
     * 此外,如果在查看后立即新增頁面,也無需擔心;
     *   在添加右頁之前,頁面分割代碼已經在左頁上設置了寫鎖,
     *   因此我們必須已經處理了將被移動到此類頁中的所有元組。
     *
     * We can skip locking for new or temp relations, however, since no one
     * else could be accessing them.
     * 對于新的或臨時relations可以跳過鎖定,因為其他進程無法訪問這些relations.
     */
    //是否需要鎖定?
    needLock = !RELATION_IS_LOCAL(rel);
    //索引relation的第一個頁是元數據頁,需要跳過
    blkno = BTREE_METAPAGE + 1;
    for (;;)//循環
    {
        /* Get the current relation length */
        //獲取當前relation的大小
        if (needLock)
            //如需要鎖,則鎖定
            LockRelationForExtension(rel, ExclusiveLock);
        //獲取pages
        num_pages = RelationGetNumberOfBlocks(rel);
        if (needLock)
            //解鎖
            UnlockRelationForExtension(rel, ExclusiveLock);
        /* Quit if we've scanned the whole relation */
        //如果已掃描了整個relation,則Quit
        if (blkno >= num_pages)
            break;
        /* Iterate over pages, then loop back to recheck length */
        //迭代掃描pages,然后回過頭來重新檢查大小
        for (; blkno < num_pages; blkno++)
        {
            btvacuumpage(&vstate, blkno, blkno);
        }
    }
    /*
     * Check to see if we need to issue one final WAL record for this index,
     * which may be needed for correctness on a hot standby node when non-MVCC
     * index scans could take place.
     * 檢查我們是否需要為這個索引發出最后一條WAL記錄,
     *   當可以進行非MVCC索引掃描時,可能需要在熱備節點上正確地發出這條記錄。
     *
     * If the WAL is replayed in hot standby, the replay process needs to get
     * cleanup locks on all index leaf pages, just as we've been doing here.
     * However, we won't issue any WAL records about pages that have no items
     * to be deleted.  For pages between pages we've vacuumed, the replay code
     * will take locks under the direction of the lastBlockVacuumed fields in
     * the XLOG_BTREE_VACUUM WAL records.  To cover pages after the last one
     * we vacuum, we need to issue a dummy XLOG_BTREE_VACUUM WAL record
     * against the last leaf page in the index, if that one wasn't vacuumed.
     * 如果在熱備份中重放WAL,重放過程需要在所有索引頁上獲得清理鎖,就像我們在這里所做的那樣。
     * 但是,我們不會發布任何關于沒有要刪除項的頁面的WAL記錄。
     * 對于在已vacuumed頁面之間的頁面,
     *   重放代碼將在XLOG_BTREE_VACUUM WAL記錄中的lastBlockVacuumed字段下進行鎖定。
     * 要覆蓋最后一個vacuumed頁面之后的頁面,
     *   我們需要對索引中的最后一個葉子頁面發出一個虛擬的XLOG_BTREE_VACUUM WAL記錄,
     *   如果這個頁面沒有vacuumed的話。
     */
    if (XLogStandbyInfoActive() &&
        vstate.lastBlockVacuumed < vstate.lastBlockLocked)
    {
        Buffer      buf;
        /*
         * The page should be valid, but we can't use _bt_getbuf() because we
         * want to use a nondefault buffer access strategy.  Since we aren't
         * going to delete any items, getting cleanup lock again is probably
         * overkill, but for consistency do that anyway.
         * 頁面應該是有效的,但是我們不能使用_bt_getbuf(),
         *   因為我們想使用非默認的緩沖區訪問策略。
         * 因為我們不打算刪除任何項,所以再次獲得清理鎖可能有點過頭,但為了一致性,還是要這樣做。
         */
        buf = ReadBufferExtended(rel, MAIN_FORKNUM, vstate.lastBlockLocked,
                                 RBM_NORMAL, info->strategy);
        LockBufferForCleanup(buf);
        _bt_checkpage(rel, buf);
        _bt_delitems_vacuum(rel, buf, NULL, 0, vstate.lastBlockVacuumed);
        _bt_relbuf(rel, buf);
    }
    //刪除臨時上下文
    MemoryContextDelete(vstate.pagedelcontext);
    /*
     * If we found any recyclable pages (and recorded them in the FSM), then
     * forcibly update the upper-level FSM pages to ensure that searchers can
     * find them.  It's possible that the pages were also found during
     * previous scans and so this is a waste of time, but it's cheap enough
     * relative to scanning the index that it shouldn't matter much, and
     * making sure that free pages are available sooner not later seems
     * worthwhile.
     * 如果我們發現任何可回收的頁面(并將其記錄在FSM中),則強制更新高級FSM頁面,以確保能夠找到它們。
     * 有可能這些頁面也是在以前的掃描中找到的,所以這是浪費時間,
     *   但是相對于掃描索引來說,這個操作的成本足夠低了,所以它不太重要,
     *   并且確保空閑頁面是盡早可用的,而不是以后才有用。
     *
     * Note that if no recyclable pages exist, we don't bother vacuuming the
     * FSM at all.
     * 注意:如果沒有可回收的頁面,不需要糾結是否需要vacuuming FSM
     */
    if (vstate.totFreePages > 0)
        //處理空閑空間
        IndexFreeSpaceMapVacuum(rel);
    /* update statistics */
    //更新統計信息
    stats->num_pages = num_pages;
    stats->pages_free = vstate.totFreePages;
    if (oldestBtpoXact)
        *oldestBtpoXact = vstate.oldestBtpoXact;
}

lazy_vacuum_index->index_bulk_delete->…btbulkdelete->btvacuumscan->btvacuumpage
btvacuumpage —- VACUUM頁面
btvacuumscan()調用該過程處理單個頁面.在某些情況下,必須回過頭來重新檢查先前已掃描過的頁面,該過程在需要的時候遞歸處理這種情況.

其主要處理邏輯如下:
1.初始化相關變量
2.調用ReadBufferExtended讀取block到buffer中,鎖定buffer,獲取page
3.如果不是new page,則執行檢查并獲取BTPageOpaque
4.如塊號與原始不同,正在進行遞歸處理,如page可回收或者可忽略或者不是葉子節點或者cycleid不同,則調用_bt_relbuf,返回;否則繼續往下執行
5.執行相關判斷
5.1如page可回收,則回收頁面
5.2如page已刪除,但不能回收,則更新統計信息
5.3如page為Half-dead,則嘗試刪除(設置delete_now標記為T)
5.4如page為葉子節點
5.4.1初始化變量
5.4.2鎖定緩沖區
5.4.3記錄已取得cleanup lock的最大葉子頁編號
5.4.4檢查我們是否需要遞歸回先前的頁面
5.4.5掃描所有條目,看看哪些根據回調函數得到的需要刪除的條目(寫入到deletable數組中)
5.4.6如數組不為空,則調用_bt_delitems_vacuum,記錄相關信息
如數組為空,判斷頁面是否在這個vacuum cycle中被分裂,清除btpo_cycleid標記,標記緩沖區為臟
5.4.7如果page為空,則試著刪除(設置delete_now為T);否則計算活動元組
5.4.8如試著刪除(delete_now為T),則調用_bt_pagedel刪除,更新統計信息
否則調用_bt_relbuf
5.4.9判斷recurse_to != P_NONE,如T,則重新啟動,否則退出


/*
 * btvacuumpage --- VACUUM one page
 * btvacuumpage --- VACUUM頁面
 * 
 * This processes a single page for btvacuumscan().  In some cases we
 * must go back and re-examine previously-scanned pages; this routine
 * recurses when necessary to handle that case.
 * btvacuumscan()調用該過程處理單個頁面.
 * 在某些情況下,必須回過頭來重新檢查先前已掃描過的頁面,該過程在需要的時候遞歸處理這種情況.
 *
 * blkno is the page to process.  orig_blkno is the highest block number
 * reached by the outer btvacuumscan loop (the same as blkno, unless we
 * are recursing to re-examine a previous page).
 * blkno是需處理的頁面.orig_blkno是外層btvacuumscan循環最大的塊號
 * (與blkno一樣,除非我們需要遞歸檢查先前的頁面)
 */
static void
btvacuumpage(BTVacState *vstate, BlockNumber blkno, BlockNumber orig_blkno)
{
    IndexVacuumInfo *info = vstate->info;//IndexVacuumInfo
    IndexBulkDeleteResult *stats = vstate->stats;//統計信息
    //typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, void *state);
    IndexBulkDeleteCallback callback = vstate->callback;//回調函數
    void       *callback_state = vstate->callback_state;//回調函數狀態
    Relation    rel = info->index;//index relation
    bool        delete_now;//現在刪除?
    BlockNumber recurse_to;//遞歸處理的block
    Buffer      buf;//buffer
    Page        page;//page
    BTPageOpaque opaque = NULL;//
restart:
    delete_now = false;
    recurse_to = P_NONE;
    /* call vacuum_delay_point while not holding any buffer lock */
    //在沒有持有任何buffer lock時,調用vacuum_delay_point
    vacuum_delay_point();
    /*
     * We can't use _bt_getbuf() here because it always applies
     * _bt_checkpage(), which will barf on an all-zero page. We want to
     * recycle all-zero pages, not fail.  Also, we want to use a nondefault
     * buffer access strategy.
     * 在這里不能使用_bt_getbuf()函數,因為該函數通常會調用_bt_checkpage(),
     *   該函數會braf on剛被初始化的page上.
     * 我們希望成功重用all-zero pages.
     * 而且,我們希望使用非默認的buffer訪問策略.
     */
    buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
                             info->strategy);
    LockBuffer(buf, BT_READ);
    page = BufferGetPage(buf);
    if (!PageIsNew(page))
    {
        _bt_checkpage(rel, buf);
        opaque = (BTPageOpaque) PageGetSpecialPointer(page);
    }
    /*
     * If we are recursing, the only case we want to do anything with is a
     * live leaf page having the current vacuum cycle ID.  Any other state
     * implies we already saw the page (eg, deleted it as being empty).
     * 如果我們正在遞歸處理,需要處理的唯一情況是存在當前的vacuum cycle ID的活動葉子頁節點。
     * 任何其他狀態都意味著我們已經看到該頁面(例如,刪除后視為空)。
     */
    if (blkno != orig_blkno)
    {
        //block編號已改變
        if (_bt_page_recyclable(page) ||
            P_IGNORE(opaque) ||
            !P_ISLEAF(opaque) ||
            opaque->btpo_cycleid != vstate->cycleid)
        {
            _bt_relbuf(rel, buf);
            return;
        }
    }
    /* Page is valid, see what to do with it */
    //Page有效,看看需要做些什么
    if (_bt_page_recyclable(page))
    {
        /* Okay to recycle this page */
        //可以回收該頁面了
        RecordFreeIndexPage(rel, blkno);
        vstate->totFreePages++;
        stats->pages_deleted++;
    }
    else if (P_ISDELETED(opaque))
    {
        /* Already deleted, but can't recycle yet */
        //該page已被刪除,但不能回收
        //更新統計信息
        stats->pages_deleted++;
        /* Update the oldest btpo.xact */
        //更新最舊的btpo.xact
        if (!TransactionIdIsValid(vstate->oldestBtpoXact) ||
            TransactionIdPrecedes(opaque->btpo.xact, vstate->oldestBtpoXact))
            vstate->oldestBtpoXact = opaque->btpo.xact;
    }
    else if (P_ISHALFDEAD(opaque))
    {
        /* Half-dead, try to delete */
        //Half-dead,嘗試刪除
        delete_now = true;
    }
    else if (P_ISLEAF(opaque))
    {
        //------- 葉子節點
        //偏移數組
        OffsetNumber deletable[MaxOffsetNumber];
        int         ndeletable;//計數
        OffsetNumber offnum,
                    minoff,
                    maxoff;
        /*
         * Trade in the initial read lock for a super-exclusive write lock on
         * this page.  We must get such a lock on every leaf page over the
         * course of the vacuum scan, whether or not it actually contains any
         * deletable tuples --- see nbtree/README.
         * 將初始讀鎖轉換為此頁上的超級獨占寫鎖。
         * 在vacuum掃描過程中,我們必須在每個葉子頁上獲得這樣的鎖,
         *   不管它是否包含任何可刪除的元組——參見nbtree/README。
         */
        LockBuffer(buf, BUFFER_LOCK_UNLOCK);
        LockBufferForCleanup(buf);
        /*
         * Remember highest leaf page number we've taken cleanup lock on; see
         * notes in btvacuumscan
         * 記錄已取得cleanup lock的最大葉子頁編號,詳見btvacuumscan注釋
         */
        if (blkno > vstate->lastBlockLocked)
            vstate->lastBlockLocked = blkno;
        /*
         * Check whether we need to recurse back to earlier pages.  What we
         * are concerned about is a page split that happened since we started
         * the vacuum scan.  If the split moved some tuples to a lower page
         * then we might have missed 'em.  If so, set up for tail recursion.
         * (Must do this before possibly clearing btpo_cycleid below!)
         * 檢查我們是否需要遞歸回先前的頁面.
         * 我們所關心的是自開始vacuum掃描以來發生的索引頁面分裂.
         * 如果分裂將一些元組移動到層次較低的頁面,那么我們可能會錯過它們。
         * (在可能清除下面的btpo_cycleid之前,必須這樣做!)
         */
        if (vstate->cycleid != 0 &&
            opaque->btpo_cycleid == vstate->cycleid &&
            !(opaque->btpo_flags & BTP_SPLIT_END) &&
            !P_RIGHTMOST(opaque) &&
            opaque->btpo_next < orig_blkno)
            recurse_to = opaque->btpo_next;
        /*
         * Scan over all items to see which ones need deleted according to the
         * callback function.
         * 掃描所有條目,看看哪些根據回調函數得到的需要刪除的條目。
         */
        ndeletable = 0;
        //最小偏移
        minoff = P_FIRSTDATAKEY(opaque);
        //最大偏移
        maxoff = PageGetMaxOffsetNumber(page);
        if (callback)
        {
            //存在回調函數
            for (offnum = minoff;
                 offnum <= maxoff;
                 offnum = OffsetNumberNext(offnum))
            {
                //從小到大遍歷偏移
                IndexTuple  itup;//索引元組
                ItemPointer htup;//行指針
                //獲取索引元組
                itup = (IndexTuple) PageGetItem(page,
                                                PageGetItemId(page, offnum));
                htup = &(itup->t_tid);//獲取行指針
                /*
                 * During Hot Standby we currently assume that
                 * XLOG_BTREE_VACUUM records do not produce conflicts. That is
                 * only true as long as the callback function depends only
                 * upon whether the index tuple refers to heap tuples removed
                 * in the initial heap scan. When vacuum starts it derives a
                 * value of OldestXmin. Backends taking later snapshots could
                 * have a RecentGlobalXmin with a later xid than the vacuum's
                 * OldestXmin, so it is possible that row versions deleted
                 * after OldestXmin could be marked as killed by other
                 * backends. The callback function *could* look at the index
                 * tuple state in isolation and decide to delete the index
                 * tuple, though currently it does not. If it ever did, we
                 * would need to reconsider whether XLOG_BTREE_VACUUM records
                 * should cause conflicts. If they did cause conflicts they
                 * would be fairly harsh conflicts, since we haven't yet
                 * worked out a way to pass a useful value for
                 * latestRemovedXid on the XLOG_BTREE_VACUUM records. This
                 * applies to *any* type of index that marks index tuples as
                 * killed.
                 * 在熱備份期間,目前假設XLOG_BTREE_VACUUM記錄不會產生沖突。
                 * 只有當回調函數僅依賴于索引元組是否引用在初始堆掃描中刪除的堆元組時,這種情況才成立。
                 * 當vacuum開始時,它得到一個OldestXmin的值。
                 * 拍攝較晚快照的后臺進程可能具有一個RecentGlobalXmin,其xid比vacuum的最老的xmin還要晚,
                 *   因此,在OldestXmin之后刪除的行版本可能被其他后臺進程標記為已刪除。
                 * 回調函數*可以*單獨查看索引元組的狀態,并決定刪除索引元組,盡管目前沒有。
                 * 如有,我們需要重新考慮XLOG_BTREE_VACUUM記錄是否應該引起沖突.
                 * 如果它們確實導致沖突,那將是相當嚴重的沖突,
                 *   因為我們還沒有找到在XLOG_BTREE_VACUUM記錄上傳遞latestRemovedXid的有用值的方法。
                 * 這適用于任何將索引元組標記為killed的索引類型。
                 */
                if (callback(htup, callback_state))
                    //回調函數返回T,寫入數組中
                    deletable[ndeletable++] = offnum;
            }
        }
        /*
         * Apply any needed deletes.  We issue just one _bt_delitems_vacuum()
         * call per page, so as to minimize WAL traffic.
         * 應用需要的刪除.
         * 我們每個頁面只發出一個_bt_delitems_vacuum()調用,以便最小化WAL流量。
         */
        if (ndeletable > 0)
        {
            //--------------- 如deletable數組不為空
            /*
             * Notice that the issued XLOG_BTREE_VACUUM WAL record includes
             * all information to the replay code to allow it to get a cleanup
             * lock on all pages between the previous lastBlockVacuumed and
             * this page. This ensures that WAL replay locks all leaf pages at
             * some point, which is important should non-MVCC scans be
             * requested. This is currently unused on standby, but we record
             * it anyway, so that the WAL contains the required information.
             * 請注意,已發布的XLOG_BTREE_VACUUM WAL記錄包含重放代碼的所有信息,
             *   以允許重放代碼在上一個lastblockvacuum和這個頁面之間的所有頁面上獲得清理鎖。
             * 這確保了WAL replay在某個時刻鎖定所有的葉頁面,這一點在請求非mvcc掃描時非常重要。
             * 這在待機狀態下目前是未使用的,但是我們會記錄它,以便WAL包含所需的信息。
             *
             * Since we can visit leaf pages out-of-order when recursing,
             * replay might end up locking such pages an extra time, but it
             * doesn't seem worth the amount of bookkeeping it'd take to avoid
             * that.
             * 因為在遞歸處理時,我們可以無序地訪問葉子頁面,
             *   所以重放可能會額外地鎖定這些頁面,但是似乎不值得為此花費大量的bookkeeping時間。
             */
            _bt_delitems_vacuum(rel, buf, deletable, ndeletable,
                                vstate->lastBlockVacuumed);
            /*
             * Remember highest leaf page number we've issued a
             * XLOG_BTREE_VACUUM WAL record for.
             * 記住我們已經生成LOG_BTREE_VACUUM WAL record的最大葉子頁面編號
             */
            if (blkno > vstate->lastBlockVacuumed)
                vstate->lastBlockVacuumed = blkno;
            stats->tuples_removed += ndeletable;
            /* must recompute maxoff */
            //重新計算maxoff
            maxoff = PageGetMaxOffsetNumber(page);
        }
        else
        {
            /*
             * If the page has been split during this vacuum cycle, it seems
             * worth expending a write to clear btpo_cycleid even if we don't
             * have any deletions to do.  (If we do, _bt_delitems_vacuum takes
             * care of this.)  This ensures we won't process the page again.
             * 如果頁面在這個vacuum cycle中被分裂,
             *   那么即使我們沒有任何刪除工作要做,但似乎也值得花費一次寫操作來清除btpo_cycleid。
             * (如果我們這樣做,_bt_delitems_vacuum負責處理這個問題。)
             * 這確保我們不會再次處理該頁面。
             *
             * We treat this like a hint-bit update because there's no need to
             * WAL-log it.
             * 進行這個處理如同hint-bit更新,因為不需要記錄WAL Record.
             */
            if (vstate->cycleid != 0 &&
                opaque->btpo_cycleid == vstate->cycleid)
            {
                opaque->btpo_cycleid = 0;
                MarkBufferDirtyHint(buf, true);
            }
        }
        /*
         * If it's now empty, try to delete; else count the live tuples. We
         * don't delete when recursing, though, to avoid putting entries into
         * freePages out-of-order (doesn't seem worth any extra code to handle
         * the case).
         * 如果它現在是空的,試著刪除;否則計算活動元組。
         * 但是,在遞歸時我們不會刪除,
         *   以避免將條目無序地放入freePages中
         * (似乎不值得使用任何額外的代碼來處理這種情況)。
         */
        if (minoff > maxoff)
            delete_now = (blkno == orig_blkno);
        else
            stats->num_index_tuples += maxoff - minoff + 1;
    }
    if (delete_now)
    {
        MemoryContext oldcontext;
        int         ndel;
        /* Run pagedel in a temp context to avoid memory leakage */
        //在臨時內存上下文中執行pagedel避免內存泄漏
        MemoryContextReset(vstate->pagedelcontext);
        oldcontext = MemoryContextSwitchTo(vstate->pagedelcontext);
        ndel = _bt_pagedel(rel, buf);
        /* count only this page, else may double-count parent */
        //只對該頁面進行計數,否則會雙倍計算父節點
        if (ndel)
        {
            stats->pages_deleted++;
            if (!TransactionIdIsValid(vstate->oldestBtpoXact) ||
                TransactionIdPrecedes(opaque->btpo.xact, vstate->oldestBtpoXact))
                vstate->oldestBtpoXact = opaque->btpo.xact;
        }
        MemoryContextSwitchTo(oldcontext);
        /* pagedel released buffer, so we shouldn't */
        //pagedel會釋放緩存,在這里不需要做這個事情
    }
    else
        _bt_relbuf(rel, buf);
    /*
     * This is really tail recursion, but if the compiler is too stupid to
     * optimize it as such, we'd eat an uncomfortably large amount of stack
     * space per recursion level (due to the deletable[] array). A failure is
     * improbable since the number of levels isn't likely to be large ... but
     * just in case, let's hand-optimize into a loop.
     * 這實際上是尾部遞歸,但是如果編譯器笨到無法對其進行優化,
     *   那么每個遞歸級別都會消耗大量堆棧空間(由于deletable[]數組的存在)。
     * 失敗是不可能的,因為級別的數量不大……但以防萬一,我們手工優化成一個循環。
     */
    if (recurse_to != P_NONE)
    {
        blkno = recurse_to;
        goto restart;
    }
}

lazy_tid_reaped
回調函數,調用系統函數bsearch檢查tid是否可以被刪除?


/*
 *  lazy_tid_reaped() -- is a particular tid deletable?
 *
 *      This has the right signature to be an IndexBulkDeleteCallback.
 *
 *      Assumes dead_tuples array is in sorted order.
 */
static bool
lazy_tid_reaped(ItemPointer itemptr, void *state)
{
    LVRelStats *vacrelstats = (LVRelStats *) state;
    ItemPointer res;
    //vac_cmp_itemptr是比較函數
    res = (ItemPointer) bsearch((void *) itemptr,
                                (void *) vacrelstats->dead_tuples,
                                vacrelstats->num_dead_tuples,
                                sizeof(ItemPointerData),
                                vac_cmp_itemptr);
    return (res != NULL);
}
/*
 * Comparator routines for use with qsort() and bsearch().
 * qsort()和bsearch()使用的比較函數
 * 比較塊號和塊內偏移,如一致則返回0,否則left < right,返回-1;left > right,返回1.
 */
static int
vac_cmp_itemptr(const void *left, const void *right)
{
    BlockNumber lblk,
                rblk;
    OffsetNumber loff,
                roff;
    lblk = ItemPointerGetBlockNumber((ItemPointer) left);
    rblk = ItemPointerGetBlockNumber((ItemPointer) right);
    if (lblk < rblk)
        return -1;
    if (lblk > rblk)
        return 1;
    loff = ItemPointerGetOffsetNumber((ItemPointer) left);
    roff = ItemPointerGetOffsetNumber((ItemPointer) right);
    if (loff < roff)
        return -1;
    if (loff > roff)
        return 1;
    return 0;
}

三、跟蹤分析

測試腳本 : 刪除數據,執行vacuum


14:24:12 (xdb@[local]:5432)testdb=# delete from t1 where id < 1300;
DELETE 100
14:24:23 (xdb@[local]:5432)testdb=# checkpoint;
CHECKPOINT
14:24:26 (xdb@[local]:5432)testdb=# 
14:25:28 (xdb@[local]:5432)testdb=# vacuum verbose t1;

btvacuumscan
啟動gdb,設置斷點


(gdb) b btvacuumscan
Breakpoint 1 at 0x509951: file nbtree.c, line 959.
(gdb) c
Continuing.
Breakpoint 1, btvacuumscan (info=0x7ffd33d29b70, stats=0x23ea988, callback=0x6bf507 <lazy_tid_reaped>, 
    callback_state=0x23eaaf8, cycleid=37964, oldestBtpoXact=0x7ffd33d29a40) at nbtree.c:959
959     Relation    rel = info->index;
(gdb)

輸入參數


(gdb) p *info
$1 = {index = 0x7f6b76bcc688, analyze_only = false, estimated_count = true, message_level = 17, num_heap_tuples = 14444, 
  strategy = 0x2413708}
(gdb) p *stats
$2 = {num_pages = 0, pages_removed = 0, estimated_count = false, num_index_tuples = 0, tuples_removed = 0, 
  pages_deleted = 0, pages_free = 0}
(gdb) p *oldestBtpoXact
$3 = 869440096
(gdb) 
(gdb) p (LVRelStats *)callback_state
$4 = (LVRelStats *) 0x23eaaf8
(gdb) p *(LVRelStats *)callback_state
$5 = {hasindex = true, old_rel_pages = 124, rel_pages = 124, scanned_pages = 52, pinskipped_pages = 0, 
  frozenskipped_pages = 1, tupcount_pages = 52, old_live_tuples = 14444, new_rel_tuples = 14840, new_live_tuples = 14840, 
  new_dead_tuples = 0, pages_removed = 0, tuples_deleted = 100, nonempty_pages = 124, num_dead_tuples = 100, 
  max_dead_tuples = 36084, dead_tuples = 0x7f6b76ad7050, num_index_scans = 0, latestRemovedXid = 397077, 
  lock_waiter_detected = false}

1.初始化統計信息(IndexBulkDeleteResult結構體)


(gdb) n
969     stats->estimated_count = false;
(gdb) 
970     stats->num_index_tuples = 0;
(gdb) 
971     stats->pages_deleted = 0;
(gdb)

2.初始化vstate狀態信息(BTVacState結構體)


974     vstate.info = info;
(gdb) 
975     vstate.stats = stats;
(gdb) 
976     vstate.callback = callback;
(gdb) 
977     vstate.callback_state = callback_state;
(gdb) 
978     vstate.cycleid = cycleid;
(gdb) 
979     vstate.lastBlockVacuumed = BTREE_METAPAGE;  /* Initialise at first block */
(gdb) 
980     vstate.lastBlockLocked = BTREE_METAPAGE;
(gdb) 
981     vstate.totFreePages = 0;
(gdb) 
982     vstate.oldestBtpoXact = InvalidTransactionId;
(gdb) 
(gdb) p vstate
$6 = {info = 0x7ffd33d29b70, stats = 0x23ea988, callback = 0x6bf507 <lazy_tid_reaped>, callback_state = 0x23eaaf8, 
  cycleid = 37964, lastBlockVacuumed = 0, lastBlockLocked = 0, totFreePages = 0, oldestBtpoXact = 0, 
  pagedelcontext = 0x23c1d00}

3.構造臨時上下文


985     vstate.pagedelcontext = AllocSetContextCreate(CurrentMemoryContext,
(gdb)

4.循環遍歷page
4.1獲取relation鎖


1012        needLock = !RELATION_IS_LOCAL(rel);
(gdb) p vstate
$6 = {info = 0x7ffd33d29b70, stats = 0x23ea988, callback = 0x6bf507 <lazy_tid_reaped>, callback_state = 0x23eaaf8, 
  cycleid = 37964, lastBlockVacuumed = 0, lastBlockLocked = 0, totFreePages = 0, oldestBtpoXact = 0, 
  pagedelcontext = 0x23c1d00}
(gdb) 
(gdb) n
1014        blkno = BTREE_METAPAGE + 1;
(gdb) 
1018            if (needLock)
(gdb) p needLock
$7 = true
(gdb) n
1019                LockRelationForExtension(rel, ExclusiveLock);
(gdb) 
1020            num_pages = RelationGetNumberOfBlocks(rel);
(gdb) 
1021            if (needLock)
(gdb) p num_pages
$8 = 60
(gdb) n
1022                UnlockRelationForExtension(rel, ExclusiveLock);
(gdb) 
1025            if (blkno >= num_pages)
(gdb) p blkno
$9 = 1
(gdb) n
1028            for (; blkno < num_pages; blkno++)
(gdb)

4.2遍歷block,執行btvacuumpage


(gdb) 
1030                btvacuumpage(&vstate, blkno, blkno);
(gdb) 
1028            for (; blkno < num_pages; blkno++)
(gdb) 
1030                btvacuumpage(&vstate, blkno, blkno);
(gdb)

4.3如需要,多次遍歷relation


(gdb) b nbtree.c:1018
Breakpoint 2 at 0x509a1f: file nbtree.c, line 1018.
(gdb) c
Continuing.
Breakpoint 2, btvacuumscan (info=0x7ffd33d29b70, stats=0x23ea988, callback=0x6bf507 <lazy_tid_reaped>, 
    callback_state=0x23eaaf8, cycleid=37964, oldestBtpoXact=0x7ffd33d29a40) at nbtree.c:1018
1018            if (needLock)
(gdb) n
1019                LockRelationForExtension(rel, ExclusiveLock);
(gdb) 
1020            num_pages = RelationGetNumberOfBlocks(rel);
(gdb) 
1021            if (needLock)
(gdb) 
1022                UnlockRelationForExtension(rel, ExclusiveLock);
(gdb) 
1025            if (blkno >= num_pages)
(gdb) p blkno
$11 = 60
(gdb) n
1026                break;
(gdb)

5.WAL Record處理


(gdb) n
1048        if (XLogStandbyInfoActive() &&
(gdb)

6.刪除臨時上下文


(gdb) 
1067        MemoryContextDelete(vstate.pagedelcontext);
(gdb)

7.處理空閑空間


(gdb) 
1081        if (vstate.totFreePages > 0)
(gdb) 
1082            IndexFreeSpaceMapVacuum(rel);
(gdb)

8.更新統計信息


(gdb) 
1085        stats->num_pages = num_pages;
(gdb) 
1086        stats->pages_free = vstate.totFreePages;
(gdb) 
1088        if (oldestBtpoXact)
(gdb) 
1089            *oldestBtpoXact = vstate.oldestBtpoXact;
(gdb) p oldestBtpoXact
$12 = (TransactionId *) 0x7ffd33d29a40
(gdb) p *oldestBtpoXact
$13 = 869440096
(gdb) p vstate.oldestBtpoXact
$14 = 397078
(gdb) n
1090    }
(gdb) p *stats
$15 = {num_pages = 60, pages_removed = 0, estimated_count = false, num_index_tuples = 8701, tuples_removed = 100, 
  pages_deleted = 7, pages_free = 6}
(gdb)

完成調用


(gdb) n
btbulkdelete (info=0x7ffd33d29b70, stats=0x23ea988, callback=0x6bf507 <lazy_tid_reaped>, callback_state=0x23eaaf8)
    at nbtree.c:880
880         _bt_update_meta_cleanup_info(info->index, oldestBtpoXact,
(gdb)

btvacuumpage


14:50:45 (xdb@[local]:5432)testdb=# vacuum verbose t1;
...............
(gdb) b btvacuumpage
Breakpoint 3 at 0x509b82: file nbtree.c, line 1106.
(gdb) 
(gdb) c
Continuing.
Breakpoint 3, btvacuumpage (vstate=0x7ffd33d298d0, blkno=1, orig_blkno=1) at nbtree.c:1106
1106        IndexVacuumInfo *info = vstate->info;
(gdb)

輸入參數


(gdb) p *vstate
$16 = {info = 0x7ffd33d29b70, stats = 0x24157e8, callback = 0x6bf507 <lazy_tid_reaped>, callback_state = 0x2415958, 
  cycleid = 37965, lastBlockVacuumed = 0, lastBlockLocked = 0, totFreePages = 0, oldestBtpoXact = 0, 
  pagedelcontext = 0x23ea7a0}
(gdb)

1.初始化相關變量


(gdb) n
1107        IndexBulkDeleteResult *stats = vstate->stats;
(gdb) 
1108        IndexBulkDeleteCallback callback = vstate->callback;
(gdb) 
1109        void       *callback_state = vstate->callback_state;
(gdb) 
1110        Relation    rel = info->index;
(gdb) 
1115        BTPageOpaque opaque = NULL;
(gdb) 
1118        delete_now = false;
(gdb) 
1119        recurse_to = P_NONE;
(gdb) 
1122        vacuum_delay_point();
(gdb) p *info
$17 = {index = 0x7f6b76b0c268, analyze_only = false, estimated_count = true, message_level = 17, num_heap_tuples = 14840, 
  strategy = 0x2403478}
(gdb) p *stats
$18 = {num_pages = 0, pages_removed = 0, estimated_count = false, num_index_tuples = 0, tuples_removed = 0, 
  pages_deleted = 0, pages_free = 0}
(gdb) p rel
$19 = (Relation) 0x7f6b76b0c268
(gdb) p *rel
$20 = {rd_node = {spcNode = 1663, dbNode = 16402, relNode = 50823}, rd_smgr = 0x23d0270, rd_refcnt = 1, rd_backend = -1, 
  rd_islocaltemp = false, rd_isnailed = false, rd_isvalid = true, rd_indexvalid = 0 '\000', rd_statvalid = false, 
  rd_createSubid = 0, rd_newRelfilenodeSubid = 0, rd_rel = 0x7f6b76bccd20, rd_att = 0x7f6b76bcc9b8, rd_id = 50823, 
  rd_lockInfo = {lockRelId = {relId = 50823, dbId = 16402}}, rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0, 
  rd_rsdesc = 0x0, rd_fkeylist = 0x0, rd_fkeyvalid = false, rd_partkeycxt = 0x0, rd_partkey = 0x0, rd_pdcxt = 0x0, 
  rd_partdesc = 0x0, rd_partcheck = 0x0, rd_indexlist = 0x0, rd_oidindex = 0, rd_pkindex = 0, rd_replidindex = 0, 
  rd_statlist = 0x0, rd_indexattr = 0x0, rd_projindexattr = 0x0, rd_keyattr = 0x0, rd_pkattr = 0x0, rd_idattr = 0x0, 
  rd_projidx = 0x0, rd_pubactions = 0x0, rd_options = 0x0, rd_index = 0x7f6b76bcc8d8, rd_indextuple = 0x7f6b76bcc8a0, 
  rd_amhandler = 330, rd_indexcxt = 0x236b340, rd_amroutine = 0x236b480, rd_opfamily = 0x236b598, rd_opcintype = 0x236b5b8, 
  rd_support = 0x236b5d8, rd_supportinfo = 0x236b600, rd_indoption = 0x236b738, rd_indexprs = 0x0, rd_indpred = 0x0, 
  rd_exclops = 0x0, rd_exclprocs = 0x0, rd_exclstrats = 0x0, rd_amcache = 0x0, rd_indcollation = 0x236b718, 
  rd_fdwroutine = 0x0, rd_toastoid = 0, pgstat_info = 0x23c4198}
(gdb)

2.調用ReadBufferExtended讀取block到buffer中,鎖定buffer,獲取page


(gdb) 
1130        buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
(gdb) n
1132        LockBuffer(buf, BT_READ);
(gdb) 
1133        page = BufferGetPage(buf);
(gdb) 
1134        if (!PageIsNew(page))
(gdb) p *page
$21 = 1 '\001'
(gdb) p page
$22 = (Page) 0x7f6b4add8380 "\001"
(gdb) p *(PageHeader)page
$23 = {pd_lsn = {xlogid = 1, xrecoff = 1320733408}, pd_checksum = 0, pd_flags = 0, pd_lower = 28, pd_upper = 8168, 
  pd_special = 8176, pd_pagesize_version = 8196, pd_prune_xid = 0, pd_linp = 0x7f6b4add8398}
(gdb)

3.如果不是new page,則執行檢查并獲取BTPageOpaque


(gdb) n
1136            _bt_checkpage(rel, buf);
(gdb) 
1137            opaque = (BTPageOpaque) PageGetSpecialPointer(page);
(gdb) p buf
$24 = 224
(gdb) n
1145        if (blkno != orig_blkno)
(gdb) p opaque
$25 = (BTPageOpaque) 0x7f6b4adda370
(gdb) p *opaque
$26 = {btpo_prev = 0, btpo_next = 33, btpo = {level = 397073, xact = 397073}, btpo_flags = 5, btpo_cycleid = 0}
(gdb)

4.如塊號與原始不同,正在進行遞歸處理,如page可回收或者可忽略或者不是葉子節點或者cycleid不同,則調用_bt_relbuf,返回;否則繼續往下執行


(gdb) n
1145        if (blkno != orig_blkno)

5.執行相關判斷
5.1如page可回收,則回收頁面


(gdb) n
1158        if (_bt_page_recyclable(page))
(gdb) 
1161            RecordFreeIndexPage(rel, blkno);
(gdb) 
1162            vstate->totFreePages++;
(gdb) p blkno
$27 = 1
(gdb) n
1163            stats->pages_deleted++;
(gdb)

5.2如page已刪除,但不能回收,則更新統計信息


N/A

5.3如page為Half-dead,則嘗試刪除(設置delete_now標記為T)


N/A

5.5如試著刪除(delete_now為T),則調用_bt_pagedel刪除,更新統計信息
否則調用_bt_relbuf


1329        if (delete_now)
(gdb) 
1353            _bt_relbuf(rel, buf);

5.6判斷recurse_to != P_NONE,如T,則重新啟動,否則退出


1362        if (recurse_to != P_NONE)
(gdb) p recurse_to
$29 = 0
(gdb) p P_NONE
$30 = 0
(gdb) n
1367    }
(gdb)

進入page為葉子節點的邏輯
5.4如page為葉子節點


(gdb) del 
Delete all breakpoints? (y or n) y
(gdb) b nbtree.c:1182
Breakpoint 5 at 0x509e61: file nbtree.c, line 1182.
(gdb) c
Continuing.
Breakpoint 5, btvacuumpage (vstate=0x7ffd33d298d0, blkno=6, orig_blkno=6) at nbtree.c:1194
1194            LockBuffer(buf, BUFFER_LOCK_UNLOCK);
(gdb)

5.4.1初始化變量


N/A

5.4.2鎖定緩沖區


1194            LockBuffer(buf, BUFFER_LOCK_UNLOCK);
(gdb) N
1195            LockBufferForCleanup(buf);
(gdb)

5.4.3記錄已取得cleanup lock的最大葉子頁編號


(gdb) 
1201            if (blkno > vstate->lastBlockLocked)
(gdb) p blkno
$31 = 6
(gdb) p vstate->lastBlockLocked
$32 = 0
(gdb) n
1202                vstate->lastBlockLocked = blkno;
(gdb)

5.4.4檢查我們是否需要遞歸回先前的頁面


(gdb) 
1211            if (vstate->cycleid != 0 &&
(gdb) p vstate->cycleid
$33 = 37965
(gdb) p opaque->btpo_cycleid
$34 = 0
(gdb) p vstate->cycleid
$35 = 37965
(gdb) 
(gdb) n
1212                opaque->btpo_cycleid == vstate->cycleid &&
(gdb) 
1211            if (vstate->cycleid != 0 &&
(gdb) 
1222            ndeletable = 0;
(gdb) 
1223            minoff = P_FIRSTDATAKEY(opaque);
(gdb) 
1224            maxoff = PageGetMaxOffsetNumber(page);
(gdb) 
1225            if (callback)
(gdb) p minoff
$36 = 2
(gdb) p maxoff
$37 = 174
(gdb)

5.4.5掃描所有條目,看看哪些根據回調函數得到的需要刪除的條目(寫入到deletable數組中)


(gdb) n
1227                for (offnum = minoff;
(gdb) 
1234                    itup = (IndexTuple) PageGetItem(page,
(gdb) 
1236                    htup = &(itup->t_tid);
(gdb) p *itup
$38 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 103}, ip_posid = 138}, t_info = 16}
(gdb) n
1259                    if (callback(htup, callback_state))
(gdb) p *htup
$41 = {ip_blkid = {bi_hi = 0, bi_lo = 103}, ip_posid = 138}
(gdb)

進入回調函數lazy_tid_reaped


(gdb) step
lazy_tid_reaped (itemptr=0x7f6b4addfd40, state=0x2415958) at vacuumlazy.c:2140
2140        LVRelStats *vacrelstats = (LVRelStats *) state;
(gdb)

調用bsearch判斷是否滿足條件,返回NULL,不滿足


(gdb) n
2145                                    vacrelstats->num_dead_tuples,
(gdb) 
2143        res = (ItemPointer) bsearch((void *) itemptr,
(gdb) 
2144                                    (void *) vacrelstats->dead_tuples,
(gdb) 
2143        res = (ItemPointer) bsearch((void *) itemptr,
(gdb) 
2149        return (res != NULL);
(gdb) p res
$42 = (ItemPointer) 0x0
(gdb) 
(gdb) n
2150    }
(gdb) 
btvacuumpage (vstate=0x7ffd33d298d0, blkno=6, orig_blkno=6) at nbtree.c:1229
1229                     offnum = OffsetNumberNext(offnum))
(gdb)

5.4.6如數組不為空,則調用_bt_delitems_vacuum,記錄相關信息
如數組為空,判斷頁面是否在這個vacuum cycle中被分裂,清除btpo_cycleid標記,標記緩沖區為臟


(gdb) del
Delete all breakpoints? (y or n) y
(gdb) b nbtree.c:1284
Breakpoint 7 at 0x50a035: file nbtree.c, line 1284.
(gdb) c
Continuing.
Breakpoint 7, btvacuumpage (vstate=0x7ffd33d298d0, blkno=48, orig_blkno=48) at nbtree.c:1284
1284                _bt_delitems_vacuum(rel, buf, deletable, ndeletable,
(gdb) 
(gdb) n
1291                if (blkno > vstate->lastBlockVacuumed)
(gdb) p blkno
$43 = 48
(gdb) p vstate->lastBlockVacuumed
$44 = 0
(gdb) n
1292                    vstate->lastBlockVacuumed = blkno;
(gdb) 
1294                stats->tuples_removed += ndeletable;
(gdb) 
1296                maxoff = PageGetMaxOffsetNumber(page);
(gdb) 
1323            if (minoff > maxoff)
(gdb) p minoff
$45 = 2
(gdb) p maxoff
$46 = 67
(gdb) n
1326                stats->num_index_tuples += maxoff - minoff + 1;
(gdb)

DONE!

四、參考資料

PG Source Code

向AI問一下細節

免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI

临洮县| 建德市| 大竹县| 普兰店市| 长岛县| 方山县| 天津市| 岳普湖县| 新丰县| 吉木乃县| 信宜市| 维西| 乌拉特后旗| 沅江市| 礼泉县| 正镶白旗| 瓦房店市| 菏泽市| 勐海县| 新乐市| 辛集市| 九台市| 花莲市| 澄城县| 无棣县| 凤城市| 屏边| 合山市| 岗巴县| 宜州市| 修水县| 河北区| 安宁市| 东港市| 云龙县| 南皮县| 西平县| 绥江县| 建平县| 福安市| 湛江市|