您好,登錄后才能下訂單哦!
內容介紹
本文詳細介紹了在客戶現場一次因為resmgr:cpu quantum等待事件而導致CPU大量被消耗,在確認原因并將導致CPU沖高的Resource Manager資源管理器特性關閉后,后續導致了后臺alert日志中大量的自動定時JOB沒有執行并報錯:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_255" ORA-29373: resource manager is not on |
客戶操作系統:RHEL6
數據庫版本:11.2.0.3.11
以下我們將描述并說明Resource Manager資源管理器以及后續的解決處理排查等一系列的過程。
概念普及
Resource Manager資源管理器是oracle在10g推出,并在11g中得以完善的一項功能。資源管理器通過控制數據庫內部的執行調度任務控制資源在各個會話之間的分配。通過控制所要運行的會話以及會話運行的時間長度,數據庫資源管理器可以確保資源利用及分配和我們計劃中的配置一樣,有效利用資源,當然,Resource Manager主要控制資源包括如下幾塊:
Oracle會話的CPU使用率
并行度
SQL語句操作執行時間
會話空閑時間
活躍會話(session)數
UNDO管理
當然,在真正的生產環境中,Resource Manager還沒有被大量的使用起來,但是在11g(11.1.0.6 to 11.1.0.7 and 11gR2)開始,oracle默認開啟Resource Manager計劃,如下顯示oracle后臺日志:
Setting Resource Manager plan SCHEDULER[0x51B5]:DEFAULT_MAINTENANCE_PLAN via scheduler window Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter Thu Feb 05 22:00:03 2009 Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Thu Feb 05 22:00:39 2009 End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" |
如下顯示在10g和11g中,oracle對Resource Manager采用的不同策略:
Subject |
10G |
11G |
Maintenance Window |
2 windows, WEEK and WEEKEND |
Each day has its own window |
Resource manager |
Not enabled per default |
Default resource plan specified |
故障詳解
客戶新系統上線,在上線后,發現數據庫出現大量的等待事件:resmgr:cpu quantum,這個等待事件是在11g中一個比較常見的等待事件,消耗大量CPU 系統性能變差,在和客戶溝通后,決定對該特性進行關閉,以下為關閉該特性步驟:
ALTER SYSTEM SET “_resource_manager_always_on”=FALSE SCOPE=SPFILE SID='*';
#1,Change the active windows to use the null resource manager plan (or other nonrestrictive plan) using:
execute dbms_scheduler.set_attribute('WEEKNIGHT_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('WEEKEND_WINDOW','RESOURCE_PLAN','');
#2,Since in 11g there are more Maintenance Windows, we should add them too:
execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');
#3. Then, for each window_name (WINDOW_NAME from DBA_SCHEDULER_WINDOWS), run:
execute dbms_scheduler.set_attribute('<window name>','RESOURCE_PLAN',''); |
以上關掉Resource Manager后效果確實很明顯,系統運行穩定,CPU處于正常水平。但是運行一段時間后后臺開始報錯:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_186" ORA-29373: resource manager is not on |
從報錯上看,似乎和Resource Manager有一定的關系,從JOB的命名來看,這個JOB是"SQL Tuning Advisor Job", 用于診斷和監控高負載的SQL, 并為ADDM提供SQL優化建議的JOB,第一次,我們對該報錯的JOB,也就是SQL Tuning Advisor進行了關閉,關閉過程如下:
exec DBMS_AUTO_TASK_ADMIN.DISABLE('SQL TUNING ADVISOR',NULL, NULL); |
關閉完后又有相應新的JOB開始報錯:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_255" ORA-29373: resource manager is not on |
我們檢查后臺最近的JOB執行情況:
SQL> select client_name,window_name,job_name,job_status from dba_autotask_job_history ;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS ----------------------------- ------------------------------------- -------- auto optimizer stats collection WEDNESDAY_WINDOW ORA$AT_OS_OPT_SY_102 FAILED auto optimizer stats collection FRIDAY_WINDOW ORA$AT_OS_OPT_SY_105 FAILED auto optimizer stats collection TUESDAY_WINDOW ORA$AT_OS_OPT_SY_99 FAILED auto optimizer stats collection MONDAY_WINDOW ORA$AT_OS_OPT_SY_96 FAILED |
在以上的腳本輸出中,我們發現大量的后臺JOB報錯,而以上JOB報錯從告警日志上看,基本都是由于Resource Manager關閉造成。但是我們知道,Resource Manager的開啟關閉應該不至于影響JOB的正常調度,而且在相同的數據庫版本下,我們在AIX上卻沒有發現問題,而在檢查過程中,我們發現,雖然我們已經關閉了Resource Manager,但是相關的DBRM進程卻依舊存在,ORACLE的DBRM進程進程即為Resource Manager的管理進程,在某種情況下,關閉了Resource Manager卻發現該進程依舊存在,這本身就存在一定的問題。這從一定程度上給予我們一定的懷疑方向,可能存在resource并沒有完全關閉,而且從詳細的trace文件中的從call stack trace里來看, 當前進程也沒有沒有關于resource manager的函數調用,而是一直在向另外一個進程post message。
所以,我們懷疑錯誤很有可能是由于DBRM沒有正常關閉造成。
#ps -ef|grep dbrm oracle 34920 1 0 Apr18 ? 00:00:59 ora_dbrm_nfdb1 |
在對問題進行了詳細的跟蹤及分析后,我們向ORACLE提交了SR,ORACLE反饋給我們第二個隱含參數: _resource_manager_always_on設置為FALSE
我們嘗試對該參數進行設置,并重啟了數據庫生效該參數后,DBRM進程消失:
# ps -ef | grep dbrm root 4650 3647 0 21:58 pts/2 00:00:00 grep dbrm |
且相關的JOB執行正常。為了確認問題,我們分別進行了3個實驗,分別為:
1. 設置“_resource_manager_always_on隱含參數,關閉resource manager windows 調用計劃
2. 刪除隱含參數,只設置resource manager windows 調用計劃
3. 添加2個隱含參數,關閉resource manager windows 調用計劃
關閉數據庫,調整時間,設置_resource_manager_always_on隱含參數,關閉windows 計劃
關閉數據庫: SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down.
調整時間: [root@rhel6 ~]# date -s 2014/05/14 [root@rhel6 ~]# date -s 21:57:00 [root@rhel6 ~]# clock -w [root@rhel6 ~]# date Wed May 14 21:57:35 CST 2014
添加隱含參數,啟動至open: SQL> ALTER SYSTEM SET "_resource_manager_always_on"=FALSE SCOPE=SPFILE; SQL> startup force(測試環境,直接force啟動,生產環境勿如此操作)
設置resource manager plan: execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN',''); |
我們觀察22點的alert信息,確實開始報錯:
Wed May 14 22:00:03 2014 Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j003_4452.trc: ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_27" ORA-29373: resource manager is not on Wed May 14 22:00:03 2014 Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j004_4454.trc: ORA-12012: error on auto execute of job "SYS"."ORA$AT_SA_SPC_SY_28" ORA-29373: resource manager is not on Wed May 14 22:00:03 2014 Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j005_4456.trc: ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_29" ORA-29373: resource manager is not on Wed May 14 22:00:04 2014 XDB installed. XDB initialized. |
檢查DBRM進程已經存在:
[root@rhel6 ~]# ps -ef | grep dbrm ora11g 4346 1 0 21:57 ? 00:00:00 ora_dbrm_ora11g
|
檢查后臺JOB執行記錄視圖:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME ----------------------------------- ------------------------------ ------------------------------ ------------------------------ ---------------------------------------- auto optimizer stats collection WEDNESDAY_WINDOW ORA$AT_OS_OPT_SY_27 FAILED 14-MAY-14 10.00.03.233570 PM PRC |
模擬過程與當前環境產生效果一致
接下來我們進行第二個模擬:
關閉數據庫,調整時間,去除隱含參數,關閉windows 計劃
關閉數據庫: SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down.
調整時間: [root@rhel6 ~]# date -s 2014/05/14 [root@rhel6 ~]# date -s 21:57:00 [root@rhel6 ~]# clock -w [root@rhel6 ~]# date Wed May 14 21:57:35 CST 2014
去除隱含參數,啟動至open: 隱含參數去除采用create pfile from spfile; 刪除spfile,編輯pfile文件,刪除隱含參數,以pfile啟動數據庫
設置resource manager plan:(由于前面已經設置過,無需再設置) |
我們觀察22點的alert信息,發現沒有報錯
Tue May 13 22:00:03 2014 Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Tue May 13 22:00:05 2014 XDB installed. XDB initialized.
|
檢查 DBRM進程:
[root@rhel6 ~]# ps -ef | grep dbrm ora11g 3844 1 0 21:55 ? 00:00:00 ora_dbrm_ora11g |
說明:此時resource manager由于只是關閉了resource manager plan計劃,沒有真正關閉resource manager 因此該進程依舊存在。
檢查后臺JOB執行視圖信息:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME ---------------------------------------- ------------------------------ ------------------------- -------------------- ---------------------------------------- auto optimizer stats collection TUESDAY_WINDOW ORA$AT_OS_OPT_SY_24 SUCCEEDED 13-MAY-14 10.00.02.102741 PM PRC |
說明在隱含參數除掉的情況下,JOB可以正常執行,后臺沒有報錯。
模擬隱含參數及resource manager plan均存在的情況
關閉數據庫,調整時間,添加隱含參數,關閉windows 計劃
關閉數據庫: SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down. 調整時間: [root@rhel6 ~]# date -s 2014/05/15 [root@rhel6 ~]# date -s 21:57:00 [root@rhel6 ~]# clock -w [root@rhel6 ~]# date Thu May 15 21:57:35 CST 2014 添加隱含參數,啟動至open: SQL> alter system set "_resource_manager_always_off"=true scope=spfile; SQL> ALTER SYSTEM SET "_resource_manager_always_on"=FALSE SCOPE=SPFILE; SQL> startup force(測試環境,直接force啟動,生產環境勿如此操作) 設置resource manager plan:(由于前面已經設置過,無需再設置) |
我們觀察22點的alert信息,發現沒有報錯
Thu May 15 22:00:03 2014 Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Thu May 15 22:00:05 2014 XDB installed. XDB initialized. End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" |
檢查 DBRM進程:
[root@rhel6 ~]# ps -ef | grep dbrm root 4650 3647 0 21:58 pts/2 00:00:00 grep dbrm |
說明:此時DBRM進程消失
檢查后臺JOB執行視圖信息:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME ----------------------------------- ------------------------------ ------------------------------ ------------------------------ ---------------------------------------- auto optimizer stats collection THURSDAY_WINDOW ORA$AT_OS_OPT_SY_30 SUCCEEDED 15-MAY-14 10.00.02.115232 PM PRC |
說明在隱含參數兩個都添加的情況下,完全屏蔽resource manager的的情況下,JOB可以正常執行,后臺沒有報錯。
技術結論
以上測試分析結果證明,后臺報錯JOB執行失敗原因應該是DBRM進程依舊活動,而DBRM進程是管理Resource Manager當去除"_resource_manager_always_off"=true及"_resource_manager_always_on"=FALSE
或者將兩個參數全部添加,均可避免該錯誤,統計信息自動收集也可以自動執行
在對以為問題進行分析及確認后,我們向ORACLE提交了相關的SR,最終,ORACLE對此,確認相關的一個BUG,BUG號為:Bug 18748456 : AUTO TASK JOBS FAILED WITH ORA-29373 ERROR免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。