您好,登錄后才能下訂單哦!
這篇文章將為大家詳細講解有關asm中dismount導致rac一個節點宕機該怎么辦,文章內容質量較高,因此小編分享給大家做個參考,希望大家閱讀完這篇文章后對相關知識有一定的了解。
asm日志
/u01/app/grid/diag/asm/+asm/+ASM1/trace
Thu Jul 30 02:10:46 2015<br /> WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.
Thu Jul 30 02:10:47 2015<br /> NOTE: process _b000_+asm1 (38695) initiating offline of disk 0.3915941304 (DATA2_0000) with mask 0x7e in group 1
NOTE: process _b000_+asm1 (38695) initiating offline of disk 1.3915941302 (DATA2_0001) with mask 0x7e in group 1
NOTE: process _b000_+asm1 (38695) initiating offline of disk 2.3915941303 (DATA2_0002) with mask 0x7e in group 1<br /> NOTE: checking PST: grp = 1
GMON checking disk modes for group 1 at 12 for pid 28, osid 38695
ERROR: no read quorum in group: required 2, found 0 disks
Dirty Detach Reconfiguration complete<br /> Thu Jul 30 02:10:47 2015
WARNING: dirty detached from domain 1
NOTE: cache dismounted group 1/0xB368755B (DATA2) <--自己dismounted了
SQL> alter diskgroup DATA2 dismount force /* ASM SERVER:3009967451 */
Thu Jul 30 02:11:24 2015<br /> NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup DATA2 was mounted <
自己又mounted了
SUCCESS: ALTER DISKGROUP DATA2 MOUNT /* asm agent *//* {0:31:15779} */
alert可以看到ASM磁盤dismount,并且是錯誤“Waited 15 secs for write IO to PST”的問題,這是ASM特有的心跳超時檢測,<br /> ASM instance會定期檢查每個asm disk是不是能正常反饋
Generally this kind messages comes in ASM alertlog file on below situations,
Delayed ASM PST heart beats on ASM disks in normal or high redundancy diskgroup,
thus the ASM instance dismount the diskgroup.By default, it is 15 seconds.
By the way the heart beat delays are sort of ignored for external redundancy diskgroup.
ASM instance stop issuing more PST heart beat until it succeeds PST revalidation,
but the heart beat delays do not dismount external redundancy diskgroup directly.
上面描述,可以理解為下面幾點:1. ASM實例會定期檢查每一個磁盤組的磁盤狀態,是否通信正常;
2. 這個檢查,只是針對normal和high冗余模式,對于external冗余,不會遇到這個錯誤;
3. 默認情況是15s超時,也就是說15s磁盤組還是沒有對ASM實例響應的話,就會dismount磁盤組。在存儲網絡出現問題的情況下,會引發這個錯誤的出現。也就是說,在ASM定期發出檢查信息的時候,如果磁盤沒有在15s內反饋的話,就認為磁盤已經無法訪問。
實際情況是上面的凌晨2:10時間點正好是做全庫備份時間,估計大量的寫入導致io響應慢<br /> <br /> 在11.2.0.3.0之后才有這個參數出現,也就是說ASM實例對磁盤超時的檢測是在11.2.0.3之后才出現的<br /> <br /> <br /> set pages 9999;<br /> <br /> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ<br /> FROM SYS.x$ksppi x, SYS.x$ksppcv y<br /> WHERE x.inst_id = USERENV ('Instance')<br /> AND y.inst_id = USERENV ('Instance')<br /> AND x.indx = y.indx<br /> AND upper(x.ksppinm) like '%ASM_H%';<br /> 顯示如下:
_asm_hbeatiowait
number of secs to wait for PST Async Hbeat IO return
_asm_hbeatwaitquantum
quantum used to compute time-to-wait for a PST Hbeat check
在存儲網絡條件不是很好的情況下可以設置檢查時間長點,其實在12.1.0.2默認就是120秒了
alter system set "_asm_hbeatiowait"=120 scope=spfile;
重啟asm 繼續觀察
關于asm中dismount導致rac一個節點宕機該怎么辦就分享到這里了,希望以上內容可以對大家有一定的幫助,可以學到更多知識。如果覺得文章不錯,可以把它分享出去讓更多的人看到。
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。