同事对rac集群生成性能报告时发现rac集群有一个实例没有生成awr快照,另一个实例快照正常。下面是具体处理步骤。
1号实例没有生成awr快照
SQL> select SNAP_ID,END_INTERVAL_TIME,instance_number from dba_hist_snapshot where instance_number=1;
no rows selected
2号实快照正常
SQL> set long 200
SQL> set linesize 200
SQL> select * from ( select SNAP_ID,END_INTERVAL_TIME,instance_number from dba_hist_snapshot where instance_number=2 order by SNAP_ID desc) where rownum < =10;
SNAP_ID END_INTERVAL_TIME INSTANCE_NUMBER
---------- --------------------------------------------------------------------------- ---------------
24405 17-AUG-19 07.00.47.595 PM 2
24404 17-AUG-19 06.00.42.150 PM 2
24403 17-AUG-19 05.00.37.041 PM 2
24402 17-AUG-19 04.00.31.774 PM 2
24401 17-AUG-19 03.00.26.414 PM 2
24400 17-AUG-19 02.00.21.176 PM 2
24399 17-AUG-19 01.00.16.316 PM 2
24398 17-AUG-19 12.00.10.997 PM 2
24397 17-AUG-19 11.00.05.446 AM 2
24396 17-AUG-19 10.00.59.801 AM 2
10 rows selected.
mmon进程与awr快照相关,mmnl与ash相关,如是查看两个实例的mmon与mmnl进程
2号实例
[root@db2 ~]# ps -ef | grep mmon
root 128329 127956 0 18:11 pts/2 00:00:00 grep mmon
oracle 201527 1 0 2018 ? 17:17:11 ora_mmon_RLZY2
[root@db2 ~]# ps -ef | grep mmnl
root 131772 127956 0 18:17 pts/2 00:00:00 grep mmnl
oracle 201531 1 0 2018 ? 1-06:06:24 ora_mmnl_RLZY2
1号实例
[root@db1 ~]# ps -ef | grep mmon
root 239020 238963 0 18:52 pts/2 00:00:00 grep mmon
[root@db1 ~]# ps -ef | grep mmnl
root 239052 238963 0 18:52 pts/2 00:00:00 grep mmnl
可以看到1号实例没有mmon与mmnl进程了。
如是查看1号实例的mmon进程的跟踪文件
[root@db1 trace]# ls -lrt *mmon*.trc
-rw-r----- 1 oracle asmadmin 1351052 Jan 19 2018 RLZY1_mmon_20073.trc
-rw-r----- 1 oracle asmadmin 173031 Jan 22 2018 RLZY1_mmon_49119.trc
[root@db1 trace]# more RLZY1_mmon_49119.trc
Trace file /u01/app/oracle/diag/rdbms/rlzy/RLZY1/trace/RLZY1_mmon_49119.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name: Linux
Node name: db1
Release: 3.8.13-68.3.4.el6uek.x86_64
Version: #2 SMP Tue Jul 14 15:03:36 PDT 2015
Machine: x86_64
Instance name: RLZY1
Redo thread mounted by this instance: 1
Oracle process number: 36
Unix process pid: 49119, image: oracle@db1 (MMON)
*** 2018-01-19 13:55:20.030
*** SESSION ID:(1369.1) 2018-01-19 13:55:20.030
*** CLIENT ID:() 2018-01-19 13:55:20.030
*** SERVICE NAME:() 2018-01-19 13:55:20.030
*** MODULE NAME:() 2018-01-19 13:55:20.030
*** ACTION NAME:() 2018-01-19 13:55:20.030
minact-scn slave-status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
*** 2018-01-19 14:00:20.643
minact-scn master-status: grec-scn:0x0e0e.55ad96ef gmin-scn:0x0e0e.55abf256 gcalc-scn:0x0e0e.55ac2a0a
..........
KEBM: MMON action policy violation. 'Block Cleanout Optim, Undo Segment Scan' viol=1; err=12751
minact-scn master-status: grec-scn:0x0e0e.5f0ebf8c gmin-scn:0x0e0e.5f0eac2e gcalc-scn:0x0e0e.5f0ead91
DDE rules only execution for: ORA 12751
*** 2018-01-22 07:06:04.060
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- START DDE Action: 'ORA_12751_DUMP' (Sync) -----
Runtime exceeded 300 seconds
Time limit violation detected at:
ksedsts()+465< -kspol_12751_dump()+145<-dbgdaExecuteAction()+1065<-dbgerRunAction()+109<-dbgerRunActions()+4134<-dbgexPhaseII()+1873<-dbgexProcessError()+2680<-dbgeExecuteForError()+88<-dbgePostErrorKGE()+2136<-dbkePostKGE_kgsf()+71<-kge
selv()+276<-ksesecl0()+162
<-ksucin()+147<-kcbzwb()+2727<-kcbgtcr()+31325<-ktucloUsMinScn()+539<-ktucloUsegScan()+992<-ksb_run_managed_action()+384<-ksbcti()+2490<-ksbabs()+1735<-kebm_mmon_main()+209<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai
_real()+250<-ssthrdmain()+265
<-main()+201<-__libc_start_main()+253Current Wait Stack: 0: waiting for 'gc buffer busy acquire' file#=0x5, block#=0x278, class#=0x49 wait_id=255378 seq_num=59358 snap_id=1 wait times: snap=5 min 5 sec, exc=5 min 5 sec, total=5 min 5 sec wait times: max=infinite, heur=5 min 5 sec wait counts: calls=358 os=358 in_wait=1 iflags=0x15a2 There is at least one session blocking this session. Dumping 1 direct blocker(s): inst: 1, sid: 990, ser: 1 Dumping final blocker: inst: 1, sid: 990, ser: 1 Wait State: fixed_waits=0 flags=0x22 boundary=(nil)/-1 Session Wait History: elapsed time of 0.000061 sec since current wait 0: waited for 'gc cr block 2-way' =0x5, =0x258, =0x47 wait_id=255377 seq_num=59357 snap_id=1 wait times: snap=0.000478 sec, exc=0.000478 sec, total=0.000478 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000122 sec of elapsed time 1: waited for 'gc cr block 2-way' =0x5, =0x228, =0x45 wait_id=255376 seq_num=59356 snap_id=1 wait times: snap=0.000741 sec, exc=0.000741 sec, total=0.000741 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000120 sec of elapsed time 2: waited for 'gc cr block 2-way' =0x5, =0x138, =0x43 wait_id=255375 seq_num=59355 snap_id=1 wait times: snap=0.000528 sec, exc=0.000528 sec, total=0.000528 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000111 sec of elapsed time 3: waited for 'gc cr block 2-way' =0x5, =0xb8, =0x41 wait_id=255374 seq_num=59354 snap_id=1 wait times: snap=0.000583 sec, exc=0.000583 sec, total=0.000583 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000139 sec of elapsed time 4: waited for 'gc cr block 2-way' =0x5, =0x110, =0x37 wait_id=255373 seq_num=59353 snap_id=1 wait times: snap=0.000541 sec, exc=0.000541 sec, total=0.000541 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000110 sec of elapsed time 5: waited for 'gc cr block 2-way' =0x5, =0x100, =0x35 wait_id=255372 seq_num=59352 snap_id=1 wait times: snap=0.000629 sec, exc=0.000629 sec, total=0.000629 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000158 sec of elapsed time 6: waited for 'gc cr block 2-way' =0x5, =0xf0, =0x33 wait_id=255371 seq_num=59351 snap_id=1 wait times: snap=0.000617 sec, exc=0.000617 sec, total=0.000617 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000128 sec of elapsed time 7: waited for 'gc cr block 2-way' =0x5, =0xe0, =0x31 wait_id=255370 seq_num=59350 snap_id=1 wait times: snap=0.000561 sec, exc=0.000561 sec, total=0.000561 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000124 sec of elapsed time 8: waited for 'gc cr block 2-way' =0x5, =0xd0, =0x2f wait_id=255369 seq_num=59349 snap_id=1 wait times: snap=0.000565 sec, exc=0.000565 sec, total=0.000565 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000128 sec of elapsed time 9: waited for 'gc cr block 2-way' =0x5, =0xc0, =0x2d wait_id=255368 seq_num=59348 snap_id=1 wait times: snap=0.000555 sec, exc=0.000555 sec, total=0.000555 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000125 sec of elapsed time Sampled Session History of session 1369 serial 1 --------------------------------------------------- The sampled session history is constructed by sampling the target session every 1 second. The sampling process captures at each sample if the session is in a non-idle wait, an idle wait, or not in a wait. If the session is in a non-idle wait then one interval is shown for all the samples the session was in the same non-idle wait. If the session is in an idle wait or not in a wait for consecutive samples then one interval is shown for all the consecutive samples. Though we display these consecutive samples in a single interval the session may NOT be continuously idle or not in a wait (the sampling process does not know). The history is displayed in reverse chronological order. sample interval: 1 sec, max history 120 sec --------------------------------------------------- [121 samples, 07:04:03 - 07:06:03] waited for 'gc buffer busy acquire', seq_num: 59358 p1: 'file#'=0x5 p2: 'block#'=0x278 p3: 'class#'=0x49 time_waited: >= 120 sec (still in wait)
---------------------------------------------------
Sampled Session History Summary:
longest_non_idle_wait: 'gc buffer busy acquire'
[121 samples, 07:04:03 - 07:06:03]
time_waited: >= 120 sec (still in wait)
---------------------------------------------------
----- END DDE Action: 'ORA_12751_DUMP' (SUCCESS, 1 csec) -----
----- END DDE Actions Dump (total 1 csec) -----
KEBM: MMON action policy violation. 'Block Cleanout Optim, Undo Segment Scan' viol=1; err=12751
minact-scn master-status: grec-scn:0x0e0e.5f0ec4ce gmin-scn:0x0e0e.5f0eac2e gcalc-scn:0x0e0e.5f0ead91
*** 2018-01-22 07:11:11.071
DDE rules only execution for: ORA 12751
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
ORA12751的错误原因是陈旧的SYS对象统计数据会导致生成次优执行计划,从而使AWR自动刷新从操作的语句运行更长时间和超时。
解决方法就是收集新的SYS对象统计信息,为优化器提供更好的统计信息,并生成更高效的执行计划
SQL> EXEC DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;
PL/SQL procedure successfully completed.
SQL> EXEC DBMS_STATS.GATHER_SCHEMA_STATS ('SYS');
PL/SQL procedure successfully completed.
下面就是重启mmon和mmnl进程
SQL> alter system enable restricted session;
System altered.
SQL> alter system disable restricted session;
System altered.
查看alert日志可以看到mmon和mmnl进程已经重启了
Sat Aug 17 19:18:22 2019
Starting background process MMON
Sat Aug 17 19:18:22 2019
Starting background process MMNL
MMON started with pid=399, OS id=10373
Sat Aug 17 19:18:22 2019
MMNL started with pid=405, OS id=10375
ALTER SYSTEM enable restricted session;
Sat Aug 17 19:18:25 2019
Some DDE async actions failed or were cancelled
Sat Aug 17 19:18:25 2019
Sweep [inc][48021]: completed
Sweep [inc][48011]: completed
Sweep [inc][48002]: completed
Sweep [inc][35010]: completed
Sweep [inc][34706]: completed
Sweep [inc][34242]: completed
Sweep [inc][33546]: completed
Sweep [inc][33394]: completed
Sweep [inc2][48021]: completed
Sweep [inc2][48011]: completed
Sweep [inc2][48002]: completed
Sweep [inc2][35010]: completed
Sweep [inc2][34706]: completed
Sweep [inc2][34242]: completed
Sweep [inc2][33546]: completed
Sweep [inc2][33394]: completed
minact-scn: Inst 1 is a slave inc#:30 mmon proc-id:10373 status:0x2
minact-scn status: grec-scn:0x0e0e.61cb2e9c gmin-scn:0x0e0e.5f0eac2e gcalc-scn:0x0e0e.5f0ead91
Sat Aug 17 19:18:29 2019
db_recovery_file_dest_size of 10240 MB is 20.87% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Sat Aug 17 19:18:42 2019
ALTER SYSTEM disable restricted session;
再查看1号实例的mmon与mmnl进程状态
[root@db1 ~]# ps -ef | grep mmnl
oracle 10375 1 0 19:18 ? 00:00:00 ora_mmnl_RLZY1
root 10611 238963 0 19:18 pts/2 00:00:00 grep mmnl
[root@db1 ~]# ps -ef | grep mmon
oracle 10373 1 7 19:18 ? 00:00:02 ora_mmon_RLZY1
root 10630 238963 0 19:18 pts/2 00:00:00 grep mmon
过了两个小时去查看1号实例已经生成了两条快照信息
SQL> set long 200
SQL> set linesize 200
SQL> select * from ( select SNAP_ID,END_INTERVAL_TIME,instance_number from dba_hist_snapshot where instance_number=1 order by SNAP_ID desc) where rownum < =10;
SNAP_ID END_INTERVAL_TIME INSTANCE_NUMBER
---------- --------------------------------------------------------------------------- ---------------
24407 17-AUG-19 09.00.58.595 PM 1
24406 17-AUG-19 08.00.40.244 PM 1
到此问题解决了。