近期,一金融客户oracle 11.2.0.4 rac集群delete不当导致等待事件enq: TM - contention严重引起大范围会话堆积,记录的相关分析工作如下。
1、登录集群任意节点,查看集群全局等待事件
SQL> select event,count(*) from gv$session where wait_class<>'Idle' group by event;
EVENT COUNT(*)
---------------------------------------------------------------- ----------
enq: TM - contention 24
2、查看等待事件在两个节点的分布
节点1:
EVENT COUNT(*)
---------------------------------------------------------------- ----------
SQL*Net message from client 79
rdbms ipc message 64
enq: TM - contention 22
gcs remote message 10
Space Manager: slave idle wait 5
wait for unread message on broadcast channel 4
DIAG idle wait 4
class slave wait 3
Streams AQ: waiting for time management or cleanup tasks 2
Streams AQ: qmn coordinator idle wait 2
PX Deq: Execution Msg 2
VKTM Logical Idle Wait 2
GCR sleep 2
Streams AQ: qmn slave idle wait 2
smon timer 2
pmon timer 2
ges remote message 2
ASM background timer 2
PING 2
PX Deq: Execute Reply 1
节点2:
EVENT COUNT(*)
---------------------------------------------------------------- ----------
SQL*Net message from client 49
rdbms ipc message 32
gcs remote message 5
wait for unread message on broadcast channel 2
class slave wait 2
DIAG idle wait 2
Space Manager: slave idle wait 2
enq: TM - contention 2
smon timer 1
PING 1
Streams AQ: qmn slave idle wait 1
SQL*Net message to client 1
GCR sleep 1
ges remote message 1
VKTM Logical Idle Wait 1
Streams AQ: qmn coordinator idle wait 1
ASM background timer 1
pmon timer 1
Streams AQ: waiting for time management or cleanup tasks 1
19 rows selected.
3、查看引起enq: TM - contention等待事件的会话信息
SID USERNAME SQL_ID MODULE MACHINE PROGRAM
---------- ---------- ------------- ---------- ---------------------------------------------------------------- ------------------------------------------------
363 MW_SYS 74j1zd36h5n96 localhost.localdomain
387 MW_SYS 74j1zd36h5n96 localhost.localdomain
602 MW_SYS 74j1zd36h5n96 localhost.localdomain
626 MW_SYS 74j1zd36h5n96 localhost.localdomain
674 MW_SYS 74j1zd36h5n96 localhost.localdomain
1106 MW_SYS 74j1zd36h5n96 localhost.localdomain
1441 MW_SYS 74j1zd36h5n96 localhost.localdomain
1730 MW_SYS 74j1zd36h5n96 localhost.localdomain
1946 MW_SYS 74j1zd36h5n96 localhost.localdomain
1969 MW_SYS 74j1zd36h5n96 localhost.localdomain
1993 MW_SYS 74j1zd36h5n96 localhost.localdomain
1994 MW_SYS 74j1zd36h5n96 localhost.localdomain
2041 MW_SYS 74j1zd36h5n96 localhost.localdomain
2042 MW_SYS 74j1zd36h5n96 localhost.localdomain
2066 MW_SYS 74j1zd36h5n96 localhost.localdomain
2067 MW_SYS 74j1zd36h5n96 localhost.localdomain
2091 MW_SYS 74j1zd36h5n96 localhost.localdomain
2114 MW_SYS 74j1zd36h5n96 localhost.localdomain
2137 MW_SYS 74j1zd36h5n96 localhost.localdomain
2138 MW_SYS 74j1zd36h5n96 localhost.localdomain
2161 MW_SYS 74j1zd36h5n96 localhost.localdomain
2162 MW_SYS 74j1zd36h5n96 localhost.localdomain
2209 MW_SYS 74j1zd36h5n96 localhost.localdomain
2233 MW_SYS 74j1zd36h5n96 localhost.localdomain
24 rows selected.
4、查看SQL语句74j1zd36h5n96文本信息
SQL_FULLTEXT
--------------------------------------------------------------------------------
delete from mw_sys.mwt_is_user m where m.user_id not in('0D3C0ACD-C95B-42DA-A5DE
-70F019DAB52B','8697D72E-C77B-4C70-A434-ED7D488ADC36');
5、查看SQL语句74j1zd36h5n96的执行计划
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------
SQL_ID 74j1zd36h5n96, child number 0
-------------------------------------
delete from mw_sys.mwt_is_user m where m.user_id not
in('0D3C0ACD-C95B-42DA-A5DE-70F019DAB52B','8697D72E-C77B-4C70-A434-ED7D4
88ADC36')
Plan hash value: 3271715000
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | | | 210 (100)| |
| 1 | DELETE | MWT_IS_USER | | | | |
|* 2 | TABLE ACCESS FULL| MWT_IS_USER | 18739 | 951K| 210 (1)| 00:00:03 |
----------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - DEL$1
2 - DEL$1 / M@DEL$1
Outline Data
-------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("M"."USER_ID"<>'0D3C0ACD-C95B-42DA-A5DE-70F019DAB52B' AND
"M"."USER_ID"<>'8697D72E-C77B-4C70-A434-ED7D488ADC36'))
Column Projection Information (identified by operation id):
-----------------------------------------------------------
2 - (cmp=2; cpy=2,3) "M".ROWID[ROWID,10], "M"."USER_ID"[CHARACTER,36],
"M"."USER_NAME"[VARCHAR2,64], "USER_ISSYSTEM"[CHARACTER,1],
"USER_STATUS"[CHARACTER,1]
55 rows selected.
6、查询SQL语句74j1zd36h5n96要保留的数据量
SQL> select count(*) from mw_sys.mwt_is_user where user_id ='0D3C0ACD-C95B-42DA-A5DE-70F019DAB52B';
COUNT(*)
----------
1
SQL>
SQL> select count(*) from mw_sys.mwt_is_user where user_id ='8697D72E-C77B-4C70-A434-ED7D488ADC36';
COUNT(*)
----------
1
7、查看SQL语句74j1zd36h5n96要删除的数据量
SQL> select count(*) from mw_sys.mwt_is_user m where m.user_id not in('0D3C0ACD-C95B-42DA-A5DE
-70F019DAB52B','8697D72E-C77B-4C70-A434-ED7D488ADC36'); 2
COUNT(*)
----------
18740
8、问题
由以上信息可知,24个会话分布在两个节点上通知对同一张表执行相同的delete操作,而且删除的数据量是大范围的,只保留2条记录,
因此导致严重的enq:TM锁。
9、建议
delete操作分批执行,且控制会话间执行的delete语句删除的数据没有交集。