近日遇到一个RAC节点hang导致节点被重启的问题,最后经过分析,发现在系统运行一段时间后,系统内存就会耗尽,原本256G的内存,最后只剩几百M。
1. 问题时间段的TOP输出可以看到,内存只剩7G,而分析内存问题,TOP输出是不够的,一般情况下,Database的SGA和PGA是内存使用大户,所以,在TOP很难发现谁是使用内存最多的。
除非某些进程内存使用的格外明显
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linux OSWbb v7.3.3
zzz ***Tue Feb 21 00:00:10 CST 2017
top - 00:00:12 up 14:16, 10 users, load average: 2.97, 2.31, 2.05
Tasks: 3087 total, 11 running, 3076 sleeping, 0 stopped, 0 zombie
Cpu(s): 11.7%us, 2.8%sy, 0.0%ni, 83.7%id, 0.9%wa, 0.0%hi, 0.9%si, 0.0%st
Mem: 257948M total, 250464M used, 7484M free, 113M buffers
Swap: 65537M total, 0M used, 65537M free, 59868M cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1156 oracle 20 0 4232 568 380 R 101 0.0 0:01.67 gzip
20019 root RT 0 308m 89m 57m S 13 0.0 24:09.96 osysmond.bin
1160 oracle 20 0 11252 3492 836 R 9 0.0 0:00.17 top
49793 oracle 20 0 128g 1.2g 1.2g S 7 0.5 36:00.74 oracle
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2. 通过AWR,可以看到数据库很忙
3. 但是Oracle的物理内存使用百分比只有33%,并不是oracle耗尽的主机内存。
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memory Statistics
Begin End
Host Mem (MB): 257,948.4 257,948.4
SGA use (MB): 77,824.0 77,824.0
PGA use (MB): 8,938.9 6,416.3
% Host Mem used for SGA+PGA: 33.64 32.66
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4. 注意:一般情况下Oracle的全部进程,如smon, pmon,lgwr等,都会分别使用SGA, PGA,以及一小部分内存作为进程本身使用(这部分一般很小)。
所以,这里的33%,可以代表Oracle全部使用的物理内存。
当然,出现一些bug的情况,如LMS异常使用内存等情况,就另当别论了。
参考案例:
RAC: LMS uses huge memory (Doc ID 1954701.1)
RAC LMS processes using huge PGA memory:
SQL> select pid,spid,program,pga_used_mem,pga_alloc_mem,pga_freeable_mem,pga_max_mem from v$process where program like '%LMS%';
PID SPID USER PROGRAM PGA_USED_MEM PGA_ALLOC_MEM PGA_FREEABLE_MEM PGA_MAX_MEM
---------- ------------------------ --------------- ------------------------------------------------ ------------
13 23698 oracle@grid06.prod.quova.com (LMS0) 1.0644E+10 1.6525E+10 0 1.6525E+10
14 23702 oracle@grid06.prod.quova.com (LMS1) 1.0644E+10 1.6525E+10 0 1.6525E+10
15 23706 oracle@grid06.prod.quova.com (LMS2) 1.0407E+10 1.6157E+10 0 1.6157E+10
16 23710 oracle@grid06.prod.quova.com (LMS3) 1.0599E+10 1.6455E+10 0 1.6455E+10
其中涉及BUG 16412220 - DLM USES EXCEESIVE PGA SEND MBUFS AND NOT RELEASE BACK TO PGA MEMORY POOL
5. 分析过程中,也确认了一下,PGA曾经使用过的最大内存情况,可以看到PGA最大也就是使用10G,对应256G物理内存来说,很少。不是问题点
select * from dba_hist_pgastat
SNAP_ID DBID INSTANCE_NUMBER NAME VALUE
1054 602741423 1 aggregate PGA target parameter 5.5835E+10
1054 602741423 1 aggregate PGA auto target 4.3041E+10
1054 602741423 1 global memory bound 1073741824
1054 602741423 1 total PGA inuse 8010343424
1054 602741423 1 total PGA allocated 9373099008
1054 602741423 1 maximum PGA allocated 1.0711E+10
1054 602741423 1 total freeable PGA memory 396361728
1054 602741423 1 process count 2232
1054 602741423 1 max processes count 3053
1054 602741423 1 PGA memory freed back to OS 6.3224E+11
1054 602741423 1 maximum PGA used for auto workareas 5028864
1054 602741423 1 maximum PGA used for manual workareas 542720
1054 602741423 1 bytes processed 1036738560
1054 602741423 1 cache hit percentage 100
1054 602741423 1 recompute count (total) 7478
1055 602741423 2 aggregate PGA target parameter 4.8050E+10
1055 602741423 2 aggregate PGA auto target 3.7282E+10
1055 602741423 2 global memory bound 1073741824
1055 602741423 2 total PGA inuse 6643825664
1055 602741423 2 total PGA allocated 7995835392
1055 602741423 2 maximum PGA allocated 9677304832
1055 602741423 2 total freeable PGA memory 420085760
1055 602741423 2 process count 2107
1055 602741423 2 max processes count 2365
1055 602741423 2 PGA memory freed back to OS 8.2417E+11
1055 602741423 2 maximum PGA used for auto workareas 33622016
1055 602741423 2 maximum PGA used for manual workareas 542720
1055 602741423 2 bytes processed 1.3889E+10
1055 602741423 2 cache hit percentage 100
1055 602741423 2 recompute count (total) 8519
Line 384: 997 602741423 2 maximum PGA allocated 1.0699E+10
Line 967: 998 602741423 2 maximum PGA allocated 1.0699E+10 <<<<<<<<<<<<<<<10G
Line 1380: 983 602741423 1 maximum PGA allocated 1.0598E+10
Line 1436: 986 602741423 1 maximum PGA allocated 1.1655E+10
Line 1808: 1056 602741423 1 maximum PGA allocated 1.1055E+10
Line 2029: 997 602741423 1 maximum PGA allocated 1.3501E+10
Line 2350: 1018 602741423 1 maximum PGA allocated 1.0049E+10
Line 2376: 985 602741423 1 maximum PGA allocated 1.1624E+10
6. 最后,查看meminfo,发现了问题,PageTables占用了168G的内存, 加上SGA和PGA的使用,刚刚好250G左右。
PageTables是内存表,是不共享的,在内存很大的情况下,如果很大process访问内存的话,就会每个process都copy一份PageTables,最终导致大量内存自耗的情况
node3_meminfo_17.02.21.0000.dat
zzz ***Tue Feb 21 00:00:10 CST 2017
MemTotal: 264139120 kB ===> 260 GB
MemFree: 7720156 kB ===> 7 GB
Buffers: 116576 kB
Cached: 60954824 kB ===> 60GB (include SGA)
SwapCached: 0 kB
Active: 61768656 kB
Inactive: 12761292 kB
Active(anon): 61284872 kB ===>
Inactive(anon): 11620960 kB
Active(file): 483784 kB ===> 500 MB
Inactive(file): 1140332 kB ===> 1GB
Unevictable: 333944 kB
Mlocked: 223568 kB
SwapTotal: 67110908 kB
SwapFree: 67110780 kB
Dirty: 3764 kB
Writeback: 0 kB
AnonPages: 13793504 kB
Mapped: 58621868 kB
Shmem: 59376696 kB
Slab: 1354844 kB ===> 1 GB
SReclaimable: 351496 kB
SUnreclaim: 1003348 kB
KernelStack: 29248 kB
PageTables: 176260660 kB ===> 168 GB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 199180468 kB
Committed_AS: 88076096 kB
关于PageTables,参考下图
7. 检查之前正常时间段的meminfo,可以发现,刚启动数据库时,PageTables只有700M,但是随着进程的增加,很快PageTables就增长上来了
meminfo_17.02.20.1400.dat
zzz ***Mon Feb 20 14:19:05 CST 2017
MemTotal: 264139120 kB
MemFree: 222005744 kB
Buffers: 112332 kB
SUnreclaim: 258840 kB
KernelStack: 11320 kB
PageTables: 747560 kB <<<<<<<<<<<<<<<
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
Line 1113: PageTables: 752060 kB
Line 1157: PageTables: 769128 kB
Line 1201: PageTables: 769252 kB
Line 1245: PageTables: 758068 kB
Line 1289: PageTables: 2995368 kB <<<<<<<<<<<<<<<<<
Line 1333: PageTables: 4314036 kB
Line 1377: PageTables: 5717752 kB
Line 1421: PageTables: 6107780 kB
Line 1465: PageTables: 6427636 kB
Line 1509: PageTables: 7307184 kB
Line 1553: PageTables: 8552708 kB
Line 1597: PageTables: 9382396 kB
Line 1641: PageTables: 10236492 kB
8. 既然问题找到了,如何解决呢?
Hugepages是解决这种问题的最好方案。
hugepages的内存块是2M(普通内存块是4K),首先内存管理的成本就降低500倍,而且hugepages的内存表是可以共享的。
9. 最终,配置hugepages,解决问题。
相关hugepages文档,请参考另两篇blog
Hugepages你用了吗?----原理概念篇
Hugepages你用了吗?----测试案例篇
题外话,oswatcher是Oracle分析和解决问题,非常有用的一个工具,在很多问题的分析上,都能提供很大的帮助。 所以强烈建议部署。