ASM单实例由Oracle Restart引发的系列故障分析(Final Version)
置顶 aaron8219 2013-08-26 03:35:50 5816 收藏
分类专栏: Oracle ASM
版权
今天重新打开上次安装完的一个ASM单实例环境,突然报错
SQL> startup
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file "+DATA/asmsingle/spfileasmsingle.ora"
ORA-17503: ksfdopn:2 Failed to open file +DATA/asmsingle/spfileasmsingle.ora
ORA-29701: unable to connect to Cluster Synchronization Service
以为是监听问题造成的无法读取asm磁盘组,于是运行netca删除监听和服务名,准备重新创建。
当我删除监听后,准备再重新创建一个时,报了一个错误,如下图:
需要重启Oracle Restart后再配置netca,否则不能注册到Oracle Restart
google了一下,Oracle Restart是11gR2用来管理单实例组件的一个组件,在装完Grid Infrastructure以后自动安装的
但是我的环境现在crsctl命令也不能用,看来是GI环境配置有点问题
在安装GI的时候,执行完root.sh脚本后,通常要执行一下roothas.pl,如:
[root@dg1 ~]# /u01/app/11.2.0/grid/root.sh
Performing root user operation for Oracle 11g
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/11.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
To configure Grid Infrastructure for a Stand-Alone Server run the following command as the root user:
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/roothas.pl
To configure Grid Infrastructure for a Cluster execute the following command:
/u01/app/11.2.0/grid/crs/config/config.sh
This command launches the Grid Infrastructure Configuration Wizard. The wizard also supports silent operation, and the parameters can be passed through the response file that is available in the installation media.
[root@dg1 ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Improper Oracle Grid Infrastructure configuration found on this host
Deconfigure the existing cluster configuration before starting
to configure a new Grid Infrastructure
run "/u01/app/11.2.0/grid/crs/install/roothas.pl -deconfig"
to configure existing failed configuration and then rerun root.sh
根据执行roothas.pl失败的指示,先用-deconfig参数来删除原来GI的配置:
[root@dg1 ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl -deconfig
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Oracle Restart stack is not active on this node
Restart the SIHA stack (use /u01/app/11.2.0/grid/bin/crsctl start has) and retry
Failed to write the checkpoint:"" with status:FAIL.Error code is 256
Failed to verify HA resources
注意上面红色部分字体,提示Oracle Restart并没有在本节点启动,也无法删除原有GI配置,难道没办法删除了吗?
[root@dg1 ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl -deconfig -force -verbose
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Delete failed, or completed with errors.
CLSU-00100: Operating System function: opendir failed with error data: 2
CLSU-00101: Operating System error message: No such file or directory
CLSU-00103: error location: scrsearch1
CLSU-00104: additional error information: cant open scr home dir scls_scr_getval
CRS-4544: Unable to connect to OHAS
CRS-4000: Command Stop failed, or completed with errors.
Successfully deconfigured Oracle Restart stack
发现提示不一样了,已经顺利地重新配置了Oracle Restart,然后重新执行roothas.pl来启动Oracle Restart:
[root@dg1 ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
LOCAL ADD MODE
Creating OCR keys for user "grid", privgrp "oinstall"..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user "root", privgrp "root"..
Operation successful.
CRS-4664: Node dg1 successfully pinned.
Adding Clusterware entries to inittab
dg1 2013/08/25 23:30:16 /u01/app/11.2.0/grid/cdata/dg1/backup_20130825_233016.olr
Successfully configured Oracle Grid Infrastructure for a Standalone Server
成功地重新配置完成了一个Stand-Alone,此时crsctl命令已经可以使用了,再执行root.sh中提示的那条命令就会报错了,提示已经配置了CRS:
[root@dg1 ~]# /u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/roothas.pl
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
CRS is already configured on this node for crshome=/u01/app/11.2.0/grid
Cannot configure two CRS instances on the same cluster.
Please deconfigure before proceeding with the configuration of new home.
[grid@dg1 ~]$ crsctl check has
CRS-4638: Oracle High Availability Services is online
[grid@dg1 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ons
OFFLINE OFFLINE dg1
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
1 OFFLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE dg1
Oracle Restart问题暂时解决了,但是随后又发现一个问题:
[grid@dg1 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 152
Available space (kbytes) : 261968
ID : 1711295372
Device/File Name : /u01/app/11.2.0/grid/cdata/localhost/local.ocr
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
按理说OCR磁盘的路径应该在+DATA/asmsingle上的,怎么变成/u01/...这个本地路径了,而且grid用于居然执行不了下面这条命令:
[grid@dg1 ~]$ crsctl query css votedisk
Parse error:
"css" is an invalid argument
此时突然醒悟,怎么这里主机名是dg1呢?明明之前配置过/etc/hosts里的主机名是asm-single的,查看一下:
[root@dg1 ~]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.1.99 asm-single
用hostname重新改一下:
[root@dg1 ~]# hostname asm-single
[root@dg1 ~]#su - grid
[grid@dg1 ~]# exit
[root@asm-single ~]#
修改完以后切换下用户就显示正确的主机名了,再查看一下GI资源情况:
[grid@asm-single ~]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ons
OFFLINE OFFLINE dg1
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
1 OFFLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE dg1
仍然是dg1作为主机名,看来要重新执行roothas.pl再配置一次了
[root@asm-single ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl -deconfig -force -verbose
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Failed to write the checkpoint:"" with status:FAIL.Error code is 256
Can"t open /etc/oracle/scls_scr/asm-single/grid/ohasdstr for write: No such file or directory at /u01/app/11.2.0/grid/crs/install/s_crsconfig_lib.pm line 1332.
该命令无法执行,提示找不到asm-single路径,查看了一下,确实没有asm-single这一路径:
[root@asm-single ~]# cd /etc/oracle/scls_scr
[root@asm-single scls_scr]# ll
total 4
drwxr-x--- 4 root oinstall 4096 Aug 25 23:29 dg1
[root@asm-single scls_scr]#
既然已经认定主机是dg1,那么就先将错就错一下吧,把主机名改回dg1,直接修改/etc/hosts中的内容
然后用NETCA重新配置一下监听和NET SERVICE NAME,并把监听和数据库实例添加到Oracle Restart配置中:
[grid@dg1 ~]$ srvctl add database -d asmsingle -o /u01/app/oracle/product/11.2.0/dbhome_1
PRCD-1025 : Failed to create database asmsingle
PRKH-1014 : Current user "grid" is not the oracle owner user "oracle" of oracle home "/u01/app/oracle/product/11.2.0/dbhome_1"
应该是grid用户没有操作ORACLE_HOME的权限,用oracle用户试一下:
[oracle@dg1 ~]$ srvctl add database -d asmsingle -o /u01/app/oracle/product/11.2.0/dbhome_1
没有报错,此时再用grid用户查看一下Oracle Restart资源配置情况:
[grid@dg1 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
ONLINE ONLINE dg1
ora.ons
OFFLINE OFFLINE dg1
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asmsingle.db
1 OFFLINE OFFLINE
ora.cssd
1 OFFLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE dg1
可以看到,数据库实例已经添加进去了,同时监听也自动进来了,但是ASM和磁盘组都还没有,试试启动CRS
[grid@dg1 ~]$ crsctl start crs
CRS-4013: This command is not supported in a single-node configuration.
CRS-4000: Command Start failed, or completed with errors.
这是RAC中使用的命令,这里单实例ASM不能用它来启动全部资源,查看一下数据库的配置:
[grid@dg1 ~]$ srvctl config database -d asmsingle
Database unique name: asmsingle
Database name:
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Database instance: asmsingle
Disk Groups:
Services:
[grid@dg1 ~]$
此处可以看到,与ASM有关的都没有相应内容,先启动ASM实例,配置磁盘组DG
[grid@dg1 ~]$ export ORACLE_SID=+ASM
[grid@dg1 ~]$ sqlplus "/as sysdba"
SQL*Plus: Release 11.2.0.3.0 Production on Mon Aug 26 01:27:04 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-01078: failure in processing system parameters
ORA-29701: unable to connect to Cluster Synchronization Service
启动错误,无法连接CSS,查看一下CSS进程:
SQL> !
[grid@dg1 ~]$ crsctl check css
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
[grid@dg1 ~]$ ps -ef|grep cssd
grid 13704 13680 0 01:28 pts/5 00:00:00 grep cssd
[grid@dg1 ~]$ ps -ef|grep has
root 11620 1 0 Aug25 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
grid 11652 1 0 Aug25 ? 00:00:21 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
grid 13706 13680 0 01:28 pts/5 00:00:00 grep has
[grid@dg1 ~]$ ps -ef|grep d.bin
grid 11652 1 0 Aug25 ? 00:00:21 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
grid 11834 1 0 Aug25 ? 00:00:06 /u01/app/11.2.0/grid/bin/oraagent.bin
grid 11850 1 0 Aug25 ? 00:00:00 /u01/app/11.2.0/grid/bin/evmd.bin
grid 11887 11850 0 Aug25 ? 00:00:00 /u01/app/11.2.0/grid/bin/evmlogger.bin -o /u01/app/11.2.0/grid/evm/log/evmlogger.info -l /u01/app/11.2.0/grid/evm/log/evmlogger.log
grid 13156 1 0 00:51 ? 00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit
grid 13714 13680 0 01:28 pts/5 00:00:00 grep d.bin
[grid@dg1 ~]$ crsctl check has
CRS-4638: Oracle High Availability Services is online
has进程正常启动的,但是没有找到cssd的demon进程
[grid@dg1 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....ER.lsnr ora....er.type ONLINE ONLINE dg1
ora....ngle.db ora....se.type OFFLINE OFFLINE
ora.cssd ora.cssd.type OFFLINE OFFLINE
ora.diskmon ora....on.type OFFLINE OFFLINE
ora.evmd ora.evm.type ONLINE ONLINE dg1
ora.ons ora.ons.type OFFLINE OFFLINE
这里发现cssd和diskmon都是OFFLINE的,这2个服务是依赖于HAS维护的,分别查看这2个进程的参数:
[grid@dg1 ~]$ crs_stat -p ora.cssd
NAME=ora.cssd
TYPE=ora.cssd.type
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AUTO_START=never
CHECK_INTERVAL=30
DESCRIPTION="Resource type for CSSD"
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
PLACEMENT=balanced
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=600
START_TIMEOUT=600
STOP_TIMEOUT=900
UPTIME_THRESHOLD=1m
[grid@dg1 ~]$ crs_stat -p ora.diskmon
NAME=ora.diskmon
TYPE=ora.diskmon.type
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AUTO_START=never
CHECK_INTERVAL=3
DESCRIPTION="Resource type for Diskmon"
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
PLACEMENT=balanced
RESTART_ATTEMPTS=10
SCRIPT_TIMEOUT=60
START_TIMEOUT=600
STOP_TIMEOUT=60
UPTIME_THRESHOLD=5s
发现他们并不会随HAS启动而自己启动,于是手动启动它们:
[grid@dg1 ~]$ crsctl start res ora.cssd
CRS-2672: Attempting to start "ora.cssd" on "dg1"
CRS-2672: Attempting to start "ora.diskmon" on "dg1"
CRS-2676: Start of "ora.diskmon" on "dg1" succeeded
CRS-2676: Start of "ora.cssd" on "dg1" succeeded
这里发现启动了cssd后,diskmon也被启动了,其实他们是被绑在一起的2个服务,随便先启动哪个,另一个也会跟着启动,查看一下HAS资源情况
[grid@dg1 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....ER.lsnr ora....er.type ONLINE ONLINE dg1
ora....ngle.db ora....se.type OFFLINE OFFLINE
ora.cssd ora.cssd.type ONLINE ONLINE dg1
ora.diskmon ora....on.type OFFLINE OFFLINE
ora.evmd ora.evm.type ONLINE ONLINE dg1
ora.ons ora.ons.type OFFLINE OFFLINE
但事实好像有点出入,尽管现实diskmon也是succeeded,但这里仍然是OFFLINE状态,就算单独再启动一次diskmon,也是一样
[grid@dg1 ~]$ crsctl start res ora.diskmon
CRS-2672: Attempting to start "ora.diskmon" on "dg1"
CRS-2676: Start of "ora.diskmon" on "dg1" succeeded
[grid@dg1 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....ER.lsnr ora....er.type ONLINE ONLINE dg1
ora....ngle.db ora....se.type OFFLINE OFFLINE
ora.cssd ora.cssd.type ONLINE ONLINE dg1
ora.diskmon ora....on.type OFFLINE OFFLINE
ora.evmd ora.evm.type ONLINE ONLINE dg1
ora.ons ora.ons.type OFFLINE OFFLINE
先不管diskmon服务了,用grid启动ASM实例:
[grid@dg1 ~]$ sqlplus "/as sysdba"
SQL*Plus: Release 11.2.0.3.0 Production on Mon Aug 26 01:46:13 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-00099: warning: no parameter file specified for ASM instance
ORA-01031: insufficient privileges
SQL> exit
Disconnected
注意:这里必须要用sysasm来连接ASM实例,在管理ASM时,他的权限比sysdba还要大
[grid@dg1 ~]$ export ORACLE_SID=+ASM
[grid@dg1 ~]$ sqlplus "/as sysasm"
SQL*Plus: Release 11.2.0.3.0 Production on Mon Aug 26 01:46:57 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-00099: warning: no parameter file specified for ASM instance
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2227664 bytes
Variable Size 256537136 bytes
ASM Cache 25165824 bytes
ORA-15110: no diskgroups mounted
先查看一下磁盘组,那些磁盘没有挂载,这里并没有显示:
SQL> select name,state,type from v$asm_diskgroup;
no rows selected
因为环境是自己搭建的,所以知道有2个磁盘组,分别是DATA和FRA,直接进行挂载:
SQL> alter diskgroup DATA mount;
alter diskgroup DATA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"
SQL> alter diskgroup FRA mount;
alter diskgroup FRA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "FRA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "FRA"
DATA和FRA是之前为单实例创建的2个磁盘组,用的是EXTERNAL模式,DATA给了4个盘,FRA给了3个盘,每个盘都是3G
现在挂载失败,提示发现数量不足,EXTERNAL模式应该没理由盘不够啊,查看一下asm_diskstring参数:
SQL> show parameter string
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring string
原来asm_diskstring参数值为空,难怪说磁盘数量不足,原来是找不到磁盘,一个盘都没有,数量当然不足了
SQL> alter system set asm_diskstring="/dev/asm*";
System altered.
SQL> show parameter string
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring string /dev/asm*
SQL> alter system set asm_diskstring="/dev/asm*" scope=both;
alter system set asm_diskstring="/dev/asm*" scope=both
*
ERROR at line 1:
ORA-32001: write to SPFILE requested but no SPFILE is in use
这里本来是想把变更加到spfile里去的,但是这里并不是用spfile启动的,所以报错了,先不管,再试试挂载一下磁盘组
SQL> alter diskgroup DATA mount;
Diskgroup altered.
SQL> alter diskgroup FRA mount;
Diskgroup altered.
此处其实还有另外一条命令可以用:
SQL> alter system set asm_diskgroups=data,fra;(用此法加入磁盘组后是否直接是挂载状态还有待测试)
再查看一下数据库的配置情况:
[grid@dg1 ~]$ srvctl config database -t asmsingle
PRKO-2002 : Invalid command line option: -t
[grid@dg1 ~]$ srvctl config database -d asmsingle
Database unique name: asmsingle
Database name:
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Database instance: asmsingle
Disk Groups:
Services:
磁盘组依旧没有添加进内容,查看参数asm_diskgroup已经有这2个磁盘组了
SQL> show parameter asm_diskgroup
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups string DATA, FRA
SQL> select name,state,type from v$asm_diskgroup;
NAME STATE TYPE
------------------------------ ----------- ------
DATA MOUNTED EXTERN
FRA MOUNTED EXTERN
SQL> !
[grid@dg1 ~]$ asmcmd
ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 12288 10350 0 10350 0 N DATA/
MOUNTED EXTERN N 512 4096 1048576 9216 8984 0 8984 0 N FRA/
磁盘组已经正常挂载了,由于是用pfile启动的,想把结果保存到spfile中去:
SQL> show parameter spfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string
SQL> create spfile from pfile
2 ;
create spfile from pfile
*
ERROR at line 1:
ORA-29786: SIHA attribute GET failed with error [Attribute "SPFILE" sts[200]
lsts[0]]
可能是由于ASM资源并没有注册到HAS
[grid@dg1 ~]$ srvctl config asm
PRCR-1001 : Resource ora.asm does not exist
[grid@dg1 ~]$ srvctl add asm
[grid@dg1 ~]$ srvctl config asm
ASM home: /u01/app/11.2.0/grid
ASM listener: LISTENER
Spfile:
ASM diskgroup discovery string: ++no-value-at-resource-creation--never-updated-through-ASM++
这里不知道为何没有值,按理说应该已经能够识别到asm_diskstring了
ASM的alert日志在路径/u01/app/grid/diag/asm/+asm/+ASM/trace/alert_+ASM.log
查看了一下,有如下报错信息:
Mon Aug 26 02:25:32 2013
NOTE: failed to discover disks from gpnp profile asm diskstring
Errors in file /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_rbal_14046.trc:
ORA-29786: SIHA attribute GET failed with error [Attribute "ASM_DISKSTRING" sts[200] lsts[0]]
[grid@dg1 ~]$ srvctl status asm
ASM is not running.
[grid@dg1 ~]$ srvctl start asm
[grid@dg1 ~]$ srvctl status asm
ASM is running on dg1
[grid@dg1 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....ER.lsnr ora....er.type ONLINE ONLINE dg1
ora.asm ora.asm.type ONLINE ONLINE dg1
ora....ngle.db ora....se.type OFFLINE OFFLINE
ora.cssd ora.cssd.type ONLINE ONLINE dg1
ora.diskmon ora....on.type OFFLINE OFFLINE
ora.evmd ora.evm.type ONLINE ONLINE dg1
ora.ons ora.ons.type OFFLINE OFFLINE
SQL> create spfile from pfile;
create spfile from pfile
*
ERROR at line 1:
ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file
"/u01/app/11.2.0/grid/dbs/init+ASM.ora"
查看了一下路径,确实没有init+ASM.ora这个pfile,所以用pfile来创建spfile这个方法看来不行了
SQL> create spfile from memory;
File created.
但是可以从内存创建spfile,但是要注意,这条命令执行以后,会在$GRID_HOME/dbs创建spfile,全名是”spfile+ASM.ora“
对于单实例而言,这个问题还不大,如果是RAC环境,则必须要在spfile文件名后面指定在ASM中的路径,如:”+DATA/asmsingle/spfile+ASM.ora“
那么这里,我再重新创建一个spfile,来修改默认的spfile存放路径,这里涉及到GPNP profile内容的更新问题,原则是,会按照最新的保存位置来更新spfile
SQL> shutdown immediate
ASM diskgroups dismounted
ASM instance shutdown
SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2227664 bytes
Variable Size 256537136 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL> show parameter spfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string /u01/app/11.2.0/grid/dbs/spfil
e+ASM.ora
我们可以来看一下ASM中文件的存放路径:
ASMCMD> ls
DATA/
FRA/
ASMCMD> cd data
ASMCMD> ls
ASM/
ASMSINGLE/
ASMCMD> cd asmsingle
ASMCMD> ls
CONTROLFILE/
DATAFILE/
ONLINELOG/
PARAMETERFILE/
TEMPFILE/
spfileasmsingle.ora
ASMCMD> cd parameterfile
ASMCMD> ls
spfile.266.824164223
注意,如果是RAC环境,刚才的创建spfile的语句就要改成:SQL> create spfile="+DATA/asmsingle/" from memory;
否则会造成RAC中其他节点无法访问spfile而破坏RAC环境,这个spfile.266.824164223的文件是系统自动生成的,具体作用还未研究过
注意这里RAC用的spfile是上面那个spfileasmsingle.ora,这点千万不要搞错了
做完以上全部操作以后,再启动数据库:
[oracle@dg1 ~]$ sqlplus "/as sysdba"
SQL*Plus: Release 11.2.0.3.0 Production on Mon Aug 26 03:26:14 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 313159680 bytes
Fixed Size 2227944 bytes
Variable Size 117440792 bytes
Database Buffers 188743680 bytes
Redo Buffers 4747264 bytes
Database mounted.
Database opened.
[grid@dg1 ~]$ srvctl config database -d asmsingle
Database unique name: asmsingle
Database name:
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Database instance: asmsingle
Disk Groups: DATA,FRA
Services:
[grid@dg1 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.DATA.dg ora....up.type ONLINE ONLINE dg1
ora.FRA.dg ora....up.type ONLINE ONLINE dg1
ora....ER.lsnr ora....er.type ONLINE ONLINE dg1
ora.asm ora.asm.type ONLINE ONLINE dg1
ora....ngle.db ora....se.type ONLINE ONLINE dg1
ora.cssd ora.cssd.type ONLINE ONLINE dg1
ora.diskmon ora....on.type OFFLINE OFFLINE
ora.evmd ora.evm.type ONLINE ONLINE dg1
ora.ons ora.ons.type OFFLINE OFFLINE
OK,故障解决
PS:关于本测试还有2个疑问
测试重新在ASM上创建一个spfile,看默认是不是用这个spfile启动,即GPNP profile有没有生效(已解决)
ASMCMD> spget
/u01/app/11.2.0/grid/dbs/spfile+ASM.ora
ASMCMD> pwd
+data/asmsingle --注意这里是数据库实例,是我搞错了,应该进入的路径是+data/asm
ASMCMD> ls -l
Type Redund Striped Time Sys Name
Y CONTROLFILE/
Y DATAFILE/
Y ONLINELOG/
Y PARAMETERFILE/
Y TEMPFILE/
N spfileasmsingle.ora => +DATA/ASMSINGLE/PARAMETERFILE/spfile.266.824164223
spfileasmsingle.ora->spfile.266.824164223(OMF格式)是一个映射关系,都是数据库实例的spfile
ASMCMD> spset +DATA/asmsingle/spfileasmsingle.ora --之前错把数据库实例的spfile当做了asm实例的spfile,这里spset了一个错误的路径
ASMCMD> spget
+DATA/asmsingle/spfileasmsingle.ora
ASMCMD> exit
[grid@dg1 ~]$ srvctl config asm -a
ASM home: /u01/app/11.2.0/grid
ASM listener: LISTENER
Spfile: +DATA/asmsingle/spfileasmsingle.ora
ASM diskgroup discovery string: ++no-value-at-resource-creation--never-updated-through-ASM++
ASM is enabled.
(注:上面的spfile其实是数据库实例的,而并不是asm实例的,所以这里ASM diskgroup discovery string提示没有值)
SQL> show parameter diskgroup
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups string DATA, FRA
SQL> show parameter diskstring
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring string /dev/asm* --其实是有diskgroup string的
用spset设置spfile新的路径后是即时生效的,但前提是设置的这个spfile必须本来就是存在的,否则,只有通过下面的办法先创建一个ASM实例的spfile:
SQL>create spfile="+DATA/asm/asmparameterfile/asmspfile.ora" from pfile="$ORACLE_HOME/dbs/init+ASM.ora";
SQL> create spfile="+DATA/asm/asmparameterfile/asmspfile.ora" from pfile; --此处可以省略pfile的路径,默认就是去dbs路径下去找init+ASM.ora文件的
File created.
SQL> !
[grid@dg1 ~]$ asmcmd
ASMCMD> cd data/asm
ASMCMD> ls
ASMPARAMETERFILE/
ASMCMD> cd asmparameterfile
ASMCMD> ls
REGISTRY.253.824517131
asmspfile.ora
ASMCMD> spget
+DATA/asm/asmparameterfile/asmspfile.ora
可以看到,创建完后会直接更新GPNP profile,现在ASM实例的spfile已经是新指定的+DATA路径了
这里REGISTRY.253.824517131是系统自动生成的ASM实例的spfile文件,是OMF格式的,下面的asmspfile.ora是刚才那条命令创建的
此时如果以不带参数文件的startup启动ASM实例,则使用的是+DATA这一ASM磁盘上的spfile,而不再是之前本地磁盘上的spfile了
SQL> startup force
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2227664 bytes
Variable Size 256537136 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL> show parameter spfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string +DATA/asm/asmparameterfile/asm
spfile.ora
注意:这里已经用了刚才新创建的spfile指定的路径了,之前的理论得到验证
ASMCMD> cd +data/asm/asmparameterfile
ASMCMD> pwd
+data/asm/asmparameterfile
ASMCMD> ls -l
Type Redund Striped Time Sys Name
ASMPARAMETERFILE UNPROT COARSE AUG 27 00:00:00 Y REGISTRY.253.824517131
N asmspfile.ora => +DATA/ASM/ASMPARAMETERFILE/REGISTRY.253.824517131
当我们在ASM磁盘组上创建spfile后,不仅是更新GPNP profile后直接生效,还会把新建的spfile和系统原有的spifle做一个映射,asmspfile.ora->REGISTRY.253.824517131的映射,这个和数据库实例的spifle是类似的
最后在来验证一下:
ASMCMD> exit
[grid@dg1 ~]$ srvctl config asm -a
ASM home: /u01/app/11.2.0/grid
ASM listener: LISTENER
Spfile: +DATA/asm/asmparameterfile/asmspfile.ora
ASM diskgroup discovery string: /dev/asm*
ASM is enabled.
这时候发现ASM diskgroup discovery string里的值已经可以正确识别了,因为配置了正确的ASM实例的spfile
为何状态都正常的情况下,ora.diskmon还是OFFLINE状态(已解决)
关于ora.diskmon这个resource是专门为EXADATA环境准备的,对于11.2.0.3的非EXADATA环境,默认是被DISABLE的,我也是查了一篇帖子才知道,
它和CSS有依赖关系,虽然CSS启动的时候diskmon也显示succeeded,但是默认不会启动,让我困惑了好一会
这个是关于ora.diskmon在非EXADATA环境会DISABLE的链接:http://blog.csdn.net/mrluoe/article/details/7525457
后记:当从pfile创建spfile后,启动asm实例时,又引发了另一个错误:ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
为ASM指定的参数过期或废弃了
SQL> set line 200 pages 999
SQL> col name for a30
SQL> col value for a50
SQL> select t.NAME,t.value from v$parameter t where t.ISDEPRECATED="TRUE";
NAME VALUE
------------------------------ --------------------------------------------------
lock_name_space
remote_os_authent FALSE
background_dump_dest /u01/app/grid/diag/asm/+asm/+ASM/trace
user_dump_dest /u01/app/grid/diag/asm/+asm/+ASM/trace
sql_trace FALSE
发现有5个过期参数,asm实例的报警日志里也记录了2个参数,内容如下:
Deprecated system parameters with specified values:
background_dump_dest
user_dump_dest
End of deprecated system parameter listing
都共同指向了background_dump_dest和user_dump_dest,那我们只要把这两个参数注释掉,就可以解决问题了
先修改pfile,把其中的关于这两个参数的行注释掉,然后重新用pfile再创建一次spfile,再用spfile启动:
SQL> startup force pfile="/u01/app/11.2.0/grid/dbs/init+ASM.ora" --用注释掉那2个参数的本地pfile启动
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2227664 bytes
Variable Size 256537136 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL> create spfile="+DATA/asm/asmparameterfile/asmspfile.ora" from pfile; --重新用修改好的pfile创建一次spfile
File created.
SQL> startup force --重新启动后,不报ORA-32004错误了
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2227664 bytes
Variable Size 256537136 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
至此,一路碰到的ASM单实例故障全部解决,完!
————————————————
版权声明:本文为CSDN博主「aaron8219」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/aaron8219/java/article/details/10313397