0%

记录一次oracle rac数据库启动故障修复过程

周末测试服务器意外断电关机了,早上启动起来机器,发现数据库起不来了,以下记录一下排查过程。
经过查看sx21和orcl两个数据库都是“Instance Shutdown”状态。

查看rac启动状态

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
[grid@rac1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.OCR.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.VOTINGDISK.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac1
ora.cvu
1 ONLINE ONLINE rac1
ora.oc4j
1 ONLINE ONLINE rac2
ora.orcl.db
1 ONLINE OFFLINE Instance Shutdown
2 ONLINE OFFLINE Instance Shutdown
ora.rac1.vip
1 ONLINE ONLINE rac1
ora.rac2.vip
1 ONLINE ONLINE rac2
ora.scan1.vip
1 ONLINE ONLINE rac1
ora.sx21.db
1 ONLINE OFFLINE Instance Shutdown
2 ONLINE OFFLINE Instance Shutdown

手动启动数据库

手动启动sx21数据库报错

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[grid@rac1 ~]$ srvctl start database -d sx21
PRCR-1079 : Failed to start resource ora.sx21.db
CRS-5017: The resource action "ora.sx21.db start" encountered the following error:
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], []
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/rac1/agent/crsd/oraagent_oracle/oraagent_oracle.log".

CRS-5017: The resource action "ora.sx21.db start" encountered the following error:
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01110: data file 6: '/u01/app/oracle/11g/dbs/E:APPADMINISTRATORORADATASX21ALK.DBF'
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/rac2/agent/crsd/oraagent_oracle/oraagent_oracle.log".

CRS-2674: Start of 'ora.sx21.db' on 'rac1' failed
CRS-2674: Start of 'ora.sx21.db' on 'rac2' failed
CRS-2632: There are no more servers to try to place resource 'ora.sx21.db' on that would satisfy its placement policy

手动启动orcl数据库报错

1
2
3
4
5
6
7
8
9
10
[grid@rac1 ~]$ srvctl start database -d orcl
PRCC-1014 : orcl was already running
PRCR-1004 : Resource ora.orcl.db is already running
PRCR-1079 : Failed to start resource ora.orcl.db
CRS-5017: The resource action "ora.orcl.db start" encountered the following error:
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/rac2/agent/crsd/oraagent_oracle/oraagent_oracle.log".

CRS-2674: Start of 'ora.orcl.db' on 'rac2' failed
CRS-2528: Unable to place an instance of 'ora.orcl.db' as all possible servers are occupied by the resource

恢复数据库sx21

另外开一个窗口,连接sx21数据库;恢复数据库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
[root@rac1 ~]# su - oracle
[oracle@rac1 ~]$ export ORACLE_SID=sx211
[oracle@rac1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Mon Apr 2 09:43:52 2018

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 2471931904 bytes
Fixed Size 2255752 bytes
Variable Size 687867000 bytes
Database Buffers 1761607680 bytes
Redo Buffers 20201472 bytes
Database mounted.
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [],
[], [], [], [], [], [], [], []


SQL> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01139: RESETLOGS option only valid after an incomplete database recovery


SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

恢复数据库orcl

另外orcl实例,情况不太一样;直接恢复的话,会报错;需要我们手动指定恢复文件。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
[oracle@rac1 ~]$ export ORACLE_SID=orcl11
[oracle@rac1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Mon Apr 2 09:51:24 2018

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 2471931904 bytes
Fixed Size 2255752 bytes
Variable Size 687867000 bytes
Database Buffers 1761607680 bytes
Redo Buffers 20201472 bytes
Database mounted.
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1],
[115], [49383], [49388], [], [], [], [], [], [], []

-- 查看数据库模式
SQL> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED
--此时数据库启动到mount状态,尝试以resetlogs方式open数据库,也无法打开
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01139: RESETLOGS option only valid after an incomplete database recovery


SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [krr_read_5], [115], [49383], [],
[], [], [], [], [], [], [], []
--查看当前日志文件情况
SQL> select v1.group#, member, sequence#, first_change# from v$log v1, v$logfile v2 where v1.group# = v2.group#;

GROUP#
----------
MEMBER
--------------------------------------------------------------------------------
SEQUENCE# FIRST_CHANGE#
---------- -------------
1
+DATA/orcl/onlinelog/group_1.261.970078827
115 4674751

2
+DATA/orcl/onlinelog/group_2.262.970078829
114 4622551

GROUP#
----------
MEMBER
--------------------------------------------------------------------------------
SEQUENCE# FIRST_CHANGE#
---------- -------------

3
+DATA/orcl/onlinelog/group_3.266.970078993
65 4646182

4
+DATA/orcl/onlinelog/group_4.267.970078995

GROUP#
----------
MEMBER
--------------------------------------------------------------------------------
SEQUENCE# FIRST_CHANGE#
---------- -------------
66 4688519

SQL> recover database using backup controlfile until cancel;
ORA-00279: change 4674751 generated at 04/01/2018 15:33:57 needed for thread 1
ORA-00289: suggestion : /u01/app/oracle/11g/dbs/arch1_115_970078828.dbf
ORA-00280: change 4674751 for thread 1 is in sequence #115
-- 记录住这里提示的change号和sqeuence号,去上边记录中查找文件名

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
+DATA/orcl/onlinelog/group_1.261.970078827
-- 输入文件路径名称,进行恢复
ORA-00279: change 4674751 generated at 04/01/2018 10:23:59 needed for thread 2
ORA-00289: suggestion : /u01/app/oracle/11g/dbs/arch2_65_970078828.dbf
ORA-00280: change 4674751 for thread 2 is in sequence #65

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
+DATA/orcl/onlinelog/group_3.266.970078993
ORA-00279: change 4688519 generated at 04/01/2018 18:01:42 needed for thread 2
ORA-00289: suggestion : /u01/app/oracle/11g/dbs/arch2_66_970078828.dbf
ORA-00280: change 4688519 for thread 2 is in sequence #66
ORA-00278: log file '+DATA/orcl/onlinelog/group_3.266.970078993' no longer
needed for this recovery


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
+DATA/orcl/onlinelog/group_4.267.970078995
Log applied.
Media recovery complete.
--恢复完成后,再次可以正常打开数据库,重启数据库正常。
SQL> alter database open resetlogs;

Database altered.

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

参考链接:
https://www.cnblogs.com/jionjionyou/p/5602019.html
https://blog.csdn.net/linucle/article/details/8778046