Problem:
For one of our client we have a backup strategy where we
take a RMAN hot backup every night for all the EBS enviroment with Oracle
Database 11.2.0.3.
But last night RMAN hot backup for our test environment
failed or I would say got killed with the below error messages:
RMAN-03009: failure of backup command on dsk_9 channel at 03/27/2014
03:19:39
ORA-00028: your session has been killed
ORA-00028: your session has been killed
continuing other job steps, job failed will not be re-run
|
At the same time we found below error message in the alert
alert log:
Thu Mar 27 03:18:11 2014
Errors in file
/mnt/TSTapps/oracle/trctstdb/11.2.0/dbhome_1/log/diag/rdbms/trctst/TRCTST/trace/TRCTST_ora_70736.trc
(incident=234835):
ORA-00494: enqueue [CF] held for too long (more than 900
seconds) by 'inst 1, osid 66183'
Incident details in: /mnt/TSTapps/oracle/trctstdb/11.2.0/dbhome_1/log/diag/rdbms/trctst/TRCTST/incident/incdir_234835/TRCTST_ora_70736_i234835.trc
|
Cause:
The reason for above behavior could
be high load on server or high concurrency on resources, IO waits , which keep
the Oracle background processes from receiving the necessary resources.
We verified and found high IO waits on our
server with sar -u 2 5 command, this probably could be the reason why kill
blocker interface killed our rman hot backup process.
Solutions:
This kill blocker interface / ORA-494 was
introduced in 10.2.0.4. This new mechanism will kill *any* kind of blocking
process, non-background or background. If that blocking process is background
process, then the instances crashes too.
The following situation can be avoided in
two ways:
- If you want to avoid the kill of the blocker (background or non-background process), you can set.
- _kill_controlfile_enqueue_blocker=false.
This means that no type of blocker will be
killed anymore
2. In order to prevent a background blocker
from being killed, you can set the following init.ora parameter to 1 (default
is 3).
- _kill_enqueue_blocker=1
With this parameter, if the enqueue
holder is a background process, then it will not be killed, therefore
the instance will not crash.
NOTE:A common root cause of an ORA-494 are very frequent loq switches as can be seen in the alert.log. In those cases the enqueue is normally held by the CKPT process.
References:
Ora-00494: Enqueue [Cf] Held For Too Long
Causing Database To Crash [ID 1101862.1]
PHYSICAL: ORA-00494: enqueue [CF] held for too long after Node Crash [ID 747071.1]
PHYSICAL: ORA-00494: enqueue [CF] held for too long after Node Crash [ID 747071.1]
ORA-00494 Or ORA-600 [2103] During High Load
After 10.2.0.4 Upgrade (Doc ID 779552.1)
Database Crashes With ORA-00494 (Doc ID
753290.1)
No comments:
Post a Comment