“Hello, Mr. Li!”

“Xiao Zhang, look at the ERP database quickly, the application can’t be opened again!”

“Yes, right away.”

Xiao zhang took out the computer from the black backpack, connected to the mobile phone hot spot began to check, just connected to the database, the phone rang again…..

This scenario is all too familiar to Oracle DBAs: whenever an application has a problem, anywhere, it is always the first to receive a call, and in severe cases it can lead to a barrage of calls.

Such problems between novice and expert, first is the mentality, the novice encounter problems lie in the heart, I do not know where laid a hand on him, timid carelessness, because experts experience, often calm, strategizing, cobwebs, calibrated boldness, but experience it is to eat salt, through the bridge, must do more, learn more than person. And another very important is the diagnostic ideas and auxiliary scripts, this article describes the common processing ideas under various scenarios, share some of the scripts used to help you quickly locate problems and solve, reduce business interruption events, become an expert as soon as possible, promotion and pay rise, marry…

>>>> View the operating system load

After boarding the database server, the first is to confirm whether the CPU, memory, I/O is abnormal through the system command, each system command is not the same, the common have top, topas, vmstat, iostat.

>>>> to view wait events

The second step is to connect to the database view events of waiting, this is the most basic monitoring, inspection, diagnosis database, usually 81% of the problems can all be waiting for the event as a reason, it is the most direct embodiment of the database operation, the following script is to look at each waiting for the number of events, the waiting time, in addition to some of the common side by side IDLE wait event.

Wait_event col event for a45 SELECT inst_id, event, SUM(DECODE(WAIT_TIME, 0, 0, 1)) "Prev", SUM(DECODE(WAIT_TIME, 0, 0, 1)) "Prev" 0, 1, 0)) "Curr", COUNT(*) "Tot" , sum(SECONDS_IN_WAIT) SECONDS_IN_WAIT FROM GV$SESSION_WAIT WHERE event NOT IN ('smon timer','pmon timer','rdbms ipc message','SQL*Net message from client','gcs remote message') AND event NOT LIKE '%idle%' AND event NOT LIKE '%Idle%' AND Event NOT LIKE '%Streams AQ%' GROUP BY inst_id, event ORDER BY 1,5 desc;

Here, we need to understand the reasons for some common abnormal waiting events and form a conditioned reflex. Library cache lock, read by other session, row cache lock, buffer busy file, latch:shared pool, gc buffer busy file, cursor: datafile. Pin S on X, Direct Path Read, Log File Sync, ENQ: TX-Index Contention, PX Deq Credit: Send BLKD, Latch Free, ENQ: TX-Row lock contention, and so on. If the number of events you can see on hold or on hold for a long time, that’s where you can see the main reason for contention.

>>>> Search sessions based on wait events

After we get an exception wait event, we look up the session details based on the wait event, which session execute which SQL is waiting, and we also look up the user name and machine name, and whether it is blocked. In addition, the following script can be written to look up sessions by user, look up sessions by SQL\_ID, and so on.

Session_by_event SELECT /*+rule */ sid, s.serial#, spid, event, sql_id, seconds_in_wait ws, row_wait_obj# obj, s.username, s.machine, BLOCKING_INSTANCE||'.'||blocking_session b_sess FROM v$session s, v$process p WHERE event='&event_name' AND s.paddr = p.addr order by 6;

>>>> to find details of a session

Once you have the list of sessions, you can query the details of a session based on the following SQL, such as SQL\_ID of the last execution, login time, etc., or this SQL can be written to multiple sessions.

Session_by_sid SELECT s.sid, s.serial#, spid, event, sql_id, PREV_SQL_ID, seconds_in_wait ws, row_wait_obj# obj, seconds_in_wait s.username, s.machine, module,blocking_session b_sess,logon_time FROM v$session s, v$process p WHERE sid = '&sid' AND s.paddr = p.addr;

>>>> to query object information

You can see the object ID that the session is waiting for from both of the previous SQL statements, and you can query the object details with the following SQL.

Obj_info col OBJECT_NAME for a30 select owner, OBJECT_NAME,subobject_name,object_type from dba_objects where object = 1 object_id=&oid;

>>>> query SQL statement by SQL\_ID, HASH\_VALUE If not found in V $SQLAREA, you can try the DBA\_HIST\_SQLTEXT view.

Sql_text select sql_id,SQL_fullTEXT from v$sqlarea where (sql_id=' &SQLID 'or hash_value=to_number('&hashvale')) and rownum<2;

SQL statement execution plan, object statistics, performance diagnosis, trace SQL, etc. I will not expand here, later planned a similar series, please pay attention to.

>>>> Query session blocking

Query how many sessions are blocked by a particular session with the following SQL.

Select count(*),blocking_session from v$session where blocking_session is not null group by blocking_session;

>>>> to query the database locks

If the ctime is greater than 100 seconds, 30% of the time is due to manipulation of the lock table, causing the application SQL to block and not be able to run.

Mysql > select username for a15 col owner for a15 col OBJECT_NAME for a30 col SPID for a10 select /*+rule*/SESSION_ID,OBJECT_ID,ORACLE_USERNAME,OS_USER_NAME,PROCESS, LOCKED_MODE from gv$locked_object where session_id=&sid; Select /*+rule*/* from v$lock where ctime >100 and type in ('TX','TM') order by 3,9; , query the database select lock / * + * / s.s rule id, p. pid, l.t ype, round (Max (l.c time) / 60, 0) lock_min,s.sql_id,s.USERNAME,b.owner,b.object_type,b.object_name from v$session s, v$process p,v$lock l,v$locked_object o,dba_objects b where o.SESSION_ID=s.sid and s.sid=l.sid and o.OBJECT_ID=b.OBJECT_ID and s.paddr = p.addr and l.ctime >100 and l.type in ('TX','TM','FB') group by S.s id, p. pid, l.t ype, s.s ql_id, s.U SERNAME, b.o wner, b.o bject_type, b.o bject_name order by 9,1,3

>>>> Keep evidence at the scene

The following script is the systemstate dump and hanganalyze steps. If SQLPLUS fails to log in, you can add the -prelim parameter.

--systemstate dump
sqlplus -prelim / as sysdba
oradebug setmypid
oradebug unlimit;
oradebug dump systemstate 266;
--wait for 1 min
oradebug dump systemstate 266;
--wait for 1 min
oradebug dump systemstate 266;
oradebug tracefile_name;

 --hanganalyze
oradebug setmypid
oradebug unlimit;
oradebug dump hanganalyze 3
 --wait for 1 min
oradebug dump hanganalyze 3
--wait for 1 min
oradebug dump hanganalyze 3
oradebug tracefile_name

> > > > kill session

Usually, in order to quickly restore the business after the initial problem, you need to kill some sessions, especially batch kill sessions, and sometimes directly kill all the processes with LOCAL=NO. When you kill a session, you must check and confirm it, but not execute it on other nodes or other servers.

- the ink head sheave kill_sess set line 199 col event format a35 - kill a SID session SELECT / * + * / rule SID, s.s erial#, 'kill 9' | | spid, event, blocking_session b_sess FROM v$session s, v$process p WHERE sid='&sid' AND s.paddr = p.addr order by 1; - according to SELECT SQL_ID kill session / * + * / rule sid, s.s erial#, 'kill 9' | | spid, event, blocking_session b_sess FROM v $session s, v$process p WHERE sql_id='&sql_id' AND s.paddr = p.addr order by 1; - according to wait for events to kill session SELECT / * + * / rule sid, s.s erial#, 'kill 9' | | spid, event, blocking_session b_sess FROM v $session s, v$process p WHERE event='&event' AND s.paddr = p.addr order by 1; - according to user's kill session SELECT / * + * / rule sid, s.s erial#, 'kill 9' | | spid, event, blocking_session b_sess FROM v $session s, v$process p WHERE username='&username' AND s.paddr = p.addr order by 1; Ps - ef - kill all LOCAL = NO process | grep LOCAL = NO | grep $ORACLE_SID | grep -v grep | awk '{print $2}' | xargs ki

>>>> Restart Dafa

tail -f alert_.log
alter system checkpoint;
alter system switch logfile;
shutdown immediate;

startu

Static parameters, such as the need to modify the problem such as memory, need to restart the database, (don’t think resume is very LOW, in many cases, in order to quickly recover business often use this from the cafe room to spin), remember don’t screwing problem cause at this time, as a subject research, our priority is to restore the business.

>>>>CRT button tips

In addition, a small tip is to organize the commonly used scripts into the Button Bar of SecureCRT. Just click the Button set, which is equivalent to directly execute the corresponding SQL statement, so that you don’t have to paste and copy the execution every time, or upload the script to each server. However, do not set DDL and other operational button, so as not to miss the point.

The above is some of the scripts used in database problems, especially when the application is slow and stuck. In addition, it is recommended to read the scripts first and then use them. You can also rewrite them according to your own environment, so as to integrate and accumulate experience.

I put these scripts are organized into the ink wheel common scripts, above there are many other monitoring, management, diagnosis tool scripts, you can go to the above free copy download to use. (Click below to download a free copy to download ~)

It can be downloaded from cs.enmotech.com/scripts