Who is monitoring the OEM server ?


In all cases i’m relying on the the OEM to report incidents which occur on my monitor targets. But is no news always good new ? I had to experience this the hard way by finding my OEM down and production systems screaming for help.
To overcome this problem for the next time , i created a custom script to check availability on the OEM itself. This scripts will check the components which need to run , in order to let OEM work its magic !

This quick peek is part of a larger script with acts as a “heartbeat” to provide us with this necessary information.
On top of this checking mechanism , we also keep track of heartbeat calls. When a call is not received within an specific window , an e-mail is send notifying the support department.


LOG='/home/oracle/scripts/heartbeat.log'

OMS_HOME=`cat /etc/oratab | grep -v "^#|^*|^agent" |cut -f2 -d: -s| grep oms | sort -u`
DB_HOME=`cat /etc/oratab | grep -v "^#|^*|^agent" |cut -f2 -d: -s| grep db10g | sort -u`

# =========== FUNCTIONS ================

########################################
# Check OPMN status of our EM GRID
########################################

check_oms(){
echo "---[ Oms ] ------------------------------------------------" >> $LOG
# array declare
declare -a array
# Variables
OMS_OK=1
count=0

# Check the Environment
OPMN=`$OMS_HOME/opmn/bin/opmnctl status -fmt %prt20%sta10%utm -noheaders | grep -v logloader | grep -v DSA | grep -v dcm`
array=($OPMN)

num=${#array[@]}

# Check if OPMN is running or not , if not , 0 is returned in OPMN
if [ $num != 0 ]; then

# OPMN is running , so check component state

while [ $count -lt $num ]
do
base=$count
state=`expr $count + 2`
time=`expr $count + 4`
if [ "${array[state]}" != "Alive" ]; then
echo "Component ${array[base]} has a ${array[state]} state" >> $LOG
OMS_OK=0
fi
echo "Component ${array[base]} has a ${array[state]} state and is running for ${array[time]}" >> $LOG

# OHS - 11g , OC4J_EM - 10g

if [ "${array[base]}" = "OHS" ] || [ "${array[base]}" = "OC4J_EM" ]; then
OHSUP=${array[time]}
fi

count=`expr $count + 5`
done

else
OMS_OK=0
echo "OPMN is not Running !" >> $LOG
fi

}

When this function is triggered on a OEM server the following
Command line Output :

---[ Oms ] ------------------------------------------------
Component HTTP_Server has a Alive state and is running for 02:43:47
Component home has a Alive state and is running for 02:44:04
Component OC4J_EM has a Alive state and is running for 02:44:04
Component OC4J_EMPROV has a Alive state and is running for 02:44:04
Component OCMRepeater has a Alive state and is running for 02:44:04
Component WebCache has a Alive state and is running for 02:44:04
Component WebCacheAdmin has a Alive state and is running for 02:44:04