WEM Process Monitor

The Process Monitor is a script used for monitoring and maintaining WEM application software processes. It provides the flexibility of being run as a stand-alone program or a fully functional background daemon and is capable of logging to syslog and log file with customizable e-mail notification facilities. The ability to monitor and maintain these processes coupled with the ease and flexibility of this feature’s configuration ensure maximum availability and optimum server performance.

The processes are monitored based on rules defined within a configuration file stored on the server. These rules identify criteria for determining the action to be taken if the conditions are met. (For example, if a process becomes unresponsive, the rule could cause the process to be re-spawned.)

Basic parameters pertaining to the Process Monitor were configured as part of the WEM installation or upgrade process. However, additional parameters are available for fine-tuning its configuration to meet the needs of your network.

IMPORTANT:

When installing redundant servers, any WEM services started by the <ems/home>/server/servstart command and monitored by the Process Monitor are not started automatically. Refer to Appendix A for information on installing redundant servers using Oracle Cluster software.

This chapter provides information and procedures for modifying the Process Monitor’s operation using the configuration file.

This chapter includes the following topics:

IMPORTANT:

Unless otherwise specified, all information in this chapter applies to both Sun Solaris- and Red Hat Enterprise Linux-based WEM systems.

Process Monitor Configuration File

Processes are monitored based on rules defined within a plain-text configuration file called psmon.cfg located in the /users/ems/server/etc directory (by default).

Rules are defined within the file using the following syntax:
<Process process_id>
    processing-directives directive_variables
</Process>
The following table provides the syntax descriptions.
Table 1. psmon.cfg Rule Syntax Descriptions
Syntax Description
<Process process_id>

Identifies the process to be monitored.

The process can be identified by name (i.e. ssh) or by URL (i.e. /users/ems/server/bin/server). Additionally, a wildcard (*) can be specified for monitoring all the processes.

Note:

To maintain proper system operation, extreme care should be taken when specifying the wildcard.

processing-directives directive_variables

Identifies the criteria that must be met before action is taken as well as the action to be taken. The criteria are referred to as “Directives”, while the actions are referred to as “Process Scope Directives”.

It is important to note that processing directives can be specified within a specific rule or outside of all rules. When outside, the directive is applied globally to all rules. The psmon.cfg Processing Directives table provides information on the supported processing directives.

</Process>

Indicates the end of a process rule.




Table 2. psmon.cfg Processing Directives
Directive Description

Facility

Identifies the syslog facility used for logging. The following facilities are supported:
  • LOG_KERN
  • LOG_USER
  • LOG_MAIL
  • LOG_DAEMON
  • LOG_AUTH
  • LOG_SYSLOG
  • LOG_LPR
  • LOG_NEW
  • SLOG_UUCP
  • LOG_CRON
  • LOG_LOCAL0
  • LOG_LOCAL1
  • LOG_LOCAL2
  • LOG_LOCAL3
  • LOG_LOCAL4
  • LOG_LOCAL5
  • LOG_LOCAL6
  • LOG_LOCAL7

Default: LOG_DAEMON

LogLevel

Identifies the log level priority used to mark notifications to syslog. The following levels are supported:
  • LOG_EMERG
  • LOG_ALERT
  • LOG_CRIT
  • LOG_ERR
  • LOG_WARNING
  • LOG_NOTICE
  • LOG_INFO
  • LOG_DEBUG

Note:

The log level used by a notification for any failed action is automatically raised to the next level in order to indicate failure.

This directive may also be used in a Process Scope Directive which has precedence over a global declaration.

Default: LOG_NOTICE

KillLogLevel (previously KillPIDLogLevel)

Identical to the LogLevel directive, however it only applies to process kill actions.

This directive has precedence over the LogLevel Directive and may also be used in a Process Scope Directive which has precedence over a global declaration.

SpawnLogLevel

Identical to the LogLevel directive, however it only applies to process spawn actions.

This directive has precedence over the LogLevel Directive and may also be used in a Process Scope Directive which has precedence over a global declaration.

AdminEmail

Specifies the e-mail address to which notification e-mails should be sent.

This directive corresponds to the To Email-ID parameter configured during the WEM installation. During the installation, it is stored as a global declaration, however it may also be used in a Process Scope Directive which has precedence over a global declaration.

Default: root@localhost

NotifyEmailFrom

Specifies the e-mail address used in the “From” field of sent notification e-mails.

Default: <username>@hostname

Frequency

The frequency (measured in seconds) at which the Process Monitor attempts communication with a process.

This directive corresponds to the Poll Interval parameter configured during the WEM installation. During the installation, it is stored as a global declaration, however it may also be used in a Process Scope Directive which has precedence over a global declaration.

Default: 30 seconds

LastSafePID

Specifies the highest process identification number which the Process Monitor cannot “kill”. When specified, the Process Monitor never attempts to kill a process ID which is numerically less than or equal to the value defined by this directive.

Note:

The Process Monitor never attempts to kill itself, or a process ID less than or equal to 1.

This directive is treated as a global directive by default.

Default: 800

ProtectSafePIDsQuietly

Enables or disables the suppression of all notifications for preserved process IDs when used in conjunction with the lastsafepid directive.

“On” enables this functionality. “Off” disables it.

This directive is treated as a global directive by default.

Default: Off

SMTPHost

Specifies the IP address or hostname of the Simple Mail Transport Protocol (SMTP) server used for sending e-mail notifications.

This directive corresponds to the SMTP Server Name parameter configured during the WEM installation. During the installation, it is stored as a global declaration, however it may also be used in a Process Scope Directive which has precedence over a global declaration.

Default: localhost

SMTPTimeout

Specifies the timeout (measured in seconds) used during SMTP connections.

Default: 20 seconds

SendmailCmd

Enables the configuration of the sendmail command used for sending e-mail notifications if a failure occurs with the SMTP connection to the host specified by the SMTPHost Directive.

Default: /usr/sbin/sendmail -t

Dryrun

Forces this Process Monitor to function as if the --dryrun command line switch had been specified.

This can be used for forcing a specific configuration file to only report information but never take any automated action.

This directive is treated as a global directive by default.

Default: False (disabled)

NotifyDetail

Specifies the verbosity of the notification e-mails. The following levels are supported:
  • Simple
  • Verbose
  • Debug

Default: Verbose

PROCESS SCOPE DIRECTIVES

SpawnCmd

Identifies the full command line to be executed in order to re-spawn a dead process.

KillCmd

Identifies the full command line to be executed in order to gracefully shutdown or kill a rogue process.

If the command returns a boolean true exit status then, it is assumed that the command failed to execute successfully. If no KillCmd is specified or the command fails, the process is killed by sending a SIGKILL signal with the standard kill() function.

PIDFile

Identifies the full path and filename of a file created by the process containing the identification number of its main parent process.

NUMRETRY

The number of times the Process Monitor attempts to communicate with an un-responsive process before taking action.

If the process has not responded to the final attempt within the configured timeout interval, the system considers it unreachable and takes action.

This directive corresponds to the Number of Retries parameter configured during the WEM installation. During the installation, it is stored as a Process Scope Directive for each rule defined for WEM process.

Default: 10

TMINTVAL

The amount of time (measured in seconds) the system should wait prior to re-attempting to communicate with an un-responsive process.

Once the time interval has been reached, the system re-attempts communication for the configured number of retries.

This directive corresponds to the Timeout Interval parameter configured during the WEM installation. During the installation, it is stored as a Process Scope Directive for each rule defined for WEM process.

Default: 330

TTL

Specifies a maximum time-to-live (in seconds) for a process.

The process is killed once it has been running longer than this value, and its process identification number is removed from the defined pidfile.

PctCpu

Specifies the maximum allowable percentage of CPU time a process may use.

The process is killed once its CPU usage exceeds this threshold and its process identification number is removed from the defined pidfile.

PctMem

Specifies the maximum allowable percentage of total system memory a process may use.

The process will be killed once its memory usage exceeds this threshold and its process identification number is removed from the defined pidfile.

Instances

Specifies the maximum number of instances of a process that are allowed to run simultaneously.

The process will be killed once its memory usage exceeds this threshold and its process identification number is removed from the defined pidfile.

NoEmailOnKill

Enables or disables the suppression of e-mail notifications for killed processes.

Default: False (disabled)

NoEmailOnSpawn:

Enables or disables the suppression of e-mail notifications for spawned processes.

Default: False (disabled)

NoEmail

Enables or disables the suppression of all e-mail notifications.

Default: False (disabled)

NeverKillPID

Specifies a list or process identification numbers (separated by spaces) that are never to be killed.

Default: 1

NeverKillProcessName

Specifies a list or process names (separated by spaces) that are never to be killed.

Default: kswapd kupdated mdrecoveryd



Default Rules

During installation, WEM provided the user with the opportunity to automatically define rules for the following process monitors:
  • EMS Server: Enabled by default
  • Bulkstat Server: Disabled by default
  • Bulkstat Parser: Enabled by default
  • Script Server: Disabled by default
  • Northbound (NB) Server: Disabled by default
  • Notification Service: Disabled by default

IMPORTANT:

Two additional WEM processes are pre-configured to be monitored by the Process Monitor: the Postgres database process and the Apache Webserver process. Configurables for these two processes appear in the psmon.cfg file, but they cannot be altered during the WEM installation process.

The following table identifies the default rules configured for each of the above processes.
Table 3. Default Rules for WEM Process Monitors Processes
Process Default Rule
EMS Server

<Process /<ems_dir>/server/bin/server>

spawncmd (cd /<ems_dir>/server; /<ems_dir>/server/bin/server)

pidfile /<ems_dir>/server/server.pid

numretry 10

tmintval 330

</Process>

Bulkstat Server

<Process /<ems_dir>/server/bin/bulkstatserver>

spawncmd (cd /<ems_dir>/server; /<ems_dir>/server/bin/bulkstatserver)

pidfile /<ems_dir>/server/bsserver.pid

numretry 10

tmintval 330

</Process>

Bulkstat Parser

<Process /<ems_dir>/server/bin/bulkstatparser>

spawncmd (cd /<ems_dir>/server; /<ems_dir>/server/bin/bulkstatparser)

pidfile /<ems_dir>/server/bulkstatparser.pid

numretry 10

tmintval 330

</Process>

Script Server

<Process /<ems_dir>/server/bin/scriptsrv>

spawncmd (cd /<ems_dir>/server; /<ems_dir>/server/bin/scriptsrv)

pidfile /<ems_dir>/server/script.pid

numretry 10

tmintval 330

</Process>

Northbound Server

<Process /<ems_dir>/server/bin/nbserver>

�spawncmd (cd /<ems_dir>/server; /<ems_dir>/server/bin/nbserver)

pidfile /users/ems/server/nbserver.pid

numretry 10

tmintval 330

</Process>

Notify Service

<Process /<ems_dir>/server/bin/Notify_Service>

spawncmd(cd /<ems_dir>/server; /<ems_dir>/server/bin/nbSrvr

pidfile /<ems_dir>/server/notify_service.pid

numretry 10

tmintval 330

</Process>

Postgres Database

<Process /<ems_dir>/postgresx.x.x/bin/postmaster -i>

spawncmd /<ems_dir>/server/scripts/postgresctl start

pidfile /<ems_dir>/postgresx.x.x/data/postmaster.pid

numretry 10

tmintval 330

</Process>

Apache Webserver

<Process /<ems_dir>/apache/bin/httpd -f /<ems_dir>/apache/conf/httpd.conf>

spawncmd /<ems_dir>/apache/bin/apachectl start

pidfile /<ems_dir>/apache/logs/httpd.pid

numretry 10

tmintval 330

</Process>


Verifying the Process Monitor Status

The status of the Process Monitor can be checked at any time by executing either of the following commands:

ps -ef | grep
psmon
or
./serv monitor
The first command indicates whether or not an active psmon process is running. The second command performs one of the following:
  • If the Process Monitor is stopped, executing this command is equivalent to executing the ./serv monitor start command (refer to the Manually Starting the Process Monitor section in this chapter).
  • If the Process Monitor is currently running, a message is displayed indicating that it is running. A prompt is also provided that allows you to restart the process. To restart the process, enter yes.

Manually Stopping the Process Monitor

Upon installation of the WEM, the Process Monitor is started automatically. This section provides instructions for manually disabling it. This can be useful if changes are made to the configuration file.

Follow the instructions below to manually stop the Process Monitor.

  1. Login as the root user.
  2. Go to the directory in which the WEM Server application file is located. By default, this is the /<ems_dir>/server directory. Enter the following command:
    cd /<ems_dir>/server
    
  3. Stop the Process Monitor by entering the following command:
    ./serv monitor
    stop
    
  4. Verify that the Process Monitor has stopped by executing the following command:
    ps -ef | grep
    psmon
    
    If the Process Monitor was successfully stopped, this command finds no active process and returns no result.

Manually Starting the Process Monitor

Upon installation of the WEM, the Process Monitor is started automatically. However, if the process was stopped, it can be started using the information and instructions in this section.

  1. Login as the root user.
  2. Go to the directory in which the WEM Server application file is located. By default, this is the /<ems_dir>/server directory. Enter the following command:
    cd /<ems_dir>/server
    
  3. Start the Process Monitor by entering the following command:
    ./serv monitor
    start
    
    Once the Process Monitor is started, a status message is displayed. The process identification number assigned to the psmon process, and the directory in which the created log file is located is also displayed.

Running the Process Monitor as a Stand-alone Application

As mentioned previously, the Process Monitor can be run as a background deamon (the default operation when enabled during the installation of the WEM) or a stand-alone application.

This section provides information and instructions for running the Process Monitor as a stand-alone application from the command line interface.

To run the Process Monitor from the command line, use the following instructions:

  1. Login as the root user.
  2. Go to the directory in which the Process Monitor application file is located. By default, this is the /<ems_dir>/server directory. Enter the following command:
    cd /<ems_dir>/server
    
  3. Launch the Process Monitor application by entering the following command:
    ./psmon [--conf=filename] [--daemon] [--cron] [--user=user] [--adminemail=emailaddress] [--dryrun] [--help] [--version]
    
    Keyword/Variable Description
    --conf=filename Specify an alternate configuration file name (other than the psmon.cfg file).
    --daemon Start as background deamon.
    --cron Disables already running errors when trying to launch (i.e. with the --daemon option).
    --user=user Specifies that only processes running under the specified username should be scanned.
    --adminemail=emailaddress Specifies the e-mail address to send notifications to.
    --dryrun Provides notifications but does not kill or spawn new processes.
    --help Displays the supported keywords.
    --version Displays the version information.


    Depending on the command used, the rules dictated by the configuration file are executed.