Aggregate CPU, Disk, and Network Bandwidth Utilization
You can monitor the aggregate CPU, disk, and network bandwidth utilization across all the hosts in a cluster. The metrics are collected in the following ways:
-
Aggregate CPU and Disk metrics: For every host that is running the job, the PID collects the percentage of CPU and memory used by the job. The sum of all these percentages gives the aggregate CPU and disk metrics.
-
Aggregate network bandwidth metrics: For aggregate network bandwidth of one node, obtain the network bandwidth on each network interface, and then add them. Similarly network bandwidths are measured for all the nodes in the cluster. The sum of all these bandwidths provides the aggregate network bandwidth metrics for the cluster.
-
Duration of long-running jobs: A Rest API collects the start time, elapsed time, and end time for each job identified on the cluster. The difference between start time and end time provides the duration of completed jobs. The elapsed time reports the duration of the jobs running currently.