Some customers have
experienced the DHCP server to abort itself, generating a core file with DHCP
server log messages indicating that the process is aborting as it was unable to
create a thread. Prior to the server abort, memory usage may be seen to
increase significantly (100 MB or more) after a reload. This occurs on Linux,
typically impacting users who have fairly large configurations (where the DHCP
server uses 2 GB or more of memory), and occurs on a reload (after a few to
hundreds of reloads). For more information, refer CSCus91865.
When the server
aborts either due to the inability to create a thread or loss of memory, the
cnrservagt automatically starts a new DHCP server process. Thus, the impact to
most users is:
- Slightly longer reloads.
files (3.5GB to just over 4 GB) in the /opt/nwreg2/local directory must be
periodically removed to avoid running out of disk space. Whether these core
files are created and how, depends on the system settings (see man pages for
take a long time while reloading prior to exiting. The server is found to be
using 100% CPU on one processor and spending most of its time in memory
allocation system calls.
In working with
Red Hat on this issue, it was determined to result from the behavior of the
glibc MALLOC library, and the pattern of memory allocations and thread usage
within the DHCP server - the two do not play nicely.
The MALLOC library
uses the concept of ARENAs (memory pools) to improve performance and reduce the
need for locks and reducing lock contention. However, at times the ARENAs are
reused differently than they were used earlier in the life of the process and
memory held by an ARENA is thus not necessarily reused or freed to the system.
Thus this can thus result in many ARENAs holding large amounts of memory -
increasing the memory required for the DHCP server process. Eventually, most of
the memory space is in use (or what is still available is fragmented), and when
the server requests the system to create a thread, the system is unable to
obtain the necessary contiguous mappable space for the thread - and hence the
thread creation fails and the server considers this "fatal" and (by design)
This is known to
occur on Red Hat Enterprise Linux (RHEL)/CentOS 5.x with Network Registrar 8.2
and earlier. It may also occur on RHEL/CentOS 6.x with Network Registrar 8.3.
There are several
workarounds possible, as described in the following sections. The table below
indicates the workaround options: