Introduction
This document describes how to troubleshoot high disk space utilization issue for the /dev/vda3 filesystem in RCM.
Prerequisites
Requirements
Cisco recommends that you have knowledge of:
- StarOS Control and User Plane Separation (CUPS) system architecture and administration.
- Basic Linux/Unix commands for filesystem and disk usage monitoring.
Components Used
This document is not restricted to specific software and hardware versions.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
Overview
In Cisco Ultra Packet Core deployments with Control and User Plane Separation (CUPS), the Redundancy Control Manager (RCM) plays a critical role in control plane operations and management. Stable filesystem utilization on RCM nodes is important to ensure smooth functioning of logging, monitoring, and subscriber session management.
High disk space utilization on the root filesystem (/dev/vda3) can cause system instability, failures in log writes, or even service restarts if left unchecked. This article outlines the analysis, troubleshooting steps, and preventive measures to address high disk utilization in RCM nodes.
Analysis and Observation
During monitoring, it was found that the RCM node reached 72% utilization on its root filesystem.
Disk Utilization Snapshot
df -kh
Filesystem Size Used Avail Use% Mounted on
tmpfs 6.3G 9.7M 6.3G 1% /run
/dev/vda3 39G 27G 11G 72% /
tmpfs 32G 4.0K 32G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/vda2 488M 48K 452M 1% /var/tmp
/dev/vda1 488M 76K 452M 1% /tmp
On further investigation, it was observed that journal logs under /var/log/journal/ had grown significantly. Logs generated during July alone accounted for ~3 GB of space.


Troubleshoot Process
To bring the disk utilization under control, the required change implementation steps were applied:
Step 1: Clean Up Old Logs Using journalctl Vacuum
Retain only the last 2 weeks of logs:
sudo journalctl --vacuum-time=2weeks
Or limit journal size (for example, keep only 600 MB):
sudo journalctl --vacuum-size=600M
Step 2: Configure journald Retention for Future Prevention
Edit journald configuration:
vi /etc/systemd/journald.conf
Add/modify parameter:
MaxRetentionSec=2week
Apply configuration:
sudo systemctl restart systemd-journald
Optional Step 3: Resolve Restart Error
While restarting the systemd-journald service in Step 2, you can get a concerned error:
Error : Failed to allocate directory watch: Too many open files
-
systemd-journald uses inotify to watch log directories for changes.
-
Each watch or monitor it sets up counts towards certain kernel limits.
The current limits defined in the problematic RCM are:
cat /proc/sys/fs/inotify/max_user_watches
501120
cat /proc/sys/fs/inotify/max_user_instances
128
ulimit -n
1024
From the collected output:
- Max identify watches: 501120
- Max inotify instances: 128
Open file descriptor limit for journald: 1024
Either (or all) of the output values limit could have hit leading to the error. So, we collected the current used value and compared them with collected output limit:
sudo lsof -p $(pidof systemd-journald) | wc -l
65
echo "Root inotify instances: $(sudo find /proc/*/fd -user root -type l -lname 'anon_inode:inotify' 2>/dev/null | wc -l) / $(cat /proc/sys/fs/inotify/max_user_instances)"
Root inotify instances: 126 / 128
It looks like the root is already using 126 out of 128 allowed inotify instances. That leaves journald with almost no room to create a new inotify instance when we restart it.
To resolve the error: we can increase the max_user_instances value and then restart the service:
# Temporarily increase the limit (until next reboot)
echo 256 > /proc/sys/fs/inotify/max_user_instances
sudo systemctl restart systemd-journald
# Temporarily increase the limit (until next reboot)
echo 256 > /proc/sys/fs/inotify/max_user_instances
sudo systemctl restart systemd-journald
Post-Change Verification
After applying the changes, disk utilization dropped to 61%, restoring the node to normal operating state.
df -kh
Filesystem Size Used Avail Use% Mounted on
tmpfs 6.3G 9.7M 6.3G 1% /run
/dev/vda3 39G 23G 15G 61% /
tmpfs 32G 4.0K 32G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/vda2 488M 48K 452M 1% /var/tmp
/dev/vda1 488M 76K 452M 1% /tmp
Recommendation
-
Implement the same configuration across all RCM nodes in the deployment to maintain disk utilization within safe limits.
-
Always place the target RCM in standby mode before performing the changes to avoid impact to live traffic.
-
Periodically monitor /dev/vda3 utilization and journald log growth as part of proactive system health checks.