Troubleshoot High Disk Space Utilization for Filesystem in RCM

Available Languages

Download Options

ePub (247.1 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (130.4 KB)
View on Kindle device or Kindle app on multiple devices

Updated:August 26, 2025

Document ID:224684

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Analysis and Observation

Disk Utilization Snapshot

Troubleshoot Process

Step 1: Clean Up Old Logs Using journalctlVacuum

Step 2:Configure journald Retention for Future Prevention

Optional Step 3: Resolve Restart Error

Post-Change Verification

Recommendation

Introduction

This document describes how to troubleshoot high disk space utilization issue for the /dev/vda3 filesystem in RCM.

Prerequisites

Requirements

Cisco recommends that you have knowledge of:

StarOS Control and User Plane Separation (CUPS) system architecture and administration.
Basic Linux/Unix commands for filesystem and disk usage monitoring.

Components Used

This document is not restricted to specific software and hardware versions.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.

Overview

In Cisco Ultra Packet Core deployments with Control and User Plane Separation (CUPS), the Redundancy Control Manager (RCM) plays a critical role in control plane operations and management. Stable filesystem utilization on RCM nodes is important to ensure smooth functioning of logging, monitoring, and subscriber session management.

High disk space utilization on the root filesystem (/dev/vda3) can cause system instability, failures in log writes, or even service restarts if left unchecked. This article outlines the analysis, troubleshooting steps, and preventive measures to address high disk utilization in RCM nodes.

Analysis and Observation

During monitoring, it was found that the RCM node reached 72% utilization on its root filesystem.

Disk Utilization Snapshot

df -kh
Filesystem Size Used Avail Use% Mounted on
tmpfs 6.3G 9.7M 6.3G 1% /run
/dev/vda3 39G 27G 11G 72% /
tmpfs 32G 4.0K 32G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/vda2 488M 48K 452M 1% /var/tmp
/dev/vda1 488M 76K 452M 1% /tmp

On further investigation, it was observed that journal logs under /var/log/journal/ had grown significantly. Logs generated during July alone accounted for ~3 GB of space.

journal Log

Journal Log Output

Troubleshoot Process

To bring the disk utilization under control, the required change implementation steps were applied:

Step 1: Clean Up Old Logs Using journalctl Vacuum

Retain only the last 2 weeks of logs:

sudo journalctl --vacuum-time=2weeks

Or limit journal size (for example, keep only 600 MB):

sudo journalctl --vacuum-size=600M

Step 2: Configure journald Retention for Future Prevention

Edit journald configuration:

vi /etc/systemd/journald.conf

Add/modify parameter:

MaxRetentionSec=2week

Apply configuration:

sudo systemctl restart systemd-journald

Optional Step 3: Resolve Restart Error

While restarting the systemd-journald service in Step 2, you can get a concerned error:

Error : Failed to allocate directory watch: Too many open files

systemd-journald uses inotify to watch log directories for changes.
Each watch or monitor it sets up counts towards certain kernel limits.

The current limits defined in the problematic RCM are:

cat /proc/sys/fs/inotify/max_user_watches
501120
cat /proc/sys/fs/inotify/max_user_instances
128
ulimit -n
1024

From the collected output:

Max identify watches: 501120
Max inotify instances: 128

Open file descriptor limit for journald: 1024

Either (or all) of the output values limit could have hit leading to the error. So, we collected the current used value and compared them with collected output limit:

sudo lsof -p $(pidof systemd-journald) | wc -l
65

echo "Root inotify instances: $(sudo find /proc/*/fd -user root -type l -lname 'anon_inode:inotify' 2>/dev/null | wc -l) / $(cat /proc/sys/fs/inotify/max_user_instances)"
Root inotify instances: 126 / 128

Current inotify instances: 126
Current opened file descriptor: 65

It looks like the root is already using 126 out of 128 allowed inotify instances. That leaves journald with almost no room to create a new inotify instance when we restart it.

To resolve the error: we can increase the max_user_instances value and then restart the service:

# Temporarily increase the limit (until next reboot)

echo 256 > /proc/sys/fs/inotify/max_user_instances
sudo systemctl restart systemd-journald 

# Temporarily increase the limit (until next reboot)

echo 256 > /proc/sys/fs/inotify/max_user_instances
sudo systemctl restart systemd-journald

Post-Change Verification

After applying the changes, disk utilization dropped to 61%, restoring the node to normal operating state.

df -kh
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           6.3G  9.7M  6.3G   1% /run
/dev/vda3        39G   23G   15G  61% /
tmpfs            32G  4.0K   32G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/vda2       488M   48K  452M   1% /var/tmp
/dev/vda1       488M   76K  452M   1% /tmp

Recommendation

Implement the same configuration across all RCM nodes in the deployment to maintain disk utilization within safe limits.
Always place the target RCM in standby mode before performing the changes to avoid impact to live traffic.
Periodically monitor /dev/vda3 utilization and journald log growth as part of proactive system health checks.

Revision History

Revision	Publish Date	Comments
1.0	26-Aug-2025	Initial Release

Contributed by Cisco Engineers

Manash Jena
Cisco Technical Consulting Engineer

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

This Document Applies to These Products

Ultra Gateway Platform with CUPS

Troubleshoot High Disk Space Utilization for Filesystem in RCM

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Prerequisites

Requirements

Components Used

Overview

Analysis and Observation

Disk Utilization Snapshot

Troubleshoot Process

Step 1: Clean Up Old Logs Using journalctl Vacuum

Step 2: Configure journald Retention for Future Prevention

Optional Step 3: Resolve Restart Error

Post-Change Verification

Recommendation

Revision History

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products