Scale Your Deployment for Large Data Volumes

To support high data volumes and many users while maintaining performance and reliability, you should plan your deployment architecture carefully. This chapter explains when a single-instance setup is sufficient and when you should move to a distributed Splunk deployment.

Best Practices for Scaling Your Splunk Deployment

Scaling is crucial for handling multiple inputs and large data volumes in your Splunk app. Ensure your deployment can scale to match your infrastructure's capabilities.

For larger data sets, add more servers or storage as needed.

Scale your indexers, search heads, and forwarders independently. Ensure that any connected APIs can handle increased load.

To enable automatic, seamless scaling based on your data volume and search needs, use Splunk Cloud.

Sample Cisco Security Cloud App Use Case

Here is a sample performance and scale data for a Security Cloud App deployment:

Load on the App

  • up to 90 GB data ingested per month

  • up to 15 concurrent users

Configuration of the enterprise

  • one Splunk Enterprise single instance deployed on AWS.

  • one EC2 c5.4xlarge (vCPUs: 16, Memory: 32 GB) with Volume 100 GiB GP2 EBS with 300 IOPS

Metrics collected

  • top CPU usage: 32%

  • top memory usage: 5GB

Crash issue

Disk was full.

Mitigation

Set up an alert on CloudWatch and set the retention time on Splunk. For information, refer to https://docs.splunk.com/Documentation/Splunk/8.0.0/Indexer/Configureindexstorage

When to Migrate from a Single Node Deployment to Distributed Deployment

We recommend that you consider the one or more of the following factors before migrating to a distributed deployment.

Data Volume

  • Single Instance Data ingestion: A single instance deployment can typically handle moderate amounts of data ingestion effectively, especially on a powerful instance like c5.4xlarge.

  • Threshold: A single instance can handle a data ingestion of 50-100 GB per day. Exceeding this limit may result in performance issues such as, indexing, searching, and overall system performance can degrade, particularly with complex queries or large datasets.

  • Distributed Deployment: If your data ingestion often exceeds 100 GB/day, we recommend that you migrate to a distributed deployment, where indexing is not a bottleneck and search performance remains optimal.

Number of Users and Search Load

  • Single Instance Deployment: A single instance setup normally supports 10-20 active users with moderate search activity (e.g., running searches every few minutes).

  • Threshold: If you observe more than 20 concurrent users regularly, especially with heavy search loads or complex queries, the performance of the single-instance setup may be affected. You may notice increased search latency, slower indexing, and higher resource contention (CPU and memory).

  • Distributed Deployment: If you anticipate or experience growth beyond 20 active users, or if search activity becomes increasingly complex (e.g., frequent ad-hoc searches, dashboards with real-time monitoring), we recommend that you migrate to a distributed setup with separate search heads.

Number of Inputs (Data Sources)

  • Single Instance Deployment: A single instance deployment can typically handle dozens of inputs without issue, provided they’re not excessively high-volume.

  • Threshold: If the number of inputs exceeds 50-100, particularly if some inputs are high-volume (e.g., syslog, large application logs), the single instance may become overburdened, leading to dropped data, delays in indexing, or reduced search performance.

  • Distributed Deployment: If you have varied data sources or high-volume inputs that could lead to data ingestion rates beyond what a single instance can handle, we recommend that you migrate to a distributed setup with dedicated forwarders for data collection and indexers for data storage.

Indicators for Scaling

Consider the following criteria before upgrading the system.

  • CPU and Memory Utilization: Consistently high CPU (above 75%) and memory usage (above 80%) during peak times.

  • Disk Latency and IOPS: If disk latency increases and IOPS approach the provisioned limit, especially during data ingestion or searches.

  • Search Performance: Increasing search times, especially for complex or wide-ranging queries, could indicate the need for additional search heads.

  • Data Latency: If there's a noticeable delay between data ingestion and its availability for search (data latency), indexing may be falling behind.

Recommended Distributed Architecture

We recommend the following specifications for a distributed architecture.

  • Search Head (SH):

    • Instance Type: c5.4xlarge or c5.9xlarge for high search concurrency.

    • Handles search requests, dashboards, and user interactions.

  • Indexer (IDX)

    • Instance Type: i3.4xlarge or r5d.4xlarge for high IOPS and storage needs.

    • Manages data indexing and storage.

  • Forwarder (HF/UF):

    • Instance Type: t3.medium for Universal Forwarders (UF) or c5.large for Heavy Forwarders (HF).

    • Collects data from various sources and forwards it to the indexers.

Summary

  • Single instance deployment: Can handle up to 100 GB/day of data ingestion, 20 active users, and 50-100 inputs.

  • Migrate to distributed deployment: When data ingestion exceeds 100 GB/day, user load surpasses 20 concurrent users with heavy search activity, or when the number of inputs becomes too large or diverse.