Help Docs Performance Server Optimization Investigate high disk I/O on Linux

Investigate high disk I/O on Linux

Learn to identify and investigate high disk I/O on Linux servers using tools like iotop, iostat, and vmstat to diagnose and resolve performance bottlenecks.

High disk I/O (Input/Output) can be a significant bottleneck for your server’s performance, leading to slow application response times and overall system sluggishness. Understanding how to identify and investigate high disk I/O is crucial for maintaining a healthy and efficient Linux server. This guide will walk you through the basics of spotting high disk I/O and pinpointing its cause.

What is disk I/O?

Disk I/O refers to the read and write operations that your server’s storage devices (like Hard Disk Drives or Solid State Drives) perform. Every time your server needs to access or store data on its disks, it generates I/O operations. While a certain amount of disk activity is normal, excessively high I/O can indicate an underlying problem or an overloaded system.

When disk I/O is too high, your applications might have to wait longer to access data, leading to:

  • Slow website loading times
  • Delayed database queries
  • Unresponsive services
  • Increased CPU wait times (%iowait)

Identifying high disk I/O

Several command-line tools are available on Linux to help you monitor disk activity and identify which processes are responsible for high I/O.

Using iotop

The iotop command provides a real-time, per-process view of disk I/O activity, similar to how top shows CPU usage. It’s incredibly useful for quickly seeing which processes are currently reading from or writing to the disk most heavily.

Installation:

If iotop is not already installed, you can typically install it using your distribution’s package manager:

For Debian/Ubuntu systems:

sudo apt update
sudo apt install iotop

For RHEL/CentOS/AlmaLinux systems:

sudo yum install iotop

Running iotop:

To run iotop, simply execute the command, usually requiring root privileges:

sudo iotop

The output will show a list of processes along with their disk read and write rates. Key columns to watch include:

  • TID: Thread ID of the process.
  • USER: The user running the process.
  • DISK READ: The speed at which the process is reading from the disk.
  • DISK WRITE: The speed at which the process is writing to the disk.
  • IO>: The percentage of time the process spent waiting for I/O.
  • COMMAND: The command that initiated the I/O.

A particularly useful option is -o or --only, which tells iotop to only show processes or threads that are actually doing I/O:

sudo iotop -o

Press q to exit iotop.

Using iostat

The iostat command (part of the sysstat package) provides more detailed statistics about your storage devices. It can show you metrics like operations per second, blocks read/written per second, and device utilization.

Installation:

If iostat or the sysstat package isn’t installed, use your package manager:

For Debian/Ubuntu systems:

sudo apt update
sudo apt install sysstat

For RHEL/CentOS/AlmaLinux systems:

sudo yum install sysstat

Running iostat:

A common way to use iostat for ongoing monitoring is to specify an interval (in seconds) and a count. For extended device statistics, use the -x flag. The -m flag shows statistics in megabytes per second.

iostat -x -m 5 10

This command will display extended statistics every 5 seconds for a total of 10 reports. Look for these important metrics for each device:

  • r/s, w/s: Read and write operations per second.
  • MB_read/s, MB_wrtn/s: Megabytes read or written per second.
  • await: The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
  • %util: Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.

High %util and long await times often indicate a disk bottleneck.

Using vmstat

The vmstat command reports virtual memory statistics but also includes information about block I/O.

Running vmstat:

You can run vmstat with an interval and count, similar to iostat:

vmstat 5 10

This will provide updates every 5 seconds for 10 iterations. In the output, focus on the io columns:

  • bi: Blocks received from a block device (blocks/s). This corresponds to disk reads.
  • bo: Blocks sent to a block device (blocks/s). This corresponds to disk writes.

Significant numbers in these columns, especially if sustained, point to heavy disk activity.

Common causes of high disk I/O

Once you’ve confirmed high disk I/O, the next step is to understand its origin. Common causes include:

  • Databases: Intensive read/write operations, poorly optimized queries, or unindexed tables in databases like MySQL or PostgreSQL.
  • Logging: Excessive logging by applications, especially if debug logging is enabled.
  • Backup Processes: Server backups can be very I/O intensive.
  • File Synchronization: Services like rsync or cloud synchronization tools.
  • Swap Usage: If your server is running out of RAM, it may start using swap space on the disk heavily, which is much slower than RAM and causes high I/O. Check the swpd column in vmstat or free -m.
  • Search Indexing: Processes that update search indexes can consume a lot of I/O.
  • Web Server Activity: Serving large files or handling a high volume of requests that require disk access.
  • Malware or Compromised Processes: Malicious software can sometimes cause unusual disk activity.
  • Insufficient RAM: Not enough memory can lead to increased reliance on disk swapping.

Analyzing processes causing high I/O

After identifying a problematic process using tools like iotop, you might need to dig deeper.

Using lsof

The lsof (List Open Files) command can show you which files a specific process has open. This is very helpful for understanding what a process is reading from or writing to.

First, get the Process ID (PID) from iotop or ps aux. Then, use lsof with the -p option:

sudo lsof -p <PID>

Replace <PID> with the actual process ID. Review the output for file paths that might explain the disk activity.

Checking process-specific logs

Many applications (databases, web servers, etc.) maintain their own logs. If you’ve identified a specific application causing high I/O, check its logs for errors, unusual activity, or clues about what operations it’s performing.

Understanding system logs

System logs can also provide insights, especially if there are hardware issues with the disk itself. Check logs such as:

  • /var/log/syslog (Debian/Ubuntu)
  • /var/log/messages (RHEL/CentOS/AlmaLinux)
  • The output of the dmesg command (especially for kernel-level messages about disk errors).

General tips for reducing disk I/O

Addressing high disk I/O often involves optimizing applications or system configurations:

  • Optimize Applications: For databases, ensure queries are efficient and tables are properly indexed. For web applications, implement caching strategies.
  • Adjust Logging Levels: Reduce the verbosity of application logging unless debugging.
  • Schedule Intensive Tasks: Run I/O-heavy tasks like backups or large data processing jobs during off-peak hours.
  • Increase RAM: Adding more RAM can reduce the need for swapping, which is a major cause of disk I/O.
  • Consider Faster Storage: Upgrading from HDDs to SSDs (especially NVMe SSDs) can dramatically improve I/O performance.
  • Use a Content Delivery Network (CDN): Offload serving static assets (images, CSS, JavaScript) to a CDN to reduce I/O on your origin server.
  • Tune Filesystem Mount Options: Consider using mount options like noatime or relatime for your filesystems to reduce unnecessary write operations that update file access times. Edit your /etc/fstab file carefully if you choose to make these changes.

Conclusion

Investigating high disk I/O on your Linux server involves using the right tools to monitor activity, identify offending processes, and then analyze the behavior of those processes. By understanding the common causes and knowing how to look for them, you can effectively diagnose and resolve performance bottlenecks, ensuring your server runs smoothly and efficiently.

Was this article helpful?