Help Docs Performance Server Optimization Monitoring Disk I/O with iostat

Monitoring Disk I/O with iostat

iostat monitors server disk I/O and CPU. It helps identify storage bottlenecks using metrics like request size, queue depth, wait times, and device utilization.

Understanding your server’s disk performance is crucial for identifying bottlenecks and ensuring smooth operation. The iostat (input/output statistics) command is a powerful system monitor tool that collects and displays statistics about your storage devices. It’s an excellent utility for diagnosing performance issues related to local disks or even remote storage accessed via network file systems like NFS. While primarily focused on I/O, it also provides some basic CPU information.

Key metrics explained

iostat provides a wealth of data, but a few key metrics are particularly useful for diagnosing disk performance:

  • %user (CPU): Percentage of CPU utilization that occurred while executing at the user level (applications).
  • %system (CPU): Percentage of CPU utilization that occurred while executing at the system level (kernel).
  • %iowait (CPU): Percentage of time the CPU was idle during which the system had outstanding disk I/O requests. High %iowait often indicates a disk bottleneck.
  • %idle (CPU): Percentage of time the CPU was idle and the system did not have an outstanding disk I/O request.
  • r/s (Reads per second): The number of read requests completed per second.
  • w/s (Writes per second): The number of write requests completed per second.
  • rsec/s (Read sectors per second): The number of sectors read from the device per second.
  • wsec/s (Write sectors per second): The number of sectors written to the device per second.
Usage example: Advanced disk statistics

To get a detailed view of your disk I/O, including advanced metrics, use the -x flag. Adding a number after the flag (e.g., 1) will make iostat update its output every specified second, providing a real-time view.

Command:

iostat -x 1

Example Output:

Linux 2.6.18-274.3.1.el5xen (mysql.cerberus.infusedsites.com)       01/19/2012

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0.26    0.00    0.00    7.59    0.00   92.15

Device:           rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz    await   svctm  %util
xvda                0.00    67.00  0.00 30.00     0.00   776.00    25.87     0.47    15.60    9.47  28.40
xvda1               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00     0.00    0.00   0.00
xvda2               0.00    67.00  0.00 30.00     0.00   776.00    25.87     0.47    15.60    9.47  28.40
xvdd                0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00     0.00    0.00   0.00
xvdb                0.00     4.00  0.00  4.00     0.00    64.00    16.00     0.02     5.00    4.00   1.60
xvdb1               0.00     4.00  0.00  4.00     0.00    64.00    16.00     0.02     5.00    4.00   1.60
dm-0                0.00     0.00  0.00 97.00     0.00   776.00     8.00     0.92     9.44    2.93  28.40
dm-1                0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00     0.00    0.00   0.00
Interpreting the output

While iostat -x provides a lot of data, here are some of the most important values to focus on when analyzing disk performance:

  • avgrq-sz (Average Request Size): This metric shows the average size (in sectors) of the I/O requests made to the device. A high value might indicate that applications are making large, efficient requests, while many small requests could point to inefficient I/O patterns.
  • avgqu-sz (Average Queue Size): This is the average number of requests currently waiting to be served by the device. A consistently high avgqu-sz suggests that the disk is struggling to keep up with incoming requests, which often correlates with high %iowait seen in tools like top.
  • await (Average Wait Time): This represents the average time (in milliseconds) that I/O requests spend waiting in the queue and being serviced by the device. A higher await value indicates slower disk response times. For example, if await is 200ms and there are 20 requests in the queue (avgqu-sz), it could take approximately 4 seconds to process that queue.
  • %util (Device Utilization): This crucial metric shows how busy the device is. It’s the percentage of time the device was busy handling I/O requests. While brief spikes to 100% might be normal, if %util consistently stays near 100%, it indicates that the drive is saturated and overworked. Note that saturation doesn’t necessarily mean you’re hitting the maximum bandwidth; it often means the drive is receiving more requests than it can efficiently handle.

Conclusion

iostat is an invaluable tool to understand and troubleshoot disk I/O performance. By focusing on key metrics like avgrq-sz, avgqu-sz, await, and %util, you can quickly identify whether your storage devices are a bottleneck and take appropriate action to optimize your server’s performance. As always, if you need assistance interpreting iostat output or optimizing your server, Liquid Web’s Heroic Support® team is here to help!

Was this article helpful?