Skip to Content

Troubleshoot high iowait issue on Linux

This article is part of the following series.

 

High iowait issue is usually related to io performance which includes disk performance and nfs performance. We will look at how to check disk performance and nfs performance today.

Check Disk IO Performance on Linux

In Linux system, we can use iostat command to get performance data for disks. If the issue happen in the past, we can use sar command to get the historical data to analyze what was going on at that time.

We can also use monitor tools like telegraf to collect metrics like disk IOPS, disk io bytes, and disk time.

Next we can use iotop to check which process is generating workloads to our disks. More info about iotop here.

10 iostat commands on Linux

Report CPU and I/O Statistics are listed below. The most commonly used option is -xk + interval.

For example: 

iostat -xk  3 /dev/sda

Here 3 means print performance data for disk sda every 3 seconds until we press ctr+c.

iostat: Get report and statistic.
iostat -x: Show more details statistics information.
iostat -c: Show only the cpu statistic.
iostat -d: Display only the device report.
iostat -xd: Show extended I/O statistic for device only.
iostat -k: Capture the statistics in kilobytes or megabytes.
iostat -k 2 3: Display cpu and device statistics with delay.
iostat -j ID mmcbkl0 sda6 -x -m 2 2: Display persistent device name statistics.
iostat -p: Display statistics for block devices.

Here are the tools we can use to check disk performance in Linux.

  1. iostat: The iostat command provides detailed information about disk I/O statistics, including read/write rates, average response time, and utilization. You can run iostat -dx to display the disk I/O statistics for all devices. The output will show metrics such as the number of reads and writes per second, the average read and write sizes, and the average response time.
  2. sar: The sar command is another powerful tool for monitoring system performance, including disk I/O. Running sar -d will display disk I/O statistics, including the average I/O rates, transfer rates, and service time. You can also specify the sampling interval, such as sar -d 1 10 to collect data every 1 second for 10 iterations.
  3. iotop: The iotop command provides real-time monitoring of disk I/O activities. It shows which processes are generating the most disk I/O and provides information such as read/write rates and the percentage of I/O utilization by each process. Run iotop with root privileges to see all processes or use the -u <username> option to filter results by a specific user.
  4. blktrace: The blktrace tool allows you to trace and analyze block layer I/O events. It provides detailed information about disk I/O operations, including I/O size, duration, and latency. You can use blktrace to capture and analyze I/O patterns to identify performance bottlenecks.
  5. Filesystem-specific tools: Different filesystems may provide their own tools for monitoring disk I/O performance. For example, for the ext4 filesystem, you can use ext4magic or debugfs to gather information about disk I/O activities specific to ext4.
  6. Monitoring tools: System monitoring tools like top, htop, or graphical tools  also provide disk I/O statistics. They display overall system performance metrics, including CPU, memory, and disk I/O, in a user-friendly interface.

 

These tools provide insights into disk I/O performance and help identify any performance bottlenecks, such as high I/O wait times or disk saturation. Analyzing these metrics can assist in optimizing disk I/O performance, tuning system configurations, or identifying problematic processes.

Check NFS IO performance issue on Linux

To check NFS IO performance issues on Linux, you can follow these steps:

Check Network Connectivity: Ensure that the network connectivity between the client and server is stable and without any issues. You can use tools like ping or traceroute to verify network connectivity and identify any network latency or packet loss.

Check NFS Server Load: Monitor the server’s load, CPU usage, memory usage, and disk I/O. High resource utilization on the server can affect NFS performance. Use commands like top or htop to check the server’s resource utilization.

Check NFS Server Logs: Review the NFS server logs for any error messages or warnings related to NFS operations. The logs can provide insights into any performance-related issues or errors occurring on the server. The log file location varies depending on the Linux distribution and NFS implementation.

Measure Network Bandwidth: Use tools like iperf or nload to measure the network bandwidth between the client and server. This can help identify any network bottlenecks or limitations affecting NFS performance.

Check NFS Mount Options: Review the NFS mount options on the client side. Ensure that the options are appropriately set for performance. Common options include rw (read-write access), hard (retry indefinitely on I/O errors), tcp (use TCP protocol for NFS communication), and noac (disable attribute caching).

Test NFS Performance: Run benchmark tests to measure the NFS performance. Tools like nfsstat, dd, or iozone can be used to assess the read and write performance of NFS shares. By running these tests, you can identify any performance bottlenecks and compare the results with expected performance levels.

Tune NFS Parameters: Adjust NFS-related parameters to optimize performance. Parameters like rsize and wsize (NFS read and write sizes), async (asynchronous writes), or actimeo (attribute cache timeout) can be tuned to improve performance. However, it is essential to carefully test and monitor the impact of parameter changes.

Consider Network and Storage Upgrades: If you consistently experience poor NFS performance and have identified the network or storage infrastructure as the bottleneck, consider upgrading the network switches, routers, or storage devices to improve performance.

Poor nfs performance can also cause high iowait issue. Nfsiostat is a commonly used command to check NFS performance. This command can tell us the workload like IOPS, network latency, kernel latency etc. More details about this command are here.

High iowaits on specific CPU cores

The io workload on the CPU cores are not evenly distributed. This is a expected behavior of a Linux kernel.

When a CPU encounters a task that requires an I/O operation, it sends a request to an I/O controller. The responsibility of the I/O controller is to fulfill this request.

During this time, the task enters a ‘D’ state, indicating that it is waiting for the I/O operation to complete. The CPU remains idle, known as IO_WAIT, until the I/O controller finishes serving the request.

And if system has number of processors then the cpu which is serving to that particular task will wait for IO and will be idle for that amount of time, the other processors will be assigned to other running tasks, so seeing a IOWAIT for particular CPU’s is expected behavior of a Linux kernel.

Related Post:

Linux Troubleshooting Guide:

Linux Learning Guide: