Using Linux nfsiostat to troubleshoot nfs performance issue
Updated: Jun 7
Linux command nfsiostat can be used to determine the average Round Trip Time (avg RTT) in milliseconds which is a good measurement for storage performance issue. The nfsiostat command is available in later versions of the nfs-utils package.
What does nfsiostat output mean?
The nfsiostat gets input from /proc/self/mountstats and provides information about the input/output performance of NFS shares mounted in the system.
We usually run this command this way nfsiostat 3 /mountpoint. 3 means 3s. It is the interval.
# nfsiostat 3 nfs-server:/export mounted on /mnt/nfs-export: op/s rpc bklog 1019.40 0.00 read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 0.000 0.000 0.000 0 (0.0%) 0.000 0.000 write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 1019.200 77549.706 76.089 0 (0.0%) 2.895 27.149
Understanding nfsiostat command output
The following is the output of nfsiostat. We can get the NFS performance metrics here like NFS IOPS, bandwidth, latency.
op/s This is the number of operations per second.
rpc bklog This is the length of the backlog queue.
kB/s This is the number of kB written/read per second.
kB/op This is the number of kB written/read per each operation.
retrans This is the number of retransmissions.
avg RTT (ms) This is the duration from the time that client's kernel sends the RPC request until the time it receives the reply.
avg exe (ms) This is the duration from the time that NFS client does the RPC request to its kernel until the RPC request is completed, this includes the RTT time above.
How to troubleshooting NFS performance?
Avg RTT is an important metric for NFS performance. It tells us the average NFS latency on host side. Avg RTT= network latency + NFS storage latency.
For network latency, we can use tcpdump command to capture network packets and then analyze the issue. Here are some useful tcpdump examples we can use.
We can get more ideas on how to fix network issues from here like packets drop, tcp retransmission etc.
For storage latency, we can engage our storage vendor to check the response time from the storage side.
We can use telegraf to write a simple script to get this performance data and show it on Grafana. We can easily see how the NAS performance look like on system side.