I was recently tasked with solving a tricky disk performance issue on a Linux server. At first, I thought it would be a straightforward task. After all, I had been working in IT for several years and had seen my fair share of disk performance issues.
But as soon as I started looking into the issue, I quickly realized that this wasn’t going to be an easy fix. There are 5 disks on that Linux system and the disk performance seemed to vary from time to time.
Table of Contents
Check disk status in Linux
I started by trying some of the more obvious solutions — checking disk space, checking hardware status, checking partitions, checking RAID status, etc. But none of those seemed to be having any issues.
At this point, I was beginning to lose hope. But then I remembered something my mentor had said when I first started in IT: “When all else fails, go back to basics.”
Boost Your Website Speed!
If you want your website to run as fast as ours, consider trying Cloudways. Their powerful cloud infrastructure and optimized stack deliver exceptional performance. Free migration!I decided to start from the very beginning. I checked the physical connections of the disks, making sure all cables were securely connected and in the right places.
Check disk performance in Linux
Once that was done, I followed the steps here to check disk performance. I began running a monitor with the iostat command on each disk to see if I could identify any areas where performance might be suffering. After a few hours, I found one disk that had abnormally high latency.
After carefully examining the disk, I was able to pinpoint a specific area where data transfers were taking much longer than usual. After further investigation, I realized that this issue was caused by a high workload on the disk.
The IOPS for that disk was increased by 3 times and disk utilization was 100% during the issue time.
Check disk workload by process in Linux
Then I used the iotop command to check which process generated so much workload on this disk. It was a process related to Database.
I checked this with the Database team. It turned out that they added a SQL recently which trigged the full table scan on the database side.
They fixed this sql by adding an index on the DB side. After that, the latency dropped significantly and the overall performance of the disk improved drastically. Problem solved!
It took me a few hours to fix the issue, but in the end it was worth it. I learned an important lesson that day: Sometimes, if you take the time to go back to basics and examine things from a different angle, it can make all the difference.
Now, I can proudly say that I have a deep understanding of disk performance in Linux and the skill set to tackle any issue that might arise. And it all started with that one disk performance issue.
Conclusion
The moral of this story is simple: don’t give up. With the right knowledge and approach, any problem can be solved, even those that seem impossible at first. This was true for me when I faced my disk performance issue in Linux — and it can be true for you, too. Good luck!