In this two-part series, we outline the steps to take when investigating where server load originates or causing your server to become overloaded. When running a server that hosts multiple websites, high load issues often crop up. To find out how and why this occurs, read on.
What is Server Load?
Server load is a measure of work that a server is experiencing. The load averages represent the average system load over a period of time. Servers calculate load averages as the exponentially damped/weighted moving average of the load numbers. The three values of load average refer to the past one, five, and fifteen minutes of system operation. If you have a single CPU, the load average is a percentage of the system utilization for a specific time period. If you have multiple CPU’s, you must divide by the number of processors to get a comparable percentage. To find the number of processors on the server, run the following command.
root@host [~]# grep processor /proc/cpuinfo | wc -l 4 root@host [~]#
Addressing Load Issues
The first step in addressing any load issue on a server is having a benchmark in place for the server to determine its resting performance. While this may seem like an inopportune time to try and run a benchmark, we need to establish a baseline to see how well our adjustments are working. Often we see performance improvements using proper configuration and caching. We recommend running this benchmark when the server is at its least busy point. The main command used for benchmarking is shown below.
root@host:~ # ab -lt 10 -c3 -H "Accept-Encoding: gzip,deflate,br" "http://www.domain.com/"
The apachebench (or ab) command is used here to provide a standard we can judge performance against.
Now that we have completed our benchmark, the next item we want to look at is how many processes are waiting for CPU resources. This measurement is expressed as an average over a period of time. The top command measures load in increments over time. The term "high load" is relative based on the amount of resources the CPU has available.
An experienced Linux admin had this to say.
“The server load should not be higher than the total number of cores the server has. If the server has eight cores, and it is running at a load of eight, all eight cores are working at 100%.”
A dedicated server with a one-minute load average of eight and uses an octa-core processor does not necessarily have what would be defined as a "high load.” However, a VPS server with a one-minute load average of 8.0 and using a quad-core processor is likely experiencing a "high load" as all CPU cores are operating at 200% capacity.
We must also take into account the server’s responsiveness. If one of a Managed Cloud duo-core servers has a one-minute load average of 4.0, keeps up with incoming requests, and is responsive, the server is most likely not experiencing "high load.”
We can see the one-minute, five-minute, and fifteen-minute load averages using the w command or the top command.
root@host [~]# w 17:17:40 up 6 days, 8:13, 0 users, load average: 0.00, 0.03, 0.07 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT root@host [~]#
root@host [~]# top top - 17:18:30 up 6 days, 8:13, 0 users, load average: 0.00, 0.02, 0.06 Tasks: 159 total, 1 running, 157 sleeping, 0 stopped, 1 zombie %Cpu(s): 0.2 us, 0.1 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.4 st KiB Mem : 3766984 total, 195752 free, 1307504 used, 2263728 buff/cache KiB Swap: 2047996 total, 763476 free, 1284520 used. 2004704 avail Mem
Generally, server load is caused by one or more services or their related applications. Here are four main resources that ordinarily cause an overloaded server:
- Disk I/O
The type of load a server experiences is found using one of several commands. The top command should be our first choice when evaluating server load as it prints a system summary using the three load averages, system task stats, system CPU stats, system RAM stats, and system swap stats.
While the top command is running, we can press the "1" key, which shows the CPU stats for every CPU core on the system. These stats are broken down into eight percentages and are considered the percentage of time each CPU is engaged with tasks.
top - 17:21:36 up 6 days, 8:16, 0 users, load average: 0.09, 0.08, 0.08 Tasks: 158 total, 1 running, 156 sleeping, 0 stopped, 1 zombie %Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st%Cpu1 : 0.3 us, 0.0 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.7 st %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.7 st %Cpu3 : 0.3 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.3 st KiB Mem : 3766984 total, 346856 free, 1283940 used, 2136188 buff/cache KiB Swap: 2047996 total, 761516 free, 1286480 used. 2028524 avail Mem
The table below defines the identifiers above.
|Task Label||Task Description|
|us, user||The time running user processes|
|sy, system||The time running kernel processes|
|ni, nice||The time running niced user processes|
|id, idle||Time spent in the kernel idle handler|
|wa, IO-wait||Time waiting for I/O completion|
|hi||The time spent servicing hardware interrupts|
|si||The time spent servicing software interrupts|
|st||The time stolen from the VM by the hypervisor|
The number of CPU resources is measured by the percentage of time the CPU spends processing actual workloads. If the largest CPU percentage time is spent on user processes or system processes, this indicates the server is tasked with too many resource-intensive processes. Here are a few examples of processes that cause an overload of a CPU(s) on a server:
RAM, also known as Random Access Memory, is measured at the server level using the free -m command. This command shows us the total memory, used memory, free memory, shared memory, buffers/cache memory, and finally, available memory.
root@host [~]# free -m totalusedfreesharedbuff/cacheavailableMem: 3678 1415 403 125 1859 1869 Swap: 1999 1271 728 root@host [~]#
The used memory accounts for the memory used by all running processes, including the kernel, and includes the buffers/cache memory. The available memory estimates how much memory is available for starting new processes without swapping.
Memory usage by a process is viewed using the ps command. The %MEM column is a percentage or ratio of the process’s Resident Set Size compared to the machine’s physical memory. The Resident Set Size (or RSS) is the amount of memory used by the process occupied by physical RAM. To put it another way, %MEM is the percentage of physical RAM used by the processes.
A few examples are of things that can max out the RAM of a server:
- PHP/Apache - If the PHP (memory_limit * PHP-FPM's Max_children) or FCGI's (FCGIdMaxProcesses) requests exceed the amount of RAM on the server, a crash is possible due to memory exhaustion.
- MySQL - If MySQL's maximum configured memory limit exceeds the amount of RAM available on a server, a crash is possible due to memory exhaustion.
Disk I/O refers to the operational transfer of data on a physical disk to and from a destination. If we read data from files on a disk, the CPU requires time to read the files. The same applies to writing. Disk I/O can contribute to an increase in load in multiple ways. Here are a few examples that can exceed the Disk I/O of a server:
- Running out of RAM and swapping memory to a disk.
- MySQL queries writing temporary tables to a disk.
- A substantial amount of email is sending from the server.
- Large backups running over a long period of time.
Using the iotop command can show the actual amount of disk I/O being utilized.
root@host [~]# iotop Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % systemd --switched-root --system --deserialize 22 2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd] 4 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0H] 6 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0] 7 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0] 8 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_bh] 9 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_sched] 10 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [lru-add-drain] 11 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/0] 12 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/1]
Ethernet performance can affect the overall load on the server. If a significant amount of traffic is received, network bottlenecks can occur. Typically, this happens when communication between devices lacks the necessary bandwidth or processing power to complete a task quickly. During high traffic periods, DDOS attacks, or slow loris attacks, the network throughput needed to fulfill requests to the server can exceed capacity or lock open. When this occurs, server load can increase. An optimized website usually does not encounter this type of load.
Using a command line tool like iftop provides an solid overview of network usage.
root@host [~]# iftop interface: eth0 IP address is: 188.8.131.52 MAC address is: 52:54:00:91:87:6f 12.5Kb 25.0Kb 37.5Kb 50.0Kb 62.5Kb └───────────────────────────────────┴───────────────────────────────────┴────────────────────────────────────┴───────────────────────────────────┴──────────────────────────────────── host.domain.com => 430461.cloudwaysapps.com 40.1Kb 15.1Kb 12.5Kb <= 7.12Kb 1.99Kb 1.65Kb host.domain.com => dc2-176.vpn.domain.com 1.39Kb 6.24Kb 5.46Kb <= 160b 1.49Kb 1.29Kb host.domain.com => 184.108.40.206 0b 1.14Kb 973b <= 0b 5.34Kb 4.45Kb host.domain.com => lvps87-230-15-219.dedicated.hosteurope.de 0b 1.23Kb 1.02Kb <= 0b 3.77Kb 3.14Kb host.domain.com => 10.10.10.10 0b 1.50Kb 1.33Kb <= 0b 2.78Kb 2.44Kb host.domain.com => 10.30.9.124 0b 1.40Kb 1.17Kb <= 0b 920b 767b host.domain.com => 10.30.9.125 0b 1.40Kb 1.16Kb <= 0b 920b 767b host.domain.com => 10.30.9.122 2.91Kb 640b 656b <= 1.50Kb 353b 473b host.domain.com => 10.30.9.138 0b 468b 427b <= 0b 310b 295b ─────────────────────────────────────────────────────────────────────────────────────────── TX: cum: 48.8KB peak: 46.7Kb rates: 46.7Kb 30.3Kb 32.5Kb RX: 24.6KB 56.6Kb 11.3Kb 18.8Kb 16.4Kb TOTAL: 73.3KB 81.9Kb
This ends part one of our two-part article series on investigating server load. Server load is the measure of work that a server experiences. When issues arise with CPU usage, RAM deficits, increased Disk I/O, or network congestion, load issues will manifest. In part two of this series, we will explore the means and methods to locate and address load issues on the server.
We pride ourselves on being The Most Helpful Humans In Hosting™!
Our Support Teams are filled with experienced Linux technicians and talented system administrators who have intimate knowledge of multiple web hosting technologies, especially those discussed in this article. Should you have any questions regarding this information, we are always available to answer any inquiries with issues related to this article, 24 hours a day, 7 days a week 365 days a year.
If you are a Fully Managed VPS server, Cloud Dedicated, VMWare Private Cloud, Private Parent server, Managed Cloud Servers, or a Dedicated server owner and you are uncomfortable with performing any of the steps outlined, we can be reached via phone at @800.580.4985, a chat or support ticket to assisting you with this process.
Our Sales and Support teams are available 24 hours by phone or e-mail to assist.