How to Investigate Server Load: Part 1

Introduction

In this two-part series, we outline the steps to take when investigating where server load originates or causing your server to become overloaded. When running a server that hosts multiple websites, high load issues often crop up. To find out how and why this occurs, read on.

What is Server Load?

Server load is a measure of work that a server is experiencing. The load averages represent the average system load over a period of time. Servers calculate load averages as the exponentially damped/weighted moving average of the load numbers. The three values of load average refer to the past one, five, and fifteen minutes of system operation. If you have a single CPU, the load average is a percentage of the system utilization for a specific time period. If you have multiple CPU’s, you must divide by the number of processors to get a comparable percentage. To find the number of processors on the server, run the following command.

root@host [~]# grep processor /proc/cpuinfo | wc -l
4
root@host [~]#

Addressing Load Issues

The first step in addressing any load issue on a server is having a benchmark in place for the server to determine its resting performance. While this may seem like an inopportune time to try and run a benchmark, we need to establish a baseline to see how well our adjustments are working. Often we see performance improvements using proper configuration and caching. We recommend running this benchmark when the server is at its least busy point. The main command used for benchmarking is shown below.

root@host:~ # ab -lt 10  -c3 -H "Accept-Encoding: gzip,deflate,br" "http://www.domain.com/"

The apachebench (or ab) command is used here to provide a standard we can judge performance against.

Load Average

Now that we have completed our benchmark, the next item we want to look at is how many processes are waiting for CPU resources. This measurement is expressed as an average over a period of time. The top command measures load in increments over time. The term “high load” is relative based on the amount of resources the CPU has available.

An experienced Linux admin had this to say.

“The server load should not be higher than the total number of cores the server has. If the server has eight cores, and it is running at a load of eight, all eight cores are working at 100%.”

A dedicated server with a one-minute load average of eight and uses an octa-core processor does not necessarily have what would be defined as a “high load.” However, a VPS server with a one-minute load average of 8.0 and using a quad-core processor is likely experiencing a “high load” as all CPU cores are operating at 200% capacity.

We must also take into account the server’s responsiveness. If one of a Managed Cloud duo-core servers has a one-minute load average of 4.0, keeps up with incoming requests, and is responsive, the server is most likely not experiencing “high load.”

[su_box title=”Note:” style=”glass” box_color=”#3ac6eb” radius=”20″] The optimal time to investigate server load is while it is happening because you get the issue’s clearest picture.[/su_box]

We can see the one-minute, five-minute, and fifteen-minute load averages using the w command or the top command.

root@host [~]# w
 17:17:40 up 6 days,  8:13,  0 users,  load average: 0.00, 0.03, 0.07

USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
root@host [~]#

root@host [~]# top
top - 17:18:30 up 6 days,  8:13,  0 users,  load average: 0.00, 0.02, 0.06
Tasks: 159 total,   1 running, 157 sleeping,   0 stopped,   1 zombie
%Cpu(s):  0.2 us,  0.1 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.4 st
KiB Mem :  3766984 total,   195752 free,  1307504 used,  2263728 buff/cache
KiB Swap:  2047996 total,   763476 free,  1284520 used.  2004704 avail Mem

Load Types

Generally, server load is caused by one or more services or their related applications. Here are four main resources that ordinarily cause an overloaded server:

CPU
RAM
Disk I/O
Networking

CPU

The type of load a server experiences is found using one of several commands. The top command should be our first choice when evaluating server load as it prints a system summary using the three load averages, system task stats, system CPU stats, system RAM stats, and system swap stats.

While the top command is running, we can press the “1” key, which shows the CPU stats for every CPU core on the system. These stats are broken down into eight percentages and are considered the percentage of time each CPU is engaged with tasks.

top - 17:21:36 up 6 days,  8:16,  0 users,  load average: 0.09, 0.08, 0.08
Tasks: 158 total,   1 running, 156 sleeping,   0 stopped,   1 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.3 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.7 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.7 st
%Cpu3  :  0.3 us,  0.0 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.3 st
KiB Mem :  3766984 total,   346856 free,  1283940 used,  2136188 buff/cache
KiB Swap:  2047996 total,   761516 free,  1286480 used.  2028524 avail Mem

The table below defines the identifiers above.

Task Label	Task Description
us, user	The time running user processes
sy, system	The time running kernel processes
ni, nice	The time running niced user processes
id, idle	Time spent in the kernel idle handler
wa, IO-wait	Time waiting for I/O completion
hi	The time spent servicing hardware interrupts
si	The time spent servicing software interrupts
st	The time stolen from the VM by the hypervisor

The number of CPU resources is measured by the percentage of time the CPU spends processing actual workloads. If the largest CPU percentage time is spent on user processes or system processes, this indicates the server is tasked with too many resource-intensive processes. Here are a few examples of processes that cause an overload of a CPU(s) on a server:

PHP Scripts
Multiple background processes
Malformed MySQL queries
Apache processes
Malware scanning

RAM

RAM, also known as Random Access Memory, is measured at the server level using the free -m command. This command shows us the total memory, used memory, free memory, shared memory, buffers/cache memory, and finally, available memory.

root@host [~]# free -m
              total        used        free      shared  buff/cache   available
Mem:        3678        1415         403      125     1859        1869
Swap:        1999        1271         728
root@host [~]#

The used memory accounts for the memory used by all running processes, including the kernel, and includes the buffers/cache memory. The available memory estimates how much memory is available for starting new processes without swapping.

Memory usage by a process is viewed using the ps command. The %MEM column is a percentage or ratio of the process’s Resident Set Size compared to the machine’s physical memory. The Resident Set Size (or RSS) is the amount of memory used by the process occupied by physical RAM. To put it another way, %MEM is the percentage of physical RAM used by the processes.

A few examples are of things that can max out the RAM of a server:

PHP/Apache – If the PHP (memory_limit * PHP-FPM’s Max_children) or FCGI’s (FCGIdMaxProcesses) requests exceed the amount of RAM on the server, a crash is possible due to memory exhaustion.
MySQL – If MySQL’s maximum configured memory limit exceeds the amount of RAM available on a server, a crash is possible due to memory exhaustion.

Disk I/O

Disk I/O refers to the operational transfer of data on a physical disk to and from a destination. If we read data from files on a disk, the CPU requires time to read the files. The same applies to writing. Disk I/O can contribute to an increase in load in multiple ways. Here are a few examples that can exceed the Disk I/O of a server:

Running out of RAM and swapping memory to a disk.
MySQL queries writing temporary tables to a disk.
A substantial amount of email is sending from the server.
Large backups running over a long period of time.

Using the iotop command can show the actual amount of disk I/O being utilized.

root@host [~]# iotop

Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
     1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % systemd --switched-root --system --deserialize 22
     2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
     4 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H]
     6 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
     7 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
     8 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_bh]
     9 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_sched]
    10 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [lru-add-drain]
    11 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/0]
    12 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/1]

Networking

Ethernet performance can affect the overall load on the server. If a significant amount of traffic is received, network bottlenecks can occur. Typically, this happens when communication between devices lacks the necessary bandwidth or processing power to complete a task quickly. During high traffic periods, DDOS attacks, or slow loris attacks, the network throughput needed to fulfill requests to the server can exceed capacity or lock open. When this occurs, server load can increase. An optimized website usually does not encounter this type of load.

Using a command line tool like iftop provides an solid overview of network usage.

root@host [~]# iftop
interface: eth0
IP address is: 68.228.87.126
MAC address is: 52:54:00:91:87:6f

12.5Kb               25.0Kb               37.5Kb              50.0Kb              62.5Kb
└──────────────────────┴────────────────────┴───────────────────┴───────────────────┴────────────────────────────────────
host.domain.com   => 430461.cloudwaysapps.com                       40.1Kb  15.1Kb  12.5K      <=     7.12Kb  1.99Kb  1.65Kb
host.domain.com   => dc2-176.vpn.domain.com                         1.39Kb  6.24Kb  5.46Kb     <=     160b   1.49Kb  1.29Kb
host.domain.com   => 151.139.128.11                                 0b   1.14Kb   973b         <=     0b   5.34Kb  4.45Kb
host.domain.com   => lvps87-230-15-219.dedicated.hosteurope.de      0b   1.23Kb  1.02Kb        <=     0b   3.77Kb  3.14Kb
host.domain.com   => 10.10.10.10                                    0b   1.50Kb  1.33Kb.       <=     0b   2.78Kb  2.44Kb
host.domain.com   => 10.30.9.124                                    0b   1.40Kb  1.17Kb        <=     0b    920b    767b
host.domain.com   => 10.30.9.125                                    0b   1.40Kb  1.16Kb        <=     0b    920b    767b
host.domain.com   => 10.30.9.122                                    2.91Kb   640b    656b      <=     1.50Kb   353b    473b
host.domain.com   => 10.30.9.138                                    0b    468b    427b.        <=     0b    310b    295b
───────────────────────────────────────────────────────────────────────────────────────────
TX:             cum:   48.8KB   peak:   46.7Kb     rates:   46.7Kb  30.3Kb  32.5Kb
RX:                    24.6KB           56.6Kb              11.3Kb  18.8Kb  16.4Kb
TOTAL:                 73.3KB           81.9Kb

Conclusion

This ends part one of our two-part article series on investigating server load. Server load is the measure of work that a server experiences. When issues arise with CPU usage, RAM deficits, increased Disk I/O, or network congestion, load issues will manifest. In part two of this series, we will explore the means and methods to locate and address load issues on the server.

We pride ourselves on being The Most Helpful Humans In Hosting™!

Our Support Teams are filled with experienced Linux technicians and talented system administrators who have intimate knowledge of multiple web hosting technologies, especially those discussed in this article. Should you have any questions regarding this information, we are always available to answer any inquiries with issues related to this article.