Understanding How To Use the AWK Command
AWK stands for “Aho Weinberg Kernighan” and are the last names of people who invented it: Alfred Aho, Peter Weinberg, and Brian Kernighan. The purpose of AWK is to search existing files to find lines that match certain patterns. It is a full scripting language, as well as a complete text manipulation toolkit. It is data-driven, meaning you define a set of actions to be performed on provided text, and it sends results to standard output.
With AWK, we can:
- Scan a file line by line.
- Split each input line into fields.
- Compare input lines or fields to patterns.
- Perform actions on matched lines.
Patterns are enclosed in slashes (//), actions are enclosed in braces ({}), and the entire AWKprogram is enclosed in single quotes (‘). The default delimiter for the awk command is any whitespace character like space or tab. If there is no pattern in the awk command, then all lines from the provided file will be matched.
Let’s see the contents of the current folder with the ls -l command.
[mstevens@host public_html]$ ls -l
total 12
-rw-rw-r--. 1 mstevens mstevens 6426 Feb 9 08:00 access_log
-rw-rw-r--. 1 mstevens mstevens 0 Mar 19 04:48 config.php
-rw-r--r--. 1 mstevens mstevens 3661 Mar 19 04:31 dovecot.log
-rw-rw-r--. 1 mstevens mstevens 0 Mar 19 04:48 error_log
-rwxrwxrwx. 1 mstevens mstevens 0 Mar 19 04:49 everyone.txt
-rw-rw-r--. 1 mstevens mstevens 0 Mar 19 04:48 index.php
-rw-rw-r--. 1 mstevens mstevens 0 Mar 19 04:49 list.php
-rw-rw-r--. 1 mstevens mstevens 0 Mar 19 04:49 login.php
-rw-rw-r--. 1 mstevens mstevens 0 Mar 24 03:14 php.ini
The output of the ls command shows the total number of blocks (which in this case is 12) and contains nine fields (from left to right):
- Permissions
- The number of connections
- User
- Group
- Size
- Month
- Day
- Time of the last update
- Filename
If, for example, we only need to print out permissions and filenames, we can pipe the ls -l command into AWKand tell it to print the first and ninth fields.
The simple AWKprogram below has no pattern, only actions, so it will review and match every line of text provided by showing only the first and ninth fields on each line.
[mstevens@host public_html]$ ls -l | awk '{print $1,$9}'
total
-rw-rw-r--. access_log
-rw-rw-r--. config.php
-rw-r--r--. dovecot.log
-rw-rw-r--. error_log
-rwxrwxrwx. everyone.txt
-rw-rw-r--. index.php
-rw-rw-r--. list.php
-rw-rw-r--. login.php
-rw-rw-r--. php.ini
As you can see, the ls command output has 10 lines of text, including the line with the word total. The word total is the first field on its line, and the number 12 was the second field on its line. Only total is returned in the output because the awk command requested the first and ninth fields. To avoid matching lines that are not needed, we can provide a pattern, and only lines with this pattern will be output.
Pattern Matching
Patterns in AWK are used to show specific actions on lines that match a given pattern. The same thing can be accomplished with a grep command to find certain information in the provided text or files. The only difference is that we don’t need to combine multiple commands; we just need to use one awk command.
AWKsupports different types of patterns:
- Regular expression patterns
- Relation expression patterns
- Range patterns
- Special expressions
Regular Expression Patterns
The most basic example is string matching. If we want to get only lines with the word php, we can add a pattern in the awk command between slashes (//). As shown below, no matter where word php is located in the line, those files are displayed in the output.
[mstevens@host public_html]$ ls -l | awk '/php/ {print $1,$9}'
-rw-rw-r--. config.php
-rw-rw-r--. index.php
-rw-rw-r--. list.php
-rw-rw-r--. login.php
-rw-rw-r--. php.ini
Regex Syntax Characters
A regular expression is a pattern describing a certain amount of text. Not to confuse it with “Regular expression pattern,” which is one of the awk patterns, I will use "regex," which is also widely used in IT.
Certain characters have special meanings when used in regex.
Anchors
Anchors do not match any character. Instead, they match a position before or after characters.
Anchor | Function |
---|---|
^ | Indicates the beginning of the line. |
$ | Indicates the end of a line. |
\A | Denotes the beginning of a string. |
\z | Denotes the end of a string. |
\b | Marks a word boundary. |
Characters
You can match characters that follow specific rules.
Character | Function |
---|---|
[ae] | Selects a or e. |
[a-e] | Selects any character from a to e (a, b, c, d, or e). |
[^a-e] | Selects any character except a through e (f, g, h, etc.). |
\w | Selects any word. |
\s | Selects any whitespace character. |
\b | Selects any digit. |
Quantifiers
Quantifiers specify how many instances of a character, group, or character class must be present in the input for a match to be found.
Quantifier | Function |
---|---|
. | Matches any character. |
+ | Modifies the preceding set one or more times. |
* | Modifies the preceding set zero or more times. |
? | Modifies the preceding set zero or one time. |
{n} | Modifies the preceding set exactly n times. |
{n,} | Modifies the preceding set n or more times |
{n,m} | Modifies the preceding set between n and m times. |
With this information, we can now use it to find all PHP files. We can use /php$/ in the command to find all lines that end with php.
[mstevens@host public_html]$ ls -l | awk '$9 ~ /php$/ {print $1,$9}'
-rw-rw-r--. config.php
-rw-rw-r--. index.php
-rw-rw-r--. list.php
-rw-rw-r--. login.php
In the current folder, there are only four PHP files. The file php.ini was excluded because php is not at the end of the string.
Relational Expression Patterns
By default, regular expression patterns are matched against the whole line. Relational expression patterns match the content of a specified field with the provided pattern.
To match a pattern against a field, we would need to specify the comparison operator (~) against a pattern:
- Match lines: $n ~ /pattern/
- Not match lines: $n !~ /pattern/
The placeholder $n is the number of fields used to match the provided pattern. Now let’s use our previous example.
ls -l | awk '$9 ~ /php/ {print $1,$9}
The $9 ~ /php/ will match the 9th field with the word php.
[mstevens@host public_html]$ ls -l | awk '$9 ~ /php/ {print $1,$9}'
-rw-rw-r--. config.php
-rw-rw-r--. index.php
-rw-rw-r--. list.php
-rw-rw-r--. login.php
-rw-rw-r--. php.ini
If I tried using the first field (permissions) there wouldn’t be any results since the first field only contains characters like -rwxr-xr--. (which stands for read, write, execute).
[mstevens@host public_html]$ ls -l | awk '$1 ~ /php/ {print $1,$9}'
[mstevens@host public_html]$
Range Patterns
Range patterns consist of two patterns separated by a comma. This allows us to print all records from the line that matches the first pattern until the second pattern is matched.
/pattern1/, /pattern2/
In this example I want to print all files from the line that matches config and up to file that matches index. The command is shown below.
[mstevens@host public_html]$ ls -l | awk '/config/,/index/ { print $0 }'
-rw-rw-r--. 1 mstevens mstevens 0 Mar 19 04:48 config.php
-rw-r--r--. 1 mstevens mstevens 3661 Mar 19 04:31 dovecot.log
-rw-rw-r--. 1 mstevens mstevens 0 Mar 19 04:48 error_log
-rwxrwxrwx. 1 mstevens mstevens 0 Mar 19 04:49 everyone.txt
-rw-rw-r--. 1 mstevens mstevens 0 Mar 19 04:48 index.php
We could also match characters in lines that follow defined rules. Let’s say you want to find all lines containing the letter l, followed by the letter o or i. Create the below command.
[mstevens@host public_html]$ ls -l | awk '$9 ~ /l[oi]/ {print $1,$9}'
-rw-rw-r--. access_log
-rw-r--r--. dovecot.log
-rw-rw-r--. error_log
-rw-rw-r--. list.php
-rw-rw-r--. login.php
As shown above, log, list, and login are the words matching regex used in the awk command.
Quantifiers can be used if there is a certain character repeating in the provided text. I have created a file with the following content.
[mstevens@host public_html]$ cat test.txt
1. a b c d
2. d c b a
3. aa bb cc dd
4. dd cc bb aa
5. aaa bbb ccc ddd
6. ddd ccc bbb aaa
To find all lines that contain three a characters (aaa) and have at least one subsequent c character, I would use the following command.
awk '/a{3}.*c/ {print $0}' test.txt
The output indicates one line contains aaa with at least one character c following afterward.
[mstevens@host public_html]$ awk '/a{3}.*c/ {print $0}' test.txt
5. aaa bbb ccc ddd
Special Expressions
Variables within AWK can be set at any line in the program. AWK includes the following special patterns:
- BEGIN - Carries out its corresponding action before the first record is read and is generally used to defines variables for the entire program.
- END - Performs its action after the last record is read from the input file.
AWK has several built-in variables that allow you to control how the program is processed. Here are some of the most common built-in variables.
Variable | Function |
---|---|
NF | The number of fields in the record. |
NR | The number of the current record. |
FILENAME | The name of the input file that is currently processed. |
FS | Field separator. |
RS | Record separator. |
OFS | Output field separator. |
ORS | Output record separator. |
Now let’s use NR in our command to check the number of lines in test.txt. As we see below, there are six lines within the file.
[mstevens@host public_html]# awk 'END { print FILENAME, "contains", NR, "lines." }' test.txt
test.txt contains 6 lines.
Changing the Separator
The separator is any character that divides lines of text into fields. The default field separator is any number of whitespace characters like space or tab, but you can change the separator with the FS variable or -F flag in the awk command.
Using the FS Variable
First, we will show how to use the FS variable. Below we have the current lines in test.txt with the fields separated by white spaces.
[mstevens@host public_html]$ cat test.txt
1. a b c d
2. d c b a
3. aa bb cc dd
4. dd cc bb aa
5. aaa bbb ccc ddd
6. ddd ccc bbb aaa
For easier readability, the image below shows the information above, with the white spaces highlighted in green.

Now, I will separate the fields by the c character and print the first field. This means the existing white spaces will no longer separate each field and are regular characters. Everything before the first c in a line will be part of the first field and will be printed. All remaining information on the lines is part of subsequent fields and will not be included in the output.
[mstevens@host public_html]$ awk 'BEGIN { FS = "c" } { print $1 }' test.txt
1. a b
2. d
3. aa bb
4. dd
5. aaa bbb
6. ddd
Again, we have the output from above shown below with the separator (c).

Because the separator creates an additional field, the number of c’s on a line will increase the number of fields present. Two fields are present on lines 1 and 2, three fields are on lines 3 and 4, and four fields are present on lines 5 and 6. We can better see this in the image below. The area between each green separator represents an additional field.

Using the -F Flag
Now we will change the separator in an awk command using the -F flag and work through another example.
awk -F'c' '{ print $1 }' test.txt
Below shows our previous folder contents from earlier in the article.
[mstevens@host public_html]$ ls -l
total 12
-rw-rw-r--. 1 mstevens mstevens 6426 Feb 9 08:00 access_log
-rw-rw-r--. 1 mstevens mstevens 0 Mar 19 04:48 config.php
-rw-r--r--. 1 mstevens mstevens 3661 Mar 19 04:31 dovecot.log
-rw-rw-r--. 1 mstevens mstevens 0 Mar 19 04:48 error_log
-rwxrwxrwx. 1 mstevens mstevens 0 Mar 19 04:49 everyone.txt
-rw-rw-r--. 1 mstevens mstevens 0 Mar 19 04:48 index.php
-rw-rw-r--. 1 mstevens mstevens 0 Mar 19 04:49 list.php
-rw-rw-r--. 1 mstevens mstevens 0 Mar 19 04:49 login.php
-rw-rw-r--. 1 mstevens mstevens 0 Mar 24 03:14 php.ini
By utilizing a few records from dovecot.log, we can determine if someone is trying to access the email accounts by incorporating the awk command. We have examples of failed and successful connections.
Failed Connection | Successful Connection |
---|---|
Mar 19 04:21:20 host dovecot: imap-login: Disconnected (auth failed, 1 attempts in 2 secs): user=<mstevens@liquidweb.com>, method=PLAIN, rip=50.50.50.50, lip=5.6.7.8, TLS, TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits), session=<sc1GnVq+wiYf2SVh> | Mar 19 04:37:33 host dovecot: imap-login: Login: user=<mstevens@liquidweb.com>, method=PLAIN, rip=1.2.3.4, lip=5.6.7.8, mpid=20273, TLS, TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits), session=<Nk1RJOK9Qp5dj5y/> |
It’s not pretty, but we can break the connection output into smaller pieces. The most important values in these logs to focus on are:
- imap-login - Indicates that someone tried to log into an email account.
- user= - Shows which email account that person is trying to access.
- rip= - The IP that is trying to connect.
The following command will output all of the IPs that failed to connect to an email account.
[mstevens@host public_html]$ awk -F'rip=' '/imap-login/&&/failed/ {print $1, $2}' dovecot.log | awk -F'user=' '{print $2}' | awk -F, '{print $3,$1}'
127.0.0.1 <mstevens@liquidweb.com>
127.0.0.1 <mstevens@liquidweb.com>
127.0.0.1 <mstevens@liquidweb.com>
50.50.50.50 <mstevens@liquidweb.com>
50.50.50.50 <mstevens@liquidweb.com>
50.50.50.50 <mstevens@liquidweb.com>
50.50.50.50 <mstevens@liquidweb.com>
50.50.50.50 <mstevens@liquidweb.com>
50.50.50.50 <mstevens@liquidweb.com>
50.50.50.50 <mstevens@liquidweb.com>
50.50.50.50 <mstevens@liquidweb.com>
50.50.50.50 <mstevens@liquidweb.com>
If you see suspicious activity, someone could be attempting a brute force attack on your server. Update your password(s) as soon as possible and take steps to prevent attacks in the future, like implementing two-factor authentication (2FA) and enabling CAPTCHA.
Using AWK With sub() and gsub()
AWK features several functions that perform find-and-replace actions like the sed command. The sub function substitutes the first matched entity in a record with a provided string. I’m going to show this on the test.txt file.
The part of the command that reads sub(/a/, "X", $2); will substitute the letter a with a letter X in the second field. Only the first, third, and fifth lines will be affected since the lines contain the letter a on the second field.
[mstevens@host public_html]$ awk '{sub(/a/, "X", $2); print $0}' test.txt
1. X b c d
2. d c b a
3. Xa bb cc dd
4. dd cc bb aa
5. Xaa bbb ccc ddd
6. ddd ccc bbb aaa
While this change will only be shown in the terminal and won’t change the file, we can redirect the output to a different file to save the changes. The sub function is used when we need to replace certain information within a file, like a site URL in sql files, while still preserving the original sql file.
The second function is gsub, and while it has the same syntax, the only difference is that it will replace all values found in the provided fields, not just the first character. Again, the first, third, and fifth lines are affected, but instead of only the first a character in the line changing to X, all a characters in the first field are changed to X.
[mstevens@host public_html]$ awk '{gsub(/a/, "X", $2); print $0}' test.txt
1. X b c d
2. d c b a
3. XX bb cc dd
4. dd cc bb aa
5. XXX bbb ccc ddd
6. ddd ccc bbb aaa
Conclusion
AWK is a powerful tool that can replace commands like grep, sed, and many others to find patterns within files. Depending on what is needed, all patterns can be changed to output the desired information. Test out the commands mentioned in this article on your own server and see what patterns you can find!
To learn more about Liquid Webs solutions, please visit our product overview page to learn more. Our Managed Hosting line of products is robust enough for businesses of every size, from early-stage startups to mature businesses requiring enterprise hosting environments.
Related Articles:
- How to Force HTTPS For Your Domain
- 2 Methods of Checking Apache Version
- How to Install Adminer MySQL Database Management Tool on AlmaLinux
- How to Edit the PHP Memory for Your WordPress Site via WP Toolkit
- 4 Methods for How to Install Yarn on Windows Server
- How to Install Bpytop Resource Monitoring Tool on AlmaLinux
About the Author: Matthew Stevens
I'm a system administrator, developer, and I'm constantly improving and learning new skills. In my spare time I keep myself in shape with dancing... breakdancing!
Our Sales and Support teams are available 24 hours by phone or e-mail to assist.
Latest Articles
How to Force HTTPS For Your Domain
Read ArticleWhat is CGI-Bin and What Does it Do?
Read ArticleTop 10 Password Security Standards
Read ArticleTop 10 Password Security Standards
Read ArticleHow to Use the WP Toolkit to Secure and Update WordPress
Read Article