How to Install and Configure Elasticsearch

What is Elasticsearch?

Elasticsearch is a distributed, open-source, full-text search engine which provides multi-tenant capabilities for analyzing multiple data types. It stores and indexes data shipped from sources like Logstash or Kibana. Elasticsearch can then be queried for specific data to return useful information about a particular application, log analysis, application performance data, or other information.

In this tutorial, we will cover how to install Elasticsearch on Ubuntu 18.04. Aggregating and transforming log information from disparate systems is an essential piece of application management and monitoring in a distributed system. Needing the capability to gather is important, but having the ability to contextualize it and be able to comb through it for relevant data is paramount. That’s where Elasticsearch comes in.

How to Install and Configure Elasticsearch

Preflight Check

A server running Ubuntu 18.04 LTS
This tutorial assumes there is a working installation of Java available on the server.
The user has a working knowledge of CLI in the terminal

Install Dependencies

Because Elasticsearch uses Java, we need to ensure the Java Development Kit (JDK) is installed. We can check for the Java installation on our Ubuntu server using this command.

root@ubuntu18:~$ java -version
 -bash: java: command not found

If Java is not installed, you can run the command below to install it or review our KB article for more detailed instructions.

root@ubuntu18:~$ apt install openjdk-8-jdk

Now, we can re-verify our Java JDK installation by running the following command again.

root@ubuntu18:~$ java -version
openjdk version "13.0.2" 2020-01-14
OpenJDK Runtime Environment (build 13.0.2+8)
OpenJDK 64-Bit Server VM (build 13.0.2+8, mixed mode, sharing)

Prepare the Environment

Next, as a best practice, we should update our system packages by running the following command.

root@ubuntu18:~# apt update -y

Next, we will run the following wget command to pull down and install the GPG Public Signing Key for the Logstash package repositories.

root@ubuntu18:~# apt update -y

The following step may or may not be necessary on all systems, but to be certain that all prerequisite packages are available, and we have access to all of our repositories via HTTPS, we will install the following software package called apt-transport-https.

root@ubuntu18:~# apt install apt-transport-https -y
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  apt-transport-https
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 1692 B of archives.
After this operation, 153 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 apt-transport-https all 1.6.12ubuntu0.1 [1692 B]
Fetched 1692 B in 1s (2311 B/s)
Selecting previously unselected package apt-transport-https.
(Reading database ... 35064 files and directories currently installed.)
Preparing to unpack .../apt-transport-https_1.6.12ubuntu0.1_all.deb ...
Unpacking apt-transport-https (1.6.12ubuntu0.1) ...
Setting up apt-transport-https (1.6.12ubuntu0.1) ...
root@ubuntu18:~#

Finally, we will install the Logstash repository and add it to our /etc/apt/sources.list.d file using the following command.

root@ubuntu18:~# echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list

Install Elasticsearch

Now that we have the Logstash repository added, apt needs to be updated to be made aware of the new source.

root@ubuntu18:~# apt update -y

Now we can install Elasticsearch like any other software package using the apt command.

root@ubuntu18:~# apt install elasticsearch -y

If by some chance you get the following error:

Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list.d/elastic-7.x.list:1 and /etc/apt/sources.list.d/elastic-7.x.list:2

after running the above install command, use vim or nano to edit the /etc/apt/sources.list.d/elastic-7.x.list file.

vim /etc/apt/sources.list.d/elastic-7.x.list

Once the file is open, remove one of the two dual entries using the dd command. Pressing dd twice will remove a single line from the file.

deb https://artifacts.elastic.co/packages/7.x/apt stable main 
deb https://artifacts.elastic.co/packages/7.x/apt stable main

Now, rerun the apt install command again.

root@ubuntu18:~# apt install elasticsearch -y
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  elasticsearch
0 upgraded, 1 newly installed, 0 to remove and 19 not upgraded.
Need to get 318 MB of archives.
After this operation, 531 MB of additional disk space will be used.
Selecting previously unselected package elasticsearch.able/main amd64 elasticsearch amd64 7.8.1 [318 MB]
(Reading database ... 35068 files and directories currently installed.)
Preparing to unpack .../elasticsearch_7.8.1_amd64.deb ...
Creating elasticsearch group... OK
Creating elasticsearch user... OK
Unpacking elasticsearch (7.8.1) ...
Setting up elasticsearch (7.8.1) ...
System has not been booted with systemd as init system (PID 1). Can't operate.
Created elasticsearch keystore in /etc/elasticsearch/elasticsearch.keystore
Processing triggers for ureadahead (0.100.0-21) ...
Processing triggers for systemd (237-3ubuntu10.41) …
root@ubuntu18:~#

Configure Elasticsearch

Elasticsearch has a basic configuration in place after we install it, but we can modify the default elasticsearch.yml configuration file located within the /etc/elasticsearch directory. Nearly all the Elasticsearch settings exist in the following files.

/etc/elasticsearch/elasticsearch.yml – Main configuration file
/etc/elasticsearch/jvm.options – Configures Elasticsearch JVM settings
/etc/elasticsearch/log4j2.properties – Configures Elasticsearch logging
/etc/default/elasticsearch – Java start-up settings
/var/lib/elasticsearch – Main Elasticsearch data file

root@ubuntu18:~# /etc/elasticsearch# ll
total 32
drwxr-s--- 1 root elasticsearch  4096 Aug 13 08:53 ./
drwxr-xr-x 1 root root           4096 Aug 13 08:04 ../
-rw-r--r-- 1 root elasticsearch    76 Aug 12 18:02 .elasticsearch.keystore.initial_md5sum
-rw-rw---- 1 root elasticsearch   199 Aug 12 18:02 elasticsearch.keystore
-rw-rw---- 1 root elasticsearch  2847 Jul 21 12:47 elasticsearch.yml
-rw-rw---- 1 root elasticsearch  2373 Jul 21 12:47 jvm.options
drwxr-s--- 1 root elasticsearch  4096 Jul 21 12:47 jvm.options.d/
-rw-rw---- 1 root elasticsearch 17419 Jul 21 12:47 log4j2.properties
-rw-rw---- 1 root elasticsearch   473 Jul 21 12:47 role_mapping.yml
-rw-rw---- 1 root elasticsearch   197 Jul 21 12:47 roles.yml
-rw-rw---- 1 root elasticsearch     0 Jul 21 12:47 users
-rw-rw---- 1 root elasticsearch     0 Jul 21 12:47 users_roles
root@ubuntu18:~# /etc/elasticsearch#

There are other configuration files in this directory which we will not touch on, but further information can be garnered from the main elastic.co website.

Default Configuration File

Below is a default Elasticsearch yaml file. It contains the configuration options for our cluster, node, paths, memory, network, discovery, and gateway. Again, most of the defaults are fine to leave as is, but your settings may vary depending on the project variables.

[su_box title=”Note:” style=”glass” box_color=”#3ac6eb” radius=”20″]Since Elasticsearch’s configuration file is in a YAML format, care should be taken to keep the specific format intact. This means we should not add any extra spaces or tabs as we edit the file.[/su_box]

Since Elasticsearch’s configuration file is in a YAML format, care should be taken to keep the specific format intact. This means we should not add any extra spaces or tabs as we edit the file.

# =========== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#Before you set out to tweak and tune the configuration, make sure you
#understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#
# ------------------------------------ Node
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

In this tutorial, we are only going to modify the network settings to allow access to the host. To accomplish this, simply remove the ‘#‘ character at the front of the line to modify the default setting.

To further restrict access to Elasticsearch from unwanted external access via its built-in REST API, find the line that specifies network.host, and uncomment it, and replace its value with localhost as shown below.

# ------------------------ Network -----------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 192.168.0.1network.host: localhost

Normally, Elasticsearch listens on port 9200 but this can be modified to only listen to a specific interface. We can specify that setting with an IP, replacing localhost as the default gateway.

Server Roles

One of the most important settings is defining what purpose the server is going to be used for. Normally in a multi-server clustered set up, a master or slave node should be selected. In larger deployments, multiple master nodes can be set up to maintain the stability and health of the cluster. Typically, Elasticsearch servers also have designated slaves which are used store tasks and share workloads.

Depending on the needs of the project, the cluster.name and node.name settings in the may need to be uncommented and modified to set the name of the server and cluster. If we do not modify these settings, a node.name and cluster.name will be automatically set to the name of the default node and cluster if one is available.

The setting which determines the role of the server is called node.master. By default, the first node is the master. If you only have one Elasticsearch node, you should leave the default option set to the true value because at least one master is always needed. Alternatively, if you wish to configure the node as a slave, assign a false value to the variable node.master.

Enable Elasticsearch

Once Elasticsearch is installed, we will run the following command to enable it to run on startup, and start it in our current session.

 root@ubuntu18:~# systemctl enable --now elasticsearch 
 * Starting Elasticsearch Server 
root@ubuntu18:~#

This command may take some time to complete.

Firewall Modifications

Next, we will configure the UFW firewall in Ubuntu to allow access to the default Elasticsearch port 9200 from a trusted host. To allow access, type the following command in the terminal.

root@ubuntu18:~# ufw allow from 192.190.221.246  to any port 9200

Once that is complete, you can enable UFW with the following command. Type Y when prompted and the follow-up message will let us know UFW is active and enabled.

root@ubuntu18:~# ufw enable 
Command may disrupt existing ssh connections. Proceed with operation (y|n)? Y
Firewall is active and enabled on system startup

Finally, check the status of UFW with the following command.

root@ubuntu18:~# ufw status

Verify Configuration

Lastly, we can then verify that Elasticsearch is now configured by running the following curl and get commands.

root@ubuntu18:~# curl -X GET "localhost:9200"
{
  "name" : "ubuntu18.awesome.com",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "ZPbl_EHjS5Kfb43P3O8o-w",
  "version" : {
    "number" : "7.8.1",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "b5ca9c58fb664ca8bf9e4057fc229b3396bf3a89",
    "build_date" : "2020-07-21T16:40:44.668009Z",
    "build_snapshot" : false,
    "lucene_version" : "8.5.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline": "Liquid Web Rocks!"
}

If you see output similar to the above, we can determine that Elasticsearch is working as expected. If you get an error similar to the following:

root@ubuntu18:~# curl -X GET "localhost:9200" 
curl: (7) Failed to connect to localhost port 9200: Connection refused

simply wait a few seconds and retry the command again.

Elasticsearch is now up and running. It is ready to be connected to various data sources to begin collecting data and indexing it. Elasticsearch can be set up in a distributed way, with multiple nodes in play. Depending on your needs, it can provide a means to search and make logs coming in from your application or system more meaningful.

Conclusion

Elasticsearch is now a mainstay for multiple large industries for distributed search and data analysis. Its ease of use, configurability, feature set and scalability are just some of the many reasons Elasticsearch is in use today. Additionally, because it is open-source and free to use, corporate usage has skyrocketed over the years. Lastly, Elastic provides excellent support and documentation to support its use in smaller simple testing environments scaling up to major global providers. Liquid Web provides multiple platforms in which this software can be utilized. Want to know more?

Our Solutions team is standing by day and night to discuss your needs and provide educated and reliable advice on how we can implement a system like this for you!

We pride ourselves on being The Most Helpful Humans In Hosting™!

Our talented Support Teams are full of experienced Linux technicians and System administrators who have intimate knowledge of multiple web hosting technologies, especially those discussed in this article. We are always available to assist with any issues related to this article, 24 hours a day, 7 days a week 365 days a year.

If you are a Fully Managed VPS server, Cloud Dedicated, VMWare Private Cloud, Private Reseller VPS Parent server or a Dedicated server owner and you are uncomfortable with performing any of the steps outlined, we can be reached via chat or support ticket to assisting you with this process.