How to Install Scikit-Learn on Ubuntu 18.04

Reading Time: 5 minutes

In this tutorial, we are going to walk through how to install scikit-learn on an Ubuntu 18.04 server. We are going to walk through the installation both in a virtual environment with the Python package manager, Pip, and via Anaconda.

Scikit-learn is a Python library designed to provide an interface for developers to create machine learning software. When comparing scikit-learn with other Python libraries that broach similar subject matter, such as TensorFlow, it’s important to note that scikit-learn provides a higher-level interface and is set up with algorithms for machine-learning ready-to-use. For this reason, scikit-learn lands more squarely in the field of traditional machine learning.

Once scikit-learn is installed on your server, you could even combine it with the Python web framework, some CSS, javascript, and HTML to build a frontend to expose your machine-learning model to the web! 

Pre-flight Check

  • These instructions are being performed on an Ubuntu 18.04 LTS server as the root user.
  • Python Version (=> 3.5)
  • Anaconda is a prerequisite for the Anaconda installation portion of this tutorial. If you need to get Anaconda installed, check out our tutorial here!
  • The Python module venv is required for the Pip installation portion of this tutorial. If you need to install venv and get some basics on utilizing a Python virtual environment, check out our tutorial here!

Install Scikit-Learn via Pip 

Step 1: Create a new Python virtual environment

First, as a best practice, ensure all packages are up to date:

root@ubuntu:~# apt-get update -y

Once everything is up to date, let’s create and change into a directory for our project:

root@ubuntu:~# mkdir machine_learning
root@ubuntu:~# cd machine_learning/
root@ubuntu:~/machine_learning#

Now that we have a fresh space to work in, let’s create our new Python virtual environment:

root@ubuntu:~/machine_learning# python3 -m venv scikit_is_cool

Finally, let’s go ahead and activate the newly created virtual environment:

root@ubuntu:~/machine_learning# source scikit_is_cool/bin/activate
(scikit_is_cool) root@ubuntu:~/machine_learning#

Step 2: Install dependencies

One of the major differences between installing scikit-learn via Pip as opposed to Anaconda is that we are going to have to manage scikit-learn’s dependencies. This just means we have a few extra Python modules to install before we will be ready to install scikit-learn.

We now need to install the Python modules that scikit-learn depends on to function. We are also going to install some additional libraries that scikit-learn doesn’t always depend on, but will allow us to take full advantage of scikit-learn’s functionality:

Note:
Scikit-learn’s dependencies can be found here
(scikit_is_cool) root@ubuntu:~/machine_learning# pip install numpy scipy joblib matplotlib scikit-image pandas

Step 3: Install and Test Scikit-learn

The environment is now ready for us to install scikit-learn:

(scikit_is_cool) root@ubuntu:~/machine_learning# pip install -U scikit-learn

Scikit-learn is now installed! To test this out, let’s drop into a Python shell and try to load up one of its default datasets:

(scikit_is_cool) root@ubuntu:~/machine_learning# python
Python 3.6.8 (default, Oct  7 2019, 12:59:55)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Once the shell is open, copy and paste in this Python snippet and hit enter:

from sklearn import datasets
iris = datasets.load_iris()
digits = datasets.load_digits()
print(digits.data)

The output should look something like this:

[[ 0.  0.  5. ...  0.  0.  0.]
 [ 0.  0.  0. ... 10.  0.  0.]
 [ 0.  0.  0. ... 16.  9.  0.]
 ...
 [ 0.  0.  1. ...  6.  0.  0.]
 [ 0.  0.  2. ... 12.  0.  0.]
 [ 0.  0. 10. ... 12.  1.  0.]]

That’s it! Scikit-learn is now available in the Python virtual environment we just created. There is a wealth of great examples in scikit-learn’s documentation here.

Install Scikit-Learn via Anaconda

In some ways, using conda, the package manager that comes along with Anaconda to install scikit-learn, is a bit more straight-forward. The reason for this is that conda takes the management of the dependencies for scikit-learn out of our hands. When we install scikit-learn with conda, we get all of scikit-learn’s dependencies as part of the install by default.

Step 1: Setup and install scikit-learn

Conda offers the ability for us to create a discreet environment for our scikit installation to live in, similar to the virtual environment mentioned in the Pip installation portion of this tutorial. With conda, we can actually create the environment and install scikit with one command:

root@ubuntu:~# conda create --name conda-scikit scikit-learn

You should see an output similar to this, with a prompt. This prompt is requesting permission for conda to go out and grab any dependencies scikit-learn might require. Type ‘y’ into the terminal and hit enter to continue with the installation:

The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main
  blas               pkgs/main/linux-64::blas-1.0-mkl
  ca-certificates    pkgs/main/linux-64::ca-certificates-2019.10.16-0
  certifi            pkgs/main/linux-64::certifi-2019.9.11-py37_0
  intel-openmp       pkgs/main/linux-64::intel-openmp-2019.4-243
  joblib             pkgs/main/linux-64::joblib-0.13.2-py37_0
  libedit            pkgs/main/linux-64::libedit-3.1.20181209-hc058e9b_0
  libffi             pkgs/main/linux-64::libffi-3.2.1-hd88cf55_4
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-9.1.0-hdf63c60_0
  libgfortran-ng     pkgs/main/linux-64::libgfortran-ng-7.3.0-hdf63c60_0
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0
  mkl                pkgs/main/linux-64::mkl-2019.4-243
  mkl-service        pkgs/main/linux-64::mkl-service-2.3.0-py37he904b0f_0
  mkl_fft            pkgs/main/linux-64::mkl_fft-1.0.15-py37ha843d7b_0
  mkl_random         pkgs/main/linux-64::mkl_random-1.1.0-py37hd6b4f25_0
  ncurses            pkgs/main/linux-64::ncurses-6.1-he6710b0_1
  numpy              pkgs/main/linux-64::numpy-1.17.3-py37hd14ec0e_0
  numpy-base         pkgs/main/linux-64::numpy-base-1.17.3-py37hde5b4d6_0
  openssl            pkgs/main/linux-64::openssl-1.1.1d-h7b6447c_3
  pip                pkgs/main/linux-64::pip-19.3.1-py37_0
  python             pkgs/main/linux-64::python-3.7.5-h0371630_0
  readline           pkgs/main/linux-64::readline-7.0-h7b6447c_5
  scikit-learn       pkgs/main/linux-64::scikit-learn-0.21.3-py37hd81dba3_0
  scipy              pkgs/main/linux-64::scipy-1.3.1-py37h7c811a0_0
  setuptools         pkgs/main/linux-64::setuptools-41.6.0-py37_0
  six                pkgs/main/linux-64::six-1.12.0-py37_0
  sqlite             pkgs/main/linux-64::sqlite-3.30.1-h7b6447c_0
  tk                 pkgs/main/linux-64::tk-8.6.8-hbc83047_0
  wheel              pkgs/main/linux-64::wheel-0.33.6-py37_0
  xz                 pkgs/main/linux-64::xz-5.2.4-h14c3975_4
  zlib               pkgs/main/linux-64::zlib-1.2.11-h7b6447c_3


Proceed ([y]/n)? 
Y

Step 2: Activate the conda environment and test scikit-learn

Scikit-learn is now installed into a freshly created conda environment along with all of its dependencies! Once the installation is complete, we will see output indicating how to activate and deactivate the newly created conda environment we just created. It’s very similar to activating and deactivating a virtual environment created with the venv module:

#
# To activate this environment, use
#
#     $ conda activate conda-scikit
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Let’s go ahead and activate the conda environment to make sure scikit-learn is available:

root@ubuntu:~# conda activate conda-scikit
(conda-scikit) root@ubuntu:~#

Now that the conda environment is active, let’s hop into a Python shell and hit that default dataset:

Python 3.7.5 (default, Oct 25 2019, 15:51:11)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Next, paste in the following snippet and hit enter:

from sklearn import datasets
iris = datasets.load_iris()
digits = datasets.load_digits()
print(digits.data)

You should see this output in the Python shell:

[[ 0.  0.  5. ...  0.  0.  0.]
 [ 0.  0.  0. ... 10.  0.  0.]
 [ 0.  0.  0. ... 16.  9.  0.]
 ...
 [ 0.  0.  1. ...  6.  0.  0.]
 [ 0.  0.  2. ... 12.  0.  0.]
 [ 0.  0. 10. ... 12.  1.  0.]]

Start Your AI Journey Today!!!

This technology is extremely hot right now and with good reason! Being able to take advantage of the information you may already possess to identify new areas of opportunity and growth, strikes a deep chord in every business owner.

Firms are utilizing machine learning to:

  • Enhance operational workflows
  • Increase the speed and course of information to your clients
  • Improve the customer experience by heightening personalization
  • Streamline the hiring processes
  • Boost employee’s engagement and retention
  • Take better advantage of predictive analytics
  • Enhanced metrics make for informed decisions

Give us a call at 800.580.4985, or open a chat or ticket with us to speak with one of our knowledgeable Solutions or Experienced Hosting advisors to learn how to begin your journey today!