How to Install Scikit-Learn on Ubuntu 18.04

Reading Time: 5 minutes

In this tutorial, we are going to walk through how to install scikit-learn on an Ubuntu 18.04 server. We are going to walk through the installation both in a virtual environment with the Python package manager, Pip, and via Anaconda.

Scikit-learn is a Python library designed to provide an interface for developers to create machine learning software. When comparing scikit-learn with other Python libraries that broach similar subject matter, such as TensorFlow, it’s important to note that scikit-learn provides a higher-level interface and is set up with algorithms for machine-learning ready-to-use. For this reason, scikit-learn lands more squarely in the field of traditional machine learning.

Once scikit-learn is installed on your server, you could even combine it with the Python web framework, some CSS, javascript, and HTML to build a frontend to expose your machine-learning model to the web! 

Pre-flight Check

  • These instructions are being performed on an Ubuntu 18.04 LTS server as the root user.
  • Python Version (=> 3.5)
  • Anaconda is a prerequisite for the Anaconda installation portion of this tutorial. If you need to get Anaconda installed, check out our tutorial here!
  • The Python module venv is required for the Pip installation portion of this tutorial. If you need to install venv and get some basics on utilizing a Python virtual environment, check out our tutorial here!

Install Scikit-Learn via Pip 

Step 1: Create a new Python virtual environment

First, as a best practice, ensure all packages are up to date:

root@ubuntu:~# apt-get update -y

Once everything is up to date, let’s create and change into a directory for our project:

root@ubuntu:~# mkdir machine_learning
root@ubuntu:~# cd machine_learning/

Now that we have a fresh space to work in, let’s create our new Python virtual environment:

root@ubuntu:~/machine_learning# python3 -m venv scikit_is_cool

Finally, let’s go ahead and activate the newly created virtual environment:

root@ubuntu:~/machine_learning# source scikit_is_cool/bin/activate
(scikit_is_cool) root@ubuntu:~/machine_learning#

Step 2: Install dependencies

One of the major differences between installing scikit-learn via Pip as opposed to Anaconda is that we are going to have to manage scikit-learn’s dependencies. This just means we have a few extra Python modules to install before we will be ready to install scikit-learn.

We now need to install the Python modules that scikit-learn depends on to function. We are also going to install some additional libraries that scikit-learn doesn’t always depend on, but will allow us to take full advantage of scikit-learn’s functionality:

Scikit-learn’s dependencies can be found here
(scikit_is_cool) root@ubuntu:~/machine_learning# pip install numpy scipy joblib matplotlib scikit-image pandas

Step 3: Install and Test Scikit-learn

The environment is now ready for us to install scikit-learn:

(scikit_is_cool) root@ubuntu:~/machine_learning# pip install -U scikit-learn

Scikit-learn is now installed! To test this out, let’s drop into a Python shell and try to load up one of its default datasets:

(scikit_is_cool) root@ubuntu:~/machine_learning# python
Python 3.6.8 (default, Oct  7 2019, 12:59:55)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

Once the shell is open, copy and paste in this Python snippet and hit enter:

from sklearn import datasets
iris = datasets.load_iris()
digits = datasets.load_digits()

The output should look something like this:

[[ 0.  0.  5. ...  0.  0.  0.]
 [ 0.  0.  0. ... 10.  0.  0.]
 [ 0.  0.  0. ... 16.  9.  0.]
 [ 0.  0.  1. ...  6.  0.  0.]
 [ 0.  0.  2. ... 12.  0.  0.]
 [ 0.  0. 10. ... 12.  1.  0.]]

That’s it! Scikit-learn is now available in the Python virtual environment we just created. There is a wealth of great examples in scikit-learn’s documentation here.

Install Scikit-Learn via Anaconda

In some ways, using conda, the package manager that comes along with Anaconda to install scikit-learn, is a bit more straight-forward. The reason for this is that conda takes the management of the dependencies for scikit-learn out of our hands. When we install scikit-learn with conda, we get all of scikit-learn’s dependencies as part of the install by default.

Step 1: Setup and install scikit-learn

Conda offers the ability for us to create a discreet environment for our scikit installation to live in, similar to the virtual environment mentioned in the Pip installation portion of this tutorial. With conda, we can actually create the environment and install scikit with one command:

root@ubuntu:~# conda create --name conda-scikit scikit-learn

You should see an output similar to this, with a prompt. This prompt is requesting permission for conda to go out and grab any dependencies scikit-learn might require. Type ‘y’ into the terminal and hit enter to continue with the installation:

The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main
  blas               pkgs/main/linux-64::blas-1.0-mkl
  ca-certificates    pkgs/main/linux-64::ca-certificates-2019.10.16-0
  certifi            pkgs/main/linux-64::certifi-2019.9.11-py37_0
  intel-openmp       pkgs/main/linux-64::intel-openmp-2019.4-243
  joblib             pkgs/main/linux-64::joblib-0.13.2-py37_0
  libedit            pkgs/main/linux-64::libedit-3.1.20181209-hc058e9b_0
  libffi             pkgs/main/linux-64::libffi-3.2.1-hd88cf55_4
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-9.1.0-hdf63c60_0
  libgfortran-ng     pkgs/main/linux-64::libgfortran-ng-7.3.0-hdf63c60_0
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0
  mkl                pkgs/main/linux-64::mkl-2019.4-243
  mkl-service        pkgs/main/linux-64::mkl-service-2.3.0-py37he904b0f_0
  mkl_fft            pkgs/main/linux-64::mkl_fft-1.0.15-py37ha843d7b_0
  mkl_random         pkgs/main/linux-64::mkl_random-1.1.0-py37hd6b4f25_0
  ncurses            pkgs/main/linux-64::ncurses-6.1-he6710b0_1
  numpy              pkgs/main/linux-64::numpy-1.17.3-py37hd14ec0e_0
  numpy-base         pkgs/main/linux-64::numpy-base-1.17.3-py37hde5b4d6_0
  openssl            pkgs/main/linux-64::openssl-1.1.1d-h7b6447c_3
  pip                pkgs/main/linux-64::pip-19.3.1-py37_0
  python             pkgs/main/linux-64::python-3.7.5-h0371630_0
  readline           pkgs/main/linux-64::readline-7.0-h7b6447c_5
  scikit-learn       pkgs/main/linux-64::scikit-learn-0.21.3-py37hd81dba3_0
  scipy              pkgs/main/linux-64::scipy-1.3.1-py37h7c811a0_0
  setuptools         pkgs/main/linux-64::setuptools-41.6.0-py37_0
  six                pkgs/main/linux-64::six-1.12.0-py37_0
  sqlite             pkgs/main/linux-64::sqlite-3.30.1-h7b6447c_0
  tk                 pkgs/main/linux-64::tk-8.6.8-hbc83047_0
  wheel              pkgs/main/linux-64::wheel-0.33.6-py37_0
  xz                 pkgs/main/linux-64::xz-5.2.4-h14c3975_4
  zlib               pkgs/main/linux-64::zlib-1.2.11-h7b6447c_3

Proceed ([y]/n)? 

Step 2: Activate the conda environment and test scikit-learn

Scikit-learn is now installed into a freshly created conda environment along with all of its dependencies! Once the installation is complete, we will see output indicating how to activate and deactivate the newly created conda environment we just created. It’s very similar to activating and deactivating a virtual environment created with the venv module:

# To activate this environment, use
#     $ conda activate conda-scikit
# To deactivate an active environment, use
#     $ conda deactivate

Let’s go ahead and activate the conda environment to make sure scikit-learn is available:

root@ubuntu:~# conda activate conda-scikit
(conda-scikit) root@ubuntu:~#

Now that the conda environment is active, let’s hop into a Python shell and hit that default dataset:

Python 3.7.5 (default, Oct 25 2019, 15:51:11)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

Next, paste in the following snippet and hit enter:

from sklearn import datasets
iris = datasets.load_iris()
digits = datasets.load_digits()

You should see this output in the Python shell:

[[ 0.  0.  5. ...  0.  0.  0.]
 [ 0.  0.  0. ... 10.  0.  0.]
 [ 0.  0.  0. ... 16.  9.  0.]
 [ 0.  0.  1. ...  6.  0.  0.]
 [ 0.  0.  2. ... 12.  0.  0.]
 [ 0.  0. 10. ... 12.  1.  0.]]

Start Your AI Journey Today!!!

This technology is extremely hot right now and with good reason! Being able to take advantage of the information you may already possess to identify new areas of opportunity and growth, strikes a deep chord in every business owner.

Firms are utilizing machine learning to:

  • Enhance operational workflows
  • Increase the speed and course of information to your clients
  • Improve the customer experience by heightening personalization
  • Streamline the hiring processes
  • Boost employee’s engagement and retention
  • Take better advantage of predictive analytics
  • Enhanced metrics make for informed decisions

Give us a call at 800.580.4985, or open a chat or ticket with us to speak with one of our knowledgeable Solutions or Experienced Hosting advisors to learn how to begin your journey today!

How to Install PyTorch on Ubuntu

Reading Time: 4 minutes

Data analysis via machine learning is becoming increasingly important in the modern world. PyTorch is a machine learning Python library, developed by the Facebook AI research group, that acts as a high-level interface for developers to create applications like natural language processors. In this tutorial, we are going to cover how to install PyTorch via Anaconda and PIP.

Continue reading “How to Install PyTorch on Ubuntu”

How To Install TensorFlow on Ubuntu 18.04

Reading Time: 2 minutes

In this tutorial, we are going to set up TensorFlow in a virtual Python environment on Ubuntu 18.04. TensorFlow is an open-source framework, developed by the Google Brain team, designed to be a high-level interface for implementing machine learning and mathematical operations. This library provides developers an avenue to work on complex projects like neural networks through an easy to use Python API. One of the significant benefits of having a Python front-end is that it is portable between operating systems like Linux and Windows.

Continue reading “How To Install TensorFlow on Ubuntu 18.04”

Install TensorFlow on Windows

Reading Time: 2 minutes

Whether you’re a beginner or a professional, TensorFlow is an end-to-end platform that makes building and deploying Machine Learning models a snap! Because TensorFlow is based on the Python system, you can install it on multiple operating systems, including Windows. This article will take you through the necessary steps to get TensorFlow installed on your Windows server.

Continue reading “Install TensorFlow on Windows”

What is Machine Learning?

Reading Time: 3 minutesIt was 2017 when American businessman Mark Cuban said that if you don’t understand artificial intelligence, deep learning and machine learning “you’ll be a dinosaur within three years.” Time will tell as to whether he is right, but if his theory has substance, some companies are well into the 12-month countdown of becoming extinct.

What is Machine Learning?

In its purest form, machine learning teaches computers to learn in the same way that humans do. It collects and interprets data from the world around us and makes decisions on what to do with that information. Machine learning is one of the first applications of artificial intelligence.

Just think about every time you start a search using Google. How can it find all the relevant matches to your terms? Considering there are 30 trillion unique web pages that search engines trawl to retrieve what you need, it is even more impressive. It’s impossible for a human to explore that many pages in a lifetime. This is the essence of machine learning, without intervention computers learn to use data to accomplish human tasks in a fraction of the time.


Machine Learning and Data

It is almost impossible to stress just how vital data is to machine learning; in fact, they are just about synonymous with each other. This is probably best summarised within the Data Science Hierarchy of Needs penned by Rogati, 2017.

At the top of the hierarchy is the AI or Deep Learning algorithm. This might be the algorithm that recommenders which Netflix show to watch or Amazon Alexa responding to your voice command. However, at the very start of the journey is data collection and the quality of what feeds the algorithm.

As an example, marketing teams use machine learning applications to hyper-personalize communications. This is why we tend to get emails or notifications that are highly relevant and tailored to our needs. The machine has studied our data and knows exactly what we need and when we want it. Had the initial data been incorrect or “dirty” in any way, customers would receive communications that are not relevant. What if somebody had accidentally entered a customer location as the U.K. on an order form instead of the U.S. and all pricing is calculated pounds instead of dollars? The customer would soon unsubscribe to an email list because it doesn’t pertain to them.

A company can have the best algorithms in the industry, but without quality data, they are effectively useless and possibly detrimental. To counter these problems, companies deploying machine learning technology will usually start by designing a data quality or governance strategy which negates the risk. Adopting AI is a journey and must begin with getting the simple things right.


Machine Learning Framework

Hiring a team to design and deploy machine learning applications can be costly. While Data Scientists are usually specialists in statistical methods and incredibly adept with coding languages like Python and R; they often find it hard to present findings to Data Analysts or Insight Managers. However, the algorithms also need to be deployed onto platforms requiring a Data Engineer or Developer. There also needs to be duplicate roles to avoid single points of failure, and of course, everybody needs powerful processors that can analyze vast amounts of data. Suddenly, one Data Scientist has become a team of 8 people with expensive hardware and costs have escalated!

The role of machine learning has been growing exponentially in the last few years, and it looks set to continue with recent developments in cloud, edge and quantum computing which will only increase the potential processing power. Companies who fail to realize the capability of AI will fall behind the competition.

Our Cloud Sites service is a fine example of how machine learning works in a hosting environment. This PaaS allows your websites to scale as your site grows, without having to worry about scheduling downtime to resize and upgrade your server! Our one-click install of popular CMS’s makes working on your sites that much easier.