Don’t use Anaconda: How to setup a decent machine learning environment?🚠

2/8/2020·spencer woo·

Anaconda is bloated. It comes with an installation size of over 2 gigabytes and also installs a bunch of software that we normally won’t use, such as the Python IDE: Spyder. (I mean, it’s 2020 already, who doesn’t use VS Code?)

After tinkering with my machine learning environment for over a month, I’ve come up with the following instructions and techniques to help you setup a decent and modern Python environment to study machine learning without cluttering up your current local dev environment in order to get a move on your research project, graduation thesis, etc.

🍭 Editor’s note: This is part of my personal machine learning Wiki, where I demonstrate my entire learning process on adversarial examples, which is the research direction for my graduation thesis. I personally think that this particular section of my Wiki useful, so I organized it into a separate article which you are reading now. Find out more at: Adversarial Attacks Targeted on Neural Networks — Spencer’s Wiki.

Before we begin, do keep in mind that it’ll be best if you were to begin your journey into machine learning on a *NIX environment, like Linux. Let’s move on.

Installing Anaconda (Miniconda)

Wait, what? Didn’t we just say we won’t use Anaconda? Well, yes, we won’t be using Anaconda exactly. Instead, we’ll be installing Miniconda — the unbloated version of Anaconda. The relationship between Anaconda, Miniconda and Conda is best explained here: The Definitive Guide to Conda Environments — Towards Data Science. In short, Conda is a tool for managing Python dependencies and creating virtual environments, both Anaconda and Miniconda includes Conda, but Anaconda is much larger than Miniconda and includes unnecessary components.

🍚 Note: You won’t need to install Python beforehand, as Miniconda will manage and install the dedicated version of Python that you will need. Installing another Python other than the system preinstalled one may lead to problematic issues.

Downloading the installer

💻 Windows: On Windows, we have the useful CLI installer (or package manager if you will): scoop. It’s recommended that you use scoop for your installation of CLI software. See here for my introduction into scoop — the Windows package manager:「一行代码」搞定软件安装卸载,用 Scoop 管理你的 Windows 软件.

First, install scoop and add the extras bucket:

# Install scoop
iwr -useb get.scoop.sh | iex# Add the extras bucket
scoop bucket add extras

Then install Miniconda with the following command:

scoop install miniconda3

And we’re done! It’s just that easy.

Also, of course you can download the Miniconda installer for Windows directly on its official website, but it’s basically the same as using scoop, and you won’t have to deal with environment variables and other inconveniences.

The Miniconda Windows installer
The Miniconda Windows installer

📟 Linux: Miniconda doesn’t come with a package-manager-managed version (i.e, APT: Ubuntu’s Advanced Package Tool. See here: Dev on Windows with WSL — CLI — APT). We’ll be using the official installer script to install it.

First, go to Miniconda’s homepage: Docs » Miniconda, and fetch the link for the latest version of Miniconda released with Python 3 on Linux:

Choosing the installer script for Miniconda: Linux 64-bit with Python 3
Choosing the installer script for Miniconda: Linux 64-bit with Python 3

With the installer link copied to your clipboard, we can simply run the following command to download the installer:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

The command will download the installer script via wget. Then we can run the script with bash:

Running the script via bash
Running the script via bash

You will be prompted to view the Miniconda License and start the installation process by entering “yes” in the terminal.

🍚 Note: This installation process may require a system installed version of Python, which we won’t use. But if the installer complains about not being able to find a working version of Python, we can install one by running sudo apt install python3.

Dealing with post-installation issues

💻 Windows: Considering that we’ll be using PowerShell, we’ll need to first initialize the conda in PowerShell’s user configuration:

conda init powershell

This command actually creates a PowerShell configuration file inside your PowerShell user configuration folder, which usually lies inside ~\Documents\WindowsPowerShell\profile.ps1, and, in my case, puts the following code in the configuration file:

#region conda initialize
# !! Contents within this block are managed by 'conda init' !!(& "C:\Users\Spencer\scoop\apps\miniconda3\current\Scripts\conda.exe" "shell.powershell" "hook") | Out-String | Invoke-Expression#endregion

Close and reopen PowerShell to see Miniconda take effect:

Miniconda in PowerShell
Miniconda in PowerShell

📟 Linux: If all is setup successfully according to default configurations, there’s a big chance that you’ll end up without a conda executable in your path, because the installer thinks we use bash by default and go change .bashrc while most of us use zsh or fish instead.

You can find the Miniconda’s bin and the tool conda itself here: ~/miniconda3/binWe'll need to init the Miniconda instance manually by editing our shell's configuration file. (That will be ~/.zshrc for zsh and ~/.config/fish/config.fish for fish.)

Run the following command by invoking the conda executable by its full path:

~/miniconda3/bin/conda init {THE_SHELL_YOU_USE}
Running “conda init fish” via its full path
Running “conda init fish” via its full path

In my case, conda init fish actually added the following content into my shell's config:

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!eval /home/spencer/miniconda3/bin/conda "shell.fish" "hook" $argv | source# <<< conda initialize <<<

Close then reopen the terminal to see Miniconda take effect:

The green circle with the text “base” indicates that we have activated the Conda base environment
The green circle with the text “base” indicates that we have activated the Conda base environment

📢 Both OS: Conda will initialize itself by default, and activate the “base” Conda environment, but I personally don’t want to actually “activate” the Conda environment whenever I open up a terminal. We can disable this feature and activate Conda manually every time we want to enter a conda environment by invoking the following command:

conda config --set auto_activate_base false

Using Conda to manage our project

After installing Conda, we will use it to:

  • Create a new virtual environment to host our simple machine learning project
  • Install our friendly neighborhood machine learning framework: TensorFlow and Keras, inside of our virtual environment
  • Install our extremely useful scientific notebook for writing and developing machine learning code: Juypter Notebook

With the help of a few commands. Let’s get started.

Creating a new virtual environment

Before everything, let’s create a folder to contain all our code files.

# Making a directory called adversarial-attacks
mkdir adversarial-attacks# Navigating into the directory
cd adversarial-attacks

Next up, we’ll create a virtual environment to help manage our code and project. If you are going to deploy your environment on different machines on different platforms, it’s considered best practice to create an environment.yml to define our environment's name, dependencies, channels and more. In this way, we won't have to deal with incompatible dependencies on different OS.

Create a file named environment.yml at the root of our project folder, and inside, we'll need to define:

  • Our environment’s name: name
  • Which channel will Conda install our dependencies from: channels
  • What dependencies will Conda actually install: dependencies

At the end of the day, our environment.yml will be something like this:

name: adversarial-attacks
- defaults
- python
- tensorflow
- numpy
- matplotlib
- pylint
- autopep8
- notebook

We can see that I have defined our environment’s name to be adversarial-attacks, and added some essential dependencies that are essential to our project. After that, we can create our environment and install all our dependencies based on this file using the following command:

conda env create --file environment.yml

Then, after successfully creating our virtual environment, we can activate it with:

conda activate adversarial-attacks # or your environment name

And if you wanted to add dependencies to your environment, just add it directly to the environment.yml file, and update your environment with:

conda env update --file environment.yml

Deactivate it with:

conda deactivate

Running Jupyter Notebook

Run Jupyter Notebook from the command line:

# Launching the default browser at the same time, or ...
jupyter notebook# Launching the notebook server only (When running inside WSL)
jupyter notebook --no-browser
Launching Jupyter Notebook and accessing it from Chrome
Launching Jupyter Notebook and accessing it from Chrome

🍚 Note: When running inside WSL, the command jupyter notebook actually tries to invoke the default browser inside Windows but fails tragically. We recommend adding the --no-browser command and copy the URL manually.

Using VS Code as our workbench

VS Code is an amazing code editor that we can use as our main Python development environment. Personally, I use VS Code for almost every project I have, whether it’s Rust, Go, Node.js or something else. What’s more, if you are trying to use WSL, you can hook your Windows side VS Code onto your Ubuntu WSL environment using a plugin called Remote — WSL. See here for more details: 🇺🇸Developing in WSL | 🇨🇳Visual Studio Code — Dev on Windows with WSL.

Then, install the Anaconda Extension Pack, which includes a copy of the necessary Python extension, and language support for YAML.

VS Code Anaconda Extension Pack
VS Code Anaconda Extension Pack

Now, you’ll be able to code, lint, debug and run Python files. Also, you can now run Jupyter Notebook directly inside VS Code.

Running Jupyter Notebook directly inside VS Code like a PRO!
Running Jupyter Notebook directly inside VS Code like a PRO!

That’s all. This tutorial basically covers all you’ll need when setting up an Anaconda development environment, and using environments.yml, we will be able to migrate our environment across different OS and platforms with ease. Thank you for reading.

Don’t use Anaconda: How to setup a decent machine learning environment? - Spencer Woo
Spencer Woo

Attribution, non-commercial, and sharealike.

cd /blog