One of the most common question people ask is which IDE / environment / tool to use, while working on your data science projects. As you would expect, there is no dearth of options available – from language specific IDEs like R Studio, PyCharm to editors like Sublime Text or Atom – the choice can be intimidating for a beginner.
If there is one tool which every data scientist should use or must be comfortable with, it is Jupyter Notebooks (previously known as iPython notebooks). Jupyter Notebooks are powerful, versatile, shareable and provide the ability to perform data visualization in the same environment.
Jupyter Notebooks allow data scientists to create and share their documents, from codes to full blown reports. They help data scientists streamline their work and enable more productivity and easy collaboration. Due to these and several other reasons you will see below, Jupyter Notebooks are one of the most popular tools among data scientists.
In this article, we will introduce you to Jupyter notebooks and deep dive into it’s features and advantages.
By the time you reach the end of the article, you will have a good idea as to why you should leverage it for your machine learning projects and why Jupyter Notebooks are considered better than other standard tools in this domain!
Are you ready to learn? Let’s begin!
Jupyter Notebook is an open-source web application that allows us to create and share codes and documents.
It provides an environment, where you can document your code, run it, look at the outcome, visualize data and see the results without leaving the environment. This makes it a handy tool for performing end to end data science workflows – data cleaning, statistical modeling, building and training machine learning models, visualizing data, and many, many other uses.
Jupyter Notebooks really shine when you are still in the prototyping phase. This is because your code is written in indepedent cells, which are executed individually. This allows the user to test a specific block of code in a project without having to execute the code from the start of the script. Many other IDE enviornments (like RStudio) also do this in several ways, but I have personally found Jupyter’s individual cells structure to be the best of the lot.
As you will see in this article, these Notebooks are incredibly flexible, interactive and powerful tools in the hands of a data scientist. They even allow you to run other languages besides Python, like R, SQL, etc. Since they are more interactive than an IDE platform, they are widely used to display codes in a more pedagogical manner.
As you might have guessed by now, you need to have Python installed on your machine first. Either Python 2.7 or Python 3.3 (or greater) will do.
For new users, the general consensus is that you should use the Anaconda distribution to install both Python and the Jupyter notebook.
Anaconda installs both these tools and includes quite a lot of packages commonly used in the data science and machine learning community. You can download the latest version of Anaconda from here.
If, for some reason, you decide not to use Anaconda, then you need to ensure that your machine is running the latest pip version. How do you do that? If you have Python already installed, pip will already be there. To upgrade to the latest pip version, follow the below code:
#Linux and OSX pip install -U pip setuptools #Windows python -m pip install -U pip setuptools
Once pip is ready, you can go ahead and install Jupyter:
#For Python2 pip install jupyter #For Python3 pip3 install jupyter
You can view the official Jupyter installation documentation here.
We’ve now learned all about what these notebooks are and how to go about setting them up on our own machines. Time to get the party started!
To run your Jupyter notebook, simply type the below command and you’re good to go!
jupyter notebook
Once you do this, the Jupyter notebook will open up in your default web browser with the below URL:
http://localhost:8888/tree
In some cases, it might not open up automatically. A URL will be generated in the terminal/command prompt with the token key. You will need to copy paste this entire URL, including the token key, into your browser when you are opening a Notebook.
Once the Notebook is opened, you’ll see three tabs at the top: Files, Running and Clusters. Files basically lists all the files, Running shows you the terminals and notebooks you currently have open, and Clusters is provided by IPython parallel.
To open a new Jupyter notebook, click on the ‘New’ option on the right-hand side of the page. Here, you get four options to choose from:
In a Text File, you are given a blank slate. Add whatever alphabets, words and numbers you wish. It basically works as a text editor (similar to the application on Ubuntu). You also get the option to choose a language (there are a plethora of them given to you) so you can write a script in that. You also have the ability to find and replace words in the file.
In the Folder option, it does what the name suggests. You can create a new folder to put your documents in, rename it and delete it, whatever your requirement.
The Terminal works exactly like the terminal on your Mac or Linux machine (cmd on Windows). It does a job of supporting terminal sessions within your web browser. Type python in this terminal and voila! Your python script is ready to be written.
But in this article, we are going to focus on the notebook so we will select the Python 3 option from the ‘New’ option. You will get the below screen:
You can then start things off by importing the most common Python libraries: pandas and numpy. In the menu just above the code, you have options to play around with the cells: add, edit, cut, move cells up and down, run the code in the cell, stop the code, save your work and restart the kernel.
In the drop-down menu (shown above), you even have four options:
The developers have inserted pre-defined magic functions that make your life easier and your work far more interactive. You can run the below command to see a list of these functions (note: the “%” is not needed usually because Automagic is usually turned on):
%lsmagic
You’ll see a lot of options listed and you might even recognise a few! Functions like %clear, %autosave, %debug and %mkdir are some you must have seen previously. Now, magic commands run in two ways:
As the name suggests, line-wise is when you want to execute a single command line while cell-wise is when you want to execute not just a line, but the entire block of code in the entire cell.
In line-wise, all given commands must started with the % character while in cell-wise, all commands must begin with %%. Let’s look at the below example to get a better understanding:
Line-wise:
%time a = range(10)
Cell-wise:
%%timeit a = range (10)
min(a)
I suggest you run these commands and see the difference for yourself!
And the magic doesn’t stop there. You can even use other languages in your Notebook, like R, Julia, JavaScript, etc. I personally love the ‘ggplot2’ package in R so using this for exploratory data analysis is a huge, huge bonus.
To enable R in Jupyter, you will need the ‘IRKernel’ (dedicated kernel for R) which is available on GitHub. It’s a 8 step process and has been explained in detail, along with screenshots to guide you, here.
If you are a Julia user, you can use that within Jupyter Notebooks too! Check out this comprehensive article which is focused on learning data science for a Julia user and includes a section on how to leverage it within the Jupyter environment.
If you prefer working on JavaScript, I recommend using the ‘IJavascript’ kernel. Check out this GitHub repository which walks you through the steps required for installing this kernel on different OS. Note that you will need to have Node.js and npm installed before being able to use this.
Before you go about adding widgets, you need to import the widgets package:
from ipywidgets import widgets
The basic type of widgets are your typical text input, input-based, and buttons. See the below example, taken from Dominodatalab, on how an interactive widget looks like:
You can check out a comprehensive guide to widgets here.
Shortcuts are one of the best things about Jupyter Notebooks. When you want to run any code block, all you need to do is press Ctrl+Enter. There are a lot more keyboard shortcuts that Jupyter notebooks offer that save us a bunch of time.
Below are a few shortcuts we hand picked that will be of immense use to you, when starting out. I highly recommend trying these out as you read them one by one. You won’t know how you lived without them!
A Jupyter Notebook offers two different keyboard input modes – Command and Edit. Command mode binds the keyboard to notebook level commands and is indicated by a grey cell border with a blue left margin. Edit mode allows you to type text (or code) into the active cell and is indicated by a green cell border.
Jump between command and edit mode using Esc and Enter, respectively. Try it out right now!
Once you are in command mode (that is, you don’t have an active cell), you can try out the below shortcuts:
When in edit mode (press Enter when in command mode to get into Edit mode), you will find the below shortcuts handy:
To see the entire list of keyboard shortcuts, press ‘H’ in command mode or go to Help > Keyboard shortcuts. Keep checking this regularly as new shortcuts are added frequently.
Extensions are a very productive way of enhancing your productivity on Jupyter Notebooks. One of the best tools to install and use extensions I have found is ‘Nbextensions’. It takes two simple steps to install it on your machine (there are other methods as well but I found this the most convenient):
Step 1: Install it from pip:
pip install jupyter_contrib_nbextensions
Step 2: Install the associated JavaScript and CSS files:
jupyter contrib nbextension install --user
Once you’re done with this, you’ll see a ‘Nbextensions’ tab on the top of your Jupyter Notebook home. And voila! There are a collection of awesome extensions you can use for your projects.
To enable an extension, just click on it to activate it. I have mentioned 4 extensions below that I have found most useful:
These are just some of the extensions you have at your disposal. I highly recommend checking out their entire list and experimenting with them.
This is one of the most important and awesome features of a Jupyter Notebook. When I have to do a blog post and my code and comments are in a Jupyter file, I need to first convert them into another format. Remember these notebooks are in json format and that isn’t really helpful when it comes to sharing it. I can’t go about posting the different cells blocks in an email or on the blog, right?
Go to the ‘Files’ menu and you’ll see a ‘Download As’ option there:
You can save your Notebook in any of the 7 options provided. The most commonly used is either a .ipynb file so the other person can replicate your code on their machine or the .html one which opens as a web page (this comes in handy when you want to save the images embedded in the Notebook).
You can also use the nbconvert option to manually convert your notebook into a different format like HTML or PDF.
You can also use jupyterhub, which lets you host notebooks on it’s server and share it with multiple users. A lot of top notch research projects use this for collaboration.
JupyterLab was launched in February this year and is considered the evolution of Jupyter Notebooks. It allows a more flexible and powerful way of working on projects, but with the same components that Jupyter notebooks have. The JupyterLab environment is exactly the same as a Jupyter Notebook, but with a more productive experience.
JupyterLab enables you to arrange your work area with notebooks, terminals, text files and outputs – all in one window! You just have to drag and drop the cells where you want them. You can also edit popular file formats like Markdown, CSV and JSON with a live preview to see the changes happening in real time in the actual file.
You can see the installation instructions here if you want to try it out on your machine. The long term aim of the developers is for JupyterLab to eventually replace Jupyter notebooks. But that point is still a bit further away right now.
While working alone on projects can be fun, most of the time you’ll find yourself working within a team. And in that situation, it’s very important to follow guidelines and best practices to ensure your code and Jupyter Notebooks are annotated properly so as to be consistent with your team members. Here I have listed down a few best practices pointers you should definitely follow while working on a Jupyter Notebook:
Another bonus tip! When you think of creating a presentation, the first tools to come to mind are PowerPoint and Google Slides. Nut your Jupyter Notebooks can create slides too! Remember when I said it’s super flexible? I wasn’t exaggerating.
To convert your Notebook into slides, go to ‘View’ -> ‘Cell Toolbar’ and click on ‘Slideshow’. Boom! Each block of code now displays a ‘Slide Type’ drop-down option on the right. You will get the below 5 options:
Play around with each option to understand it better. It will change the way you present your code!
Do note that this is not an exhaustive list of things you can do with your Jupyter notebook. There is so much more to it and you pick these things up the more you use it. The key, as with so many things, is experimenting with practice.
Check out this GitHub repository which contains a collection of fascinating Jupyter Notebooks.
This guide is just the starting point in your data science journey and I’m glad you are taking it with me! Let me know your take on Jupyter Notebooks and how they have helped you in the comments section below. Also, if you have any questions – let me know!
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
Wow such a wonderful article for a fresher
Glad you found it useful, Kamal. :)
A great article. Many thanks.
Thanks, Rahan!
I am learning Python now...This article will help me a lot think so...Can you please share some articles regarding MI and AI as well
Hi Thrilochan, All articles on Analytics Vidhya are machine learning focused. :) Are you looking for any specific domain or technique? I can help you out with that.
Hi Thrilochan, All the articles on Analytics Vidhya are machine learning focused. Are you looking for any specific domain or technique based articles? I can help you out with that. You can also check out the training platform to get started with your data science journey: https://trainings.analyticsvidhya.com/
Nice article. Data science is indeed an important subject to talk about.
A great article. Many thanks.
Great article. Thanks a lot
This article is very useful for beginners those who want to do python in jupyter. Very useful commands , tips and more.
thanks for this article pranav ..it really helps
Innovative detailed information get.. thanks!!!
I don't get the Notebook Extensions! Any help ?
Hi Amna, Were you able to install using the steps provided?
I was not able to execute Step 7: install.packages(c('rzmq','repr','IRkernel','IRdisplay'), repos = 'http://irkernel.github.io/', type = 'source'). can you please suggest next steps. Error message: Warning: unable to access index for repository http://irkernel.github.io/src/con trib: cannot open URL 'http://irkernel.github.io/src/contrib/PACKAGES' Warning message: packages 'rzmq', 'repr', 'IRkernel', 'IRdisplay' are not available (for R versio n 3.5.1)
Hi Anupama, It seems the irkernel page has been updated hence that step as mentioned on the Discussion portal might not work. You can check out the irkernel page directly to see the installation instructions: https://irkernel.github.io/docs/