This will be a start to finish guide on how to analyze data with Python libraries using Jupyter Notebooks. Over the next few notebooks you will:

  • Source and setup data
  • Manipulate data with pandas
  • Visualize the data with matplotlib
  • (Extra) Understand how to use data in a machine learning algorithm

We will be using a few tools to achieve the goals outlined above. Many of these may sound familiar to you:

  • Jupyter Notebooks
  • Pandas
  • Matplotlib

(Extra) And for the machine learning guide:

  • Numpy
  • Sklearn
  • Seaborn

Install Pandas and Matplotlib

Pandas is a library created and independent from the Python developers. It is a library. To use Pandas you must install it seperately

(base) id:~$ python --version # Check that version python is 3.6 or higher to use pandas, run this in command prompt/terminal

If python version is not 3.6 or higher, click here for the commands to update/install.

Next create a virtual environment for your project using pipenv. Make sure you have it installed:

(base) id:~$ sudo apt install python3-pip

Create/redirect to new project directory. This will most likely be your _notebooks directory.

(base) id:~$ cd _notebooks # change directory into _notebooks
(base) id:~/_notebooks$ mkdir my_data_project # make new directory for your project
(base) id:~/_notebooks$ cd my_data_project # change directory into newly made project
(base) id:~/_notebooks/my_data_project$ # resulting change in terminal prompt

Install Pandas

(base) id:~/_notebooks/my_data_project$ pip install pandas # install pandas

Install matplotlib

(base) id:~/_notebooks/my_data_project$ python -m pip install -U pip
(base) id:~/_notebooks/my_data_project$ python -m pip install -U matplotlib 

Importing pandas and matplotlib in notebooks

import pandas as pd 
import matplotlib.pyplot as plt