A. Overview, Tools Setup
Introduction to working with data.
This will be a start to finish guide on how to analyze data with Python libraries using Jupyter Notebooks. Over the next few notebooks you will:
- Source and setup data
- Manipulate data with pandas
- Visualize the data with matplotlib
- (Extra) Understand how to use data in a machine learning algorithm
We will be using a few tools to achieve the goals outlined above. Many of these may sound familiar to you:
- Jupyter Notebooks
- Pandas
- Matplotlib
(Extra) And for the machine learning guide:
- Numpy
- Sklearn
- Seaborn
Pandas is a library created and independent from the Python developers. It is a library. To use Pandas you must install it seperately
(base) id:~$ python --version # Check that version python is 3.6 or higher to use pandas, run this in command prompt/terminal
If python version is not 3.6 or higher, click here for the commands to update/install.
Next create a virtual environment for your project using pipenv. Make sure you have it installed:
(base) id:~$ sudo apt install python3-pip
Create/redirect to new project directory. This will most likely be your _notebooks directory.
(base) id:~$ cd _notebooks # change directory into _notebooks
(base) id:~/_notebooks$ mkdir my_data_project # make new directory for your project
(base) id:~/_notebooks$ cd my_data_project # change directory into newly made project
(base) id:~/_notebooks/my_data_project$ # resulting change in terminal prompt
Install Pandas
(base) id:~/_notebooks/my_data_project$ pip install pandas # install pandas
Install matplotlib
(base) id:~/_notebooks/my_data_project$ python -m pip install -U pip
(base) id:~/_notebooks/my_data_project$ python -m pip install -U matplotlib
import pandas as pd
import matplotlib.pyplot as plt