Intro to Pandas

September 6th, 2016

What is Pandas?

Pandas is "an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language." - Python Data Analysis Library

Installation

# [Mac]
pip install pandas

# [Windows]
setx path "%path%;C:\Python27;"
pip install -U pandas

Once you install it, go to this website to download a pokemon.csv. This data set includes 721 Pokemon, including their number, name, first and second type, and basic stats: HP, Attack, Defense, Special Attack, Special Defense, and Speed.

Then follow the instructions written below to use pandas module to read the csv.

import pandas as pd
df = pd.read_csv('your_path_to_pokemon.csv')
print df

How many rows are there?

print len(df) # Answer is 800.

Let's change column names.

df.columns = ["ID", "Name", "Type_1", "Type_2", "Total", "HP", "Atk", "Def", "Sp_Atk", "Sp_Def", "Speed", "Generation", "Legendary"]

Let’s check if it worked.

# Return first 5 records.
print df.head(5)

# Return last 5 records.
print df.tails(5)

Let’s filter by column.

# Method 1
print df['Name']
# Method 2 -- Usable only if column labels do not contain spaces, dashes, etc.
print df.Name
# Select multiple columns.
print df[['Name', 'Generation', 'Legendary']]

You can also set conditionals to filter.

# Filter by a series of booleans
print df[df.Total > 400]

# Filter by multiple conditionals
print df[(df.Attack > 130) & (df.Legendary == False)]

# Filter by string methods
print df[df.Name.str.startswith("Char")]

df = df.set_index(["Type_1"])
print df.head(10) # Now shows Name column before ID column.
print df.loc["Steel"] # Label-based referencing uses loc.

# Return your index to it's original column form.
print df.reset_index(["Type_1"])

# Rearrange index in descending order
df.sort_index(ascending=False).head(5)

Done! 🙂

Reference

An Introduction to Scientific Python – Pandas