Filter a pandas data frame using a mask

After using pandas for quite some time now, I started to question if I was really using it effectively. After two MOOCs in R about 2 or 3 years ago, I realised that because my GIS work wasn’t in analysis, I would not be able to use it properly.

Similarly, because pandas is essentially the R of Python, I thought I wouldn’t be able to use all the features it had to offer. As it stands, I’m still hovering around in the data munging side of pandas.

import pandas as pd
in_xls = r"C:\Some\Arb\Folder\test.xlsx"
columns = [0, 2, 3, 4, 6, 8]
# Use a function to define the mask
# to create a subset of the data frame
def mask(df, key, value):
return df[df[key] == value]
pd.DataFrame.mask = mask
# Out of the 101 rows, only 50 are stored in the data frame
df = pd.read_excel(in_xls,0, parse_cols=columns).mask('Create', 'Y')

I used a pandas mask to filter a spreadsheet (or csv) based on some value. I originally used this to filter out which feature classes need to be created from a list of dozens of templates, but I’ve also used it to filter transactions in the money tracking app I made for my household.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.