Talk: Analyzing census data with pandas @ PyCon US 2019

May 01, 2019

About

Census 2020 is coming!

Did you know the government budgeted 12.5 billion dollars to count EVERY SINGLE PERSON IN THE COUNTRY in 2020? Imagine how much data you could acquire with 12.5 billion dollars.

Not so excited about Census data? How about cool `pandas` tricks?

pandas_gif

In this tutorial you will go from a simple data exploration and analysis workflow to learning more advanced techniques social scientists apply when dealing with Census data.

If you’ve been interested in honing your pandas skills or you’d just love to learn how to calculate the demographically-adjusted employment rate gap for your county using python, well you’ve come to the right place.

This tutorial is perfect for novice data analysts, pythonistas, social scientists, and journalists that want to learn about the powerful pandas library and how to use it to analyze public use micro-data, and for those who’ve been using it but could learn a trick or two to make their workflow even more effective.

Does the acronyms ACS, CPS, PUMA, or IPUMS mean anything to you? If not, the more reason to join! Come learn something new!

Details

Date: 2019-05-01
Place: Cleveland, Ohio.
Event Website: us.pycon.org/2019
Slides: RISE presentation
Video: YouTube
GitHub: github.com/chekos/analyzing-census-data

Description

This tutorial will guide you through a typical data analysis project utilizing Census data acquired from IPUMS. It’s split into 2 notebooks:

Data Preparation
Data Analysis

In the first notebook you will:

Work with compressed data with pandas.
Retrieve high-level descriptive analytics of your data.
Drop columns.
Slice data (boolean indexing).
Work with categorical data.
Work with weighted data.
Use python’s pathlib library, making your code more reproducible across platforms.
Develop a reproducible data prep workflow for future projects.

On top of that, in the second notebook you will:

Aggregate data.
Learn about .groupby()
Learn about cross-sections .xs()
Learn about pivot_tables and crosstabs
Develop a reproducible data analysis workflow for future projects.

Contact

Project owners:

Sergio Sánchez Zavala (https://github.com/chekos)

Licence

GNU General Public License v3.0

Visit website

# pandas # jupyter # python # talk # tutorial