Talk: Analyzing census data with pandas @ PyCon US 2019
May 01, 2019About
Census 2020 is coming!
Did you know the government budgeted 12.5 billion dollars to count EVERY SINGLE PERSON IN THE COUNTRY in 2020? Imagine how much data you could acquire with 12.5 billion dollars.
Not so excited about Census data? How about cool pandas
tricks?
In this tutorial you will go from a simple data exploration and analysis workflow to learning more advanced techniques social scientists apply when dealing with Census data.
If you’ve been interested in honing your pandas skills or you’d just love to learn how to calculate the demographically-adjusted employment rate gap for your county using python, well you’ve come to the right place.
This tutorial is perfect for novice data analysts, pythonistas, social scientists, and journalists that want to learn about the powerful pandas library and how to use it to analyze public use micro-data, and for those who’ve been using it but could learn a trick or two to make their workflow even more effective.
Does the acronyms ACS, CPS, PUMA, or IPUMS mean anything to you? If not, the more reason to join! Come learn something new!
Details
- Date: 2019-05-01
- Place: Cleveland, Ohio.
- Event Website: us.pycon.org/2019
- Slides: RISE presentation
- Video: YouTube
- GitHub: github.com/chekos/analyzing-census-data
Description
This tutorial will guide you through a typical data analysis project utilizing Census data acquired from IPUMS. It’s split into 2 notebooks:
- Data Preparation
- Data Analysis
In the first notebook you will:
- Work with compressed data with pandas.
- Retrieve high-level descriptive analytics of your data.
- Drop columns.
- Slice data (boolean indexing).
- Work with categorical data.
- Work with weighted data.
- Use python’s pathlib library, making your code more reproducible across platforms.
- Develop a reproducible data prep workflow for future projects.
On top of that, in the second notebook you will:
- Aggregate data.
- Learn about .groupby()
- Learn about cross-sections .xs()
- Learn about pivot_tables and crosstabs
- Develop a reproducible data analysis workflow for future projects.
Contact
Project owners:
- Sergio Sánchez Zavala (https://github.com/chekos)
Licence
GNU General Public License v3.0
# pandas # jupyter # python # talk # tutorial