For Monday’s PSL Demo Day, I showed how to use the scf
and microdf
PSL Python packages from the Google Colab web-based Jupyter notebook interface.
The scf
package extracts data from the Federal Reserve’s Survey of Consumer Finances, the canonical source of US wealth microdata. scf
has a single function: load(years, columns)
, which then returns a pandas
DataFrame
with the specified column(s), each record’s survey weight, and the year (when multiple years are requested).
The microdf
package analyzes survey microdata, such as that returned by the scf.load
function. It offers a consistent paradigm for calculating statistics like means, medians, sums, and inequality statistics like the Gini index. Most functions are structured as follows: f(df, col, w, groupby)
where df
is a pandas
DataFrame
of survey microdata, col
is a column(s) name to be summarized, w
is the weight column, and groupby
is the column(s) to group records in before summarizing.
Using Google Colab, I showed how to use these packages to quickly calculate mean, median, and total wealth from the SCF data, without having to install any software or leave the browser. I also demonstrated how to use the groupby
argument of microdf
functions to show how different measures of wealth inequality have changed over time. Finally, I previewed some of what’s to come with scf
and microdf
: imputations, extrapolations, inflation, visualization, and documentation, to name a few priorities.
Resources: