I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.
I've got a table that looks like this from Khan Academy:
| Undergraduate | Graduate | Total
-------------+---------------+----------+------
Straight A's | 240 | 60 | 300
-------------+---------------+----------+------
Not | 3,760 | 440 | 4,200
-------------+---------------+----------+------
Total | 4,000 | 500 | 4,500
I would like to recreate this table using pandas. Of course I could create a DataFrame using something like
"Graduate": {...},
"Undergraduate": {...},
"Total": {...},
But that seems like a naive approach that would both fall over quickly and just not really be extensible.
I've got the non-totals part of the table like this:
df = pd.DataFrame(
{
"Undergraduate": {"Straight A's": 240, "Not": 3_760},
"Graduate": {"Straight A's": 60, "Not": 440},
}
)
df
I've been looking and found a couple of promising things, like:
df['Total'] = df.sum(axis=1)
But I didn't find anything terribly elegant.
I did find the crosstab
function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.
I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:
totals(df, rows=True, columns=True)
or something.
Does this exist in pandas, or do I have to just cobble together my own approach?
See Question&Answers more detail:
os