Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
368 views
in Technique[技术] by (71.8m points)

python - How do I create a sum row and sum column in pandas?

I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.

I've got a table that looks like this from Khan Academy:

             | Undergraduate | Graduate | Total
-------------+---------------+----------+------
Straight A's |           240 |       60 |   300
-------------+---------------+----------+------
Not          |         3,760 |      440 | 4,200
-------------+---------------+----------+------
Total        |         4,000 |      500 | 4,500

I would like to recreate this table using pandas. Of course I could create a DataFrame using something like

"Graduate": {...},
"Undergraduate": {...},
"Total": {...},

But that seems like a naive approach that would both fall over quickly and just not really be extensible.

I've got the non-totals part of the table like this:

df = pd.DataFrame(
    {
        "Undergraduate": {"Straight A's": 240, "Not": 3_760},
        "Graduate": {"Straight A's": 60, "Not": 440},
    }
)
df

I've been looking and found a couple of promising things, like:

df['Total'] = df.sum(axis=1)

But I didn't find anything terribly elegant.

I did find the crosstab function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.

I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:

totals(df, rows=True, columns=True)

or something.

Does this exist in pandas, or do I have to just cobble together my own approach?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):

import pandas as pd

df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})

#Total sum per column: 
df.loc['Total',:]= df.sum(axis=0)

#Total sum per row: 
df.loc[:,'Total'] = df.sum(axis=1)

Output:

              Graduate  Undergraduate  Total
Not                440           3760   4200
Straight A's        60            240    300
Total              500           4000   4500

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...