Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
333 views
in Technique[技术] by (71.8m points)

python - How can update my cumulative column in a pandas dataframe

I have a pandas dataframe named 'df' with 4 columns: date, game_name, total_registered,newly_registered

The total registered column is a cumulative column.

Sample:

+------------+-----------+------------------+------------------+
|    date    | game_name | total_registered | newly_registered |
+------------+-----------+------------------+------------------+
| 2020-12-1  | abc       |               10 |                4 |
| 2020-12-2  | abc       |               14 |                3 |
| 2020-12-3  | abc       |                0 |                5 |
| 2020-12-1  | zzz       |               20 |                2 |
| 2020-12-2  | zzz       |               22 |               10 |
| 2020-12-3  | zzz       |                0 |                5 |
+------------+-----------+------------------+------------------+

As you can see, my cumulative column broke on the 2nd for 'abc' and on the 3rd for 'zzz'

To be safe I would like to redo my cumulative column starting on the 2nd. How can modify the current row total_registered by taking the prior day's total_registered+newly_registered values of its respective game?

Desired Output:

+------------+-----------+------------------+------------------+
|    date    | game_name | total_registered | newly_registered |
+------------+-----------+------------------+------------------+
| 2020-12-1  | abc       |               10 |                4 |
| 2020-12-2  | abc       |               14 |                3 |
| 2020-12-3  | abc       |               17 |                5 |
| 2020-12-1  | zzz       |               20 |                2 |
| 2020-12-2  | zzz       |               22 |               10 |
| 2020-12-3  | zzz       |               32 |                5 |
+------------+-----------+------------------+------------------+

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Concept

  1. pick out first total_registered for each group name it newly_registered and put against epoch date
  2. concat() with actual data
  3. recalculate total_registered using cumsum() and shift() including data from step 1
import io
import datetime as dt
df = pd.read_csv(io.StringIO("""    date    | game_name | total_registered | newly_registered |
| 2020-12-1  | abc       |               10 |                4 |
| 2020-12-2  | abc       |               14 |                3 |
| 2020-12-3  | abc       |                0 |                5 |
| 2020-12-1  | zzz       |               20 |                2 |
| 2020-12-2  | zzz       |               22 |               10 |
| 2020-12-3  | zzz       |                0 |                5 |"""),sep="s*|s*", skipinitialspace=True, engine="python").reset_index(drop=True).drop(columns="Unnamed: 4")

df["date"] = pd.to_datetime(df["date"])

df = (pd.concat([df,
# synthesize first total_registered into a new row as newly_registered
(df.groupby("game_name").first()
 .reset_index()
 .assign(date=pd.to_datetime(dt.date(1970,1,1)),
        newly_registered=lambda dfa: dfa["total_registered"])
)
])
 .sort_values(["game_name","date"])
 # total_registered is now simple cumsum with a shift in group
 .assign(total_registered=lambda dfa: dfa.groupby("game_name")["newly_registered"]
         .transform(lambda x: x.cumsum().shift())
         .fillna(0)
         .astype(int))
 # remove synthesized rows
 .query("date.dt.year>1970")
)

output

      date game_name  total_registered  newly_registered
2020-12-01       abc                10                 4
2020-12-02       abc                14                 3
2020-12-03       abc                17                 5
2020-12-01       zzz                20                 2
2020-12-02       zzz                22                10
2020-12-03       zzz                32                 5

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...