Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
4.4k views
in Technique[技术] by (71.8m points)

python - Merge values of a dataframe where other columns match

I have a dataframe storing a date, car_brand, color and a city:

 date              car_brand    color     city
 "2020-01-01"      porsche      red       paris
 "2020-01-02"      prosche      red       paris
 "2020-01-03"      porsche      red       london
 "2020-01-04"      porsche      red       paris
 "2020-01-05"      porsche      red       london
 "2020-01-01"      audi         blue      munich
 "2020-01-02"      audi         red       munich
 "2020-01-03"      audi         red       london
 "2020-01-04"      audi         red       london
 "2020-01-05"      audi         red       london

I now want to create from that a dataframe in the following way: Merge rows together where for consecutive days the car_brand, color and city match. So in the example I want to end up with a dataframe

 date                             car_brand    color     city
 ["2020-01-01","2020-01-02"]      porsche      red       paris
 ["2020-01-03"]                   porsche      red       london
 ["2020-01-04"]                   porsche      red       paris
 ["2020-01-05"]                   porsche      red       london
 ["2020-01-01"]                   audi         blue      munich
 ["2020-01-02"]                   audi         red       munich
 ["2020-01-03","2020-01-05"]      audi         red       london

How can I achieve that? I tried with pd.concat and pd.merge but nothing worked so far. Thanks!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If consecutive is important can check in list comprehension. This is an extension of technique to get a list from a lambda function on a group.

df = pd.read_csv(io.StringIO(""" date              car_brand    color     city
 "2020-01-01"      porsche      red       paris
 "2020-01-02"      porsche      red       paris
 "2020-01-03"      porsche      red       london
 "2020-01-04"      porsche      red       paris
 "2020-01-05"      porsche      red       london
 "2020-01-01"      audi         blue      munich
 "2020-01-02"      audi         red       munich
 "2020-01-03"      audi         red       london
 "2020-01-04"      audi         red       london
 "2020-01-05"      audi         red       london"""), sep="s+")
df["date"] = pd.to_datetime(df["date"])
df = (
    df
    .groupby([c for c in df.columns if c!="date"])["date"]
    # only include if first date or if it's a consequetive date
    .agg(lambda x: [xx for i,xx in enumerate(x) if i==0 or xx==(list(x)[i-1]+pd.DateOffset(1))])
    .reset_index()
)

output

car_brand color   city                                                            date
     audi  blue munich                                           [2020-01-01 00:00:00]
     audi   red london [2020-01-03 00:00:00, 2020-01-04 00:00:00, 2020-01-05 00:00:00]
     audi   red munich                                           [2020-01-02 00:00:00]
  porsche   red london                                           [2020-01-03 00:00:00]
  porsche   red  paris                      [2020-01-01 00:00:00, 2020-01-02 00:00:00]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...