Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
206 views
in Technique[技术] by (71.8m points)

python - How to pair two dataframe

The first datafrane is the dividendyield ratio for each stock of all trade days from 20200601 to 20201031

ds       code                    
20200601 000001.SZ           1.64
         000002.SZ           3.96
         000004.SZ           0.00
         000005.SZ           0.00
         000006.SZ           3.68
         000007.SZ           0.00
         000008.SZ           0.33
         000009.SZ           0.28
...                           ...
20201031 688567.SH           0.00
         688568.SH           0.00
         688569.SH           0.00
         688571.SH           0.00
         688777.SH           0.00
         688788.SH           0.00
         688981.SH           0.00

[580720 rows x 1 columns]

The second dataframe stands for the stock pool on each day that I actually need

               code
ds                 
20200601  000001.SZ
20200601  000002.SZ
20200601  000004.SZ
20200601  000008.SZ
20200601  000009.SZ
20200601  000012.SZ
...             ...
20201030  603989.SH
20201030  605199.SH
20201030  605333.SH
20201030  605388.SH
20201030  605399.SH

[204000 rows x 1 columns]

What I suppose to do is that, on each tradeday, selecting all the stocks in the first dataframe that also lie in the second dataframe. To be more specific, the usable stock pool for every tradeday only consists of 2000 different individual stock, but the original data (i.e. dataframe 1) includes much more individual stocks. The goal is actually to find all the usable data. Take 20200601 as an example. My ideal result is:

20200601 000001.SZ           1.64
         000002.SZ           3.96
         000004.SZ           0.00
         000008.SZ           0.33
         000009.SZ           0.28

Any help would be appreciated.

question from:https://stackoverflow.com/questions/65897581/how-to-pair-two-dataframe

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
import pandas as pd

df1 = pd.DataFrame({'ds': [20200601, 20200601, 20200601, 20200601, 20200601, 
                           20200601, 20200601, 20200601],
                    'code': ['000001.SZ', '000002.SZ', '000004.SZ', '000005.SZ', 
                             '000006.SZ', '000007.SZ', '000008.SZ', '000009.SZ'],
                    'ratio': [1.64, 3.96, 0.00, 0.00, 3.68, 0.00, 0.33, 0.28]})
df1 = pd.DataFrame(df1.groupby(['ds','code'])['ratio'].apply(sum))

df2 = pd.DataFrame({'ds': [20200601, 20200601, 20200601, 20200601, 20200601],
                    'code': ['000001.SZ', '000002.SZ', '000004.SZ', '000008.SZ', 
                             '000009.SZ']}).set_index('ds')

result = df1.reset_index().merge(df2.reset_index(), how='inner', left_on=['ds','code'], right_on=['ds','code'])
result = pd.DataFrame(result.groupby(['ds','code'])['ratio'].apply(sum))
print(result)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...