python - Stratified samples from Pandas

Question

Welcome To Ask or Share your Answers For Others

python - Stratified samples from Pandas

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Stratified samples from Pandas

I have a pandas DataFrame which looks approximately as follows:

cli_id | X1 | X2 | X3 | ... | Xn |  Y  |
----------------------------------------
123    | 1  | A  | XX | ... | 4  | 0.1 |
456    | 2  | B  | XY | ... | 5  | 0.2 |
789    | 1  | B  | XY | ... | 5  | 0.3 |
101    | 2  | A  | XX | ... | 4  | 0.1 |
...

I have client id, few categorical attributes and Y which is probability of an event which has values from 0 to 1 by 0.1.

I need to take a stratified sample in every group (so 10 folds) of Y of size of 200

I often use this to take a stratified sample when splitting into train/test:

def stratifiedSplit(X,y,size):
    sss = StratifiedShuffleSplit(y, n_iter=1, test_size=size, random_state=0)

    for train_index, test_index in sss:
        X_train, X_test = X.iloc[train_index], X.iloc[test_index]
        y_train, y_test = y.iloc[train_index], y.iloc[test_index]

    return X_train, X_test, y_train, y_test

But I don't know how to modify it in this case.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:18:10+0000

If the number of samples is the same for every group, or if the proportion is constant for every group, you could try something like

df.groupby('Y').apply(lambda x: x.sample(n=200))

or

df.groupby('Y').apply(lambda x: x.sample(frac=.1))

To perform stratified sampling with respect to more than one variable, just group with respect to more variables. It may be necessary to construct new binned variables to this end.

However, if the group size is too small w.r.t. the proportion like groupsize 1 and propotion .25, then no item will be returned. This is due to pythons rounding implementation of the int function int(0.25)=0

Categories

python - Stratified samples from Pandas

python - Stratified samples from Pandas

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags