python - filter specific values in dataframe with unique prefix in column name (e.g. 'UniqueID_commonsuffix')

Question

Welcome To Ask or Share your Answers For Others

python - filter specific values in dataframe with unique prefix in column name (e.g. 'UniqueID_commonsuffix')

posted Feb 19, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - filter specific values in dataframe with unique prefix in column name (e.g. 'UniqueID_commonsuffix')

I have a dataframe with > 300 unique samples, there are 2 columns of similar information per sample, and I'd like to filter for 34 specific values in one of those columns per sample. I've included a screenshot of the data to help visualize this problem. I basically want to generate a new dataframe with only the information from the 34 values that I specify. My apologies if this question is difficult to understand, I hope the screenshot helps to define the problem better.

In this screenshot, each column with "sampleID_r.variant" needs to be filtered for specific values I have in a separate dataframe. There are only 34 I'm interested in. With that, I'd like to store the corresponding value to the left in the column "sampleID_reads" along with it, like a dictionary. If anyone can help with this, I'd greatly appreciate it. Thank you so much.

EDIT: the original dataframe is in the following format:

sampleID_reads	sampleID_r.variant
1	r.79_80ins79+1_79+76
64	r.79_80ins79+10857_79+10938
53	r.79_80ins80-13725_80-13587
72	r.79_80ins80-5488_80-5435
16	r.79_80ins79+2861_79+2900

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-02-19T04:10:45+0000

Here is some sample data

d = {'sample1_reads': [1, 64, 53, 72, 16],
    'sample1_r.variant': ['r.79_80ins79+1_79+76', 'r.79_80ins79+10857_79+10938', 
                         'r.79_80ins80-13725_80-13587', 'r.79_80ins80-5488_80-5435', 'r.79_80ins79+2861_79+2900'], 
    'sample2_reads': [0, 3, 6, 9, 11], 
    'sample2_r.variant': ['r.5333_5406del', 'r.4186_4188del', 'r.5333_54106del', 'r.2345_2345fad', 'r.65456_w56sjfy']}
df = pd.DataFrame(d)
rdot = pd.DataFrame(['r.79_80ins79+1_79+76', 'r.646_729del', 'r.5333_5406del', 'r.79_80ins80-5488_80-5435', 'r.79_80ins79+2861_79+2900'], columns=['r_dot'])

If you just want to filter for first frame based on the second frame then you can do the following

# reshape your current data frame 
new_df = pd.DataFrame(df.values.reshape((-1,2)), columns=['reads', 'variant'])
# use boolean indexing to filter your new data frame
df_f = new_df[new_df['variant'].isin(rdot['r_dot'])]

  reads                    variant
0     1       r.79_80ins79+1_79+76
1     0             r.5333_5406del
6    72  r.79_80ins80-5488_80-5435
8    16  r.79_80ins79+2861_79+2900

Categories

python - filter specific values in dataframe with unique prefix in column name (e.g. 'UniqueID_commonsuffix')

python - filter specific values in dataframe with unique prefix in column name (e.g. 'UniqueID_commonsuffix')

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags