Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
73 views
in Technique[技术] by (71.8m points)

python - transform dataframe to dataframe with continuous index and columns

let's assume that we have the following data frame p

x    1     2     5
y                 
1  0.5  0.00  0.25
3  0.0  0.25  0.00 

As you can see the x values are missing the values 3,4 and the y values are missing the value 2 in order for the column and index names to be continuous. As I want to plot the array via imshow, I need extend the dataframe p by the missing values, resulting in:

x    1     2  3  4     5
y                       
1  0.5  0.00  0  0  0.25
2  0.0  0.00  0  0  0.00
3  0.0  0.25  0  0  0.00

I can achieve this by writing custom functions:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def make_columns_continuous(p):
    for val in range(p.columns.min(),p.columns.max()):
        if val not in p.columns:
            p[val] = 0
            
    p = p.sort_values(by=p.columns.name,axis=1)
    return p

def make_rows_continuous(p):
    for val in range(p.index.min(),p.index.max()):
        if val not in p.index:
            s = pd.Series([0]*len(p.columns), index=p.columns, name=val)
            p = p.append(s)
            
    p = p.sort_values(by=p.index.name,axis=0)
    return p

df = pd.DataFrame({'x':[1,1,2,5],'y':[1,1,3,1]})

p = pd.crosstab(df.y,df.x,normalize=True)

#creates the data frame
#x    1     2     5
#y                 
#1  0.5  0.00  0.25
#3  0.0  0.25  0.00 


p = make_columns_continuous(p)
p = make_rows_continuous(p)

#yields:
#x    1     2  3  4     5
#y                       
#1  0.5  0.00  0  0  0.25
#2  0.0  0.00  0  0  0.00
#3  0.0  0.25  0  0  0.00

Is there a better way to achieve this transformation? Are there even built-in pandas functions? Something like DataFrame to sparse matrix?

question from:https://stackoverflow.com/questions/65830215/transform-dataframe-to-dataframe-with-continuous-index-and-columns

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Another option would be to create a new DataFrame of the final size you want, and then fill it with the data from p:

p.columns = p.columns.astype(int)
newrows = range(p.index.min(), p.index.max()+1)
newcols = range(p.columns.min(), p.columns.max()+1)

df = pd.DataFrame(index=newrows, columns=newcols, data=0)

#    1  2  3  4  5
# 1  0  0  0  0  0
# 2  0  0  0  0  0
# 3  0  0  0  0  0

df.loc[p.index, p.columns] = p

#      1     2  3  4     5
# 1  0.5  0.00  0  0  0.25
# 2  0.0  0.00  0  0  0.00
# 3  0.0  0.25  0  0  0.00

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...