let's assume that we have the following data frame p
x 1 2 5
y
1 0.5 0.00 0.25
3 0.0 0.25 0.00
As you can see the x values are missing the values 3,4 and the y values are missing the value 2 in order for the column and index names to be continuous.
As I want to plot the array via imshow, I need extend the dataframe p by the missing values, resulting in:
x 1 2 3 4 5
y
1 0.5 0.00 0 0 0.25
2 0.0 0.00 0 0 0.00
3 0.0 0.25 0 0 0.00
I can achieve this by writing custom functions:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def make_columns_continuous(p):
for val in range(p.columns.min(),p.columns.max()):
if val not in p.columns:
p[val] = 0
p = p.sort_values(by=p.columns.name,axis=1)
return p
def make_rows_continuous(p):
for val in range(p.index.min(),p.index.max()):
if val not in p.index:
s = pd.Series([0]*len(p.columns), index=p.columns, name=val)
p = p.append(s)
p = p.sort_values(by=p.index.name,axis=0)
return p
df = pd.DataFrame({'x':[1,1,2,5],'y':[1,1,3,1]})
p = pd.crosstab(df.y,df.x,normalize=True)
#creates the data frame
#x 1 2 5
#y
#1 0.5 0.00 0.25
#3 0.0 0.25 0.00
p = make_columns_continuous(p)
p = make_rows_continuous(p)
#yields:
#x 1 2 3 4 5
#y
#1 0.5 0.00 0 0 0.25
#2 0.0 0.00 0 0 0.00
#3 0.0 0.25 0 0 0.00
Is there a better way to achieve this transformation? Are there even built-in pandas functions? Something like DataFrame to sparse matrix?
question from:
https://stackoverflow.com/questions/65830215/transform-dataframe-to-dataframe-with-continuous-index-and-columns 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…