I want to achieve a visualization that shows the frequency of changes between one state to another, being these states represented as numbers.
My data looks like this and is call df_sankey
I was thinking on a Sankey Diagram following the example from the documentation. So I want one column with the states A as I1, I2, ... , I20 and another column with the states B as F1, F2, ..., F20. Then the frequency between every pair of values will be represented as a weighted line as follows.
However, I can't sort the nodes in the columns according to the number of state. This is what I want to achieve.
This is what I have tried:
#Create Labels
source = pd.DataFrame(np.arange(1,21), columns = ['source'])['source'].apply(lambda x: 'I' + str(x))
target = pd.DataFrame(np.arange(1,21), columns = ['target'])['target'].apply(lambda x: 'F' + str(x))
labels = pd.concat([source, target], axis=0).reset_index(drop=True)
#X-node
x_node = np.concatenate((np.ones(int(len(source)))*0.1, np.ones(int(len(target)))), axis = None)
#Y-node
y_node = np.tile(np.linspace(0,100,len(source)),2)
#Create Dataframe
df_nodes = pd.DataFrame(data = {'label': labels, 'X': x_node, 'Y': y_node})
#PLOT
fig = go.Figure(data=[go.Sankey(
arrangement='snap',
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = df_nodes['label'],
color = "blue",
x = df_nodes['X'],
y = df_nodes['Y']
),
link = dict(
source = df_sankey['State_A']-1, #Indices correspond to labels, eg A1, A2, A1, B1, ...
target = df_sankey['State_B']+20-1,
value = df_sankey['Freq']
))])
fig.update_layout(title_text="Basic Sankey Diagram", font_size=10)
fig.show()
Any ideas?
question from:
https://stackoverflow.com/questions/65905327/how-to-sort-nodes-in-a-sankey-diagram-plotly