"Full" legend
If the hue
is in numeric format, seaborn will assume that it represents some continuous quantity and will decide to display what it thinks is a representative sample along the color dimension.
You can circumvent this by using legend="full"
.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'column1':[1,2,3,4,5], 'column2':[2,4,5,2,3], 'cluster':[0,1,2,3,4]})
sns.relplot(x='column2', y='column1', hue='cluster', data=df, legend="full")
plt.show()
Categoricals
An alternative is to make sure the values are treated categorical
Unfortunately, even if you plug in the numbers as strings, they will be converted to numbers falling back to the same mechanism described above. This may be seen as a bug.
However, one choice you have is to use real categories, like e.g. single letters.
'cluster':list("ABCDE")
works fine,
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
d = {'column1':[1,2,3,4,5], 'column2':[2,4,5,2,3], 'cluster':list("ABCDE")}
df = pd.DataFrame(data=d)
sns.relplot(x='column2', y='column1', hue='cluster', data=df)
plt.show()
Strings with customized palette
An alternative to the above is to use numbers converted to strings, and then make sure to use a custom palette with as many colors as there are unique hues.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
d = {'column1':[1,2,3,4,5], 'column2':[2,4,5,2,3], 'cluster':[1,2,3,4,5]}
df = pd.DataFrame(data=d)
df["cluster"] = df["cluster"].astype(str)
sns.relplot(x='column2', y='column1', hue='cluster', data=df,
palette=["b", "g", "r", "indigo", "k"])
plt.show()