I have a pandas data frame with three columns containing probabilities:
Prob0 Prob1 Prob2
0.1 0.6 0.3
0.2 0.1 0.7
I need to generate a column that contains, for each row, the value 0 with probability Prob0, the value 1 with probability Prob1 and the value 2 with probability Prob2.
Alternatively, I am happy if I generate a column that contains the value Prob0 with probability Prob0, the value Prob1 with probability Prob1 and the value Prob2 with probability Prob2.
I have tried with the sample
function, but it does not work:
population['ChoiceProba'] = population[['Prob0', 'Prob1', 'Prob2']].sample(weights=population[['Prob0', 'Prob1', 'Prob2']], axis=1)
I receive the error message:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I have also considered the numpy.random.choice
function, but did not manage to combine it with a pandas
statement without a loop.
I would like to avoid a loop, as I have 1000000 rows.
question from:
https://stackoverflow.com/questions/65897751/pandas-sampling-columns-based-on-weights 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…