Without using an external library (check the EDIT for a Pandas
solution) you can do it as follows :
d = {ni: indi for indi, ni in enumerate(set(names))}
numbers = [d[ni] for ni in names]
Brief explanation:
In the first line, you assign a number to each unique element in your list (stored in the dictionary d
; you can easily create it using a dictionary comprehension; set
returns the unique elements of names
).
Then, in the second line, you do a list comprehension and store the actual numbers in the list numbers
.
One example to illustrate that it also works fine for unsorted lists:
# 'll' appears all over the place
names = ['ll', 'll', 'hl', 'hl', 'hl', 'LL', 'LL', 'll', 'LL', 'HL', 'HL', 'HL', 'll']
That is the output for numbers
:
[1, 1, 3, 3, 3, 2, 2, 1, 2, 0, 0, 0, 1]
As you can see, the number 1
associated with ll
appears at the correct places.
EDIT
If you have Pandas available, you can also use pandas.factorize
(which seems to be quite efficient for huge lists and also works fine for lists of tuples as explained here):
import pandas as pd
pd.factorize(names)
will then return
(array([(array([0, 0, 1, 1, 1, 2, 2, 0, 2, 3, 3, 3, 0]),
array(['ll', 'hl', 'LL', 'HL'], dtype=object))
Therefore,
numbers = pd.factorize(names)[0]