How are lists and other sequences compared in Python?
Lists (and other sequences) in Python are compared lexicographically and not based on any other parameter.
Sequence objects may be compared to other objects with the same sequence type. The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted.
What is lexicographic sorting?
From the Wikipedia page on lexicographic sorting
lexicographic or lexicographical order (also known as lexical order, dictionary order, alphabetical order or lexicographic(al) product) is a generalization of the way the alphabetical order of words is based on the alphabetical order of their component letters.
The min
function returns the smallest value in the iterable. So the lexicographic value of [1,2]
is the least in that list. You can check by using [1,2,21]
>>> my_list=[[1,2,21],[1,3],[1,2]]
>>> min(my_list)
[1, 2]
What is happening in this case of min
?
Going element wise on my_list
, firstly [1,2,21]
and [1,3]
. Now from the docs
If two items to be compared are themselves sequences of the same type, the lexicographical comparison is carried out recursively.
Thus the value of [1,1,21]
is less than [1,3]
, because the second element of [1,3]
, which is, 3
is lexicographically higher than the value of the second element of [1,1,21]
, which is, 1
.
Now comparing [1,2]
and [1,2,21]
, and adding another reference from the docs
If one sequence is an initial sub-sequence of the other, the shorter sequence is the smaller (lesser) one.
[1,2]
is an initial sub-sequence of [1,2,21]
. Therefore the value of [1,2]
on the whole is smaller than that of [1,2,21]
. Hence [1,2]
is returned as the output.
This can be validated by using the sorted
function
>>> sorted(my_list)
[[1, 2], [1, 2, 21], [1, 3]]
What if the list has multiple minimum elements?
If the list contains duplicate min elements the first is returned
>>> my_list=[[1,2],[1,2]]
>>> min(my_list)
[1, 2]
This can be confirmed using the id
function call
>>> my_list=[[1,2],[1,2]]
>>> [id(i) for i in my_list]
[140297364849368, 140297364850160]
>>> id(min(my_list))
140297364849368
What do I need to do to prevent lexicographic comparison in min
?
If the required comparison is not lexicographic then the key
argument can be used (as mentioned by Padraic)
The min
function has an additional optional argument called key
. The key
argument takes a function.
The optional key argument specifies a one-argument ordering function
like that used for list.sort()
. The key argument, if supplied, must be
in keyword form (for example, min(a,b,c,key=func)
).
For example, if we need the smallest element by length, we need to use the len
function.
>>> my_list=[[1,2,21],[1,3],[1,2]]
>>> min(my_list,key=len) # Notice the key argument
[1, 3]
As we can see the first shortest element is returned here.
What if the list is heterogeneous?
Until Python2
If the list is heterogeneous type names are considered for ordering, check Comparisions,
Objects of different types except numbers are ordered by their type names
Hence if you put an int
and a list
there you will get the integer value as the smallest as i
is of lower value than l
. Similarly '1'
would be of higher value than both of this.
>>> my_list=[[1,1,21],1,'1']
>>> min(my_list)
1
Python3 and onwards
However this confusing technique was removed in Python3. It now raises a TypeError
. Read What's new in Python 3.0
The ordering comparison operators (<
, <=
, >=
, >
) raise a TypeError
exception when the operands don’t have a meaningful natural ordering. Thus, expressions like 1 < ''
, 0 > None
or len <= len
are no longer valid, and e.g. None < None
raises TypeError
instead of returning False
. A corollary is that sorting a heterogeneous list no longer makes sense – all the elements must be comparable to each other.
>>> my_list=[[1,1,21],1,'1']
>>> min(my_list)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < list()
But it works for Comparable types, For example
>>> my_list=[1,2.0]
>>> min(my_list)
1
Here we can see that the list
contains float
values and int
values. But as float
and int
are comparable types, min
function works in this case.