EDIT: This question is about why the behavior is what it is, not how to get around it, which is what the alleged duplicate is about.
I've used the following notation to create lists of a certain size in different cases. For example:
>>> [None] * 5
[None, None, None, None, None]
>>>
This appears to work as expected and is shorter than:
>>> [None for _ in range(5)]
[None, None, None, None, None]
>>>
I then tried to create an list of lists using the same approach:
>>> [[]] * 5
[[], [], [], [], []]
>>>
Fair enough. It seems to work as expected.
However, while going through the debugger, I noticed that all the sub-list buckets had the same value, even though I had added only a single item. For example:
>>> t = [[]] * 5
>>> t
[[], [], [], [], []]
>>> t[1].append(4)
>>> t
[[4], [4], [4], [4], [4]]
>>> t[0] is t[1]
True
>>>
I was not expecting all top-level array elements to be references to a single sub-list; I expected 5 independent sub-lists.
For that, I had to write code like so:
>>> t = [[] for _ in range(5)]
>>> t
[[], [], [], [], []]
>>> t[2].append(4)
>>> t
[[], [], [4], [], []]
>>> t[0] is t[1]
False
>>>
I'm clearly missing something, probably a historical fact or simply a different way in which the consistency here is viewed.
Can someone explain why two different code snippets that one would reasonably expect to be equivalent to each other actually end up implicitly producing different and non-obvious (IMO) results, especially given Python's zen of always being explicit and obvious?
Please note that I'm already aware of this question, which is different to what I'm asking.
I'm simply looking for a detailed explanation/justification. If there're historical, technical, and/or theoretical reasons for this behavior, then please be sure to include a reference or two.
See Question&Answers more detail:
os