Without duplicates
If your objects are hashable and your lists have no duplicates, you can create an inverted index of the first list and then traverse the second list. This traverses each list only once and thus is O(n)
.
def find_matching_index(list1, list2):
inverse_index = { element: index for index, element in enumerate(list1) }
return [(index, inverse_index[element])
for index, element in enumerate(list2) if element in inverse_index]
find_matching_index([1,2,3], [3,2,1]) # [(0, 2), (1, 1), (2, 0)]
With duplicates
You can extend the previous solution to account for duplicates. You can keep track of multiple indices with a set
.
def find_matching_index(list1, list2):
# Create an inverse index which keys are now sets
inverse_index = {}
for index, element in enumerate(list1):
if element not in inverse_index:
inverse_index[element] = {index}
else:
inverse_index[element].add(index)
# Traverse the second list
matching_index = []
for index, element in enumerate(list2):
# We have to create one pair by element in the set of the inverse index
if element in inverse_index:
matching_index.extend([(x, index) for x in inverse_index[element]])
return matching_index
find_matching_index([1, 1, 2], [2, 2, 1]) # [(2, 0), (2, 1), (0, 2), (1, 2)]
Unfortunately, this is no longer O(n). Consider the case where you input [1, 1]
and [1, 1]
, the output is [(0, 0), (0, 1), (1, 0), (1, 1)]
. Thus by the size of the output, the worst case cannot be better than O(n^2)
.
Although, this solution is still O(n)
if there are no duplicates.
Non-hashable objects
Now comes the case where your objects are not hashable, but comparable. The idea here will be to sort your lists in a way that preserves the origin index of each element. Then we can group sequences of elements that are equal to get matching indices.
Since we make heavy use of groupby
and product
in the following code, I made find_matching_index
return a generator for memory efficiency on long lists.
from itertools import groupby, product
def find_matching_index(list1, list2):
sorted_list1 = sorted((element, index) for index, element in enumerate(list1))
sorted_list2 = sorted((element, index) for index, element in enumerate(list2))
list1_groups = groupby(sorted_list1, key=lambda pair: pair[0])
list2_groups = groupby(sorted_list2, key=lambda pair: pair[0])
for element1, group1 in list1_groups:
try:
element2, group2 = next(list2_groups)
while element1 > element2:
(element2, _), group2 = next(list2_groups)
except StopIteration:
break
if element2 > element1:
continue
indices_product = product((i for _, i in group1), (i for _, i in group2), repeat=1)
yield from indices_product
# In version prior to 3.3, the above line must be
# for x in indices_product:
# yield x
list1 = [[], [1, 2], []]
list2 = [[1, 2], []]
list(find_matching_index(list1, list2)) # [(0, 1), (2, 1), (1, 0)]
It turns out that time complexity does not suffer that much. Sorting of course takes O(n log(n))
, but then groupby
provides generators that can recover all elements by traversing our lists only twice. The conclusion is that our complexity is primarly bound by the size of the output of product
. Thus giving a best case where the algorithm is O(n log(n))
and a worst case that is once again O(n^2)
.