python - For data with a `set[int]` value, what fast means exist for grouping based on having at least one common member?

Question

Welcome To Ask or Share your Answers For Others

python - For data with a `set[int]` value, what fast means exist for grouping based on having at least one common member?

posted Jan 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - For data with a `set[int]` value, what fast means exist for grouping based on having at least one common member?

Currently, I am tackling such a problem by parsing through each set, picking each member, adding or skipping said member to some memory: set variable (to see if the number has already been parsed as a result of looking at some other set), and then having all sets which contain said member "reindexed" to be the union of all of said sets.

In code:

from typing import Set

from pandas import DataFrame

df = DataFrame({"set": [frozenset([1, 3]), frozenset([2, 3]), frozenset([5, 4])], 'data': [1, 2, 3]})
memory: Set[int] = set()
membership: frozenset
for membership in df["set"]:  # "for each set"
    localMembers = membership
    for i in membership:  # "for each element if not in memory"
        if i not in memory:
            memory.add(i)
            others: frozenset
            for others in [m for m in df["set"] if i in m]:
                superset = localMembers.union(others)
                for toChange in df.index[df["set"] == localMembers].tolist():
                    df.at[toChange, "set"] = superset
                for toChange in df.index[df["set"] == others].tolist():
                    df.at[toChange, "set"] = superset
                localMembers = superset

giving:

>> df
         set  data
0  (1, 2, 3)     1
1  (1, 2, 3)     2
2     (4, 5)     3

This is, of course, extremely slow and thus was wondering what other means I could look into in order to speed such a process up. I imagine one approach could be to get the categories and then to do all the setting at the end.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

python - For data with a `set[int]` value, what fast means exist for grouping based on having at least one common member?

python - For data with a `set[int]` value, what fast means exist for grouping based on having at least one common member?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags