Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
4.4k views
in Technique[技术] by (71.8m points)

python - For data with a `set[int]` value, what fast means exist for grouping based on having at least one common member?

Currently, I am tackling such a problem by parsing through each set, picking each member, adding or skipping said member to some memory: set variable (to see if the number has already been parsed as a result of looking at some other set), and then having all sets which contain said member "reindexed" to be the union of all of said sets.

In code:

from typing import Set

from pandas import DataFrame

df = DataFrame({"set": [frozenset([1, 3]), frozenset([2, 3]), frozenset([5, 4])], 'data': [1, 2, 3]})
memory: Set[int] = set()
membership: frozenset
for membership in df["set"]:  # "for each set"
    localMembers = membership
    for i in membership:  # "for each element if not in memory"
        if i not in memory:
            memory.add(i)
            others: frozenset
            for others in [m for m in df["set"] if i in m]:
                superset = localMembers.union(others)
                for toChange in df.index[df["set"] == localMembers].tolist():
                    df.at[toChange, "set"] = superset
                for toChange in df.index[df["set"] == others].tolist():
                    df.at[toChange, "set"] = superset
                localMembers = superset

giving:

>> df
         set  data
0  (1, 2, 3)     1
1  (1, 2, 3)     2
2     (4, 5)     3

This is, of course, extremely slow and thus was wondering what other means I could look into in order to speed such a process up. I imagine one approach could be to get the categories and then to do all the setting at the end.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神解答

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...