Currently, I am tackling such a problem by parsing through each set, picking each member, adding or skipping said member to some memory: set
variable (to see if the number has already been parsed as a result of looking at some other set), and then having all sets which contain said member "reindexed" to be the union of all of said sets.
In code:
from typing import Set
from pandas import DataFrame
df = DataFrame({"set": [frozenset([1, 3]), frozenset([2, 3]), frozenset([5, 4])], 'data': [1, 2, 3]})
memory: Set[int] = set()
membership: frozenset
for membership in df["set"]: # "for each set"
localMembers = membership
for i in membership: # "for each element if not in memory"
if i not in memory:
memory.add(i)
others: frozenset
for others in [m for m in df["set"] if i in m]:
superset = localMembers.union(others)
for toChange in df.index[df["set"] == localMembers].tolist():
df.at[toChange, "set"] = superset
for toChange in df.index[df["set"] == others].tolist():
df.at[toChange, "set"] = superset
localMembers = superset
giving:
>> df
set data
0 (1, 2, 3) 1
1 (1, 2, 3) 2
2 (4, 5) 3
This is, of course, extremely slow and thus was wondering what other means I could look into in order to speed such a process up. I imagine one approach could be to get the categories and then to do all the setting at the end.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…