I have a pretty large (about 2000x2000 but not square necessarily) dataframe that is very sparse looking something like this:
col1 col2 col3 col4
row1 0 0 1 0
row2 1 1 0 0
row3 0 1 0 1
row4 0 0 0 1
You can recreate this with this line:
df = pd.DataFrame([[0, 0, 1, 0], [1, 1, 0, 0], [0, 1, 0, 1], [0, 0, 0, 1]], columns=["col1", "col2", "col3", "col4"], index=["row1", "row2", "row3", "row4"])
So in this case we can see that row2 and row3 have a common element of col2, and row4 has a common non-zero element with row3 so they would all be a group (row2, row3, row4) while row1 has no common non-zero elements so that would be its own group.
What I would like is to have a decently efficient way to get all these groupings of rows that are independent of each other.
The only strategy I've come up with is to loop over all rows, find its common ones, and then keep looping over all rows til I have tied together all combinations but that seems really inefficient.
Does anyone have a better way of generating these distinct groups?
question from:
https://stackoverflow.com/questions/65847279/split-a-dataframe-into-chunks-where-each-chunk-has-no-common-non-zero-element-wi 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…