I'm having difficulty constructing a 3D DataFrame in Pandas. I want something like this
A B C
start end start end start end ...
7 20 42 52 90 101
11 21 213 34
56 74 9 45
45 12
Where A
, B
, etc are the top-level descriptors and start
and end
are subdescriptors. The numbers that follow are in pairs and there aren't the same number of pairs for A
, B
etc. Observe that A
has four such pairs, B
has only 1, and C
has 3.
I'm not sure how to proceed in constructing this DataFrame. Modifying this example didn't give me the designed output:
import numpy as np
import pandas as pd
A = np.array(['one', 'one', 'two', 'two', 'three', 'three'])
B = np.array(['start', 'end']*3)
C = [np.random.randint(10, 99, 6)]*6
df = pd.DataFrame(zip(A, B, C), columns=['A', 'B', 'C'])
df.set_index(['A', 'B'], inplace=True)
df
yielded:
C
A B
one start [22, 19, 16, 20, 63, 54]
end [22, 19, 16, 20, 63, 54]
two start [22, 19, 16, 20, 63, 54]
end [22, 19, 16, 20, 63, 54]
three start [22, 19, 16, 20, 63, 54]
end [22, 19, 16, 20, 63, 54]
Is there any way of breaking up the lists in C into their own columns?
EDIT: The structure of my C
is important. It looks like the following:
C = [[7,11,56,45], [20,21,74,12], [42], [52], [90,213,9], [101, 34, 45]]
And the desired output is the one at the top. It represents the starting and ending points of subsequences within a certain sequence (A
, B
. C
are the different sequences). Depending on the sequence itself, there are a differing number of subsequences that satisfy a given condition I'm looking for. As a result, there are a differing number of start:end pairs for A
, B
, etc
See Question&Answers more detail:
os