Construct a csv with array strings:
In [385]: arr = np.empty(1, object)
In [386]: arr[0]=np.arange(12).reshape(3,4)
In [387]: S = pd.Series(arr,name='x')
In [388]: S
Out[388]:
0 [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
Name: x, dtype: object
In [389]: S.to_csv('series.csv')
/usr/local/bin/ipython3:1: FutureWarning: The signature of `Series.to_csv` was aligned to that of `DataFrame.to_csv`, and argument 'header' will change its default value from False to True: please pass an explicit value to suppress this warning.
#!/usr/bin/python3
In [390]: cat series.csv
0,"[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]"
load it:
In [391]: df = pd.read_csv('series.csv',header=None)
In [392]: df
Out[392]:
0 1
0 0 [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
In [394]: astr=df[1][0]
In [395]: astr
Out[395]: '[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]'
parse the string representation of the array:
In [396]: astr.split('
')
Out[396]: ['[[ 0 1 2 3]', ' [ 4 5 6 7]', ' [ 8 9 10 11]]']
In [398]: astr.replace('[','').replace(']','').split('
')
Out[398]: [' 0 1 2 3', ' 4 5 6 7', ' 8 9 10 11']
In [399]: [i.split() for i in _]
Out[399]: [['0', '1', '2', '3'], ['4', '5', '6', '7'], ['8', '9', '10', '11']]
In [400]: np.array(_, int)
Out[400]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
No guarantee that that's the prettiest cleanest parsing, but it gives an idea of the work you have to do. I'm reinventing the wheel, but searching for a duplicate was taking too long.
If possible try to avoid saving such a dataframe as csv. csv format is meant for a clean 2d table, simple consistent columns separated by a delimiter.
And for the most part avoid dataframes/series like this. A Series can have object dtype. And each object element can be complex, such as a list, dictionary, or array. But I don't think pandas
has special functions to handle those cases.
numpy
also has object dtypes (as my arr
), but a list is often just as good, if not better. Constructing such an array can be tricky. Math on such an array is hit or miss. Iteration on an object array is slower than iteration on a list.
===
re
might work as well. For example replacing whitespace with comma:
In [408]: re.sub('s+',',',astr)
Out[408]: '[[,0,1,2,3],[,4,5,6,7],[,8,9,10,11]]'
Still not quite right. There are leading commas that will choke eval
.