Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
148 views
in Technique[技术] by (71.8m points)

matlab - Pad cell array with whitespace and rearrange

I have a 2D cell-array (A = 2x3) containing numerical vectors of unequal length, in this form:

1x3 1x4 1x2
1x7 1x8 1x3

*Size of A (in both dimensions) can be variable

I want to pad each vector with whitespace {' '} to equalise their lengths to lens = max(max(cellfun('length',A)));- in this case, all vectors will become 1x8 in size - and then subsequently rearrange the cell array into this form so that it can be converted to a columnar table using cell2table (using sample data):

4   1   2   1   3   4
8   5   8   4   7   9
10  12  11  5   []  11
[]  13  21  7   []  []
[]  15  []  11  []  []
[]  18  []  23  []  []
[]  21  []  29  []  []
[]  []  []  32  []  []

[ ] = Whitespace

i.e. columns are in the order A{1,1}, A{2,1}, A{1,2}, A{2,2}, A{1,3} and A{2,3}.

If A = 4x3, the first five columns after the rearrangement would be A{1,1}, A{2,1}, A{3,1}, A{4,1} and A{1,2}.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

My version of Matlab (R2013a) does not have cell2table so like Stewie Griffin I'm not sure which exact format you need for the conversion.

I am also not sure if padding vectors of double with whitespace is such a good idea. strings and double are not convenient to be mixed. Specially if in your case you just want cell array columns of homogeneous type (as opposed to column where each element would be a cell). It means you have to:

  • convert your numbers to string first (e.g. char array).
  • since the column will be a char array, they need to be homogeneous in dimension, so you have to find the longest string and make them all the same length.
  • Finally, you can then pad you char array column with the necessary number of whitespace

One way to do that require multiple cellfun calls to probe for all these information we need before we can actually do the padding/reshaping:

%// get the length of the longest vector
Lmax = max(max(cell2mat(cellfun( @numel , A  , 'uni',0)))) ;
%// get the maximum order of magnitude
n = max(max(cell2mat(cellfun( @(x) max(ceil(log10(x))) , A  , 'uni',0)))) 
%// prepare string format based on "n"
fmt = sprintf('%%0%dd',n) ;
%// pad columns with necessary number of whitespace
b = cellfun( @(c) [num2str(c(:),fmt) ; repmat(' ', Lmax-numel(c),n)], A ,'uni',0 ) ;
%// reshape to get final desired result
b = b(:).' 

b = 
    [8x2 char]    [8x2 char]    [8x2 char]    [8x2 char]    [8x2 char]    [8x2 char]

Note that a call to str2num on that would yield your original cell array (almost, less a reshape operation), as str2num will ignore (return empty) the whitespace entries.

>> bf = cellfun( @str2num , b,'un',0 )
bf = 
    [3x1 double]    [7x1 double]    [4x1 double]    [8x1 double]    [2x1 double]    [3x1 double]

If I was dealing with numbers, I would definitely prefer padding with a numeric type (also makes the operation slightly easier). Here's an example padding with 'NaN's:

%// get the length of the longest vector
Lmax = max(max(cell2mat(cellfun( @numel , A  , 'un',0)))) ;
%// pad columns with necessary number of NaN
b = cellfun( @(c) [c(:) ; NaN(Lmax-numel(c),1)], A ,'un',0 ) ;
%// reshape to get final desired result
b = b(:).' 

b = 
    [8x1 double]    [8x1 double]    [8x1 double]    [8x1 double]    [8x1 double]    [8x1 double]

If you do not like operating with NaNs, you could choose a numeric value which is not among the possible values of your dataset. For example if all your values are supposed to be positive integers, -1 is a good indicator of a special value.

%// choose your NULL value indicator
nullNumber = -1 ;
b = cellfun( @(c) [c.' ; zeros(Lmax-numel(c),1)+nullNumber], A ,'un',0 ) ;
b = b(:).' 

cell2mat(b)
ans =
     4     1     2     1     3     4
     8     5     8     4     7     9
    10    12    11     5    -1    11
    -1    13    21     7    -1    -1
    -1    15    -1    11    -1    -1
    -1    18    -1    23    -1    -1
    -1    21    -1    29    -1    -1
    -1    -1    -1    32    -1    -1

Note:

If -1 is a possible value for your set, and you still don't want to use NaN, a value widely used in my industry (which is totally allergic to NaN) as a null indicator for all real numbers is -999.25. Unless you have a very specific application, the probability of getting exactly this value during normal operation is so infinitesimal that it is ok for most software algorithms to recognize a null value when they come across -999.25. (sometimes they use only -999 if they deal with integers only.)

Also note the use of c(:) in the cellfun calls. This makes sure that the vector (in each cell) will be arranged as a column (regardless of it's original shape (because your initial vectors are actually in line as you have them in your example).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...