I am looking for methods, built-in functions, good practices... to append new data to a matrix - when the rows and columns are not the same
The data I deal with is structured as follows:
A.values: Ta x Ma matrix of values
A.dates: Ta x 1 vector of datenum
A.id: 1 x Ma cell array of ids
Now the challenge is how to deal with new (potentially overlapping) data B
that I load in and would like to append to a new matrix C
:
When new data comes in, it can expand both horizontally and vertically due to:
It also can have dates that start before min(A.dates)
or after max(A.dates)
or between min(A.dates)
and max(A.dates)
. The ids can be all unique in B (all new) or some can be overlapping.
Here is an example:
A.values = [2.1 2.4 2.5 2.6; ...
4.1 4.4 4.5 4.6; ...
6.1 6.4 6.5 6.6];
A.dates = [730002; ...
730004; ...
730006];
A.id = {'x1', 'x4', 'x5', 'x6'};
Now new data comes in:
B.values = [1.2 1.9 1.5 1.6 1.7; ...
3.2 3.9 3.5 3.6 3.7; ...
7.2 7.9 7.5 7.6 7.7; ...
8.2 8.9 8.5 8.6 8.7];
B.dates = [730001; ...
730003; ...
730007; ...
730008];
B.id = {'x2', 'x9', 'x5', 'x6', 'x7'};
How do we now efficiently and quickly construct the new struct C
?
C.values = [NaN 1.2 NaN 1.5 1.6 1.7 1.9; ...
2.1 NaN 2.4 2.5 2.6 NaN NaN; ...
NaN 3.2 NaN 3.5 3.6 3.7 3.9; ...
4.1 NaN 4.4 4.5 4.6 NaN NaN; ...
6.1 NaN 6.4 6.5 6.6 NaN NaN; ...
NaN 7.2 NaN 7.5 7.6 7.7 7.9; ...
NaN 8.2 NaN 8.5 8.6 8.7 8.9];
C.dates = [730001; ...
730002; ...
730003; ...
730004; ...
730006; ...
730007; ...
730008];
C.id = {'x1', 'x2', 'x4', 'x5', 'x6', 'x7', 'x9'};
Update with timetable
Following the comments, I tried to achieve this with timetable
as follows:
function dfmerged = in_mergeCache(dfA, dfB)
dtA = datenum2datetime(dfA.dates); % function datenum2datetime can be found here: https://stackoverflow.com/a/46685634/4262057
dtB = datenum2datetime(dfB.dates);
TTa = array2timetable(dfA.values, 'RowTimes', dtA, 'VariableNames', dfA.id);
TTb = array2timetable(dfB.values, 'RowTimes', dtB, 'VariableNames', dfB.id);
TTs = synchronize(TTa,TTb);
dfmerged.id = TTs.Properties.VariableNames;
dfmerged.values = table2array(TTs);
dfmerged.dates = datenum(TTs.Time); %to convert datenum
end
Problem: However, this gave me a big timetable, where the rows where indeed synchronized, but the columns where just duplicates (9 columns). How can I also synchronize the columns?
C =
struct with fields:
id: {'x1' 'x4' 'x5_TTa' 'x6_TTa' 'x2' 'x9' 'x5_TTb' 'x6_TTb' 'x7'}
values: [7×9 double]
dates: [7×1 double]
See Question&Answers more detail:
os