How to join tables in Matlab (2018) by matching time intervals? - matlab

I have two tables A and B. I want to join them based on their validity time intervals.
A has product quality (irregular times) and B has hourly settings during the production period. I need to create a table like C that includes the parameters p1 and p2 for all A's RefDates that fall in the time range of B's ValidFrom ValidTo.
A
RefDate result
'11-Oct-2017 00:14:00' 17
'11-Oct-2017 00:14:00' 19
'11-Oct-2017 00:20:00' 5
'11-Oct-2017 01:30:00' 25
'11-Oct-2017 01:30:00' 18
'11-Oct-2017 03:03:00' 28
B
ValidFrom ValidTo p1 p2
'11-Oct-2017 00:13:00' '11-Oct-2017 01:12:59' 2 1
'11-Oct-2017 01:13:00' '11-Oct-2017 02:12:59' 3 1
'11-Oct-2017 02:13:00' '11-Oct-2017 03:12:59' 4 5
'11-Oct-2017 03:13:00' '11-Oct-2017 04:12:59' 6 1
'11-Oct-2017 04:13:00' '11-Oct-2017 05:12:59' 7 9
I need to get something like this.
C
RefDate res p1 p2
'11-Oct-2017 00:14:00' 17 2 1
'11-Oct-2017 00:14:00' 19 2 1
'11-Oct-2017 00:20:00' 5 2 1
'11-Oct-2017 01:30:00' 25 3 1
'11-Oct-2017 01:30:00' 18 3 1
'11-Oct-2017 03:03:00' 28 4 5
I know how to do this in SQL and I think I have figured out how to do this row by row in MatLab but this is horribly slow. The data set is rather large. I just assume there must be a more elegant way that I just couldn't find.
Something that caused many of my approaches to fail is that the RefDate column is not unique.
edit:
the real tables have thousands of rows and hundreds of variables.
C (in reality)
RefDate res res2 ... res200 p1 p2 ... p1000
11-Oct-2017 00:14:00 17 2 1
11-Oct-2017 00:14:00 19 2 1
11-Oct-2017 00:20:00 5 2 1
11-Oct-2017 01:30:00 25 3 1
11-Oct-2017 01:30:00 18 3 1
11-Oct-2017 03:03:00 28 4 5

This can actually be done in a single line of code. Assuming your ValidTo value always ends immediately before the ValidFrom in the next row (which it does in your example), you only need to use your ValidFrom values. First, convert those and your RefDate values to serial date numbers using datenum. Then use the discretize function to bin the RefDate values using the ValidFrom values as the edges, which will give you the row index in B that contains each time in A. Then use that index to extract the p1 and p2 values and append them to A:
>> C = [A B(discretize(datenum(A.RefDate), datenum(B.ValidFrom)), 3:end)]
C =
RefDate result p1 p2
______________________ ______ __ __
'11-Oct-2017 00:14:00' 17 2 1
'11-Oct-2017 00:14:00' 19 2 1
'11-Oct-2017 00:20:00' 5 2 1
'11-Oct-2017 01:30:00' 25 3 1
'11-Oct-2017 01:30:00' 18 3 1
'11-Oct-2017 03:03:00' 28 4 5
The above solution should work for any number of columns pN in B.
If there are any times in A that don't fall in any of the ranges in B, you will have to break the solution into multiple lines so you can check whether or not the index returned from discretize contains NaN values. Assuming you want to exclude those rows from C, this would be the new solution:
index = discretize(datenum(A.RefDate), datenum(B.ValidFrom));
C = [A(~isnan(index), :) B(index(~isnan(index)), 3:end)];

The following code does exactly what you are asking for:
% convert to datetime
A.RefDate = datetime(A.RefDate);
B.ValidFrom = datetime(B.ValidFrom);
B.ValidTo = datetime(B.ValidTo);
% for each row in A, find the matching row in B
i = cellfun(#find, arrayfun(#(x) (x >= B.ValidFrom) & (x <= B.ValidTo), A.RefDate, 'UniformOutput', false), 'UniformOutput', false);
% find rows in A that where not matched
j = cellfun(#isempty, i, 'UniformOutput', false);
% build the result
C = [B(cell2mat(i),:) A(~cell2mat(j),:)];
% display output
C

Related

Summing specific columns for each row in a matrix of double

I would like to sum specific columns of each row in a matrix using a for loop. Below I have included a simplified version of my problem. As of right now, I am calculating the column sums individually, but this is not effective as my actual problem has multiple matrices (data sets).
a = [1 2 3 4 5 6; 4 5 6 7 8 9];
b = [2 2 3 4 4 6; 3 3 3 4 5 5];
% Repeat the 3 lines of code below for row 2 of matrix a
% Repeat the entire process for matrix b
c = sum(a(1,1:3)); % Sum columns 1:3 of row 1
d = sum(a(1,4:6)); % Sum columns 4:6 of row 1
e = sum(a(1,:)); % Sum all columns of row 1
I would like to know how to create a for loop that automatically loops through and sums the specific columns of each row for each matrix that I have.
Thank you.
Here is a solution that you don't need to use for loop.
Assuming that you have a matrix a of size 2x12, and you want to do the row sums every 4 columns, then you can use reshape() and squeeze() to get the final result:
k = 4;
a = [1:12
13:24];
% a =
% 1 2 3 4 5 6 7 8 9 10 11 12
% 13 14 15 16 17 18 19 20 21 22 23 24
s = squeeze(sum(reshape(a,size(a,1),k,[]),2));
and you will get
s =
10 26 42
58 74 90

How to create a matrix B from a matrix A using conditions in MATLAB

If I have this matrix:
A:
X Y Z
1 1 2
0 3 4
0 5 6
2 7 8
7 9 10
8 11 12
3 13 14
12 14 16
15 17 18
How could I create new matrix B, C, D and E which contains:
B:
0 3 4
0 5 6
C:
X Y Z
1 1 2
2 7 8
3 13 14
D:
7 9 10
8 11 12
E:
12 14 16
15 17 18
The idea is to construct a loop asking if 0<A<1 else 1<A<5 else 6<A<10 else 11<A<15. and create new matrix from that condition. Any idea about how to store the results of the loop?
I suggest you an approach that uses the discretize function in order to group the matrix rows into different categories based on their range. Here is the full implementation:
A = [
1 1 2;
0 3 4;
0 5 6;
2 7 8;
7 9 10;
8 11 12;
3 13 14;
12 14 16;
15 17 18
];
A_range = [0 1 5 10 15];
bin_idx = discretize(A(:,1),A_range);
A_split = arrayfun(#(bin) A(bin_idx == bin,:),1:(numel(A_range) - 1),'UniformOutput',false);
celldisp(A_split);
Since you want to consider 5 different ranges based on the first column values, the arguments passed to discretize must be the first matrix column and a vector containing the group limits (first number inclusive left, second number exclusive right, second number inclusive left, third number exclusive right, and so on...). Since your ranges are a little bit messed up, feel free to adjust them to respect the correct output. The latter is returned in the form of a cell array of double matrices in which every element contains the rows belonging to a distinct group:
A_split{1} =
0 3 4
0 5 6
A_split{2} =
1 1 2
2 7 8
3 13 14
A_split{3} =
7 9 10
8 11 12
A_split{4} =
12 14 16
15 17 18
Instead of using a loop, use logical indexing to achieve what you want. Use the first column of A and check for the ranges that you want to look for, then use this to subset into the final matrix A to get what you want.
For example, to create the matrix C, find all locations in the first column of A that are between 1 and 5, then subset the matrix along the rows using these locations:
m = A(:,1) >= 1 & A(:,1) <= 5;
C = A(m,:);
You can repeat this in a similar way for the rest of the matrices you want to create.

calculate similarity between two vectors

I have a matrix M which is a 29 x 18 double, something like this:
1 1 1 ...
2 1 1 ...
3 1 2 ...
2 2 2 ...
2 1 3 ...
3 1 3 ...
1 3 3 ...
...
For each possible pair of two columns in M, I want to calculate the number of times the values of the same row between two columns are identical. Take column 1 and 2 for instance, the number of times values of the same row are identical is 2 since M(1,1) = M(1,2) and M(4,1) = M(4,2). This calculation is repeated 18 time for each column as each column is paired with each of the total number of columns, including itself. Thus, the output (called N) would be 18 x 18 matrix with the each value indicating how many instances the values of the same row from the original two corresponding columns are identical. Something like this
29 4 5 3 ...
4 29 6 0 ...
5 6 29 7 ...
...
Since N(2,1) = 4, this would indicate that column 1 and column 2 matrix M have 4 matching values of the same row.
How do I do this?
you can do a double for loop like that:
result = zeros(18);
for i = 1:18
for j = 1:18
result(i,j) = nnz(M(:,i) == M(:,j));
end
end

Sorting data in MATLAB dependant on one column

How do I sort a column based on the values in another column in MATLAB?
Column A shows position data (it is neither ascending or descending in order) Column B contains another column of position data. Finally column C contains numerical values. Is it possible to link the first position value in B with its numerical value in the first cell of C? Then after this I want to sort B such that it is in the same order as column A with the C values following their B counterparts?The length of my columns would be 1558 values.
Before case;
A B C
1 4 10
4 1 20
3 5 30
5 2 40
2 3 50
After Case;
A B C
1 1 20
4 4 10
3 3 50
5 5 30
2 2 40
Basically A and B became the same and Column C followed B.
Since you don't want things necessarily in ascending or descending order, I don't think any built-in sorting functions like sortrows() will help here. Instead you are matching elements in one column with elements in another column.
Using [~,idx]=ismember(A,B) will tell you where each element of B is in A. You can use that to sort the desired columns.
M=[1 4 10
4 1 20
3 5 30
5 2 40
2 3 50];
A=M(:,1); B=M(:,2); C=M(:,3);
[~,idx]=ismember(A,B);
sorted_matrix = [A B(idx) C(idx)]
Powerful combo of bsxfun and matrix-multiplication solves it and good for code-golfing too! Here's the implementation, assuming M as the input matrix -
[M(:,1) bsxfun(#eq,M(:,1),M(:,2).')*M(:,2:3)]
Sample run -
>> M
M =
1 4 10
4 1 20
3 5 30
5 2 40
2 3 50
>> [M(:,1) bsxfun(#eq,M(:,1),M(:,2).')*M(:,2:3)]
ans =
1 1 20
4 4 10
3 3 50
5 5 30
2 2 40
Given M = [A B C]:
M =
1 4 10
4 1 20
3 5 30
5 2 40
2 3 50
You need to sort the rows of the matrix excluding the first column:
s = sortrows(M(:,2:3));
s =
1 20
2 40
3 50
4 10
5 30
Then use the first column as the indices to reorder the resulting submatrix:
s(M(:,1),:);
ans =
1 20
4 10
3 50
5 30
2 40
This would be used to build the output matrix:
N = [M(:,1) s(M(:,1),:)];
N =
1 1 20
4 4 10
3 3 50
5 5 30
2 2 40
The previous technique will obviously only work if A and B are permutations of the values (1..m). If this is not the case, then we need to find the ranking of each value in the array. Let's start with new values for our arrays:
A B C
1 5 60
6 1 80
9 6 60
-4 9 40
5 -4 30
We construct s as before:
s = sortrows([B C]);
s =
-4 30
1 80
5 60
6 60
9 40
We can generate the rankings one of two ways. If the elements of A (and B) are unique, we can use the third output of unique as in this answer:
[~, ~, r] = unique(A);
r =
2
4
5
1
3
If the values of A are not unique, we can use the second return value of sort, the indices in the original array of the elements in sorted order, to generate the rank of each element:
[~, r] = sort(A);
r =
4
1
5
2
3
[~, r] = sort(r);
r =
2
4
5
1
3
As you can see, the resulting r is the same, it just takes 2 calls to sort rather than 1 to unique. We then use r as the list of indices for s above:
M = [A s(r, :)];
M =
1 1 80
6 6 60
9 9 40
-4 -4 30
5 5 60
If you must retain the order of A then use something like this
matrix = [1 4 10; 4 1 20; 3 5 30; 5 2 40; 2 3 50];
idx = arrayfun(#(x) find(matrix(:,2) == x), matrix(:,1));
sorted = [matrix(:,1), matrix(idx,2:3)];

How to sum across a row in KDB/Q

I have a table rCom which has various columns. I would like to sum across each row..
for example:
Date TypeA TypeB TypeC TypeD
date1 40.5 23.1 45.1 65.2
date2 23.3 32.2 56.1 30.1
How can I write a q query to add a fourth column 'Total' that sums across each row?
why not just:
update Total: TypeA+TypeB+TypeC+TypeD from rCom
?
Sum will work just fine:
q)flip`a`b`c!3 3#til 9
a b c
-----
0 3 6
1 4 7
2 5 8
q)update d:sum(a;b;c) from flip`a`b`c!3 3#til 9
a b c d
--------
0 3 6 9
1 4 7 12
2 5 8 15
Sum has map reduce which will be better for a huge table.
One quick point regarding summing across rows. You should be careful about nulls in 1 column resulting in a null result for the sum. Borrowing #WooiKent Lee's example.
We put a null into the first position of the a column. Notice how our sum now becomes null
q)wn:.[flip`a`b`c!3 3#til 9;(0;`a);first 0#] //with null
q)update d:sum (a;b;c) from wn
a b c d
--------
3 6
1 4 7 12
2 5 8 15
This is a direct effect of the way nulls in q are treated. If you sum across a simple list, the nulls are ignored
q)sum 1 2 3 0N
6
However, a sum across a general list will not display this behavior
q)sum (),/:1 2 3 0N
,0N
So, for your table situation, you might want to fill in with a zero beforehand
q)update d:sum 0^(a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
Or alternatively, make it s.t. you are actually summing across simple lists rather than general lists.
q)update d:sum each flip (a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
For a more complete reference on null treatment please see the reference website
This is what worked:
select Answer:{[x;y;z;a] x+y+z+a }'[TypeA;TypeB;TypeC;TypeD] from
([] dt:2014.01.01 2014.01.02 2014.01.03; TypeA:4 5 6; TypeB:1 2 3; TypeC:8 9 10; TypeD:3 4 5)