Matlab - find a missing time from a matrix and insert the missing time with a value - matlab

I have a series of times and returns in various matrices lets call them a b c. They are all x by 2 with column 1 being times in seconds and column 2 returns. While all the returns are over a set series of time intervals like 15s 30s 45s etc the problem is not all the matrices have all the time buckets so while a might be a 30 by 2 , b might only be a 28 by 2. Because it is missing say time 45 seconds and a return. I want to go through each matrix and where I am missing a time bucket I want to insert the bucket with a zero return - I am happy to create a control 30 by 1 matrix with all the times that need to be cross referenced

You can use ismember to locate these missing positions, so if a is the control vector and b is the missing data vector ind=find(ismember(a,b)==0); will give you the indices of a that are missing in b.
For example:
a=1:10;
b=[1:2 4:5 7:10];
ind=find(ismember(a,b)==0);
ind =
3 6
In order to to add zeros in the right places for b just
for n=1:numel(ind)
b=[b(1:ind(n)-1) , 0 , b(ind(n):end)];
end
b =
1 2 0 4 5 0 7 8 9 10

Related

Matlab - filter matrix by having specific values

I have a matrix A and a vector b. I don't know their sizes, the size varies because it is the output of another function. What I want to do is filter A by a column (let's say jth column) which has at least one value that is in b.
How do I do this without measuring the size of b and concatenating every filtered result. Right now, the code is like this (assume j is a given value)
bsize=size(b,1);
for i=1:bsize
if i==1
a=A(A(:,j)==b(i),:);
else
a=[a; A(A(:,j)==b(i),:)];
end
end
I want to code a faster solution.
I am adding a numerical example just to make it clear. Let's say
A=[2 4
7 14
11 13
15 14]
and b=[4 14]
What I'm trying to do is filter to obtain the A matrix whose values are 4 and 14 in the second column, the elements of b to obtain the following output.
A=[2 4
7 14
15 14]
In my data A has more than 12000 rows and b has more than 100 elements. It doesn't always have to be the second column, sometimes the column index changes but that's not the problem now.
Use the ismember function to create a logical index based on column j=2 of A and vector b, and use that index into the rows of A:
output = A(ismember(A(:,j), b), :);

Finding the rows of a matrix with specified elements

I want to find the rows of a matrix which contain specified element of another matrix.
For example, a=[1 2 3 4 5 6 7] and b=[1 2 0 4;0 9 10 11;3 1 2 12]. Now, I want to find the rows of b which contain at least three element of a. For this purpose, I used bsxfun command as following:
c=find(sum(any(bsxfun(#eq, b, reshape(a,1,1,[])), 2), 3)>=3);
It works good for low dimension matrices but when I want to use this for high dimension matrices, for example, when the number of rows of b is 192799, MATLAB gives following error:
Requested 192799x4x48854 (35.1GB) array exceeds maximum array size preference.
Creation of arrays greater than this limit may take a long time and cause MATLAB
to become unresponsive. See array size limit or preference panel for more information.
Is there any other command which does this task without producing the behaviour like above for high dimension matrices?
a possible solution:
a=[1 2 3 4 5 6 7]
b=[1 2 0 4;0 9 10 11;3 1 2 12]
i=ismember(b,a)
idx = sum(i,2)
idx = find(idx>=3)

Counting the frequency of varying day-events

Let's say I have a few matrices where the first column is a serial date, and the second column is the information for that specific date. These matrices are organized in a way that all of the dates are at a minimum number of consecutive days. In the example below (A), this number is 3. This being said, A has runs of 3 consecutive days, 4 consecutive days, 5 consecutive days, and so forth, making the consecutive day count 3+. My total set of matrices range from 2+ to 5+.
A=
694094 91
694095 92
694096 94
694097 86
694157 95
694158 99
694159 99
694160 97
694183 100
694184 99
694185 96
694505 94
694506 92
694507 89
...
I want to find a way to count the varying amount of consecutive days per year. That is, count the amount of 3-day events, 4-day events, and onward. So, using the above example, the output would look like:
B=
1900 3
1901 1
....
Which states that there are three 3+ consecutive day events in 1900, and according to the example, only one 3+ consecutive day event in 1901. The years come from the serial date numbers, and the documentation on that can be found here. My data ranges from 1900 to 2013.
So far I've tried to use the diff function to try and split the day events by the amount of 1s in a string, find those indices, and then use histc to count the events per year but I'm realizing that this is a failed approach. I'm sure accumarray could help in this situation -- but I'm still foggy on the function after going through examples on mathworks and SO.
Code
N = 3; %// 3 for 3+ events. Change it to 2 or 5 for 2+ and 5+ events respectively
%// Year IDs
year_ID = str2num(datestr(A(:,1),'yyyy'))
%// Binary array, where ones represent consecutive dates starting with zero
%// as the start of a pack of consecutive dates
diffA1 = [0 ; diff(A(:,1))==1]' %//'
%// Row numbers of A that signal the start of N+ events.
%// STRFIND here works like a "sliding-matcher" if I may call it that way.
%// It works with a matching window that slides across diffA1 to find N+ events
%// using a proper filter. Here a filter [0 1 1] is used for 3+ events
row_ID = strfind(diffA1,[0 ones(1,N-1)])
%// N+ events for each year
Nplus_event = year_ID(row_ID)
%// Desired output as a count of such N+ events against each year
B = [unique(Nplus_event) histc(Nplus_event,unique(Nplus_event))]

MATLAB: Subtracting matrix subsets by specific rows

Here is an example of a subset of the matrix I would like to use:
1 3 5
2 3 6
1 1 1
3 5 4
5 5 5
8 8 0
This matrix is in fact 3000 x 3.
For the first 3 rows, I wish to subtract each of these rows with the first row of these three.
For the second 3 rows, I wish to subtract each of these rows with the first of these three, and so on.
As such, the output matrix will look like:
0 0 0
1 0 1
0 -2 -4
0 0 0
2 0 1
5 3 -4
What code in MATLAB will do this for me?
You could also do this completely vectorized by using mat2cell, cellfun, then cell2mat. Assuming our matrix is stored in A, try:
numBlocks = size(A,1) / 3;
B = mat2cell(A, 3*ones(1,numBlocks), 3);
C = cellfun(#(x) x - x([1 1 1], :), B, 'UniformOutput', false);
D = cell2mat(C); %//Output
The first line figures out how many 3 x 3 blocks we need. This is assuming that the number of rows is a multiple of 3. The second line uses mat2cell to decompose each 3 x 3 block and places them into individual cells. The third line then uses cellfun so that for each cell in our cell array (which is a 3 x 3 matrix), it takes each row of the 3 x 3 matrix and subtracts itself with the first row. This is very much like what #David did, except I didn't use repmat to minimize overhead. The fourth line then takes each of these matrices and stacks them back so that we get our final matrix in the end.
Example (this is using the matrix that was defined in your post):
A = [1 3 5; 2 3 6; 1 1 1; 3 5 4; 5 5 5; 8 8 0];
numBlocks = size(A,1) / 3;
B = mat2cell(A, 3*ones(1, numBlocks), 3);
C = cellfun(#(x) x - x([1 1 1], :), B, 'UniformOutput', false);
D = cell2mat(C);
Output:
D =
0 0 0
1 0 1
0 -2 -4
0 0 0
2 0 1
5 3 -4
In hindsight, I think #David is right with respect to performance gains. Unless this code is repeated many times, I think the for loop will be more efficient. Either way, I wanted to provide another alternative. Cool exercise!
Edit: Timing and Size Tests
Because of our discussion earlier, I have decided to do timing and size tests. These tests were performed on an Intel i7-4770 # 3.40 GHz CPU with 16 GB of RAM, using MATLAB R2014a on Windows 7 Ultimate. Basically, I did the following:
Test #1 - Set the random seed generator to 1 for reproducibility. I wrote a loop that cycled 10000 times. For each iteration in the loop, I generated a random integer 3000 x 3 matrix, then performed each of the methods that were described here. I took note of how long it took for each method to complete after 10000 cycles. The timing results are:
David's method: 0.092129 seconds
rayryeng's method: 1.9828 seconds
natan's method: 0.20097 seconds
natan's bsxfun method: 0.10972 seconds
Divakar's bsxfun method: 0.0689 seconds
As such, Divakar's method is the fastest, followed by David's for loop method, followed closely by natan's bsxfun method, followed by natan's original kron method, followed by the sloth (a.k.a mine).
Test #2 - I decided to see how fast this would get as you increase the size of the matrix. The set up was as follows. I did 1000 iterations, and at each iteration, I increase the size of the matrix rows by 3000 each time. As such, iteration 1 consisted of a 3000 x 3 matrix, the next iteration consisted of a 6000 x 3 matrix and so on. The random seed was set to 1 again. At each iteration, the time taken to complete the code was taken a note of. To ensure fairness, the variables were cleared at each iteration before the processing code began. As such, here is a stem plot that shows you the timing for each size of matrix. I subsetted the plot so that it displays timings from 200000 x 3 to 300000 x 3. Take note that the horizontal axis records the number of rows at each iteration. The first stem is for 3000 rows, the next is for 6000 rows and so on. The columns remain the same at 3 (of course).
I can't explain the random spikes throughout the graph.... probably attributed to something happening in RAM. However, I'm very sure I'm clearing the variables at each iteration to ensure no bias. In any case, Divakar and David are closely tied. Next comes natan's bsxfun method, then natan's kron method, followed last by mine. Interesting to see how Divakar's bsxfun method and David's for method are side-by-side in timing.
Test #3 - I repeated what I did for Test #2, but using natan's suggestion, I decided to go on a logarithmic scale. I did 6 iterations, starting at a 3000 x 3 matrix, and increasing the rows by 10 fold after. As such, the second iteration had 30000 x 3, the third iteration had 300000 x 3 and so on, up until the last iteration, which is 3e8 x 3.
I have plotted on a semi-logarithmic scale on the horizontal axis, while the vertical axis is still a linear scale. Again, the horizontal axis describes the number of rows in the matrix.
I changed the vertical limits so we can see most of the methods. My method is so poor performing that it would squash the other timings towards the lower end of the graph. As such, I changed the viewing limits to take my method out of the picture. Essentially what was seen in Test #2 is verified here.
Here's another way to implement this with bsxfun, slightly different from natan's bsxfun implementation -
t1 = reshape(a,3,[]); %// a is the input matrix
out = reshape(bsxfun(#minus,t1,t1(1,:)),[],3); %// Desired output
a slightly shorter and vectorized way will be (if a is your matrix) :
b=a-kron(a(1:3:end,:),ones(3,1));
let's test:
a=[1 3 5
2 3 6
1 1 1
3 5 4
5 5 5
8 8 0]
a-kron(a(1:3:end,:),ones(3,1))
ans =
0 0 0
1 0 1
0 -2 -4
0 0 0
2 0 1
5 3 -4
Edit
Here's a bsxfun solution (less elegant, but hopefully faster):
a-reshape(bsxfun(#times,ones(1,3),permute(a(1:3:end,:),[2 3 1])),3,[])'
ans =
0 0 0
1 0 1
0 -2 -4
0 0 0
2 0 1
5 3 -4
Edit 2
Ok, this got me curios, as I know bsxfun starts to be less efficient for bigger array sizes. So I tried to check using timeit my two solutions (because they are one liners it's easy). And here it is:
range=3*round(logspace(1,6,200));
for n=1:numel(range)
a=rand(range(n),3);
f=#()a-kron(a(1:3:end,:),ones(3,1));
g=#() a-reshape(bsxfun(#times,ones(1,3),permute(a(1:3:end,:),[2 3 1])),3,[])';
t1(n)=timeit(f);
t2(n)=timeit(g);
end
semilogx(range,t1./t2);
So I didn't test for the for loop and Divkar's bsxfun, but you can see that for arrays smaller than 3e4 kron is better than bsxfun, and this changes at larger arrays (ratio of <1 means kron took less time given the size of the array). This was done at Matlab 2012a win7 (i5 machine)
Simple for loop. This does each 3x3 block separately.
A=randi(5,9,3)
B=A(1:3:end,:)
for i=1:length(A(:,1))/3
D(3*i-2:3*i,:)=A(3*i-2:3*i,:)-repmat(B(i,:),3,1)
end
D
Whilst it may be possible to vectorise this, I don't think the performance gains would be worth it, unless you will do this many times. For a 3000x3 matrix it doesn't take long at all.
Edit: In fact this seems to be pretty fast. I think that's because Matlab's JIT compilation can speed up simple for loops well.
You can do it using just indexing:
a(:) = a(:) - a(3*floor((0:numel(a)-1)/3)+1).';
Of course, the 3 above can be replaced by any other number. It works even if that number doesn't divide the number of rows.

Matlab general matrix indexing for accesing several rows

Edit for clarity:
I have two matrices, p.valor 2x1000 and p.clase 1x1000. p.valor consists of random numbers spanning from -6 to 6. p.clase contains, in order, 200 1:s, 200 2:s and 600 3:s. What I wan´t to do is
Print p.valor using a diferent color/prompt for each clase determined in p.clase, as in following figure.
I first wrote this, in order to find out which locations in p.valor represented where the 1,2 respective 3 where in p.clase
%identify the locations of all 1,2 respective 3 in p.clase
f1=find(p.clase==1);
f2=find(p.clase==2);
f3=find(p.clase==3);
%define vectors in p.valor representing the locations of 1,2,3 in p.clase
x1=p.valor(f1);
x2=p.valor(f2);
x3=p.valor(f3);
There is 200 ones (1) in p.valor, thus, is x1=(1:200). The problem is that each number one(1) (and, respectively 2 and 3) represents TWO elements in p.valor, since p.valor has 2 rows. So even though p.clase and thus x1 now only have one row, I need to include the elements in the same colums as all locations in f1.
So the different alternatives I have tried have not yet been succesfull. Examples:
plot(x1(:,1), x1(:,2),'ro')
hold on
plot(x2(:,1),x2(:,2),'k.')
hold on
plot(x3(:,1),x3(:,2),'b+')
and
y1=p.valor(201:400);
y2=p.valor(601:800);
y3=p.valor(1401:2000);
scatter(x1,y1,'k+')
hold on
scatter(x2,y1,'b.')
hold on
scatter(x3,y1,'ro')
and
y1=p.valor(201:400);
y2=p.valor(601:800);
y3=p.valor(1401:2000);
plot(x1,y1,'k+')
hold on
plot(x2,y2,'b.')
hold on
plot(x3,y3,'ro')
My figures have the axisies right, but the plotted values does not match the correct figure provided (see top of the question).
Ergo, my question is: how do I include tha values on the second row in p.valor in my plotted figure?
I hope this is clearer!
Values from both rows simultaneously can be accessed using this syntax:
X=p.value(:,findX)
In this case, resulting X matrix will be a matrix having 2 rows and length(findX) columns.
M = magic(5)
M =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9
M2 = M(1:2, :)
M2 =
17 24 1 8 15
23 5 7 14 16
Matlab uses column major indexing. So to get to the next row, you actually just have to add 1. Adding 2 to an index on M2 here gets you to the next column, or adding 5 to an index on M
e.g. M2(3) is 24. To get to the next row you just add one i.e. M2(4) returns 5.To get to the next column add the number of rows so M2(2 + 2) gets you 1. If you add the number of columns like you suggested you just get gibberish.
So your method is very wrong. Freude's method is 100% correct, it's much easier to use subscript indexing than linear indexing for this. But I just wanted to explain why what you were trying doesn't work in Matlab. (aside from the fact that X=p.value(findX findX+1000) gives you a syntax error, I assume you meant X=p.value([findX findX+1000]))