Matlab: Count till sum equals 360 > insert event1, next 360 >insert event 2 etc - matlab

I have been trying to solve this problem for a while now and I would appreciate a push in the right direction.
I have a matrix called Turn. This matrix contains 1 column of data, somewhere between 10000 and 15000 rows (is variable). What I like to do is as follows:
start at row 1 and add values of row 2, row 3 etc till sum==360. When sum==360 insert in column 2 at that specific row 'event 1'.
Start counting at the next row (after 'event 1') till sum==360. When sum==360 insert in column 2 at that specific row 'event 2'. etc
So I basically want to group my data in partitions of sum==360
these will be called events.
The row number at which sum==360 is important to me as well (every row is a time point so it will tells me the duration of an event). I want to put those row numbers in a new matrix in which on row 1: rownr event 1 happened, row 2: rownr event 2 happened etc.

You can find the row indices where events occur using the following code. Basically you're going to use the modulo operator to find where the sum of the first column of Turn is a multiple of 360.
mod360 = mod(cumsum(Turn(:,1)),360);
eventInds = find(mod360 == 0);
You could then loop over eventInds to place whatever values you'd like in the appropriate rows in the second column of Turn.
I don't think you'll be able to place the string 'event 1' in the column though as a string array is acts like a vector and will result in a dimension mismatch. You could just store the numerical value 1 for the first event and 2 for the second event and so on.

Ryan's answer looks like the way to go. But if your condition is such that you need to find row numbers where the cumulative sum is not exactly 360, then you would be required to do a little more work. For that case, you may use this -
Try this vectorized (and no loops) code to get the row IDs where the 360 grouping occurs -
threshold = 360;
cumsum_val = cumsum(Turn);
ind1 = find(cumsum_val>=threshold,1)
num_events = floor(cumsum_val(end)/threshold);
[x1,y1] = find(bsxfun(#gt,cumsum_val,threshold.*(1:num_events)));
[~,b,~] = unique(y1,'first');
row_nums = x1(b)
After that you can get the event data, like this -
event1 = Turn(1:row_nums(1));
event2 = Turn(row_nums(1)+1:row_nums(2));
event3 = Turn(row_nums(2)+1:row_nums(3));
...
event21 = Turn(row_nums(20)+1:row_nums(21));
...
eventN = Turn(row_nums(N-1)+1:row_nums(N));
Edit 1
Sample case:
We create a small data of 20 random integer numbers instead of 15000 as used for the original problem. Also, we are using a threshold of 30 instead of 360 to account for the small datasize.
Code
Turn = randi(10,[20 1]);
threshold = 30;
cumsum_val = cumsum(Turn);
ind1 = find(cumsum_val>=threshold,1)
num_events = floor(cumsum_val(end)/threshold);
[x1,y1] = find(bsxfun(#gt,cumsum_val,threshold.*(1:num_events)));
[~,b,~] = unique(y1,'first');
row_nums = x1(b);
Run
Turn =
7
6
3
4
5
3
9
2
3
2
3
5
4
10
5
2
10
10
5
2
threshold =
30
row_nums =
7
14
18
The run results shows the row_nums as 7, 14, 18, which mean that the second grouping starts with the 7th index in Turn, third grouping starts at 14th index and so on. Of course, you can append 1 at the beginning of row_nums to indicate that the first grouping starts at the 1st index.

Given a column vector x, say,
x = randi(100,10,1)
the following would give you the index of the first row where the cumulative sum off all the items above that row adds up to 360:
i = max( find( cumsum(x) <= 360) )
Then, you would have to use that index to find the next set of cumulative sums that add up to 360, something like
offset = max( find( cumsum(x(i+1:end)) <= 360 ) )
i_new = i + offset
You might need to add +1/-1 to the offset and the index.
>> x = randi(100,10,1)'
x =
90 47 47 44 8 79 45 9 91 6
>> cumsum(x)
ans =
90 137 184 228 236 315 360 369 460 466
>> i = max(find(cumsum(x)<=360))
i =
7

Related

find indices of subsets in MATLAB

This question is motivated by very specific combinatorial optimization problem, where search space is defined as a space of permuted subsets of vector unsorted set of discrete values with multiplicities.
I am looking for effective (fast enough, vectorized or any other more clever solution) function which is able to find indices of subsets in the following manner:
t = [1 1 3 2 2 2 3 ]
is unsorted vector of all possible values, including its multiplicities.
item = [2 3 1; 2 1 2; 3 1 1; 1 3 3]
is a list of permuted subsets of vector t.
I need to find list of corresponding indices of subsets item which corresponds to the vector t. So, for above mentioned example we have:
item =
2 3 1
2 1 2
3 1 1
1 3 3
t =
1 1 3 2 2 2 3
ind = item2ind(item,t)
ind =
4 3 1
4 1 5
3 1 2
1 3 7
So, for item = [2 3 1] we get ind = [4 3 1], which means, that:
first value "2" at item corresponds to the first value "2" at t on position "4",
second value "3" at item corresponds to the first value "3" at t on position "3" and
third value "1" at item corresponds to the first value "1" at t on position "1".
In a case item =[ 2 1 2] we get ind = [4 1 5], which means, that:
first value "2" at item corresponds to the first value "2" at t on position "4",
second value "1" at item corresponds to the first value "1" at t on position "1", and
third value "2" at item corresponds to the second(!!!) value "1" at t on position "5".
For
item = [1 1 1]
does not exist any solution, because vector t contains only two "1".
My current version of function "item2ind" is very trivial serial code, which is possible simple parallelized by changing of "for" to "parfor" loop:
function ind = item2ind(item,t)
[nlp,N] = size(item);
ind = zeros(nlp,N);
for i = 1:nlp
auxitem = item(i,:);
auxt = t;
for j = 1:N
I = find(auxitem(j) == auxt,1,'first');
if ~isempty(I)
auxt(I) = 0;
ind(i,j) = I;
else
error('Incompatible content of item and t.');
end
end
end
end
But I need something definitely more clever ... and faster:)
Test case for larger input data:
t = 1:10; % 10 unique values at vector t
t = repmat(t,1,5); % unsorted vector t with multiplicity of all unique values 5
nlp = 100000; % number of item rows
[~,p] = sort(rand(nlp,length(t)),2); % 100000 random permutations
item = t(p); % transform permutations to items
item = item(:,1:30); % transform item to shorter subset
tic;ind = item2ind(item,t);toc % runing and timing of the original function
tic;ind_ = item2ind_new(item,t);toc % runing and timing of the new function
isequal(ind,ind_) % comparison of solutions
To achieve vectorizing the code, I have assumed that the error case won't be present. It should be discarded first, with a simple procedure I will present below.
Method First, let's compute the indexes of all elements in t:
t = t(:);
mct = max(accumarray(t,1));
G = accumarray(t,1:length(t),[],#(x) {sort(x)});
G = cellfun(#(x) padarray(x.',[0 mct-length(x)],0,'post'), G, 'UniformOutput', false);
G = vertcat(G{:});
Explanation: after putting input in column vector shape, we compute the max number of occurences of each possible value in t using accumarray. Now, we form array of all indexes of all numbers. It forms a cell array as there may be not the same number of occurences for each value. In order to form a matrix, we pad each array independently to the max length (naming mct). Then we can transform the cell array into a matrix. At this step, we have:
G =
1 11 21 31 41
2 12 22 32 42
3 13 23 33 43
4 14 24 34 44
5 15 25 35 45
6 16 26 36 46
7 17 27 37 47
8 18 28 38 48
9 19 29 39 49
10 20 30 40 50
Now, we process item. For that, let's figure out how to create the cumulative sum of occurences of values inside a vector. For example, if I have:
A = [1 1 3 2 2 2 3];
then I want to get:
B = [1 2 1 1 2 3 2];
Thanks to implicit expansion, we can have it in one line:
B = diag(cumsum(A==A'));
As easy as this. The syntax A==A' expands into a matrix where each element is A(i)==A(j). Making the cumulative sum in only one dimension and taking the diagonal gives us the good result: each column in the cumulative sum of occurences over one value.
To use this trick with item which 2-D, we should use a 3D array. Let's call m=size(item,1) and n=size(item,2). So:
C = cumsum(reshape(item,m,1,n)==item,3);
is a (big) 3D matrix of all cumulatives occurences. Last thing is to select the columns that are on the diagonal along dimension 2 and 3:
ia = C(sub2ind(size(C),repelem((1:m).',1,n),repelem(1:n,m,1),repelem(1:n,m,1)));
Now, with all these matrices, indexing is easy:
ind = G(sub2ind(size(G),item,ia));
Finally, let's recap the code of the function:
function ind = item2ind_new(item,t)
t = t(:);
[m,n] = size(item);
mct = max(accumarray(t,1));
G = accumarray(t,1:length(t),[],#(x) {sort(x)});
G = cellfun(#(x) padarray(x.',[0 mct-length(x)],0,'post'), G, 'UniformOutput', false);
G = vertcat(G{:});
C = cumsum(reshape(item,m,1,n)==item,3);
ia = C(sub2ind(size(C),repelem((1:m).',1,n),repelem(1:n,m,1),repelem(1:n,m,1)));
ind = G(sub2ind(size(G),item,ia));
Results Running the provided script on an old 4-core, I get:
Elapsed time is 4.317914 seconds.
Elapsed time is 0.556803 seconds.
ans =
logical
1
Speed up is substential (more than 8x), along with memory consumption (with matrix C). I guess some improvements can be done with this part to save more memory.
EDIT For generating ia, this procedure can cost a lost of memory. A way to save memory is to use a for-loop to generate directly this array:
ia = zeros(size(item));
for i=unique(t(:)).'
ia = ia+cumsum(item==i, 2).*(item==i);
end
In all cases, when you have ia, it's easy to test if there is an error in item compared to t:
any(ind(:)==0)
A simple solution to get items in error (as a mask) is then
min(ind,[],2)==0

count number values exceeds given threshold in moving window in matlab

I have time vs values plot. time =100. I want to select time 1 to 4 & then count how many values are exceeding 20. i.e. for time 1 to 4 values are 16 43 94 21 so 3 values are exceeding 20 so count should be 3. then want move window so time is 2 to 5 & count number of values exceeding 20. so last window would be 97 to 100. I tried following code but it showing 0 & 1
N=4;% length of window
d=length(t);% t has 100 values so took length
for e=0:d-N;
for x=1+e:N+e;
y(x)=sum(t(x)>20); % t contains values so took t(x)
end
end
how to do it.
You can use a logical index showing where t is greater than 20 then use movsum to count how many values in sliding window exceed 20;
N =4;
idx = t > 20;
result = movsum(idx,N)

Using IF loop and monotonic data in Matlab

I am trying to select specific data within a time vector to assign a specific start point.
Vit_lim = 5*(max(dcursor))/100
A = find(dcursor > Vit_lim)
A = [1 2 3 4 5 6 7 8 158 159 160.........318]
The start point is being dectected as first value.
The initial 8 values are a false positive (and do not represent real start point (158).
I need to add a condition that finds start point if first value increases monotonically for 20 consecutive values.
This is within a larger loop.
So,
A = [1 2 3 4 5 6 7 8 158 159 160.........318]
found=0;
idx=1;
monoticSum=0;
tempValue=A(1);
idx=2;
While found == 0
temp=A(idx);
if ((tempValue+1) == temp)
monoticSum = monoticSum+1;
tempValue = temp;
else
monoticSum = 0
end
if (monoticSum == 20)
found=1;
break
end
idx=idx+1
end
This should work.
Actually, this is a nice starting point. But you need to restart the variable monoticSum if you find any transition less than 20. I've updated.
I'm not sure what you mean by 20 consecutive values, given that your sample data has 8 false-start values. But here's an idea that finds a sample that's at least 20 away from the previous
b=find(diff(A)>20);
start_idx = A(b+1);

difference of each two elements of a column in the matrix

I have a matrix like this:
fd =
x y z
2 5 10
2 6 10
3 5 11
3 9 11
4 3 11
4 9 12
5 4 12
5 7 13
6 1 13
6 5 13
I have two parts of my problem:
1) I want to calculate the difference of each two elements in a column.
So I tried the following code:
for i= 1:10
n=10-i;
for j=1:n
sdiff1 = diff([fd(i,1); fd(i+j,1)],1,1);
sdiff2 = diff([fd(i,2); fd(i+j,2)],1,1);
sdiff3 = diff([fd(i,3); fd(i+j,3)],1,1);
end
end
I want all the differences such as:
x1-x2, x1-x3, x1-x4....x1-x10
x2-x3, x2-x4.....x2-x10
.
.
.
.
.
x9-x10
same for y and z value differences
Then all the values should stored in sdiff1, sdiff2 and sdiff3
2) what I want next is for same z values, I want to keep the original data points. For different z values, I want to merge those points which are close to each other. By close I mean,
if abs(sdiff3)== 0
keep the original data
for abs(sdiff3) > 1
if abs(sdiff1) < 2 & abs(sdiff2) < 2
then I need mean x, mean y and mean z of the points.
So I tried the whole programme as:
for i= 1:10
n=10-i;
for j=1:n
sdiff1 = diff([fd(i,1); fd(i+j,1)],1,1);
sdiff2 = diff([fd(i,2); fd(i+j,2)],1,1);
sdiff3 = diff([fd(i,3); fd(i+j,3)],1,1);
if (abs(sdiff3(:,1)))> 1
continue
mask1 = (abs(sdiff1(:,1)) < 2) & (abs(sdiff2(:,1)) < 2) & (abs(sdiff3:,1)) > 1);
subs1 = cumsum(~mask1);
xmean1 = accumarray(subs1,fd(:,1),[],#mean);
ymean1 = accumarray(subs1,fd(:,2),[],#mean);
zmean1 = accumarray(subs1,fd(:,3),[],#mean);
fd = [xmean1(subs1) ymean1(subs1) zmean1(subs1)];
end
end
end
My final output should be:
2.5 5 10.5
3.5 9 11.5
5 4 12
5 7 13
6 1 13
where, (1,2,3),(4,6),(5,7,10) points are merged to their mean position (according to the threshold difference <2) whereas 8 and 9th point has their original data.
I am stuck in finding the differences for each two elements of a column and storing them. My code is not giving me the desired output.
Can somebody please help?
Thanks in advance.
This can be greatly simplified using vectorised notation. You can do for instance
fd(:,1) - fd(:,2)
to get the difference between columns 1 and 2 (or equivalently diff(fd(:,[1 2]), 1, 2)). You can make this more elegant/harder to read and debug with pdist but if you only have three columns it's probably more trouble than it's worth.
I suspect your first problem is with the third argument to diff. If you use diff(X, 1, 1) it will do the first order diff in direction 1, which is to say between adjacent rows (downwards). diff(X, 1, 2) will do it between adjacent columns (rightwards), which is what you want. Matlab uses the opposite convention to spreadsheets in that it indexes rows first then columns.
Once you have your diffs you can then test the elements:
thesame = find(sdiff3 < 2); % for example
this will yield a vector of the row indices of sdiff3 where the value is less than 2. Then you can use
fd(thesame,:)
to select the elements of fd at those indexes. To remove matching rows you would do the opposite test
notthesame = find(sdiff > 2);
to find the ones to keep, then extract those into a new array
keepers = fd(notthesame,:);
These won't give you the exact solution but it'll get you on the right track. For the syntax of these commands and lots of examples you can run e.g. doc diff in the command window.

Generate pairs of points using a nested for loop

As an example, I have a matrix [1,2,3,4,5]'. This matrix contains one column and 5 rows, and I have to generate a pair of points like (1,2),(1,3)(1,4)(1,5),(2,3)(2,4)(2,5),(3,4)(3,5)(4,5).
I have to store these values in 2 columns in a matrix. I have the following code, but it isn't quite giving me the right answer.
for s = 1:5;
for tb = (s+1):5;
if tb>s
in = sub2ind(size(pairpoints),(tb-1),1);
pairpoints(in) = s;
in = sub2ind(size(pairpoints),(tb-1),2);
pairpoints(in) = tb;
end
end
end
With this code, I got (1,2),(2,3),(3,4),(4,5). What should I do, and what is the general formula for the number of pairs?
One way, though is limited depending upon how many different elements there are to choose from, is to use nchoosek as follows
pairpoints = nchoosek([1:5],2)
pairpoints =
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
See the limitations of this function in the provided link.
An alternative is to just iterate over each element and combine it with the remaining elements in the list (assumes that all are distinct)
pairpoints = [];
data = [1:5]';
len = length(data);
for k=1:len
pairpoints = [pairpoints ; [repmat(data(k),len-k,1) data(k+1:end)]];
end
This method just concatenates each element in data with the remaining elements in the list to get the desired pairs.
Try either of the above and see what happens!
Another suggestion I can add to the mix if you don't want to rely on nchoosek is to generate an upper triangular matrix full of ones, disregarding the diagonal, and use find to generate the rows and columns of where the matrix is equal to 1. You can then concatenate both of these into a single matrix. By generating an upper triangular matrix this way, the locations of the matrix where they're equal to 1 exactly correspond to the row and column pairs that you are seeking. As such:
%// Highest value in your data
N = 5;
[rows,cols] = find(triu(ones(N),1));
pairpoints = [rows,cols]
pairPoints =
1 2
1 3
2 3
1 4
2 4
3 4
1 5
2 5
3 5
4 5
Bear in mind that this will be unsorted (i.e. not in the order that you specified in your question). If order matters to you, then use the sortrows command in MATLAB so that we can get this into the proper order that you're expecting:
pairPoints = sortrows(pairPoints)
pairPoints =
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
Take note that I specified an additional parameter to triu which denotes how much of an offset you want away from the diagonal. The default offset is 0, which includes the diagonal when you extract the upper triangular matrix. I specified 1 as the second parameter because I want to move away from the diagonal towards the right by 1 unit so I don't want to include the diagonal as part of the upper triangular decomposition.
for loop approach
If you truly desire the for loop approach, going with your model, you'll need two for loops and you need to keep track of the previous row we are at so that we can just skip over to the next column until the end using this. You can also use #GeoffHayes approach in using just a single for loop to generate your indices, but when you're new to a language, one key advice I will always give is to code for readability and not for efficiency. Once you get it working, if you have some way of measuring performance, you can then try and make the code faster and more efficient. This kind of programming is also endorsed by Jon Skeet, the resident StackOverflow ninja, and I got that from this post here.
As such, you can try this:
pairPoints = []; %// Initialize
N = 5; %// Highest value in your data
for row = 1 : N
for col = row + 1 : N
pairPoints = [pairPoints; [row col]]; %// Add row-column pair to matrix
end
end
We get the equivalent output:
pairPoints =
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
Small caveat
This method will only work if your data is enumerated from 1 to N.
Edit - August 20th, 2014
You wish to generalize this to any array of values. You also want to stick with the for loop approach. You can still keep the original for loop code there. You would simply have to add a couple more lines to index your new array. As such, supposing your data array was:
dat = [12, 45, 56, 44, 62];
You would use the pairPoints matrix and use each column to subset the data array to access your values. Also, you need to make sure your data is a column vector, or this won't work. If we didn't, we would be creating a 1D array and concatenating rows and that's not obviously what we're looking for. In other words:
dat = [12, 45, 56, 44, 62];
dat = dat(:); %// Make column vector - Important!
N = numel(dat); %// Total number of elements in your data array
pairPoints = []; %// Initialize
%// Skip if the array is empty
if (N ~= 0)
for row = 1 : N
for col = row + 1 : N
pairPoints = [pairPoints; [row col]]; %// Add row-column pair to matrix
end
end
vals = [dat(pairPoints(:,1)) dat(pairPoints(:,2))];
else
vals = [];
Take note that I have made a provision where if the array is empty, don't even bother doing any calculations. Just output an empty matrix.
We thus get:
vals =
12 45
12 56
12 44
12 62
45 56
45 44
45 62
56 44
56 62
44 62