find all similar index of array in Matlab - matlab

I have a array X=[1 2 3 1.01 2.01 4 5 1.01 3.01] I want to all index of this array are similar and difference<=0.01 in matlab answer is
X1=[1 4 8], X2=[2 5],X3=[3 9],X4=[6],X5=[7]
many thanks

Here a solution using for-loop. Not sure about efficiency though. Gonna try to find another solution as well.
X=[1 2 3 1.01 2.01 4 5 1.01 3.01]
result=cell(length(X),1);
boarder = 0.01;
for n=1:length(X)
helper = X(n);
Y=X;
Y(X>helper+boarder)=0;
Y(X<helper-boarder)=0;
result(n,1)={find(Y)};
end
I predefine a cellarray (result) which contains all index (for each element of X). Then I loop over the elements setting those who are outside of your boarder to 0. Last but not least I save the index to the result array.
Obviously some results are the same but this way you get also results for the case: X=[ 1.01, 1.02, 1.03, 1.04,...];
And if you want to delete those elements which are the same you could loop over your data again and get unique results.

Y=cell(5,1)
for idx=1:numel(Y)
Y{idx}=find(abs(X-idx)<=.2);
end

I think the submission "unique with tolerance" in the FileExchange is for you.
You should note, that creating the variables X1...X5 as separate variables is a bad idea and a bad practice, because this makes referencing these values in later code (which is either vectorized or loop-based) cumbersome and inefficient. More correct alternatives to storing the data are cells (like in the solution suggested by Daniel) or in structs.
Having said that, if for some reason you still want to create uniquely named variables, this is possible using a mix of the aforementioned submission (uniquetol) and eval:
[~,b,c]=uniquetol(X,0.01+eps);
for ind1 = 1:length(b)
eval(sprintf('X%d = (find(c==X(b(%d))))'';',ind1,ind1));
end

Related

How do you get a list of numbers from 1 to n in Julia?

In MATLAB you could write 1:n. Is there something similar to this notation in Julia?
Yes, there is something very similar to Matlab's 1:n in Julia. It's 1:n.
There are some differences, though. Matlab's 1:n creates a "row vector," which is important since Matlab iterates over columns. If you look at Matlab's 1:n funny, it magically turns into a dense array that stores all its elements, but if you carefully avoid looking at it, I believe it can avoid allocating the space for it entirely — this is why Matlab's linter recommends (1:n) instead of [1:n].
In contrast, Julia's 1:n is a true column vector that always just uses two integers to define itself. The only time it'll actually store all its elements into memory is if you ask it to (e.g., with collect). In nearly all cases, though, you can use it like a real vector without storing its results; it'll very efficiently generate its elements on the fly. It may look a little strange since it just prints out as 1:n, but it really is an array. You can even do linear algebra with it:
julia> r = 1:4
1:4
julia> r[3]
3
julia> A = rand(0:2, 3, 4)
3×4 Array{Int64,2}:
0 1 2 2
1 1 1 2
1 0 2 1
julia> A * r
3-element Array{Int64,1}:
16
14
11
I think I was looking for
collect(1:n)

Double for loop in MATLAB, storing the information

I have two for loops in MATLAB.
One of the for loops leads to different variables being inserted into the model, which are 43 and then I have 5 horizons.
So I estimate the model 215 times.
My problem is I want to store this in 215x5 matrix, the reason I have x5 is that I am estimating 5 variables, 4 are fixed and the other one comes from the for loop.
I have tried to do this in two ways,
Firstly, I create a variable called out,
out=zeros(215,5);
The first for loop is,
for i=[1,2,3,4,5];
The second for loop is,
for ii=18:60;
The 18:60 is how I define my variables using XLS read, e.g. they are inserted into the model as (data:,ii).
I have tried to store the data in two ways, I want to store OLS which contains the five estimates
First,
out(i,:)=OLS;
This method creates a 5 x 5 matrix, with the estimates for one of the (18:60), at each of the horizons.
Second,
out(ii,:)=OLS;
This stores the variables for each of the variables (18:60), at just one horizon.
I want to have a matrix which stores all of the estimates OLS, at each of the horizons, for each of my (18:60).
Minimal example
clear;
for i=[1,2,3,4,5];
K=i;
for ii=18:60
x=[1,2,3,i,ii];
out(i,:)=x;
end
end
So the variable out will store 1 2 3 5 60
I want the variable out to store all of the combinations
i.e.
1 2 3 1 1
1 2 3 1 2
...
1 2 3 5 60
Thanks
The simplest solution is to use a 3D matrix:
for jj=[1,2,3,4,5];
K=jj;
for ii=18:60
x=[1,2,3,jj,ii];
out(ii-17,jj,:)=x;
end
end
If you now reshape the out matrix you get the same result as the first block in etmuse's answer:
out = reshape(out,[],size(out,3));
(Note I replaced i by jj. i and ii are too similar to use both, it leads to confusion. It is better to use different letters for loop indices. Also, i is OK to use, but it also is the built-in imaginary number sqrt(-1). So I prefer to use ii over i.)
As you've discovered, using just one of your loop variables to index the output results in most of the results being overwritten, leaving only the results from the final iteration of the relevant loop.
There are 2 ways to create an indexing variable.
1- You can use an independent variable, initialised before the loops and incremented at the end of the internal loop.
kk=1;
for i=1:5
for ii=18:60
%calculate OLC
out(kk,:)=OLC;
kk = kk+1;
end
end
2- Use a calculation of i and ii
kk = i + 5*(ii-18)
(Use in loop as before, without increment)

matlab percentage change between cells

I'm a newbie to Matlab and just stumped how to do a simple task that can be easily performed in excel. I'm simply trying to get the percent change between cells in a matrix. I would like to create a for loop for this task. The data is setup in the following format:
DAY1 DAY2 DAY3...DAY 100
SUBJECT RESULTS
I could only perform getting the percent change between two data points. How would I conduct it if across multiple days and multiple subjects? And please provide explanation
Thanks a bunch
FOR EXAMPLE, FOR DAY 1 SUBJECT1(RESULT=1), SUBJECT2(RESULT=4), SUBJECT3(RESULT=5), DAY 2 SUBJECT1(RESULT=2), SUBJECT2(RESULT=8), SUBJECT3(RESULT=10), DAY 3 SUBJECT1(RESULT=1), SUBJECT2(RESULT=4), SUBJECT3(RESULT=5).
I WANT THE PERCENT CHANGE SO OUTPUT WILL BE DAY 2 SUBJECT1(RESULT=100%), SUBJECT2(RESULT=100%), SUBJECT3(RESULT=100%). DAY3 SUBJECT1(RESULT=50%), SUBJECT2(RESULT=50%), SUBJECT3(RESULT=50%)
updated:
Hi thanks for responding guys. sorry for the confusion. zebediah49 is pretty close to what I'm looking for. My data is for example a 10 x 10 double. I merely wanted to get the percentage change from column to column. For example, if I want the percentage change from rows 1 through 10 on all columns (from columns 2:10). I would like the code to function for any matrix dimension (e.g., 1000 x 1000 double) zebediah49 could you explain the code you posted? thanks
updated2:
zebediah49,
(data(1:end,100)- data(1:end,99))./data(1:end,99)
output=[data(:,2:end)-data(:,1:end-1)]./data(:,1:end-1)*100;
Observing the code above, How would I go about modifying it so that column 100 is used as the index against all of the other columns(1-99)? If I change the code to the following:
(data(1:end,100)- data(1:end,:))./data(1:end,:)
matlab is unable because of exceeding matrix dimensions. How would I go about implementing that?
UPDATE 3
zebediah49,
Worked perfectly!!! Originally I created a new variable for the index and repmat the index to match the matrices which was not a good idea. It took forever to replicate when dealing with large numbers.
Thanks for you contribution once again.
Thanks Chris for your contribution too!!! I was looking more on how to address and manipulate arrays within a matrix.
It's matlab; you don't actually want a loop.
output=input(2:end,:)./input(1:end-1,:)*100;
will probably do roughly what you want. Since you didn't give anything about your matlab structure, you may have to change index order, etc. in order to make it work.
If it's not obvious, that line defines output as a matrix consisting of the input matrix, divided by the input matrix shifted right by one element. The ./ operator is important, because it means that you will divide each element by its corresponding one, as opposed to doing matrix division.
EDIT: further explanation was requested:
I assumed you wanted % change of the form 1->1->2->3->1 to be 100%, 200%, 150%, 33%.
The other form can be obtained by subtracting 100%.
input(2:end,:) will grab a sub-matrix, where the first row is cut off. (I put the time along the first dimension... if you want it the other way it would be input(:,2:end).
Matlab is 1-indexed, and lets you use the special value end to refer to the las element.
Thus, end-1 is the second-last.
The point here is that element (i) of this matrix is element (i+1) of the original.
input(1:end-1,:), like the above, will also grab a sub-matrix, except that that it's missing the last column.
I then divide element (i) by element (i+1). Because of how I picked out the sub-matrices, they now line up.
As a semi-graphical demonstration, using my above numbers:
input: [1 1 2 3 1]
input(2,end): [1 2 3 1]
input(1,end-1): [1 1 2 3]
When I do the division, it's first/first, second/second, etc.
input(2:end,:)./input(1:end-1,:):
[1 2 3 1 ]
./ [1 1 2 3 ]
---------------------
== [1.0 2.0 1.5 0.3]
The extra index set to (:) means that it will do that procedure across all of the other dimension.
EDIT2: Revised question: How do I exclude a row, and keep it as an index.
You say you tried something to the effect of (data(1:end,100)- data(1:end,:))./data(1:end,:). Matlab will not like this, because the element-by-element operators need them to be the same size. If you wanted it to only work on the 100th column, setting the second index to be 100 instead of : would do that.
I would, instead, suggest setting the first to be the index, and the rest to be data.
Thus, the data is processed by cutting off the first:
output=[data(2:end,2:end)-data(2:end,1:end-1)]./data(2:end,1:end-1)*100;
OR, (if you neglect the start, matlab assumes 1; neglect the end and it assumes end, making (:) shorthand for (1:end).
output=[data(2:,2:end)-data(2:,1:end-1)]./data(2:,1:end-1)*100;
However, you will probably still want the indices back, in which case you will need to append that subarray back:
output=[data(1,1:end-1) data(2:,2:end)-data(2:,1:end-1)]./data(2:,1:end-1)*100];
This is probably not how you should be doing it though-- keep data in one matrix, and time or whatever else in a separate array. That makes it much easier to do stuff like this to data, without having to worry about excluding time. It's especially nice when graphing.
Oh, and one more thing:
(data(:,2:end)-data(:,1:end-1))./data(:,1:end-1)*100;
is identically equivalent to
data(:,2:end)./data(:,1:end-1)*100-100;
Assuming zebediah49 guessed right in the comment above and you want
1 4 5
2 8 10
1 4 5
to turn into
1 1 1
-.5 -.5 -.5
then try this:
data = [1,4,5; 2,8,10; 1,4,5];
changes_absolute = diff(data);
changes_absolute./data(1:end-1,:)
ans =
1.0000 1.0000 1.0000
-0.5000 -0.5000 -0.5000
You don't need the intermediate variable, you can directly write diff(data)./data(1:end,:). I just thought the above might be easier to read. Getting from that result to percentage numbers is left as an exercise to the reader. :-)
Oh, and if you really want 50%, not -50%, just use abs around the final line.

Why does crossvalind fail?

I am using cross valind function on a very small data... However I observe that it gives me incorrect results for the same. Is this supposed to happen ?
I have Matlab R2012a and here is my output
crossvalind('KFold',1:1:11,5)
ans =
2
5
1
3
2
1
5
3
5
1
5
Notice the absence of set 4.. Is this a bug ? I expected atleast 2 elements per set but it gives me 0 in one... and it happens a lot that is the values are not uniformly distributed in the sets.
The help for crossvalind says that the form you are using is: crossvalind(METHOD, GROUP, ...). In this case, GROUP is the e.g. the class labels of your data. So 1:11 as the second argument is confusing here, because it suggests no two examples have the same label. I think this is sufficiently unusual that you shouldn't be surprised if the function does something strange.
I tried doing:
numel(unique(crossvalind('KFold', rand(11, 1) > 0.5, 5)))
and it reliably gave 5 as a result, which is what I would expect; my example would correspond to a two-class problem (I would guess that, as a general rule, you'd want something like numel(unique(group)) <= numel(group) / folds) - my hypothesis would be that it tries to have one example of each class in the Kth fold, and at least 2 examples in every other, with a difference between fold sizes of no more than 1 - but I haven't looked in the code to verify this.
It is possible that you mean to do:
crossvalind('KFold', 11, 5);
which would compute 5 folds for 11 data points - this doesn't attempt to do anything clever with labels, so you would be sure that there will be K folds.
However, in your problem, if you really have very few data points, then it is probably better to do leave-one-out cross validation, which you could do with:
crossvalind('LeaveMOut', 11, 1);
although a better method would be:
for leave_out=1:11
fold_number = (1:11) ~= leave_out;
<code here; where fold_number is 0, this is the leave-one-out example. fold_number = 1 means that the example is in the main fold.>
end

Performace Issue on MATLAB's vector addressing

I'm wondering, what is faster for addressing a single Element of a vector:
1) direct access via
result = a(index)
or
2) access an element via a matrix multiplication e.g
a = [1 2 3 4]';
b = [0 0 1 0];
result = b*a; % Would return 3
In my oppinion (which comes from "classic" programming like C++) the first method must be more performant, because of the direct access...the second method would need a iteration through both vectors(?).
The reason why I'm asking is, that matlab is very performant on matrix and vector operations, maybe I am missing any aspect and the second method is more effective...
A quick test:
function [] = fun1()
a = [1 2 3 4]';
b = [0 0 1 0];
tic;
for i=1:1000000
r = a(3);
end
toc;
end
Elapsed time: 0.006 seconds
Change a(3) to b*a
Elapsed time: 0.9 seconds
The performance difference is quite obvious(, and you should have done that yourself before asking this question).
Reason behind that:
No matter how efficient MATLAB's calculation is, MATLAB still needs to fetch the number 1 by 1, and do multiplication 1 by 1, and sum up. There is no hope to be faster than a single access.
In your special case, there are all 0's except 1, but it is useless to do optimization for single special case in my opinion, and the best optimization I can come up with still needs to access all the elements for at least once each.
EDIT:
It seems I am in quite good mood today....
Change a(3) to a(1)*b(1)+a(2)*b(2)+a(3)*b(3)+a(4)*b(4)
Elapsed time: 0.02 seconds
It seems that boundary checking (and/or other errands) take more time than the access and calculation.
Why would you think that multiplying a lot of numbers by zeros would be at all efficient? Even if MATLAB could be smart enough to do a test first before the multiply, it must then still do many tests.
I'm asking this question to make a point, that the dot product cannot possibly be at all efficient. Even if MATLAB were smart enough to know that there was only one element that was non-zero, to know that, it would need to do a search for the non-zero element. And how would MATLAB be smart enough to know that what you have written as a vector*vector dot product is actually intended just to access a single element, instead of a true dot product for nefarious purposes unknown to it?
How about
3) access an element by a boolean index matrix:
a = [1 2 3 4]';
b = [0 0 1 0];
result = a(b)
It's almost certainly going to be faster than (2), slower than (1).