Comparing columns of matrix and giving boolean output - matlab

I have checked other questions. I didn't find my answer. I have a matrix of n * 2 size. I want to compare the 1st and 2nd column and based on which is greater I want to assign 0/1 to the respective index. Suppose I want an output as
a = 1 2
4 3
7 8
I want the output like this
out = 0 1
1 0
0 1
I did this :
o1 = a(:,1) > a (:,2)
o2 = not(o1)
out = [o1, o2]
This does the job but I am sure there's a better way to do this. Need suggestions on that/.
Forgot to mention, the datatype is float in the matrix.

A more generic solution that can handle matrices with more than two columns:
out = bsxfun(#eq, a, max(a,[],2));

What you did is good. The number of lines doesn't really matter, what matters is the complexity of the operation in each line. Following the comments, I think you could gain some time as well by avoiding copy and multiple allocations:
out = false(size(a)); out(:,1) = (a(:,1) > a(:,2)); out(:,2) = ~out(:,1);
It is good practice to preallocate in Matlab, and in general to avoid copies in any programming language.
Optimizing further the runtime of this by using different operations is pointless IMO. If you really need speed you could Mex it to spare one iteration through the rows (second assignment), it's literally a dozen C lines, although you'd have to be careful about how you write the loop (the naive way would cause cache-miss at each iteration).

Related

Compare different sized vectors with different values

I'm still fairly new to working with Matlab and programming. I have a dataset with n trials, of these (in this case) m are relevant. So I have an m-by-1 vector with the indices of the relevant trials (rel). I have another vector (Correct which is n-by-1) that consists of 0 and 1. n is always bigger than m. I need to know which trials (of the m-by-1 relevant trials) have a 1 in the n-by-1 vector. I have tried for-loops but I always get an error 'Index exceeds matrix dimensions.'
Here is my code:
for i=1:length(rel);
CC=rel(find(Correct==1));
end;
I think it should be fairly simple but I don't know yet how to explain to Matlab what I want...
Thank you all for your answers. I realized that my question was not as clear as I thought (also a learning process I guess..) so your suggestions weren't exactly what I need. I'm sorry for being unclear.
Correct is not a logical, it does contain 0 and 1 but these refer to correct or incorrect answer (I'm actually not sure if this matters but I thought I let you know)
rel is a subset of the original data with all trials (all trials=n trials), Correct is the same length as the original vector with all trials (n trials). So rel contains the indices of the (for me) relevant trials of the original data and is that way connected to Correct.
I hope this makes my question a bit more clear, if not, let me know!
Thank you!
It's not quite clear from your question what you are trying to do but I think I have an idea.
You have a vector n similar to
>> n = round(rand(1, 10))
n =
0 1 1 0 0 0 1 0 0 1
and m is indices of this vector similar to
>> m = [1 3 7 9];
Now we use m to index n as n(m) which will return the values of n corresponding to the elements in m. Next we need to check these for equality with 1 as n(m) == 1 and finally we need to figure out what values of m have n equal to 1 again by indexing. So putting this altogether we get
>> m(n(m) == 1)
ans =
3 7
To find the indices of m that are being returned you can use
>> find(n(m) == 1)
ans =
2 4
I'm assuming that Correct is of type logical (i.e. it contains trues and falses instead of 0s and 1s).
You actually don't need a loop here (this is clear in your case, since you are looping over i and never actually use i in your loop):
m = numel(rel)
CC = rel(Correct(1:m))
The reason you are getting that error is because Correct has more elements than rel so you are attempting to address elements beyond the end of rel. I solve this above by only considering the first m elements of Correct.

Remove highly correlated components

I have got a problem to remove highly correlated components. Can I ask how to do this?
For example, I have got 40 instances with 20 features (random created). Feature 2 and 18 is highly correlated with feature 4. And feature 6 is highly correlated with feature 10. Then how to remove the highly correlated (redundant) features such as 2, 18 and 10? Essentially, I need the index of remaining features 1, 3, 4, 5, 6, ..., 9, 11, ..., 17, 19, 20.
Matlab codes:
x = randn(40,20);
x(:,2) = 2.*x(:,4);
x(:,18) = 3.*x(:,4);
x(:,6) = 100.*x(:,10);
x_corr = corr(x);
size(x_corr)
figure, imagesc(x_corr),colorbar
Correlation matrix x_corr looks like
edit:
I worked out a way:
x_corr = x_corr - diag(diag(x_corr));
[x_corrX, x_corrY] = find(x_corr>0.8);
for i = 1:size(x_corrX,1)
xx = find(x_corrY == x_corrX(i));
x_corrX(xx,:) = 0;
x_corrY(xx,:) = 0;
end
x_corrX = unique(x_corrX);
x_corrX = x_corrX(2:end);
im = setxor(x_corrX, (1:20)');
Am I right? Or you have a better idea please post. Thanks.
edit2: Is this method the same as using PCA?
It seems quite clear that this idea of yours, to simply remove highly correlated variables from the analysis is NOT the same as PCA. PCA is a good way to do rank reduction of what seems to be a complicated problem, into one that turns out to have only a few independent things happening. PCA uses an eigenvalue (or svd) decomposition to achieve that goal.
Anyway, you might have a problem. For example, suppose that A is highly correlated to B, and B is highly correlated to C. However, it need not be true that A and C are highly correlated. Since correlation can be viewed as a measure of the angle between those vectors in their corresponding high dimensional vector space, this can be easily made to happen.
As a trivial example, I'll create two variables, A and B, that are correlated at a "moderate" level.
n = 50;
A = rand(n,1);
B = A + randn(n,1)/2;
corr([A,B])
ans =
1 0.55443
0.55443 1
So here 0.55 is the correlation. I'll create C to be virtually the average of A and B. It will be highly correlated by your definition.
C = [A + B]/2 + randn(n,1)/100;
corr([A,B,C])
ans =
1 0.55443 0.80119
0.55443 1 0.94168
0.80119 0.94168 1
Clearly C is the bad guy here. But if one were to simply look at the pair [A,C] and remove A from the analysis, then do the same with the pair [B,C] and then remove B, we would have made the wrong choices. And this was a trivially constructed example.
In fact, it is true that the eigenvalues of the correlation matrix might be of interest.
[V,D] = eig(corr([A,B,C]))
V =
-0.53056 -0.78854 -0.311
-0.57245 0.60391 -0.55462
-0.62515 0.11622 0.7718
D =
2.5422 0 0
0 0.45729 0
0 0 0.00046204
The fact that D has two significant diagonal elements, and a tiny one tells us that really, this is a two variable problem. What PCA will not easily tell us is which vector to simply remove though, and the problem would only be less clear with more variables, with many interactions between all of them.
I think the answer of woodchips is quite good. But when you're using eigenvalues, you can run into some trouble. If the dataset is large enough, there will always be some small eigenvalues, but you won't be sure what they tell you.
Instead, consider grouping your data by a simple clustering method. It's easy to implement in Matlab.
http://www.mathworks.de/de/help/stats/cluster-analysis-1-1.html
edit:
If you disregard the points that woodchips made, you're solution is okay, as an algorithm.

Mathematica Table function to Matlab

I need to convert this to Matlab code, and am struggling without the "table" function.
Table[{i,1000,ability,savingsrate,0,RandomInteger[{15,30}],1,0},{i,nrhhs}];
So basically, these values are all just numbers, and I think I need to use a function handle, or maybe a for loop. I'm no expert, so I really need some help?
I'm not an expert in Mathematics (just used it long time ago). According to this documentation for Table function, you are using this form:
Table[expr, {i, imax}]
generates a list of the values of expr when i runs from 1 to imax.
It looks like your statement will produce list duplicating the list in first argument increasing i from 1 to nrhhs and using different random number.
In MATLAB the output can be equivalent to a matrix or a cell array.
To create a matrix with rows as your lists you can do:
result = [ (1:nrhhs)', repmat([1000,ability,savingsrate,0],nrhhs,1), ...
randi([15 30],nrhhs,1), repmat([1,0],nrhhs,1) ];
You can convert the above matrix to a cell array:
resultcell = cell2mat(result, ones(nrhhs,1));
The "Table" example you gave creates a list of nrhhs sub-lists, each of which contains 8 numbers (i, 1000, ability, savingsrate, 0, a random integer between 15 and 30 inclusive, 1, and 0). This is essentially (though not exactly) the same as an nrhhs x 8 matrix.
Assuming you do just want a matrix out, though, an analogous for loop in Matlab would be:
result = zeros(nrhhs,8); % preallocate memory for the result
for i = 1:nrhhs
result(i,:) = [i 1000 ability savingsrate 0 randi([15 30]) 1 0];
end
This method is likely slower than yuk's answer (which makes much more efficient use of vectors to avoid the for loop), but might be a little easier to pick apart depending on how familiar you are with Matlab.

Performace Issue on MATLAB's vector addressing

I'm wondering, what is faster for addressing a single Element of a vector:
1) direct access via
result = a(index)
or
2) access an element via a matrix multiplication e.g
a = [1 2 3 4]';
b = [0 0 1 0];
result = b*a; % Would return 3
In my oppinion (which comes from "classic" programming like C++) the first method must be more performant, because of the direct access...the second method would need a iteration through both vectors(?).
The reason why I'm asking is, that matlab is very performant on matrix and vector operations, maybe I am missing any aspect and the second method is more effective...
A quick test:
function [] = fun1()
a = [1 2 3 4]';
b = [0 0 1 0];
tic;
for i=1:1000000
r = a(3);
end
toc;
end
Elapsed time: 0.006 seconds
Change a(3) to b*a
Elapsed time: 0.9 seconds
The performance difference is quite obvious(, and you should have done that yourself before asking this question).
Reason behind that:
No matter how efficient MATLAB's calculation is, MATLAB still needs to fetch the number 1 by 1, and do multiplication 1 by 1, and sum up. There is no hope to be faster than a single access.
In your special case, there are all 0's except 1, but it is useless to do optimization for single special case in my opinion, and the best optimization I can come up with still needs to access all the elements for at least once each.
EDIT:
It seems I am in quite good mood today....
Change a(3) to a(1)*b(1)+a(2)*b(2)+a(3)*b(3)+a(4)*b(4)
Elapsed time: 0.02 seconds
It seems that boundary checking (and/or other errands) take more time than the access and calculation.
Why would you think that multiplying a lot of numbers by zeros would be at all efficient? Even if MATLAB could be smart enough to do a test first before the multiply, it must then still do many tests.
I'm asking this question to make a point, that the dot product cannot possibly be at all efficient. Even if MATLAB were smart enough to know that there was only one element that was non-zero, to know that, it would need to do a search for the non-zero element. And how would MATLAB be smart enough to know that what you have written as a vector*vector dot product is actually intended just to access a single element, instead of a true dot product for nefarious purposes unknown to it?
How about
3) access an element by a boolean index matrix:
a = [1 2 3 4]';
b = [0 0 1 0];
result = a(b)
It's almost certainly going to be faster than (2), slower than (1).

List all permutations of the numbers 1,...,n in lexicographic order

I'm trying to program a Matlab to list all permutations of the numbers 1 through n in lexicographic order. What I have so far is below. I am using recursion to try and write a program that will work for n=3 first, and then see if I can gain insight into writing the program for any n. So far I have 2 of the 6 columns for n=3: P=[1 2 3;1 3 2]. I need the next two columns to simply swap the ones and the twos. I don't know how to begin to do that.
function [P] = shoes(n)
if n == 1
P = 1;
elseif n == 2
P = [1 2; 2 1];
else
T = shoes(n-1) + 1;
G = ones(factorial(n-1),1);
P(1:2,1:3) = [G T];
end
See the documentation for a start. If by lexicographical order you mean alphabetical by english name, you may want to populate your input with the names, sort them, then permute that.
If I've misunderstood what you're wanting, comment or edit the question & I'll check back later.
Hints:
The permutations of an empty list are easy to find.
Induction is an important concept in mathematics. You should be familiar with it.
The environment you are working in supports recursion
The permutations of a longer list can be produced in the order you want by recursion; first figure out what you want the first element to be, and then figure out the rest.
If you get stuck again, edit the question posting what you've gotten so far and where/why you think you're stuck.
Hints after seeing your code.
Your core function permutes a vector, and so should take a vector as argument, not an integer
Don't start solving the n=3 case; try the n=0 case (it's []) and then go straight to the n=20 case.
Think about the n=20 case before you write any code. What is the first column going to look like? Are there any examples of the n=19 case hidden in the answer to the n=20 case? (The answer is yes, and they are all different).
Reread the first set of hints
You appear to have asked this question twice. Instead of reposting questions, you should simply click the "edit" link below your question and update it. I'll repost here the answer I gave to your other question, but you should really remove one of them.
If you have the following matrix:
A = [1 2 3; 1 3 2];
and you want all the ones to become twos and the twos to become ones, the following would be the simplest way to do it:
B = A;
B(A == 1) = 2;
B(A == 2) = 1;