Matlab division, cannot get back the answer - matlab

H = [1 2; 3 4; 5 6; 7 8; 9 10; 11 12; 13 14; 15 16];
X = [7; 9];
Y = H*X;
H1 = Y/X;
This is my code. As you can see, I was trying to get back the H values. However, it gave me something else. I have tried to use inv() but this is not possible because X is not a square matrix.

You can't get a value of rank 2 back by dividing a value of rank 1. The system is underconstrained.
Both mrdivide and pinv (for pseudo-inverse) can be used to get a solution to the system. Because there are multiple solutions, it wouldn't necessary be the one you started with. Instead you'll get a "simplest" solution, either in the sense of lowest cardinality or lowest 2-norm, depending on whether you use mrdivide or pinv.
Here, the pinv documentation page probably explains it more precisely than I can. Just note it is discussing X\Y instead of Y/X:
If A has more rows than columns and is not of full rank, then the overdetermined least squares problem
minimize norm(A*x-b)
does not have a unique solution. Two of the infinitely many solutions are
x = pinv(A)*b
and
y = A\b
These two are distinguished by the facts that norm(x) is smaller than the norm of any other solution and that y has the fewest possible nonzero components.

Related

Matlab Dimensions Swapped in Meshgrid

In something which made me spent several hours, I have found an inconsistency in how Matlab deals with dimensions. I somebody can explain to me OR indicate me how to report this to Matlab, please enlight me.
For size, ones,zeros,mean, std, and most every other old and common commands existing inside Matlab, the dimension arrangement is like the classical one and like the intended standard (as per the size of every dimension), the first dimension is along the column vector, the second dimension is along the row vector, and the following are the non graphical following indexes.
>x(:,:,1)=[1 2 3 4;5 6 7 8];
>x(:,:,2)=[9 10 11 12;13 14 15 16];
>m=mean(x,1)
m(:,:,1) = 3 4 5 6
m(:,:,2) = 11 12 13 14
m=mean(x,2)
m(:,:,1) =
2.5000
6.5000
m(:,:,2) =
10.5000
14.5000
m=mean(x,3)
m = 5 6 7 8
9 10 11 12
d=size(m)
d = 2 4 2
However, for graphical commands like stream3,streamline and others relying on the meshgrid output format, the dimensions 1 and 2 are swapped!: the first dimension is the row vector and the second dimension is the column vector, and the following (third) is the non graphical index.
> [x,y]=meshgrid(1:2,1:3)
x = 1 2
1 2
1 2
y = 1 1
2 2
3 3
Then, for stream3 to operate with classically arranged matrices, we should use permute(XXX,[2 1 3]) at every 3D argument:
xyz=stream3(permute(x,[2 1 3]),permute(y,[2 1 3]),permute(z,[2 1 3])...
,permute(u,[2 1 3]),permute(v,[2 1 3]),permute(w,[2 1 3])...
,xs,ys,zs);
If anybody can explain why this happens, and can indicate to me why this is not a bug, welcome.
This behavior is not a bug because it is clearly documented as the intended behavior: https://www.mathworks.com/help/matlab/ref/meshgrid.html. Specifically:
[X,Y,Z]= meshgrid(x,y,z) returns 3-D grid coordinates defined by the vectors x, y, and z. The grid represented by X, Y, and Z has size length(y)-by-length(x)-by-length(z).
Without speaking to the original authors, the exact motivation may be a bit obscure, but I suspect it has to do with the fact that the y-axis is generally associated with the rows of an image, while x is the columns.
Columns are either "j" or "x" in the documentation, rows are either "i" or "y".
Some functions deal with spatial coordinates. The documentation will refer to "x, y, z". These functions will thus take column values before row values as input arguments.
Some functions deal with array indices. The documentation will refer to "i, j" (or sometimes "i1, i2, i3, ..., in", or using specific names instead of "i" before the dimension number). These functions will thus take row values before column values as input arguments.
Yes, this can be confusing. But if you pay attention to the names of the variables in the documentation, you will quickly figure out the right order.
With meshgrid in particular, if the "x, y, ..." order of arguments is confusing, use ndgrid instead, which takes arguments in array indexing order.

matlab k-means clustering evaluation [duplicate]

This question already has answers here:
Evaluating K-means accuracy
(2 answers)
Closed 5 years ago.
How effectively evaluate the performance of the standard matlab k-means implementation.
For example I have a matrix X
X = [1 2;
3 4;
2 5;
83 76;
97 89]
For every point I have a gold standard clustering. Let's assume that (83,76), (97,89) is the first cluster and (1,2), (3,4), (2,5) is the second cluster. Then we run matlab
idx = kmeans(X,2)
And get the following results
idx = [1; 1; 2; 2; 2]
According the the NOMINAL values it's very bad clustering because only (2,5) is correct, but we don't care about nominal values, we care only about points that is clustered together. Therefore somehow we have to identify that only (2,5) gets to the incorrect cluster.
For me a newbie in matlab is not a trivial task to evaluate the performance of clustering. I would appreciate if you can share with us your ideas about how to evaluate the performance.
To evaluate the "best clustering" is somewhat ambiguous, especially if you have points in two different groups that may eventually cross over with respect to their features. When you get this case, how exactly do you define which cluster those points get merged to? Here's an example from the Fisher Iris dataset that you can get preloaded with MATLAB. Let's specifically take the sepal width and sepal length, which is the third and fourth columns of the data matrix, and plot the setosa and virginica classes:
load fisheriris;
plot(meas(101:150,3), meas(101:150,4), 'b.', meas(51:100,3), meas(51:100,4), 'r.', 'MarkerSize', 24)
This is what we get:
You can see that towards the middle, there is some overlap. You are lucky in that you knew what the clusters were before hand and so you can measure what the accuracy is, but if we were to get data such as the above and we didn't know what labels each point belonged to, how do you know which cluster the middle points belong to?
Instead, what you should do is try and minimize these classification errors by running kmeans more than once. Specifically, you can override the behaviour of kmeans by doing the following:
idx = kmeans(X, 2, 'Replicates', num);
The 'Replicates' flag tells kmeans to run for a total of num times. After running kmeans num times, the output memberships are those which the algorithm deemed to be the best over all of those times kmeans ran. I won't go into it, but they determine what the "best" average is out of all of the membership outputs and gives you those.
Not setting the Replicates flag obviously defaults to running one time. As such, try increasing the total number of times kmeans runs so that you have a higher probability of getting a higher quality of cluster memberships. By setting num = 10, this is what we get with your data:
X = [1 2;
3 4;
2 5;
83 76;
97 89];
num = 10;
idx = kmeans(X, 2, 'Replicates', num)
idx =
2
2
2
1
1
You'll see that the first three points belong to one cluster while the last two points belong to another. Even though the IDs are flipped, it doesn't matter as we want to be sure that there is a clear separation between the groups.
Minor note with regards to random algorithms
If you take a look at the comments above, you'll notice that several people tried running the kmeans algorithm on your data and they received different clustering results. The reason why is because when kmeans chooses the initial points for your cluster centres, these are chosen in a random fashion. As such, depending on what state their random number generator was in, it is not guaranteed that the initial points chosen for one person will be the same as another person.
Therefore, if you want reproducible results, you should set the random seed of your random seed generator to be the same before running kmeans. On that note, try using rng with an integer that is known before hand, like 123. If we did this before the code above, everyone who runs the code will be able to reproduce the same results.
As such:
rng(123);
X = [1 2;
3 4;
2 5;
83 76;
97 89];
num = 10;
idx = kmeans(X, 2, 'Replicates', num)
idx =
1
1
1
2
2
Here the labels are reversed, but I guarantee that if any else runs the above code, they will get the same labelling as what was produced above each time.

Power to elements in a matrix

I have a question about being able to bring some number to the power of an element in a matrix.
I know that, if A is a matrix, then one could write A.^2 to take the square of each number in that matrix. My question is, is there any way to something like: B=2.^A, such that the resulting matrix B is the same size as A, and each element in that matrix is equal to 2 to the power of the corresponding element in A?
Thanks for any help!
You already answered yourself! Use B = 2.^A.
For example:
>> A = [1 2; 3 4]
A =
1 2
3 4
>> B = 2.^A
B =
2 4
8 16
You could also use power(2,A), which is the same thing.
Matlab is a very interactive platform, so feel free to experiment and see for yourself if something works or not. In this case your intuition was correct.

Remove highly correlated components

I have got a problem to remove highly correlated components. Can I ask how to do this?
For example, I have got 40 instances with 20 features (random created). Feature 2 and 18 is highly correlated with feature 4. And feature 6 is highly correlated with feature 10. Then how to remove the highly correlated (redundant) features such as 2, 18 and 10? Essentially, I need the index of remaining features 1, 3, 4, 5, 6, ..., 9, 11, ..., 17, 19, 20.
Matlab codes:
x = randn(40,20);
x(:,2) = 2.*x(:,4);
x(:,18) = 3.*x(:,4);
x(:,6) = 100.*x(:,10);
x_corr = corr(x);
size(x_corr)
figure, imagesc(x_corr),colorbar
Correlation matrix x_corr looks like
edit:
I worked out a way:
x_corr = x_corr - diag(diag(x_corr));
[x_corrX, x_corrY] = find(x_corr>0.8);
for i = 1:size(x_corrX,1)
xx = find(x_corrY == x_corrX(i));
x_corrX(xx,:) = 0;
x_corrY(xx,:) = 0;
end
x_corrX = unique(x_corrX);
x_corrX = x_corrX(2:end);
im = setxor(x_corrX, (1:20)');
Am I right? Or you have a better idea please post. Thanks.
edit2: Is this method the same as using PCA?
It seems quite clear that this idea of yours, to simply remove highly correlated variables from the analysis is NOT the same as PCA. PCA is a good way to do rank reduction of what seems to be a complicated problem, into one that turns out to have only a few independent things happening. PCA uses an eigenvalue (or svd) decomposition to achieve that goal.
Anyway, you might have a problem. For example, suppose that A is highly correlated to B, and B is highly correlated to C. However, it need not be true that A and C are highly correlated. Since correlation can be viewed as a measure of the angle between those vectors in their corresponding high dimensional vector space, this can be easily made to happen.
As a trivial example, I'll create two variables, A and B, that are correlated at a "moderate" level.
n = 50;
A = rand(n,1);
B = A + randn(n,1)/2;
corr([A,B])
ans =
1 0.55443
0.55443 1
So here 0.55 is the correlation. I'll create C to be virtually the average of A and B. It will be highly correlated by your definition.
C = [A + B]/2 + randn(n,1)/100;
corr([A,B,C])
ans =
1 0.55443 0.80119
0.55443 1 0.94168
0.80119 0.94168 1
Clearly C is the bad guy here. But if one were to simply look at the pair [A,C] and remove A from the analysis, then do the same with the pair [B,C] and then remove B, we would have made the wrong choices. And this was a trivially constructed example.
In fact, it is true that the eigenvalues of the correlation matrix might be of interest.
[V,D] = eig(corr([A,B,C]))
V =
-0.53056 -0.78854 -0.311
-0.57245 0.60391 -0.55462
-0.62515 0.11622 0.7718
D =
2.5422 0 0
0 0.45729 0
0 0 0.00046204
The fact that D has two significant diagonal elements, and a tiny one tells us that really, this is a two variable problem. What PCA will not easily tell us is which vector to simply remove though, and the problem would only be less clear with more variables, with many interactions between all of them.
I think the answer of woodchips is quite good. But when you're using eigenvalues, you can run into some trouble. If the dataset is large enough, there will always be some small eigenvalues, but you won't be sure what they tell you.
Instead, consider grouping your data by a simple clustering method. It's easy to implement in Matlab.
http://www.mathworks.de/de/help/stats/cluster-analysis-1-1.html
edit:
If you disregard the points that woodchips made, you're solution is okay, as an algorithm.

Optimization with discrete parameters in Matlab

I have 12 sets of vectors (about 10-20 vectors each) and i want to pick one vector of each set so that a function f that takes the sum of these vectors as argument is maximized. In addition i have constraints for some components of that sum.
Example:
a_1 = [3 2 0 5], a_2 = [3 0 0 2], a_3 = [6 0 1 1], ... , a_20 = [2 12 4 3]
b_1 = [4 0 4 -2], b_2 = [0 0 1 0], b_3 = [2 0 0 4], ... , b_16 = [0 9 2 3]
...
l_1 = [4 0 2 0], l_2 = [0 1 -2 0], l_3 = [4 4 0 1], ... , l_19 = [3 0 9 0]
s = [s_1 s_2 s_3 s_4] = a_x + b_y + ... + l_z
Constraints:
s_1 > 40
s_2 < 100
s_4 > -20
Target: Chose x, y, ... , z to maximize f(s):
f(s) -> max
Where f is a nonlinear function that takes the vector s and returns a scalar.
Bruteforcing takes too long because there are about 5.9 trillion combinations, and since i need the maximum (or even better the top 10 combinations) i can not use any of the greedy algorithms that came to my mind.
The vectors are quite sparse, about 70-90% are zeros. If that is helping somehow ...?
The Matlab Optimization toolbox didnt help either since it doesnt much support for discrete optimization.
Basically this is a lock-picking problem, where the lock's pins have 20 distinct positions, and there are 12 pins. Also:
some of the pin's positions will be blocked, depending on the positions of all the other pins.
Depending on the specifics of the lock, there may be multiple keys that fit
...interesting!
Based on Rasman's approach and Phpdna's comment, and the assumption that you are using int8 as data type, under the given constraints there are
>> d = double(intmax('int8'));
>> (d-40) * (d+100) * (d+20) * 2*d
ans =
737388162
possible vectors s (give or take a few, haven't thought about +1's etc.). ~740 million evaluations of your relatively simple f(s) shouldn't take more than 2 seconds, and having found all s that maximize f(s), you are left with the problem of finding linear combinations in your vector set that add up to one of those solutions s.
Of course, this finding of combinations is no easy feat, and the whole method breaks down anyway if you are dealing with
int16: ans = 2.311325368800510e+018
int32: ans = 4.253529737045237e+037
int64: ans = 1.447401115466452e+076
So, I'll discuss a more direct and more general approach here.
Since we're talking integers and a fairly large search space, I'd suggest using a branch-and-bound algorithm. But unlike the bintprog algorithm, you'd have to use different branching strategies, and of course, these should be based on a non-linear objective function.
Unfortunately, there is nothing like this in the optimization toolbox (or the File Exchange as far as I could find). fmincon is a no-go, since it uses gradient and Hessian information (which will usually be all-zero for integers), and fminsearch is a no-go, since you'll need a really good initial estimate, and the rate of convergence is (roughly) O(N), meaning, for this 20-dimensional problem you'll have to wait quite long before convergence, without the guarantee of having found the global solution.
An interval method could be a possibility, however, I personally have very little experience with this. There is no native interval-related stuff in MATLAB or any of its toolboxes, but there's the freely available INTLAB.
So, if you're not feeling like implementing your own non-linear binary integer programming algorithm, or are not in the mood for an adventure with INTLAB, there's really only one thing left: heuristic methods. In this link there is a similar situation, with an outline of the solution: use the genetic algorithm (ga) from the Global Optimization toolbox.
I would implement the problem roughly like so:
function [sol, fval, exitflag] = bintprog_nonlinear()
%// insert your data here
%// Any sparsity you may have here will only make this more
%// *memory* efficient, not *computationally*
data = [...
... %// this will be an array with size 4-by-20-by-12
... %// (or some permutation of that you find more intuitive)
];
%// offsets into the 3D array to facilitate indexing a bit
offsets = bsxfun(#plus, ...
repmat(1:size(data,1), size(data,3),1), ...
(0:size(data,3)-1)' * size(data,1)*size(data,2)); %//'
%// your objective function
function val = obj(X)
%// limit "X" to integers in [1 20]
X = min(max(round(X),1),size(data,3));
%// "X" will be a collection of 12 integers between 0 and 20, which are
%// indices into the data matrix
%// form "s" from "X"
s = sum(bsxfun(#plus, offsets, X*size(data,1) - size(data,1)));
%// XxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxX
%// Compute the NEGATIVE VALUE of your function here
%// XxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxX
end
%// your "non-linear" constraint function
function [C, Ceq] = nonlcon(X)
%// limit "X" to integers in [1 20]
X = min(max(round(X),1),size(data,3));
%// form "s" from "X"
s = sum(bsxfun(#plus, offsets, X(:)*size(data,1) - size(data,1)));
%// we have no equality constraints
Ceq = [];
%// Compute inequality constraints
%// NOTE: solver is trying to solve C <= 0, so:
C = [...
40 - s(1)
s(2) - 100
-20 - s(4)
];
end
%// useful GA options
options = gaoptimset(...
'UseParallel', 'always'...
...
);
%// The rest really depends on the specifics of the problem.
%// Useful to look at will be at least 'TolCon', 'Vectorized', and of course,
%// 'PopulationType', 'Generations', etc.
%// THE OPTIMZIATION
[sol, fval, exitflag] = ga(...
#obj, size(data,3), ... %// objective function, taking a vector of 20 values
[],[], [],[], ... %// no linear (in)equality constraints
1,size(data,2), ... %// lower and upper limits
#nonlcon, options); %// your "nonlinear" constraints
end
Note that even though your constraints are essentially linear, the way by which you must compute the value for your s necessitates the use of a custom constraint function (nonlcon).
Especially note that this is currently (probably) a sub-optimal way to use ga -- I don't know the specifics of your objective function, so a lot more may be possible. For instance, I currently use a simple round() to convert the input X to integers, but using 'PopulationType', 'custom' (with a custom 'CreationFcn', 'MutationFcn' etc.) might produce better results. Also, 'Vectorized' will likely speed things up a lot, but I don't know whether your function is easily vectorized.
And yes, I use nested functions (I just love those things!); it prevents these huge, usually identical lists of input arguments if you use sub-functions or stand-alone functions, and they can really be a performance boost because there is little copying of data. But, I realize that their scoping rules make them somewhat akin to goto constructs, and so they are -ahum- "not everyone's cup of tea"...you might want to convert them to sub-functions to prevent long and useless discussions with your co-workers :)
Anyway, this should be a good place to start. Let me know if this is useful at all.
Unless you define some intelligence on how the vector sets are organized, there will be no intelligent way of solving your problem other then pure brute force.
Say you find s s.t. f(s) is max given constraints of s, you still need to figure out how to build s with twelve 4-element vectors (an overdetermined system if there ever was one), where each vector has 20 possible values. Sparsity may help, although I'm not sure how it is possible to have a vector with four elements be 70-90% zero, and sparsity would only be useful if there was some yet to be described methodology in how the vector are organized
So I'm not saying you can't solve the problem, I'm saying you need to rethink how the problem is set-up.
I know, this answer is reaching you really late.
Unfortunately, the problem, as is, show not many patterns to be exploited, besides of brute force -Branch&Bound, Master& Slave, etc.- Trying a Master Slave approach -i.e. solving first the function continuous nonlinear problem as master, and solving the discrete selection as slave could help, but with as many combinations, and without any more information over the vectors, there is not too much space for work.
But based on the given continuous almost everywhere functions, based on combinations of sums and multiplication operators and their inverses, the sparsity is a clear point to be exploited here. If 70-90% of vectors are zero, almost a good part of the solution space will be close to zero, or close to infinite. Hence a 80-20 pseudo solution would discard easily the 'zero' combinations, and use only the 'infinite' ones.
This way, the brute-force could be guided.