Meaning of ISUPPZ in LAPACK's DSYEVR - lapack

I want to calculate the eigensystem to some large real symmetric matrices and found the routine DSYEVR should suit my needs.
dsyevr (JOBZ, RANGE, UPLO, N, A, LDA, VL, VU, IL, IU, ABSTOL, M, W, Z, LDZ, ISUPPZ, WORK, LWORK, IWORK, LIWORK, INFO)
While including and testing this routine, I noticed the parameter isuppz which is an integer array of length 2*m, where m specifies the number of calculated eigenvalues (which might change depending on the input data).
According to the documentation, the array isuppz specifies the position of non-zero elements in the array of eigenvectors z. I.e. in particular, the ith eigenvector has non-zero elements between positions isuppz(2*i-1) and isuppz(2*i).
Now, I have the problem, that this doesn't really match my observations. E.g. while calculating the eigensystem to an example system of size N=400, the second eigenvector has, according to isuppz, only non-zero elements between positions 349 and 400. Looking at the eigenvector, I can see, that this is clearly not the case, and comparing with Mathematica, I can clearly see, that this shouldn't be the case, too.
Actually, the eigensystem from DSYEVR is in principle the same as the eigensystem calculated by Mathematica. As a result, I assume that there is nothing wrong in general.
So, what is the actual meaning of the parameter isuppz?

Related

what is the difference between defining a vector using linspace and defining a vector using steps?

i am trying to learn the basics of matlab ,
i wanted to write a mattlab script ,
in this script i defined a vector x with a "d" step that it's length is (2*pi/1000)
and i wanted to plot two sin function according to x :
the first sin is with a frequency of 1, and the second sin frequency 10.3 ..
this is what i did:
d=(2*pi/1000);
x=-pi:d:pi;
first=sin(x);
second=sin(10.3*x);
plot(x,first,x,second);
my question:
what is the different between :
x=linspace(-pi,pi,1000);
and ..
d=(2*pi/1000);
x=-pi:d:pi;
? i am asking because i got confused since i think they both are the same but i think there is something wrong with my assumption ..
also is there is a more sufficient way to write sin function with a giveng frequency ?
The main difference can be summarizes as predefined size vs predefined step. And your example highlights it very well, indeed (1000 elements vs 1001 elements).
The linspace function produces a fixed-length vector (the length being defined by the third input argument, which defaults to 100) whose lower and upper limits are set, respectively, by the first and the second input arguments. The correct step to use is internally computed by the function itself (step = (x2 - x1) / n).
The colon operator defines a vector of elements whose values range between the specified lower and upper limits. The step, which is an optional parameter that defaults to 1, is the discriminant of the vector length. This means that the length of the result is determined by the number of steps that must be accomplished in order to reach the upper limit, starting from the lower one. On an side note, on this MathWorks thread you can find a very interesting discussion concerning the behavior of the colon operator in respect of floating-point management.
Another difference, related to the first one, is that linspace always includes the upper limit value while the colon operator only contains it if the specified step allows it (0:5:14 = [0 5 10]).
As a general rule, I prefer to use the former when I want to produce a vector of a predefined length (pretty obvious, isn't it?), and the latter when I need to create a sequence whose length has only a marginal relevance (or no relevance at all)

Are the conditional probabilities of MATLAB's mnrval correct?

An MWE (stats toolbox required, tested on MATLAB R2014b):
x = (1:3)';
b = mnrfit(x,x,'model','hierarchical');
pihat = mnrval(b,x,'model','hierarchical','type','conditional')
Output:
pihat =
1 1
2.2204e-16 1
2.2204e-16 2.2204e-16
(Ignore the issued warning, it's because of the trivial example, which is linearly separable (I'm predicting x using itself). It doesn't matter: I've tried this as well with a non-trivial (and not-so minimal) example without the warning and the results are similar.)
My problem is the result. I've specified I want the conditional probabilities. According to MATLAB's documentation on mnrval:
Specify ['conditional'] to return predictions [...] in terms of the first k – 1 conditional category probabilities [...], i.e., the probability [...] for category j, given an outcome in category j or higher.
In my example this means rows of pihat contain the probability of
x=1 given x>=1
x=2 given x>=2
(A third column for x=3 is not necessary, because if the first two probabilities are known, the third is too. It follows logically from P(x=1) + P(x=2) + P(x=3) = 1.)
Am I interpreting this correctly? Thus, if x=1 is predicted, then the first column value should be large (close to one), because P(x=1) given x>=1 is large. The second column should be close to zero, because P(x=2) given x>=2 can't be large if x=1.
However, as you can see in the first row, the second column value is large as well as the first! I believe this is incorrect according to what the documentation specifies, am I right? The current (incorrect?) result implies the predicted probabilities in the rows are not of x=j given x>=j, but what are they then? Or how should I be interpreting them?
They are not equal to the cumulative probabilities, i.e. the probability of x<=j, which increases with j. I've checked this by calculating pihat2 = mnrval(b,x,'model','hierarchical','type','cumulative'); pihat2-pihat.

Finding the best threshold level without a for loop in MATLAB

Let A and B be two matrices of the same size. For a matrix M, let ht(M,t) threshold all the entries of M by t. That is All entries whose absolute value is less than t are set to 0. Suppose I want to find the optimal threshold t such that norm(ht(A,t)-B,'fro')^2 is minimized.
The only way that I can see to do this is deficient: do a for loop over the unique values of A and threshold A and setting C=ht(A,t)-B, compute sum(sum(C.*C)).
This is just too slow when A is large. I have considered sorting the elements of A and finding some efficient way to set a few entries to zero at a time, but I'm not sure this can all be done without a for loop.
Is there a way to do it?
Here's a very simple example (so simple a for loop works easily in this case):
B =
0.101508820368332 0
0 0.301996943246957
Set
A=B+.1*ones(2)
A =
0.201508820368332 0.1
0.1 0.401996943246957
Simple inspection shows that if we zero out the off-diagonal entries of A we minimize the difference between A and B. There are 3 possible threshold values, given by unique(A)=[.1,.2015,.402]. Given a potential threshold value t, we can hard threshold A by:
function [A_thresholded] = ht(A,t)
%
A_thresholded = A .* (abs(A)>t);
The form of the data in a matrix is irrelevant. You can convert them to vectors and simply compute the square-norm. In fact, you can sort the contents of A in increasing order (and permute B to preserve pairing). When you increase the threshold to include one more value in A, the norm only changes by that one increment. Therefore, you can find your solution in O(n log n). Hope this helps.

Partitioning a number into a number of almost equal partitions

I would like to partition a number into an almost equal number of values in each partition. The only criteria is that each partition must be in between 60 to 80.
For example, if I have a value = 300, this means that 75 * 4 = 300.
I would like to know a method to get this 4 and 75 in the above example. In some cases, all partitions don't need to be of equal value, but they should be in between 60 and 80. Any constraints can be used (addition, subtraction, etc..). However, the outputs must not be floating point.
Also it's not that the total must be exactly 300 as in this case, but they can be up to a maximum of +40 of the total, and so for the case of 300, the numbers can sum up to 340 if required.
Assuming only addition, you can formulate this problem into a linear programming problem. You would choose an objective function that would maximize the sum of all of the factors chosen to generate that number for you. Therefore, your objective function would be:
(source: codecogs.com)
.
In this case, n would be the number of factors you are using to try and decompose your number into. Each x_i is a particular factor in the overall sum of the value you want to decompose. I'm also going to assume that none of the factors can be floating point, and can only be integer. As such, you need to use a special case of linear programming called integer programming where the constraints and the actual solution to your problem are all in integers. In general, the integer programming problem is formulated thusly:
You are actually trying to minimize this objective function, such that you produce a parameter vector of x that are subject to all of these constraints. In our case, x would be a vector of numbers where each element forms part of the sum to the value you are trying to decompose (300 in your case).
You have inequalities, equalities and also boundaries of x that each parameter in your solution must respect. You also need to make sure that each parameter of x is an integer. As such, MATLAB has a function called intlinprog that will perform this for you. However, this function assumes that you are minimizing the objective function, and so if you want to maximize, simply minimize on the negative. f is a vector of weights to be applied to each value in your parameter vector, and with our objective function, you just need to set all of these to -1.
Therefore, to formulate your problem in an integer programming framework, you are actually doing:
(source: codecogs.com)
V would be the value you are trying to decompose (so 300 in your example).
The standard way to call intlinprog is in the following way:
x = intlinprog(f,intcon,A,b,Aeq,beq,lb,ub);
f is the vector that weights each parameter of the solution you want to solve, intcon denotes which of your parameters need to be integer. In this case, you want all of them to be integer so you would have to supply an increasing vector from 1 to n, where n is the number of factors you want to decompose the number V into (same as before). A and b are matrices and vectors that define your inequality constraints. Because you want equality, you'd set this to empty ([]). Aeq and beq are the same as A and b, but for equality. Because you only have one constraint here, you would simply create a matrix of 1 row, where each value is set to 1. beq would be a single value which denotes the number you are trying to factorize. lb and ub are the lower and upper bounds for each value in the parameter set that you are bounding with, so this would be 60 and 80 respectively, and you'd have to specify a vector to ensure that each value of the parameters are bounded between these two ranges.
Now, because you don't know how many factors will evenly decompose your value, you'll have to loop over a given set of factors (like between 1 to 10, or 1 to 20, etc.), place your results in a cell array, then you have to manually examine yourself whether or not an integer decomposition was successful.
num_factors = 20; %// Number of factors to try and decompose your value
V = 300;
results = cell(1, num_factors);
%// Try to solve the problem for a number of different factors
for n = 1 : num_factors
x = intlinprog(-ones(n,1),1:n,[],[],ones(1,n),V,60*ones(n,1),80*ones(n,1));
results{n} = x;
end
You can then go through results and see which value of n was successful in decomposing your number into that said number of factors.
One small problem here is that we also don't know how many factors we should check up to. That unfortunately I don't have an answer to, and so you'll have to play with this value until you get good results. This is also an unconstrained parameter, and I'll talk about this more later in this post.
However, intlinprog was only released in recent versions of MATLAB. If you want to do the same thing without it, you can use linprog, which is the floating point version of integer programming... actually, it's just the core linear programming framework itself. You would call linprog this way:
x = linprog(f,A,b,Aeq,beq,lb,ub);
All of the variables are the same, except that intcon is not used here... which makes sense as linprog may generate floating point numbers as part of its solution. Due to the fact that linprog can generate floating point solutions, what you can do is if you want to ensure that for a given value of n, you could loop over your results, take the floor of the result and subtract with the final result, and sum over the result. If you get a value of 0, this means that you had a completely integer result. Therefore, you'd have to do something like:
num_factors = 20; %// Number of factors to try and decompose your value
V = 300;
results = cell(1, num_factors);
%// Try to solve the problem for a number of different factors
for n = 1 : num_factors
x = linprog(-ones(n,1),[],[],ones(1,n),V,60*ones(n,1),80*ones(n,1));
results{n} = x;
end
%// Loop through and determine which decompositions were successful integer ones
out = cellfun(#(x) sum(abs(floor(x) - x)), results);
%// Determine which values of n were successful in the integer composition.
final_factors = find(~out);
final_factors will contain which number of factors you specified that was successful in an integer decomposition. Now, if final_factors is empty, this means that it wasn't successful in finding anything that would be able to decompose the value into integer factors. Noting your problem description, you said you can allow for tolerances, so perhaps scan through results and determine which overall sum best matches the value, then choose whatever number of factors that gave you that result as the final answer.
Now, noting from my comments, you'll see that this problem is very unconstrained. You don't know how many factors are required to get an integer decomposition of your value, which is why we had to semi-brute-force it. In fact, this is a more general case of the subset sum problem. This problem is NP-complete. Basically, what this means is that it is not known whether there is a polynomial-time algorithm that can be used to solve this kind of problem and that the only way to get a valid solution is to brute-force each possible solution and check if it works with the specified problem. Usually, brute-forcing solutions requires exponential time, which is very intractable for large problems. Another interesting fact is that modern cryptography algorithms use NP-Complete intractability as part of their ciphertext and encrypting. Basically, they're banking on the fact that the only way for you to determine the right key that was used to encrypt your plain text is to check all possible keys, which is an intractable problem... especially if you use 128-bit encryption! This means you would have to check 2^128 possibilities, and assuming a moderately fast computer, the worst-case time to find the right key will take more than the current age of the universe. Check out this cool Wikipedia post for more details in intractability with regards to key breaking in cryptography.
In fact, NP-complete problems are very popular and there have been many attempts to determine whether there is or there isn't a polynomial-time algorithm to solve such problems. An interesting property is that if you can find a polynomial-time algorithm that will solve one problem, you will have found an algorithm to solve them all.
The Clay Mathematics Institute has what are known as Millennium Problems where if you solve any problem listed on their website, you get a million dollars.
Also, that's for each problem, so one problem solved == 1 million dollars!
(source: quickmeme.com)
The NP problem is amongst one of the seven problems up for solving. If I recall correctly, only one problem has been solved so far, and these problems were first released to the public in the year 2000 (hence millennium...). So... it has been about 14 years and only one problem has been solved. Don't let that discourage you though! If you want to invest some time and try to solve one of the problems, please do!
Hopefully this will be enough to get you started. Good luck!

MATLAB: What's [Y,I]=max(AS,[],2);?

I just started matlab and need to finish this program really fast, so I don't have time to go through all the tutorials.
can someone familiar with it please explain what the following statement is doing.
[Y,I]=max(AS,[],2);
The [] between AS and 2 is what's mostly confusing me. And is the max value getting assigned to both Y and I ?
According to the reference manual,
C = max(A,[],dim) returns the largest elements along the dimension of A specified by scalar dim. For example, max(A,[],1) produces the maximum values along the first dimension (the rows) of A.
[C,I] = max(...) finds the indices of the maximum values of A, and returns them in output vector I. If there are several identical maximum values, the index of the first one found is returned.
I think [] is there just to distinguish itself from max(A,B).
C = max(A,[],dim) returns the largest elements along the dimension of A specified by scalar dim. For example, max(A,[],1) produces the maximum values along the first dimension (the rows) of A.
Also, the [C, I] = max(...) form gives you the maximum values in C, and their indices (i.e. locations) in I.
Why don't you try an example, like this? Type it into MATLAB and see what you get. It should make things much easier to see.
m = [[1;6;2] [5;8;0] [9;3;5]]
max(m,[],2)
AS is matrix.
This will return the largest elements of AS in its 2nd dimension (i.e. its columns)
This function is taking AS and producing the maximum value along the second dimension of AS. It returns the max value 'Y' and the index of it 'I'.
note the apparent wrinkle in the matlab convention; there are a number of builtin functions which have signature like:
xs = sum(x,dim)
which works 'along' the dimension dim. max and min are the oddbal exceptions:
xm = max(x,dim); %this is probably a silent semantical error!
xm = max(x,[],dim); %this is probably what you want
I sometimes wish matlab had a binary max and a collapsing max, instead of shoving them into the same function...