I have a theoretical question, and understand the concept of Kernel scale with the Gaussian Kernel, but when I run 'OptimizeHyperparameters' in fitcsvm in Matlab, it gives me different values than one, and I would like to understand what that means...
What does it mean a high value of kernel scale in linear kernel svm? and in polynomial?
Please pay attention to these paragraph from MATLAB help :
You cannot use any cross-validation name-value pair argument along with the 'OptimizeHyperparameters' name-value pair argument. You can modify the cross-validation for 'OptimizeHyperparameters' only by using the 'HyperparameterOptimizationOptions' name-value pair argument.
OptimizeHyperparameters values override any values you set using other name-value pair arguments. For example, setting OptimizeHyperparameters to 'auto' causes the 'auto' values to apply.
MATLAB divides all elements of the predictor matrix X by the value of KernelScale. Then, the software applies the appropriate kernel norm to compute the Gram matrix. Therefore a high value for kernel scale means all elements of the predictor matrix must be divided to a big value.
KernelScale can be between [1e-3,1e3]. Fitcsvm searches among positive values, by default log-scaled in the range [1e-3,1e3].
If you specify KernelScale and your own kernel function, for example, 'KernelFunction','kernel', then the software throws an error. You must apply scaling within kernel.
Related
The default MATLAB 'Extreme Value' distribution (also called a Gumbel distribution) is used for the extreme MIN case.
Given the mean and standard deviation of Gumbel distributed random variables for the extreme MAX case, I can get the location and scale parameter using the following equations from this website:
My question is how do I transform the MATLAB 'Extreme Value' distribution from the MIN case to MAX case (MATLAB says "using the negative of the original values").
I would like to use MATLAB's icdf function, thus do I need to negate the location and scale parameters of the inputs?
Judging by the last paragraph of your question you want the inverse CDF of the maximum Gumbel distribution. Given that Matlab offers the inverse CDF of the Gumbel min distribution as follows:
X = evinv(P,mu,sigma);
You can get the inverse CDF of the Gumbel max by:
X = -evinv(1-P, -mu, sigma);
Note that for computing the PDF or CDF different expressions hold (that can be similarly worked out based on the definition of the two distributions).
I have been working on the same problem and this is what I've concluded:
To create the probability distribution function of extreme value type I or gumbel for the maximum case in matlab using mu and sigma, or location and scale parameter, you can use the makedist function, use generalized extreme value function and set the k parameter equal to zero. This will create a mirror image of the ev, or extreme value function minimum which is used for gumbel in matlab. The mirror of the minimum case of gumbel is the maximum case of gumbel.
pd = makedist('GeneralizedExtremeValue','k',0,'sigma',sigma,'mu',mu);
so using the above command all you have to do is replace sigma and mu with the values you've got.
I am a student and this is my understanding of this problem.
Given a base bitmap s and a set of possible sucessor bitmaps s1, ..., sN, how can I train a TensorFlow graph to compute a probability distribution over these sucessors ?
Every bitmap sK could be processed as a single input by the same network to give a real value representing it likelihood, which could then at this point be mapped trough a softmax layer to give a probability distribution.
However, doing this by hand wouldn't allow to use backpropagation and implemented optimizers, and it's not even guaranted that it would accept variable lenght input and outputs.
Is this even possible ? Input and Output tensors seems to have necessarly a fixed size appart over the batch dimension.
If I understand correctly, you have a base bitmap s and given that you want to model the distribution of the sequence s1, s2, ..., sN.
This can be achieved using a sequence labeling RNN model, where you want to model the probability distribution of the sequence s1, s2, ..., sN. But to condition on s, you can concatenate the feature vector of s to every input in the sequence. So at time-step t, the input vector to the RNN will be the concatenation of feature vector for s_t and s and after the softmax, the expected output should be s_t+1 (i.e. the loss should be such that. eg. Cross-entropy loss can be used).
Variable-length sequences can very well be used. Tensorflow always as many dimensions of the placeholder to be None, i.e. of variable size during running the graph. So along with variable batch-size you can very well have variable length sequences.
I try to convert matlab code to numpy and figured out that numpy has a different result with the std function.
in matlab
std([1,3,4,6])
ans = 2.0817
in numpy
np.std([1,3,4,6])
1.8027756377319946
Is this normal? And how should I handle this?
The NumPy function np.std takes an optional parameter ddof: "Delta Degrees of Freedom". By default, this is 0. Set it to 1 to get the MATLAB result:
>>> np.std([1,3,4,6], ddof=1)
2.0816659994661326
To add a little more context, in the calculation of the variance (of which the standard deviation is the square root) we typically divide by the number of values we have.
But if we select a random sample of N elements from a larger distribution and calculate the variance, division by N can lead to an underestimate of the actual variance. To fix this, we can lower the number we divide by (the degrees of freedom) to a number less than N (usually N-1). The ddof parameter allows us change the divisor by the amount we specify.
Unless told otherwise, NumPy will calculate the biased estimator for the variance (ddof=0, dividing by N). This is what you want if you are working with the entire distribution (and not a subset of values which have been randomly picked from a larger distribution). If the ddof parameter is given, NumPy divides by N - ddof instead.
The default behaviour of MATLAB's std is to correct the bias for sample variance by dividing by N-1. This gets rid of some of (but probably not all of) of the bias in the standard deviation. This is likely to be what you want if you're using the function on a random sample of a larger distribution.
The nice answer by #hbaderts gives further mathematical details.
The standard deviation is the square root of the variance. The variance of a random variable X is defined as
An estimator for the variance would therefore be
where denotes the sample mean. For randomly selected , it can be shown that this estimator does not converge to the real variance, but to
If you randomly select samples and estimate the sample mean and variance, you will have to use a corrected (unbiased) estimator
which will converge to . The correction term is also called Bessel's correction.
Now by default, MATLABs std calculates the unbiased estimator with the correction term n-1. NumPy however (as #ajcr explained) calculates the biased estimator with no correction term by default. The parameter ddof allows to set any correction term n-ddof. By setting it to 1 you get the same result as in MATLAB.
Similarly, MATLAB allows to add a second parameter w, which specifies the "weighing scheme". The default, w=0, results in the correction term n-1 (unbiased estimator), while for w=1, only n is used as correction term (biased estimator).
For people who aren't great with statistics, a simplistic guide is:
Include ddof=1 if you're calculating np.std() for a sample taken from your full dataset.
Ensure ddof=0 if you're calculating np.std() for the full population
The DDOF is included for samples in order to counterbalance bias that can occur in the numbers.
I have a Function V which depends on two variables v1 and v2 and a parameter-Array p containing 15 Parameters.
I want to Minimize my Function V regarding v1 and v2, but there is no closed expression for my Function, so I can't build and use the Derivatives.
The Problem is the following : For caluclating the Value of my Function I need the Eigenvalues of two 4x4 Matrices (which should be symmetric and real by concept, but sometimes the EigenSolver does not get real Eigenvalues). These Eigenvalues I calculate with the Eigen Package. The entries of the Matrices are given by v1,v2 and p.
There are certain Input Sets for which some of these Eigenvalues become negative. These are Input Sets which I want to ignore for my calculation as they will lead to an complex Function value and my Function is only allowed to have real values.
Is there a way to include this? My first attempt was a Nelder-Mead-Simplex Algorithm using the GSL-Library and an way too high Output value for the Function if one of the Eigenvalues becomes negative, but this doesn't work.
Thanks for any suggestions.
For the Nelder-Mead simplex, you could reject new points as vertices for the simplex, unless they have the desired properties.
Your method to artificially increase the function value for forbidden points is also called penalty or barrier function. You might want to re-design your penalty function.
Another optimization method without derivatives is the Simulated Annealing method. Again, you could modify the method to avoid forbidden points.
What do you mean by "doesn't work"? Does it take too long? Are the resulting function values too high?
Depending on the function evaluation cost, it might be an approach to simply scan a 2D interval, evaluate all width x height function values and drill down in the tile with the lowest function values.
Suppose I have a bunch of instances and I want to find the closest K instances to a particular instance. Moreover, I have some weights showing the strengths of each dimension as we computing the distances. How can I incorporate these weights with the KNN finding process in MATLAB?
There are two methods that can allow you to do this. Looking at the knnsearch documentation, you can either use the seuclidean flag where this performs the standardized Euclidean distance. Each co-ordinate difference between two points is scaled by dividing by a corresponding scale value in S. S by default is the standard deviation for each co-ordinate. You can manually specify each of these scales by specifying the Scale parameter, then specifying a vector where each component will scale each dimension for you instead of the standard deviation in each dimension.
As such, the more contribution a co-ordinate has, the larger the scale should be, as you want to aggregate co-ordinates and will allow distances that are larger to have a smaller Euclidean distance. This is essentially the same thing as weighting the strengths in each dimension.
Alternatively, you can provide your own function that computes the distance between two vectors. You can define what these weights are in your workspace before hand, then create an anonymous function wrapper that accesses these weights when computing whatever distance measure you want yourself. The anonymous function can only take in two vectors, corresponding to two different co-ordinate vectors in KNN. As such, use this anonymous function to access the weights that should be already defined in the workspace then go from there.
Check out: http://www.mathworks.com/help/stats/knnsearch.html