So I understand that with SVM ||w|| is the norm to the hyperplane. I'm wondering though, if this ||w|| ever changes in LibSVM. I ask because I'm wanting to find the distance from certain vectors to the hyperplane. The problem with that is that MATLAB LibSVM doesn't natively do this. They, though, do give decision values that are related to the distance and ||w||.
tldr; ||w||--->LibSVM-->is this value constant?
Obviously not. The norm of normal to the hyperplane is the whole point of the SVM optimization. I would say it is the only thing truly "variable" in the SVM. What is this model doing is trying to minimize this term, so it cannot be constant.
Related
I am fitting data to a system of non linear ODEs to estimate model parameters using Matlab lsqcurvefit.
In this fitting the fit depends so much on the initial guesses that I use for the lsqcurvefit .
For example, if I use x0=5 as a initial guess I will get residual norm of 30, where as if I choose x0=5.2 I get a residual norm of 1.5.
1) What does residual norm (resnorm) in Matlab represent? is it the sum of the squared errors? Is there a way to decide what range of value for resnorm is acceptable.
2) When the fit depends so much on initial guess, is there a way to deal with these problems? How would I know whether a better fit can be obtained from a different initial guess?
3) In using lsqcurvefit is it required to check whether the residuals are normally distributed?
lsqcurvefit fits your data in the least squares sense. Thus it all comes down to the minimisation, and as your data is non-linear, you do not have any guarantee, that this is the global minimum nor that it is unique.
E.g. Consider the function sin(x), which x-value minimises this function, well all x=2*pi*n + 3/2*pi for n=0,1,2,... but your numerical method can only return one solution, which will then depend on your initial guess.
To further elaborate on the problem. The simplest (in my opinion) minimisation algorithm is known as the steepest descent. it uses the idea, known from calculus, that the steepest descent is in the direction of the minus the gradient. Thus it finds the gradient in the suggested point takes a step in negative that direction (scaled by some stepsize) and continues to do this until the step/derivative is significantly small.
However, even if you consider the function cos(3 pi x)/x from 0.5 to infinity, which does have a unique global minima in 1, you only find it if your guess lies in between 0.7 and 1.3 (or so). All other guesses will find their respective local minima.
With this we can answer your questions:
1) resnorm is the 2-norm of the residuals. What would it mean that specific norm would be acceptable? The algorithm is looking for a minimum, if you already are at a minimum, what would it mean to continue the search?
2) Not in an (pseudo) exact sense. What is typically done is to either use your knowledge to come up with a sensible initial guess. If this is not possible, you simply have to repeatedly make random initial guesses and then keep the best.
3) Depends on what you want to do, if you want to make statistical tests, which depends on the residuals being normally distributed, then YES. If you are solely interested in fitting the function with the lowest residual norm, then NO.
I want to fit a curve from a theoretical model to experimental data points. The model consists of 5 parameters. I can easily get the closest fit but I want something different. I need the closest fit possible but it should never go below the experimental curve. In other words, every y-value of the fit should be greater than or equal to the corresponding y-value from the experiment.
I would highly appreciate any ideas on how this could be implemented. Thanks!
Have you tried adding nonlinear constraints to your genetic algorithm?
More details are given here
https://www.mathworks.com/help/gads/examples/constrained-minimization-using-the-genetic-algorithm.html
In your case all you would need to do would be
assign
the 'c' inequality constraint value in your non-linear constraints function
to the difference between the new y values and the actual y values and the genetic algorithm should do the rest.
I'm running a series of SVM classifiers for a binary classification problem, and am getting very nice results as far as classification accuracy.
The next step of my analysis is to understand how the different features contribute to the classification. According to the documentation, Matlab's fitcsvm function returns a class, SVMModel, which has a field called "Beta", defined as:
Numeric vector of trained classifier coefficients from the primal linear problem. Beta has length equal to the number of predictors (i.e., size(SVMModel.X,2)).
I'm not quite sure how to interpret these values. I assume higher values represent a greater contribution of a given feature to the support vector? What do negative weights mean? Are these weights somehow analogous to beta parameters in a linear regression model?
Thanks for any help and suggestions.
----UPDATE 3/5/15----
In looking closer at the equations describing the linear SVM, I'm pretty sure Beta must correspond to w in the primal form.
The only other parameter is b, which is just the offset.
Given that, and given this explanation, it seems that taking the square or absolute value of the coefficients provides a metric of relative importance of each feature.
As I understand it, this interpretation only holds for the linear binary SVM problem.
Does that all seem reasonable to people?
Intuitively, one can think of the absolute value of a feature weight as a measure of it's importance. However, this is not true in the general case because the weights symbolize how much a marginal change in the feature value would affect the output, which means that it is dependent on the feature's scale. For instance, if we have a feature for "age" that is measured in years, but than we change it to months, the corresponding coefficient will be divided by 12, but clearly,it doesn't mean that the age is less important now!
The solution is to scale the data (which is usually a good practice anyway).
If the data is scaled your intuition is correct and in fact, there is a feature selection method that does just that: choosing the features with the highest absolute weight. See http://jmlr.csail.mit.edu/proceedings/papers/v3/chang08a/chang08a.pdf
Note that this is correct only to linear SVM.
I need to calculate some kind of distance between to curves.
Those are general curves, and may not be functions - that is, some values of x may be mapped to more then one value.
EDIT
The curves are given as a list of X,Y pairs and the logical curve is the line passing through all the points in the order given. a typical data set will include about 1000 points
as noted, the curve may not be a function, but is usually similar to a function
This issue us what prevents using interp1 or the curve fitting toolbox (in Matlab)
The distance measure I was thinking of the the area of the region between the curves - but any reasonable alternative is ok.
EDIT
a sample illustration of to curves, and the area I want to compute
A Matlab solution is preferred, but other languages are also fine.
If you have functions that are of the type y = f(x) and they are defined over the same domain, then a common way to find the "distance" is to use the L2 norm as explained here http://en.wikipedia.org/wiki/L2_norm#p-norm. This is simply the integral of the absolute value of the difference between the functions squared. If you have parametric curves then you cannot directly employ this approach. If the L2 norm is not good enough for your requirements then you will need to provide a more concrete definition of what you mean by "distance". If you are unclear as to what you need try taking a look at different types of mathematical norm and see if any of the commonly used ones are what you need (ie L1 norm, uniform norm). The wikepedia link above is a good start point. If the L2 is good enough then you need a way to calculate the integral that you have - there are many many numerical integration techniques out there and I suggest google is your friend here (or a good numerical analysis text book).
If you do have parametric type curves then this is very nontrivial. Using the "area" between curves is not a good idea as there is no clear way to define this area and would become even more complicated in the general case where you could have self-intersecting curves. If your curves are parametrized in the same way you could try some very crude measurement where you evaluate points on each curve at equally spaced values over the parameter range, then calculate the length of the distance between each, sum and take the average as a notion of "closeness". i.e. partition your parameter range into a set {u_0, ... , u_n} and evaluate curve1(u_i) and curve2(u_i) for each i to generate a set of n paired points. Then sum the euclidean distances between each pair of points.
This is very very crude though and if the parametrizations are different then it wont be much use.
You need to define what you mean by distance between the curves. If it is the closest approach between two general curves, then it becomes quite difficult to solve the problem.
If the "curves" are not even representable as single valued functions of x, then it becomes more complex yet.
Merely telling us that you need to define "some kind of distance" is too broad of a statement to be on-topic here, and it says that you have not yet thought out the problem you wish to solve.
If all you are willing to tell us is that the curves are two totally general parametric curves, which may be closed or not, or they may not even lie over the same domain, then the question becomes so totally ill-posed as to be impossible to answer. What is the area between two curves in that case?
If the curves are defined over the SAME support, then subtracting them and integration of the absolute value or square of the difference will be adequate. But you have already told us that these "curves" may be multi-valued. In that case, it is essentially impossible to do what you are asking.
I have a problem where I am fitting a high-order polynomial to (not very) noisy data using linear least squares. Currently I'm using polynomial orders around 15 - 25, which work surprisingly well: The dependence is very nearly linear, but the accuracy of modelling the 'very nearly' is critical. I'm using Matlab's polyfit() function, and (obviously) normalising the x-data. This generally works fine, but I have come across an issue with some recent datasets. The fitted polynomial has extrema within the x-data interval. For the application I'm working on this is a non-no. The polynomial model must have no stationary points over the x-interval.
So I need to add a constraint to the least-squares problem: the derivative of the fitted polynomial must be strictly positive over a known x-range (or strictly negative - this depends on the data but a simple linear fit will quickly tell me which it is.) I have had a quick look at the available optimisation toolbox functions, but I admit I'm at a loss to know how to go about this. Does anyone have any suggestions?
[I appreciate there are probably better models than polynomials for this data, but in the short term it isn't feasible to change the form of the model]
[A closing note: I have finally got the go-ahead to replace this awful polynomial model! I am going to adopt a nonparametric approach, spline smoothing, using the excellent SPLINEFIT code by Jonas Lundgren. This has the advantage that I'm already using a spline model in the end-user application, so I already have C# code available to evaluate a spline model]
You could use cftool and use the exclude data points option.