I want to train a SVM with non-linear boundary. The boundary is known, expressed with formula
y = sgn( (w11*x1+ w12*x2 + w13*x3)* (w21*x4+ w22*x5 + w23*x6) ), where [x1 x2 ... x6] are 1-bit inputs, [w11 w12 w13 w21 w22 w23] are unknown parameters.
How can I learn [w11 w12 w13 w21 w22 w23] with train data?
SVM is not an algorithm for such task. SVM has its own criterion to maximize, which has nothing to do with the decision boundary shape (ok, not nothing, but it is hard to convert one to another). Obviously, one can try to predefine custom kernel function to do so, but this task seems as almost unsolvable problem (I can't think of any reproducing hilbert space with such decision boundaries).
In short: your question is a bit like "how to make a watermelon remove nails from the wall?". Obviously - you can do some pretty hard "magic" to do so, but this is not what watermelons are for.
Related
I am taking an Econometrics course, and have been trying to use Python rather than the propreitry STATA and EVIEWS they set the assignments in.
In one of the questions, I have consumption data over time. I am asked to compute it in two ways.
The first way is calculating a model of the form consumption = Aexp(Bt), and the second way is to log both sides and do ordinary OLS on log(consumption) = alpha + Bt
I know how to do the second way. Howver, when I try to do the first way it goes wrong. Using statsmodels, I can exponentiate the time data (after normalising), but this calculates a regression in the form consumption = Aexp(t) + B, which is not what I want. (I want to specify where the parameters go). In sklearn I could find a polynomial regression, but not exponential.
Then I found scipy.curve_fit
However this seems to have two problems:
(1) It seems to rely on initial guesses for parameters, which means my output will end up being different from proprietry software (whereas output for things like OLS are the same) [as I assume initial guesses means some iterative solution is done which is helpful for very weird and wonderful functions, but I assume fairly standard results hold for exponential regression]
(2) every time I try to implement it, it just returns the guess parameters.
Here is my code
`consumption_data = pd.read_csv(......\consumption.csv")
def func(x,a,b):
return a * np.exp(b*x)
xdata = consumption_data.YEAR
ydata = consumption_data.CONSUMPTION
ydata = (ydata - 1948)/100
popt, pcov = curve_fit(func, xdata, ydata, (1,1))
print(popt)
plt.plot(xdata, func(xdata, *popt), 'g--',)
`
The scipy.optimize code is basically just copy-pasted from their tutorial
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
short answer: use statsmodels GLM
statsmodels does not have nonlinear least squares. The best python library for that is lmfit https://pypi.org/project/lmfit/
curve_fit, lmfit and nonlinear least squares algorithm in general find an iterative solution to the optimization problem. Even when we have to provide starting values, the solution is in many cases the same across packages up to convergence tolerance, e.g. 1e-5 or 1e-6.
Many standard models in statistics and econometrics have a single global maximum with well behaved data. However, in other cases like mixture models, there might be many local optima and the estimation might converge to one of them.
To the specific case:
consumption = A exp(B t)
can be rewritten as
consumption = exp(a + B t)
So this is just a single index model or a generalized linear model with an exponential mean function.
The general version has the expectation of the dependent variable as a nonlinear function of a linear combination of the explanatory variables:
E(y | x) = g(x b)
This can be estimated with statsmodels with GLM with family Gaussian and the log-link.
Aside: In econometrics, there is a literature to use Poisson quasi-likelihood as an estimator for exp models instead of taking the log of the dependent variable.
Poisson usually uses the log-link function as in the above.
However, using GLM allows us to use log-link, i.e. exponential mean function, with any of the supported distribution families. The main difference is in the underlying variance assumption. Gaussian assumes constant variance, Poisson assumes that the variance is proportional to the mean and Gamma assumes that the variance is quadratic in the mean.
If we use a robust sandwich covariance estimator for parameter inference, then standard errors and inference are correct even if the variance function is misspecified.
I have a neural network with N input nodes and N output nodes, and possibly multiple hidden layers and recurrences in it but let's forget about those first. The goal of the neural network is to learn an N-dimensional variable Y*, given N-dimensional value X. Let's say the output of the neural network is Y, which should be close to Y* after learning. My question is: is it possible to get the inverse of the neural network for the output Y*? That is, how do I get the value X* that would yield Y* when put in the neural network? (or something close to it)
A major part of the problem is that N is very large, typically in the order of 10000 or 100000, but if anyone knows how to solve this for small networks with no recurrences or hidden layers that might already be helpful. Thank you.
If you can choose the neural network such that the number of nodes in each layer is the same, and the weight matrix is non-singular, and the transfer function is invertible (e.g. leaky relu), then the function will be invertible.
This kind of neural network is simply a composition of matrix multiplication, addition of bias and transfer function. To invert, you'll just need to apply the inverse of each operation in the reverse order. I.e. take the output, apply the inverse transfer function, multiply it by the inverse of the last weight matrix, minus the bias, apply the inverse transfer function, multiply it by the inverse of the second to last weight matrix, and so on and so forth.
This is a task that maybe can be solved with autoencoders. You also might be interested in generative models like Restricted Boltzmann Machines (RBMs) that can be stacked to form Deep Belief Networks (DBNs). RBMs build an internal model h of the data v that can be used to reconstruct v. In DBNs, h of the first layer will be v of the second layer and so on.
zenna is right.
If you are using bijective (invertible) activation functions you can invert layer by layer, subtract the bias and take the pseudoinverse (if you have the same number of neurons per every layer this is also the exact inverse, under some mild regularity conditions).
To repeat the conditions: dim(X)==dim(Y)==dim(layer_i), det(Wi) not = 0
An example:
Y = tanh( W2*tanh( W1*X + b1 ) + b2 )
X = W1p*( tanh^-1( W2p*(tanh^-1(Y) - b2) ) -b1 ), where W2p and W1p represent the pseudoinverse matrices of W2 and W1 respectively.
The following paper is a case study in inverting a function learned from Neural Networks. It is a case study from the industry and looks a good beginning for understanding how to go about setting up the problem.
An alternate way of approaching the task of getting the desired x that yields desired y would be start with random x (or input as seed), then through gradient decent (similar algorithm to back propagation, difference being that instead of finding derivatives of weights and biases, you find derivatives of x. Also, mini batching is not needed.) repeatedly adjust x until it yields a y that is close to the desired y. This approach has an advantage that it allows an input of a seed (starting x, if not randomly selected). Also, I have a hypothesis that the final x will have some similarity to initial x(seed), which would imply that this algorithm has the ability to transpose, depending on the context of the neural network application.
I would like to measure the goodness-of-fit to an exponential decay curve. I am using the lsqcurvefit MATLAB function. I have been suggested by someone to do a chi-square test.
I would like to use the MATLAB function chi2gof but I am not sure how I would tell it that the data is being fitted to an exponential curve
The chi2gof function tests the null hypothesis that a set of data, say X, is a random sample drawn from some specified distribution (such as the exponential distribution).
From your description in the question, it sounds like you want to see how well your data X fits an exponential decay function. I really must emphasize, this is completely different to testing whether X is a random sample drawn from the exponential distribution. If you use chi2gof for your stated purpose, you'll get meaningless results.
The usual approach for testing the goodness of fit for some data X to some function f is least squares, or some variant on least squares. Further, a least squares approach can be used to generate test statistics that test goodness-of-fit, many of which are distributed according to the chi-square distribution. I believe this is probably what your friend was referring to.
EDIT: I have a few spare minutes so here's something to get you started. DISCLAIMER: I've never worked specifically on this problem, so what follows may not be correct. I'm going to assume you have a set of data x_n, n = 1, ..., N, and the corresponding timestamps for the data, t_n, n = 1, ..., N. Now, the exponential decay function is y_n = y_0 * e^{-b * t_n}. Note that by taking the natural logarithm of both sides we get: ln(y_n) = ln(y_0) - b * t_n. Okay, so this suggests using OLS to estimate the linear model ln(x_n) = ln(x_0) - b * t_n + e_n. Nice! Because now we can test goodness-of-fit using the standard R^2 measure, which matlab will return in the stats structure if you use the regress function to perform OLS. Hope this helps. Again I emphasize, I came up with this off the top of my head in a couple of minutes, so there may be good reasons why what I've suggested is a bad idea. Also, if you know the initial value of the process (ie x_0), then you may want to look into constrained least squares where you bind the parameter ln(x_0) to its known value.
I am trying to implement a routine for fitting electrophoretic data from my experiments.
The aim is to derive kinetic parameters for the interaction of biomoecules from the relative areas of peaks in the electropherogram, based on the areas of the peaks in the dataset.
Since all relevant differential equations are known and since the set of equations has an analytical solution, as described here:
Analytical solution manuscript
I set about entering the relevant equations (6, 8, 13, ... from the referenced manuscript) in matlab.
The thus created function works and I can use it to simulate electropherograms of interacting species.
Obviuously, I now would like to use the function to fit experimental data and retrieve the parameters (8 in total, Va, Vc, MUa, MUc, k, A0, C0, baseline noise).
Some of these will obviously be correlated. Example values might be (to give an idea of their magnitude):
params0 = [ ...
8.44E-02; ... % Va
1.25E-01; ... % Vc
5.32E-05; ... % MUa
8.87E-05; ... % MUc
4.48E-03; ... % k
6.06E-01; ... % A0
3.00E-00; ... % C0
4.64E-03 ... % noise
];
My problem is, if I supply experimental data and try something like lsqcurvefit:
[x,resnorm,residual] = lsqcurvefit(#(param,xdata) Electropherogram2(param,xdata,column), params0, time, ydata,lb, ub);
I often get very poor results because I either run out of iterations, I hit some (obviously poorly fitting) local minimum or whatever...
Only if I tinker a lot with the starting values and the allowed intervals (i.e. because I know likely values through other experiments) do I end up with more or less decent fits, but even then, fits are not as good as reported in the original manuscript (fig. 3).
The authors of that manuscript used Excel solver and were kind enough to provide the original data used in Fig. 3 but still I cannot seem to end up with fits as good as theirs without nearly literally supplying the nearly correct starting values.
I am not experienced enough to know what I could tweak to make this process less trial-and-error.
Would something like the global optimization toolbox help me?
Any tips are welcome...
In the mentioned paper ("Analytical solution manuscript") it is implied that the free optimization parameters are five (Va, Vc, MUa, MUc, k) and not eight because the (Aeq/Ceq) ratio can be computed from their representative equations, eq. 8 for Aeq and (obviously) eq. 6 for Ceq.
In my opinion, what's even more troubling is the appearance of the following products in the model, comprised of the free optimization parameters:
k and Va in eq. 12
MUc and Va in the equation for epsilon_A in eq. 12
MUa and Vc in the equation for epsilon_A in eq. 12
In general, non-linear optimization algorithms have a legitimate trouble in optimizing the free parameters when pairs of the latter appear as products in the non-linear model.
This question has already confused me several days. While I referred to senior students, they also cannot give a reply.
We have ten ODEs, into which each a noise term should be added. The noise is defined as follows. since I always find that I cannot upload a picture, the formula below maybe not very clear. In order to understand, you can either read my explanation or go the this address: Plos one. You could find the description of the equations directly above the Support Information in this address
The white noise term epislon_i(t) is assumed with Gaussian distribution. epislon_i(t) means that for equation i, and at t timepoint, the value of the noise.
the auto-correlation of noise are given:
(EQ.1)
where delta(t) is the Dirac delta function and the diffusion matrix D is defined by
(EQ.2)
Our problem focuses on how to explain the Dirac delta function in the diffusion matrix. Since the property of Dirac delta function is delta(0) = Inf and delta(t) = 0 if t neq 0, we don't know how to calculate the epislonif we try to sqrt of 2D(x, t)delta(t-t'). So we simply assume that delta(0) = 1 and delta(t) = 0 if t neq 0; But we don't know whether or not this is right. Could you please tell me how to use Delta function of diffusion equation in MATLAB?
This question associates with the stochastic process in MATLAB. So we review different stochastic process to inspire our ideas. In MATLAB, the Wienner process is often defined as a = sqrt(dt) * rand(1, N). N is the number of steps, dt is the length of the steps. Correspondingly, the Brownian motion can be defined as: b = cumsum(a); All of these associate with stochastic process. However, they doesn't related to the white noise process which has a constraints on the matrix of auto-correlation, noted by D.
Then we consider that, we may simply use randn(1, 10) to generate a vector representing the noise. However, since the definition of the noise must satisfy the equation (2), this cannot enable noise term in different equation have the predefined partial correlation (D_ij). Then we try to use mvnrnd to generate a multiple variable normal distribution at each time step. Unfortunately, the function mvnrnd in MATLAB return a matrix. But we need to return a vector of length 10.
We are rather confused, so could you please give me just a light? Thanks so much!
NOTE: I see two hazy questions in here: 1) how to deal with a stochastic term in a DE and 2) how to deal with a delta function in a DE. Both of these are math related questions and http://www.math.stackexchange.com will be a better place for this. If you had a question pertaining to MATLAB, I haven't been able to pin it down, and you should perhaps add code examples to better illustrate your point. That said, I'll answer the two questions briefly, just to put you on the right track.
What you have here are not ODEs, but Stochastic differential equations (SDE). I'm not sure how you're using MATLAB to work with this, but routines like ode45 or ode23 will not be of any help. For SDEs, your usual mathematical tools of separation of variables/method of characteristics etc don't work and you'll need to use Itô calculus and Itô integrals to work with them. The solutions, as you might have guessed, will be stochastic. To learn more about SDEs and working with them, you can consider Stochastic Differential Equations: An Introduction with Applications by Bernt Øksendal and for numerical solutions, Numerical Solution of Stochastic Differential Equations by Peter E. Kloeden and Eckhard Platen.
Coming to the delta function part, you can easily deal with it by taking the Fourier transform of the ODE. Recall that the Fourier transform of a delta function is 1. This greatly simplifies the DE and you can take an inverse transform in the very end to return to the original domain.