Efficient recalculation of weighted least squares regression when weights change [closed] - linear-regression

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
I am performing weighted least squares regression as described on wiki: WLS
I need to solve this equation: $B= (t(X)WX)^{-1}*t(X)Wy$
I use SVD to find: $(t(X)WX)^{-1}$ and store it in a matrix. In addition I store the matrix $H= (t(X)WX)^{-1}*t(X)W$ and simply do the following for any new value of y: B= Hy. This way I am able to save the cost of repeating SVD and matrix multiplications as y's change.
W is a diagonal matrix and generally does not change. However sometimes I change one or two elements on the diagonal in the W matrix. In that case I need to do SVD again and recalc the H matrix. This is clearly slow and time consuming.
My question is: If I know what changed in W and nothing changes in X is there a more efficient method to recalculate (t(X)WX)^-1?
Or put differently is there an efficient analytic method to find B given that only diagonal elements in W can change by a known amount?

There is such a method, in the case that the inverse you compute is a true inverse and not a generalised inverse (ie none of the singular values are 0). However some caution in using this is recommended. If you were doing your sums in infinite precision, all would be well. With finite precision, and particularly with nearly singular problems -- if some of the singular values are very large -- these formulae could result in loss of precision.
I'll call inverse you store C. If you add d (which can be positive or negative) to the m'th weight, then the modified C matrix, C~ say, and the modified H, H~, can be computed like this:
(' denotes transpose, and e_m is row the vector that's all 0, except the m'th slot is 1)
Let
c = the m'th column of H, divided by the original m'th weight
a = m'th row of the data matrix X
f = e_m - a*H
gamma = 1/d + a*c
(so c is a column vector, while a and f are row vectors)
Then
C~ = C - c*c'/gamma
H~ = H + c*f/gamma
If you want to find the new B, B~ say, for a given y, it can be calculated via:
r = y[m] - a*B
B~ = B + (r/gamma) * c
The derivation of these formulae is straightforward, but tedious, matrix algebra. The matrix inversion lemma comes in handy.

Related

Fit f(x,y,z)=0 to a set of 3D points using MATLAB

The statement is the following:
Given a 3D set of points (x,y,z), fit a surface defined by a certain number of parameters IMPLICITLY
I'm definitely no expert in programming, but I need to get this done in one way or another. I've considered other programs, such as OriginPro, which can solve this problem pretty easily, but I want to have it done in MATLAB.
The surface is defined by:
A*x^2+B*y^2+C*z^2+D*x+E*y+F*z+G*xy+H*xz+I*yz+J=0
Considering that the Curve Fitting Toolbox can only fit explicit functions, what would you guys suggest?
IMPORTANT REMARK: I'm not asking for a solution, just advice on how to proceed
This can be chalked up to solving a linear system of equations where each point forms a constraint or equation in your system. You would thus find the right set of coefficients in your surface equation that satisfies all points. Using the equation in your question as is, one would find the null space of the linear system that satisfies the surface equation. What you'll have to do is given a set of m points that contain x, y and z coordinates, we can reformulate the above equation as a matrix-vector multiplication with the first argument technically being a matrix of one row and the vector being the coefficients that fit your plane. This is important before we proceed to the null space part of this problem.
In particular, you can agree with me that we can represent the above in the following matrix-vector multiplication:
[x^2 y^2 z^2 x y z xy xz yz 1][A] = [0]
[B]
[C]
[D]
[E]
[F]
[G]
[H]
[I]
[J]
Our objective is to find the coefficients A, B, ..., J that would satisfy the constraint above. Now moving onto the more general case, since you have m points, we can build our linear system and thus a matrix of coefficients on the left side of this expression:
[x_1^2 y_1^2 z_1^2 x_1 y_1 z_1 x_1*y_1 x_1*z_1 y_1*z_1 1][A] = [0]
[x_2^2 y_2^2 z_2^2 x_2 y_2 z_2 x_2*y_2 x_2*z_2 y_2*z_2 1][B] = [0]
[x_3^2 y_3^2 z_3^2 x_3 y_3 z_3 x_3*y_3 x_3*z_3 y_3*z_3 1][C] = [0]
... [D] = [0]
... [E] = [0]
... [F] = [0]
... [G] = [0]
... [H] = [0]
... [I] = [0]
[x_m^2 y_m^2 z_m^2 x_m y_m z_m x_m*y_m x_m*z_m y_m*z_m 1][J] = [0]
We now build this linear system, and solve to find our coefficients. The trick is to build the matrix that you see on the left hand side of this linear system, which I will call M. Each row is such that you create [x_i^2 y_i^2 z_i^2 x_i y_i z_i x_i*y_i x_i*z_i y_i*z_i 1] with x_i, y_i and z_i being the ith (x,y,z) coordinate in your dataset.
Once you build this, you would thus find the null space of this system. There are many methods to do this in MATLAB. One way is to simply use the null function on the matrix you build above and it will return to you a matrix where each column is a potential solution to the surface you are fitting above. That is, each column directly corresponds to the coefficients A, B, ..., J that would fit your data to the surface. You can also try using the singular value decomposition or QR decomposition if you like, but the null function is a good place to start as it uses singular value decomposition already.
I would like to point out that the above will only work if you provide a matrix that is not full rank. To simplify things, this can happen if the number of points you have is less than the number of parameters you have. Therefore, this method would only work if you had up to 9 points in your data. If you have exactly 9, then this method will work very well. If you have less than 9, then there will be more potential solutions as the number of degrees of freedom increases. Specifically, you will have 10 - m possible solutions and any of those solutions is valid. If you have more than 10 points and they are all unique, this would be considered a full rank matrix and thus the only solution to the null space is the trivial one with the coefficients being all set to 0.
In order to escape the possibility of the null space being all 0, or the possibility of the null space providing more than one solution, you probably just want one solution, and you most likely have 10 or more possible points that you want to fit your data with. An alternative method that I can provide is simply an extension of the above but we don't need to find the null space. Specifically, you relax one of the coefficients, say J and you can set that to any value you wish. For example, set it to J = 1. Therefore, the system of equations now changes where J disappears from the mix and it now appears on the right side of the system:
[x_1^2 y_1^2 z_1^2 x_1 y_1 z_1 x_1*y_1 x_1*z_1 y_1*z_1][A] = [-1]
[x_2^2 y_2^2 z_2^2 x_2 y_2 z_2 x_2*y_2 x_2*z_2 y_2*z_2][B] = [-1]
[x_3^2 y_3^2 z_3^2 x_3 y_3 z_3 x_3*y_3 x_3*z_3 y_3*z_3][C] = [-1]
... [D] = [-1]
... [E] = [-1]
... [F] = [-1]
... [G] = [-1]
... [H] = [-1]
[x_m^2 y_m^2 z_m^2 x_m y_m z_m x_m*y_m x_m*z_m y_m*z_m][I] = [-1]
You can thus find the parameters A, B, ..., I using linear least squares where the solution can be solved using the pseudoinverse. The benefit with this approach is that because the matrix is full rank, there is one and only one solution, thus being unique. Additionally, this formulation is nice because if there is an exact solution to the linear system, solving with the pseudoinverse will provide the exact solution. If there is no exact solution to the system, meaning that not all constraints are satisfied, the solution provided is one that minimizes the least squared error between the data and the parameters that were fit with that data.
MATLAB already has an awesome utility to solve a system through linear least squares - in fact, the core functionality of MATLAB is to solve linear algebra problems (if you didn't know that already). You can use matrix left division to solve the problem. Simply put, given that you built your matrix of coefficients above after you introduce the relaxation of J also being called M, the solution to the problem is simply coeff = M\(-ones(m,1)); with m being the number of points and coeff being the coefficients for the surface equation that fit your points. The ones statement in the code creates a column vector of ones that are negative that has m elements.
Using the least squares approach has a more stable and unique solution as you are specifically constraining one of the coefficients, J, to be 1. Using the null space approach only works if you have less points than you do parameters and will possibly give you more than one solution so long as the coefficients span the null space. Specifically, you will get 10 - m solutions and they are all equally good at fitting your data.
I hope this is enough to get you started and good luck!

Computing the SVD of a rectangular matrix

I have a matrix like M = K x N ,where k is 49152 and is the dimension of the problem and N is 52 and is the number of observations.
I have tried to use [U,S,V]=SVD(M) but doing this I get less memory space.
I found another code which uses [U,S,V]=SVD(COV(M)) and it works well. My questions are what is the meaning of using the COV(M) command inside the SVD and what is the meaning of the resultant [U,S,V]?
Finding the SVD of the covariance matrix is a method to perform Principal Components Analysis or PCA for short. I won't get into the mathematical details here, but PCA performs what is known as dimensionality reduction. If you like a more formal treatise on the subject, you can read up on my post about it here: What does selecting the largest eigenvalues and eigenvectors in the covariance matrix mean in data analysis?. However, simply put dimensionality reduction projects your data stored in the matrix M onto a lower dimensional surface with the least amount of projection error. In this matrix, we are assuming that each column is a feature or a dimension and each row is a data point. I suspect the reason why you are getting more memory occupied by applying the SVD on the actual data matrix M itself rather than the covariance matrix is because you have a significant amount of data points with a small amount of features. The covariance matrix finds the covariance between pairs of features. If M is a m x n matrix where m is the total number of data points and n is the total number of features, doing cov(M) would actually give you a n x n matrix, so you are applying SVD on a small amount of memory in comparison to M.
As for the meaning of U, S and V, for dimensionality reduction specifically, the columns of V are what are known as the principal components. The ordering of V is in such a way where the first column is the first axis of your data that describes the greatest amount of variability possible. As you start going to the second columns up to the nth column, you start to introduce more axes in your data and the variability starts to decrease. Eventually when you hit the nth column, you are essentially describing your data in its entirety without reducing any dimensions. The diagonal values of S denote what is called the variance explained which respect the same ordering as V. As you progress through the singular values, they tell you how much of the variability in your data is described by each corresponding principal component.
To perform the dimensionality reduction, you can either take U and multiply by S or take your data that is mean subtracted and multiply by V. In other words, supposing X is the matrix M where each column has its mean computed and the is subtracted from each column of M, the following relationship holds:
US = XV
To actually perform the final dimensionality reduction, you take either US or XV and retain the first k columns where k is the total amount of dimensions you want to retain. The value of k depends on your application, but many people choose k to be the total number of principal components that explains a certain percentage of your variability in your data.
For more information about the link between SVD and PCA, please see this post on Cross Validated: https://stats.stackexchange.com/q/134282/86678
Instead of [U, S, V] = svd(M), which tries to build a matrix U that is 49152 by 49152 (= 18 GB šŸ˜±!), do svd(M, 'econ'). That returns the ā€œeconomy-classā€ SVD, where U will be 52 by 52, S is 52 by 52, and V is also 52 by 52.
cov(M) will remove each dimensionā€™s mean and evaluate the inner product, giving you a 52 by 52 covariance matrix. You can implement your own version of cov, called mycov, as
function [C] = mycov(M)
M = bsxfun(#minus, M, mean(M, 1)); % subtract each dimensionā€™s mean over all observations
C = M' * M / size(M, 1);
(You can verify this works by looking at mycov(randn(49152, 52)), which should be close to eye(52), since each element of that array is IID-Gaussian.)
Thereā€™s a lot of magical linear algebraic properties and relationships between the SVD and EVD (i.e., singular value vs eigenvalue decompositions): because the covariance matrix cov(M) is a Hermitian matrix, itā€™s left- and right-singular vectors are the same, and in fact also cov(M)ā€™s eigenvectors. Furthermore, cov(M)ā€™s singular values are also its eigenvalues: so svd(cov(M)) is just an expensive way to get eig(cov(M)) šŸ˜‚, up to Ā±1 and reordering.
As #rayryeng explains at length, usually people look at svd(M, 'econ') because they want eig(cov(M)) without needing to evaluate cov(M), because you never want to compute cov(M): itā€™s numerically unstable. I recently wrote an answer that showed, in Python, how to compute eig(cov(M)) using svd(M2, 'econ'), where M2 is the 0-mean version of M, used in the practical application of color-to-grayscale mapping, which might help you get more context.

Exponential curve fit matlab

I have the following equation:
I want to do a exponential curve fitting using MATLAB for the above equation, where y = f(u,a). y is my output while (u,a) are my inputs. I want to find the coefficients A,B for a set of provided data.
I know how to do this for simple polynomials by defining states. As an example, if states= (ones(size(u)), u u.^2), this will give me L+Mu+Nu^2, with L, M and N being regression coefficients.
However, this is not the case for the above equation. How could I do this in MATLAB?
Building on what #eigenchris said, simply take the natural logarithm (log in MATLAB) of both sides of the equation. If we do this, we would in fact be linearizing the equation in log space. In other words, given your original equation:
We get:
However, this isn't exactly polynomial regression. This is more of a least squares fitting of your points. Specifically, what you would do is given a set of y and set pair of (u,a) points, you would build a system of equations and solve for this system via least squares. In other words, given the set y = (y_0, y_1, y_2,...y_N), and (u,a) = ((u_0, a_0), (u_1, a_1), ..., (u_N, a_N)), where N is the number of points that you have, you would build your system of equations like so:
This can be written in matrix form:
To solve for A and B, you simply need to find the least-squares solution. You can see that it's in the form of:
Y = AX
To solve for X, we use what is called the pseudoinverse. As such:
X = A^{*} * Y
A^{*} is the pseudoinverse. This can eloquently be done in MATLAB using the \ or mldivide operator. All you have to do is build a vector of y values with the log taken, as well as building the matrix of u and a values. Therefore, if your points (u,a) are stored in U and A respectively, as well as the values of y stored in Y, you would simply do this:
x = [u.^2 a.^3] \ log(y);
x(1) will contain the coefficient for A, while x(2) will contain the coefficient for B. As A. Donda has noted in his answer (which I embarrassingly forgot about), the values of A and B are obtained assuming that the errors with respect to the exact curve you are trying to fit to are normally (Gaussian) distributed with a constant variance. The errors also need to be additive. If this is not the case, then your parameters achieved may not represent the best fit possible.
See this Wikipedia page for more details on what assumptions least-squares fitting takes:
http://en.wikipedia.org/wiki/Least_squares#Least_squares.2C_regression_analysis_and_statistics
One approach is to use a linear regression of log(y) with respect to uĀ² and aĀ³:
Assuming that u, a, and y are column vectors of the same length:
AB = [u .^ 2, a .^ 3] \ log(y)
After this, AB(1) is the fit value for A and AB(2) is the fit value for B. The computation uses Matlab's mldivide operator; an alternative would be to use the pseudo-inverse.
The fit values found this way are Maximum Likelihood estimates of the parameters under the assumption that deviations from the exact equation are constant-variance normally distributed errors additive to A uĀ² + B aĀ³. If the actual source of deviations differs from this, these estimates may not be optimal.

How do I determine the coefficients for a linear regression line in MATLAB? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm going to write a program where the input is a data set of 2D points and the output is the regression coefficients of the line of best fit by minimizing the minimum MSE error.
I have some sample points that I would like to process:
X Y
1.00 1.00
2.00 2.00
3.00 1.30
4.00 3.75
5.00 2.25
How would I do this in MATLAB?
Specifically, I need to get the following formula:
y = A + Bx + e
A is the intercept and B is the slope while e is the residual error per point.
Judging from the link you provided, and my understanding of your problem, you want to calculate the line of best fit for a set of data points. You also want to do this from first principles. This will require some basic Calculus as well as some linear algebra for solving a 2 x 2 system of equations. If you recall from linear regression theory, we wish to find the best slope m and intercept b such that for a set of points ([x_1,y_1], [x_2,y_2], ..., [x_n,y_n]) (that is, we have n data points), we want to minimize the sum of squared residuals between this line and the data points.
In other words, we wish to minimize the cost function F(m,b,x,y):
m and b are our slope and intercept for this best fit line, while x and y are a vector of x and y co-ordinates that form our data set.
This function is convex, so there is an optimal minimum that we can determine. The minimum can be determined by finding the derivative with respect to each parameter, and setting these equal to 0. We then solve for m and b. The intuition behind this is that we are simultaneously finding m and b such that the cost function is jointly minimized by these two parameters. In other words:
OK, so let's find the first quantity :
We can drop the factor 2 from the derivative as the other side of the equation is equal to 0, and we can also do some distribution of terms by multiplying the -x_i term throughout:
Next, let's tackle the next parameter :
We can again drop the factor of 2 and distribute the -1 throughout the expression:
Knowing that is simply n, we can simplify the above to:
Now, we need to simultaneously solve for m and b with the above two equations. This will jointly minimize the cost function which finds the best line of fit for our data points.
Doing some re-arranging, we can isolate m and b on one side of the equations and the rest on the other sides:
As you can see, we can formulate this into a 2 x 2 system of equations to solve for m and b. Specifically, let's re-arrange the two equations above so that it's in matrix form:
With regards to above, we can decompose the problem by solving a linear system: Ax = b. All you have to do is solve for x, which is x = A^{-1}*b. To find the inverse of a 2 x 2 system, given the matrix:
The inverse is simply:
Therefore, by substituting our quantities into the above equation, we solve for m and b in matrix form, and it simplifies to this:
Carrying out this multiplication and solving for m and b individually, this gives:
As such, to find the best slope and intercept to best fit your data, you need to calculate m and b using the above equations.
Given your data specified in the link in your comments, we can do this quite easily:
%// Define points
X = 1:5;
Y = [1 2 1.3 3.75 2.25];
%// Get total number of points
n = numel(X);
% // Define relevant quantities for finding quantities
sumxi = sum(X);
sumyi = sum(Y);
sumxiyi = sum(X.*Y);
sumxi2 = sum(X.^2);
sumyi2 = sum(Y.^2);
%// Determine slope and intercept
m = (sumxi * sumyi - n*sumxiyi) / (sumxi^2 - n*sumxi2);
b = (sumxiyi * sumxi - sumyi * sumxi2) / (sumxi^2 - n*sumxi2);
%// Display them
disp([m b])
... and we get:
0.4250 0.7850
Therefore, the line of best fit that minimizes the error is:
y = 0.4250*x + 0.7850
However, if you want to use built-in MATLAB tools, you can use polyfit (credit goes to Luis Mendo for providing the hint). polyfit determines the line (or nth order polynomial curve rather...) of best fit by linear regression by minimizing the sum of squared errors between the best fit line and your data points. How you call the function is so:
coeff = polyfit(x,y,order);
x and y are the x and y points of your data while order determines the order of the line of best fit you want. As an example, order=1 means that the line is linear, order=2 means that the line is quadratic and so on. Essentially, polyfit fits a polynomial of order order given your data points. Given your problem, order=1. As such, given the data in the link, you would simply do:
X = 1:5;
Y = [1 2 1.3 3.75 2.25];
coeff = polyfit(X,Y,1)
coeff =
0.4250 0.7850
The way coeff works is that these are the coefficients of the regression line, starting from the highest order in decreasing value. As such, the above coeff variable means that the regression line was fitted as:
y = 0.4250*x + 0.7850
The first coefficient is the slope while the second coefficient is the intercept. You'll also see that this matches up with the link you provided.
If you want a visual representation, here's a plot of the data points as well as the regression line that best fits these points:
plot(X, Y, 'r.', X, polyval(coeff, X));
Here's the plot:
polyval takes an array of coefficients (usually produced by polyfit), and you provide a set of x co-ordinates and it calculates what the y values are given the values of x. Essentially, you are evaluating what the points are along the best fit line.
Edit - Extending to higher orders
If you want to extend so that you're finding the best fit for any nth order polynomial, I won't go into the details, but it boils down to constructing the following linear system. Given the relationship for the ith point between (x_i, y_i):
You would construct the following linear system:
Basically, you would create a vector of points y, and you would construct a matrix X such that each column denotes taking your vector of points x and applying a power operation to each column. Specifically, the first column is the zero-th power, the first column is the first power, the second column is the second power and so on. You would do this up until m, which is the order polynomial you want. The vector of e would be the residual error for each point in your set.
Specifically, the formulation of the problem can be written in matrix form as:
Once you construct this matrix, you would find the parameters by least-squares by calculating the pseudo-inverse. How the pseudo-inverse is derived, you can read it up on the Wikipedia article I linked to, but this is the basis for minimizing a system by least-squares. The pseudo-inverse is the backbone behind least-squares minimization. Specifically:
(X^{T}*X)^{-1}*X^{T} is the pseudo-inverse. X itself is a very popular matrix, which is known as the Vandermonde matrix and MATLAB has a command called vander to help you compute that matrix. A small note is that vander in MATLAB is returned in reverse order. The powers decrease from m-1 down to 0. If you want to have this reversed, you'd need to call fliplr on that output matrix. Also, you will need to append one more column at the end of it, which is the vector with all of its elements raised to the mth power.
I won't go into how you'd repeat your example for anything higher order than linear. I'm going to leave that to you as a learning exercise, but simply construct the vector y, the matrix X with vander, then find the parameters by applying the pseudo-inverse of X with the above to solve for your parameters.
Good luck!

AX=0 and introducing error matrix [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I'm having a matrix of 10 rows and 5 columns, thatr I have called A
I would like to make AX=0 and determine X which contains 5 unknown parameters.
What I did is
null(A)
But it seems that it's considering what I did as a linear algebra.
I would like to introduce another matrix which contains a matrix of errors . which would give me a priori:
AX + E = 0
because the results are not accurate, indeed, but still, I would like to find the closest parameters of the unknown vector (X) and get an error matrix.
Can you help me with that?
You can do eigen value decomposition of A^T.
[Q, D] = eigen(A^T)
And take the eigen vectors that corresponds to the smallest eigen value.
That is take y vectors from the right hand side of the Q matrix.
y is the column number of X.
And then E = -AX
====edit=====
OK. actually you want to minimize E. But since E is a matrix I assume you want to minimize the sum of squares of all the elements in E.
That is:
||E||_F^2 = ||-AX||_F^2 = trace(X^T(A^TA)X)
You can do eigen decomposition of A^TA
[Q, D] = eigen(A^TA)
QA^TA = DQ => D = QA^TAQ^T
To make it smaller, X should be the columns of Qs that corresponds to the smallest eigen value in the diagonal of D.
If there is a eigen value equals to 0, then AX can be 0 and E=0