I try to understand on the following link linear regression the computing of coefficients beta0 and beta1 for the relation y = beta0 + beta1 x.
I understand the first computing of beta1 which is actually a simple least-squares regression, but with only one paramater to find (the slope coefficient) ?
In the example of "accidents", Why do they append a colum of ones to x array to compute the 2 coefficients :
X = [ones(length(x),1) x];
b = X\y
result :
b =
1.0e+02 *
1.427120171726537
0.000001256394274
what is the underlying calculation with this column of ones ?
If anyone could explain to me.
This is more like comment. But I am not allowed to do that, so writing as an answer.
They are adding column of ones to make it suitable for matrix multiplication. You have y = beta0 + beta1*x. In matrix multiplication form, it can be written as : y = [1 x]* [beta0 beta1]'. Please note transpose sign on beta matrix.
For reasons unkonwn to me, vectorization of variables is encouraged in Matlab and R. As per my knowledge, vectorization is expected to reduce resource consumption.
Ones are often added to introduce "bias". In your case, try visualizing this equation:
y = w1 * x + c
The ones are added to represent another input, but which is always one.
y = w1 * x1 + c * x2(which is 1)
So, to model equations with constants(bias) in them, ones are added to the input.
Because in the equation y = beta0 + beta1 * x, beta0 is implicitly multiplied by 1.
Put another way consider the ith (x,y) pair:
y[i] = beta0 + beta1 * x[i]
= beta0 * 1 + beta1 * x[i]
That 1 that is multiplying beta0 for any i is where the ones vector is coming from.
Related
I have a Matlab code from my class in which the professor does the step of assigning each data point to the nearest cluster using this code where c is the centroids matrix and x is the data matrix.
% norm squared of the centroids;
c2 = sum(c.^2, 1);
% For each data point x, computer min_j -2 * x' * c_j + c_j^2;
% Note that here is implemented as max, so the difference is negated.
tmpdiff = bsxfun(#minus, 2*x'*c, c2);
[val, labels] = max(tmpdiff, [], 2);
I am not sure how this is equivalent to the algorithm definition of this step in which the cluster assignment is done through
% For every centroid j and for every data point x_i
labels(i) = `argmin||x_i - c_j||^2`
Can anyone please explain to me how this works, essentially how computing
min_j -2 * x' * c_j + c_j^2
is equivalent to
argmin||x_i - c_j||^2
If we have a triangle such that the length of its sides is a, b, c, then
we know that (from the law of cosines)
a^2=c^2+b^2-2bc*cos(alpha)
where alpha is the angle between the side with size b and the size with size c.
Now, consider the triangle made of the three vertices x, c_j and O (the origin of R^n). Writing theta the angle between x and c, we have
argmin_j||x-c_j||^2
=argmin_j (||x||^2+||c_j||^2 - 2*||x||* ||c_j|| * cos(theta) )
which is equal to
argmin_j(||x||^2 + ||c||^2 - 2x^t c_j)
Now, remember that x is constant in this minimization, so the last equation is just equal to
argmin_j(||c_j||^2 - 2 x^t c_j)
which is the equation you minimize in your code.
I have an equation like this:
dy/dx = a(x)*y + b
where a(x) is a non-constant (a=1/x) and b is a vector (10000 rows).
How can I solve this equation?
Let me assume you would like to write a generic numerical solver for dy/dx = a(x)*y + b. Then you can pass the function a(x) as an argument to the right-hand side function of one of the ODE solvers. e.g.
a = #(x) 1/x;
xdomain = [1 10];
b = rand(10000,1);
y0 = ones(10000,1);
[x,y] = ode45(#(x,y,a,b)a(x)*y + b,xdomain,y0,[],a,b);
plot(x,y)
Here, I've specified the domain of x as xdomain, and the value of y at the bottom limit of x as y0.
From my comments, you can solve this without MATLAB. Assuming non-zero x, you can use an integrating factor to get a 10000-by-1 solution y(x)
y_i(x) = b_i*x*ln(x) + c_i*x
with 10000-by-1 vector of constants c, where y_i(x), b_i and c_i are the i-th entries of y(x), b and c respectively. The constant vector c can be determined at some point x0 as
c_i = y_i(x0)/x_0 - b_i*ln(x0)
The GMRES algorithm and its matlab implementation are supposed to solve linear equations system, such as
%Ax = b
A = rand(4);
b = rand(4,1);
x = gmres(A,b);
One can also use a function handle
foo = #(x) A*x + conj(A)*5*x;
y = gmres(foo,b);
What I want is to solve the following
B = rand(4);
H = rand(4);
foo2 = H*B + B*H;
X = gmres(foo2, B) %Will not run!
--Error using gmres (line 94)
--Right hand side must be a column vector of length 30 to match the coefficient matrix.
Mathematically speaking I don't see why gmres couldn't apply to this problem as well.
Note: What I'm really trying to solve is an implicit euler method for a PDE dB/dt = B_xx + B_yy, so H is in fact a second derivative matrix using finite difference.
Thank you
Amir
If I've understood right you want to use GMRES to solve an a sylvester equation
A*X + X*A = C
for n-by-n matrices A, X and C.
(I asked a related question yesterday over at SciComp and got this great answer.)
To use GMRES you can express this matrix-matrix equation as a size n^2 matrix-vector equation. For convenience we can use the Kronecker product, implemented in MATLAB with kron:
A = randn(5);
X = randi(3,[5 5]);
C = A*X + X*A;
% Use the Kronecker product to form an n^2-by-n^2 matrix
% A*X + X*A
bigA = (kron(eye(5),A) + kron(A.',eye(5)));
% Quick check that we're getting the same answer
norm(bigA*X(:) - C(:))
% Use GMRES to calculate X from A and C.
vec_X_gmres = gmres(bigA,C(:));
X_gmres = reshape(vec_X_gmres,5,5);
We are given a D-dimensional tensor, represented as a vector of size n^D.
The vector represents a D-dimensional distribution of a random variable X \in {0,1,..,n}^d. That is the (i_1,i_2,...,i_d) entry in the tensor represents the probability of X_1 = i_1, X_2 = i_2, ... X_d = i_d.
I need to compute, for each dimension d, and value i\in [n] the marginal distribution P(X_d = i).
i.e., this means that the answer of P(X_d = i) is the sum of n^(D-1) entries of the vector.
For example, if D=2 and n=4, we have a vector x of size (16,1) and the probability of the first dimension being equal to 1 is
P(X_1 = 1) = x(1) + x(2) + x(3) + x(4)
The probability of the second dimension being equal to 3 is '
P(X_2 = 3) = x(3) + x(7) + x(11) + x(15)
I'm writing Matlab code that needs to compute these marginal distributions, but I'm not familiar enough with Matlab to do it in a simple way (it is doable using some ugly recursion, but there has to be a better option).
To calculate P(X_k=z) for a D-dimensional matrix you can use
xD = reshape(x, n*ones(1,D));
B = permute(xD, [k setdiff(1:D, k)]);
P = sum(B(z,:));
It first makes it a D-dimensional matrix. It brings the dimension of interest k to the beginning and then chooses the z-th element and sums over elements corresponding to that.
Mohsen Nosratinia's answer would be my first option. As an alternative, it can be done without reshaping or permuting dimensions, which can result in faster code:
k = 2; %// chosen dimension
z = 3; %// chosen value (along d-th dimension)
result = sum(x(mod(floor((0:end-1)/n^(k-1)), n)==z-1));
I have an equation of the type c = Ax + By where c, x and y are vectors of dimensions say 50,000 X 1, and A and B are matrices with dimensions 50,000 X 50,000.
Is there any way in Matlab to find matrices A and B when c, x and y are known?
I have about 100,000 samples of c, x, and y. A and B remain the same for all.
Let X be the collection of all 100,000 xs you got (such that the i-th column of X equals the x_i-th vector).
In the same manner we can define Y and C as 2D collections of ys and cs respectively.
What you wish to solve is for A and B such that
C = AX + BY
You have 2 * 50,000^2 unknowns (all entries of A and B) and numel(C) equations.
So, if the number of data vectors you have is 100,000 you have a single solution (up to linearly dependent samples). If you have more than 100,000 samples you may seek for a least-squares solution.
Re-writing:
C = [A B] * [X ; Y] ==> [X' Y'] * [A';B'] = C'
So, I suppose
[A' ; B'] = pinv( [X' Y'] ) * C'
In matlab:
ABt = pinv( [X' Y'] ) * C';
A = ABt(1:50000,:)';
B = ABt(50001:end,:)';
Correct me if I'm wrong...
EDIT:
It seems like there is quite a fuss around dimensionality here. So, I'll try and make it as clear as possible.
Model: There are two (unknown) matrices A and B, each of size 50,000x50,000 (total 5e9 unknowns).
An observation is a triplet of vectors: (x,y,c) each such vector has 50,000 elements (total of 150,000 observed points at each sample). The underlying model assumption is that an observation is generated by c = Ax + By in this model.
The task: given n observations (that is n triplets of vectors { (x_i, y_i, c_i) }_i=1..n) the task is to uncover A and B.
Now, each sample (x_i,y_i,c_i) induces 50,000 equations of the form c_i = Ax_i + By_i in the unknown A and B. If the number of samples n is greater than 100,000, then there are more than 50,000 * 100,000 ( > 5e9 ) equations and the system is over constraint.
To write the system in a matrix form I proposed to stack all observations into matrices:
A matrix X of size 50,000 x n with its i-th column equals to observed x_i
A matrix Y of size 50,000 x n with its i-th column equals to observed y_i
A matrix C of size 50,000 x n with its i-th column equals to observed c_i
With these matrices we can write the model as:
C = A*X + B*Y
I hope this clears things up a bit.
Thank you #Dan and #woodchips for your interest and enlightening comments.
EDIT (2):
Submitting the following code to octave. In this example instead of 50,000 dimension I work with only 2, instead of n=100,000 observations I settled for n=100:
n = 100;
A = rand(2,2);
B = rand(2,2);
X = rand(2,n);
Y = rand(2,n);
C = A*X + B*Y + .001*randn(size(X)); % adding noise to observations
ABt = pinv( [ X' Y'] ) * C';
Checking the difference between ground truth model (A and B) and recovered ABt:
ABt - [A' ; B']
Yields
ans =
5.8457e-05 3.0483e-04
1.1023e-04 6.1842e-05
-1.2277e-04 -3.2866e-04
-3.1930e-05 -5.2149e-05
Which is close enough to zero. (remember, the observations were noisy and solution is a least-square one).