Error using * Inner matrix dimensions must agree in using Least Squares - how to make the regressor array for multiple independent variables - matlab

I am trying to learn how to code for linear regression where the data statistics_data represents the yeast growth year in first column, the value of a chemical component in the second column and the value of the population in third column. Once theta is calculated using least squares formulation, I want to predict the value of the population using: pred_year = 2020;
pred_year_val = [1 2020]; which is giving this error:
Error using *
Inner matrix dimensions must agree.
Error in main_normal_equation (line 44)
pred_value = pred_year_val * theta;
Below is the code:
statistics_data = [2007, 9182927, 2;
2008,3,9256347;
2009,3.5,9340682;
2010,4,9415570;
2011,5,9482855;
2012,4.8,9555893;
2013,4.9,9644864;
2014,5,9747355;
2015,5,9851017;
2016,5,9995153;
2017,5,10120242;];
% Convert to independent variable matrix and response
X = (statistics_data(:,1:2));
y = (statistics_data(:,3));
% Convert matrix values to double
X = double(X);
y = double(y);
hold on;
% Set the x-axis label
xlabel('Year');
% Set the y-axis label
ylabel('Population');
% Plot population data
plot(X, y, 'rx', 'MarkerSize', 10);
m = length(y);
% Add ones column
X = [ones(m, 1) X];
% Normal Equation
theta = (pinv(X'*X))*X'*y
% Predict population for 2020
pred_year = 2020;
pred_year_val = [1 2020];
% Calculate predicted value
pred_value = pred_year_val * theta;
% Plot linear regression line
plot(X(:,2), X*theta, '-')
fprintf('Predicted population in 2020 is %d people\n ', int64(pred_value));

In matlab when you use the * operator, you are referencing a matrix multiply. Matrix multiplication has strict rules about the dimensions of the multiplied matrices.
Inspecting your code, it does not seem that your intent is to do a matrix multiply....
You can multiply a scalar by a matrix using * and scale each value in the matrix accordingly.
You can also vector multiply which is sometimes called element by element multiplication using the .* operator.
To resolve your issue you must clarify whether you intended to do a matrix multiply, scalar multiplication, or a vector multiplication. Then you must properly set your operands and operator to reflect what it is you aim to achieve.
It isn't clear to me exactly how the math in your code is supposed to be executed otherwise I could help show you where your operators and operands must be changed.
You could start by reviewing the documentation here: https://www.mathworks.com/help/matlab/matlab_prog/array-vs-matrix-operations.html

So pred_year_val has size [1 2] while theta has size [3 1]. Using the pigeon hole principle we can determine that the number of columns of pred_year_val is not equal to the number of rows of theta and therefore we cannot perform a matrix multiplication, i.e. the execution of
pred_value = pred_year_val * theta;
is bound to fail. So it seems like you need to add a value for the chemical component to pred_year_val.

Related

How do I write correlation coefficient manually in matlab?

The following is a function that takes two equal sized vectors X and Y, and is supposed to return a vector containing single correlation coefficients for image correspondence. The function is supposed to work similarly to the built in corr(X,Y) function in matlab if given two equal sized vectors. Right now my code is producing a vector containing multiple two-number vectors instead of a vector containing single numbers. How do I fix this?
function result = myCorr(X, Y)
meanX = mean(X);
meanY = mean(Y);
stdX = std(X);
stdY = std(Y);
for i = 1:1:length(X),
X(i) = (X(i) - meanX)/stdX;
Y(i) = (Y(i) - meanY)/stdY;
mult = X(i) * Y(i);
end
result = sum(mult)/(length(X)-1);
end
Edit: To clarify I want myCorr(X,Y) above to produce the same output at matlab's corr(X,Y) when given equal sized vectors of image intensity values.
Edit 2: Now the format of the output vector is correct, however the values are off by a lot.
I recommend you use r=corrcoef(X,Y) it will give you a normalized r value you are looking for in a 2x2 matrix and you can just return the r(2,1) entry as your answer. Doing this is equivalent to
r=(X-mean(X))*(Y-mean(Y))'/(sqrt(sum((X-mean(X)).^2))*sqrt(sum((Y-mean(Y)).^2)))
However, if you really want to do what you mentioned in the question you can also do
r=(X)*(Y)'/(sqrt(sum((X-mean(X)).^2))*sqrt(sum((Y-mean(Y)).^2)))

Numerical derivative of a vector

I have a problem with numerical derivative of a vector that is x: Nx1 with respect to another vector t (time) that is the same size of x.
I do the following (x is chosen to be sine function as an example):
t=t0:ts:tf;
x=sin(t);
xd=diff(x)/ts;
but the answer xd is (N-1)x1 and I figured out that it does not compute derivative corresponding to the first element of x.
is there any other way to compute this derivative?
You are looking for the numerical gradient I assume.
t0 = 0;
ts = pi/10;
tf = 2*pi;
t = t0:ts:tf;
x = sin(t);
dx = gradient(x)/ts
The purpose of this function is a different one (vector fields), but it offers what diff doesn't: input and output vector of equal length.
gradient calculates the central difference between data points. For an
array, matrix, or vector with N values in each row, the ith value is
defined by
The gradient at the end points, where i=1 and i=N, is calculated with
a single-sided difference between the endpoint value and the next
adjacent value within the row. If two or more outputs are specified,
gradient also calculates central differences along other dimensions.
Unlike the diff function, gradient returns an array with the same
number of elements as the input.
I know I'm a little late to the game here, but you can also get an approximation of the numerical derivative by taking the derivatives of the polynomial (cubic) splines that runs through your data:
function dy = splineDerivative(x,y)
% the spline has continuous first and second derivatives
pp = spline(x,y); % could also use pp = pchip(x,y);
[breaks,coefs,K,r,d] = unmkpp(pp);
% pre-allocate the coefficient vector
dCoeff = zeroes(K,r-1);
% Columns are ordered from highest to lowest power. Both spline and pchip
% return 4xn matrices, ordered from 3rd to zeroth power. (Thanks to the
% anonymous person who suggested this edit).
dCoeff(:, 1) = 3 * coefs(:, 1); % d(ax^3)/dx = 3ax^2;
dCoeff(:, 2) = 2 * coefs(:, 2); % d(ax^2)/dx = 2ax;
dCoeff(:, 3) = 1 * coefs(:, 3); % d(ax^1)/dx = a;
dpp = mkpp(breaks,dCoeff,d);
dy = ppval(dpp,x);
The spline polynomial is always guaranteed to have continuous first and second derivatives at each point. I haven not tested and compared this against using pchip instead of spline, but that might be another option as it too has continuous first derivatives (but not second derivatives) at every point.
The advantage of this is that there is no requirement that the step size be even.
There are some options to work-around your issue.
First: you can make your domain larger. Instead of N, use N+1 gridpoints.
Second: depending on the end-point of interest, you can use
Forward difference: F(x + dx) - F(x)
Backward difference: F(x) - F(x - dx)

Solving a Second Order Differential with Matrix input

I am trying to solve a second order differential using ODE45 in Matlab with matrix as inputs. I am struck with couple of errors that includes :
"In an assignment A(I) = B, the number of elements in B and
I must be the same."
Double order differential equations given below:
dy(1)= diag(ones(1,100) - 0.5*y(2))*Co;
dy(2)= -1 * Laplacian(y(1)) * y(2);
Main function call is:
[T,Y] = ode45(#rigid,[0.000 100.000],[Co Xo]);
Here, Co is Matrix of size 100x100 and Xo is a column matrix of size 100x1. Laplacian is a pre-defined function to compute matrix laplacian.
I will appreciate any help in this. Should I reshape input matrices and vectors to fall in same dimensions or something?
Your guess is correct. The MATLAB ode suite can solve only vector valued ode, i.e. an ode of the form y'=f(t,y). In your case you should convert y, and dy, back and forth between a matrix and an array by using reshape.
To be more precise, the initial condition will be transformed into the array
y0 = reshape([Co Xo], 100*101, 1);
while y will be obtained with
y_matrix = reshape(y, 100, 101);
y1 = y_matrix(:,1:100);
y2 = y_matrix(:,101);
After having computed the matrices dy1 and dy2 you will have to covert them in an array with
dy = reshape([dy1 dy2], 100*101, 1);
Aside from the limitations of ode45 your code gives that error because, in MATLAB, matrices are not indexed in that way. In fact, if you define A = magic(5), A(11) gives the eleventh element of A i.e. 1.

Matlab Vectorization of Multivariate Gaussian Basis Functions

I have the following code for calculating the result of a linear combination of Gaussian functions. What I'd really like to do is to vectorize this somehow so that it's far more performant in Matlab.
Note that y is a column vector (output), x is a matrix where each column corresponds to a data point and each row corresponds to a dimension (i.e. 2 rows = 2D), variance is a double, gaussians is a matrix where each column is a vector corresponding to the mean point of the gaussian and weights is a row vector of the weights in front of each gaussian. Note that the length of weights is 1 bigger than gaussians as weights(1) is the 0th order weight.
function [ y ] = CalcPrediction( gaussians, variance, weights, x )
basisFunctions = size(gaussians, 2);
xvalues = size(x, 2);
if length(weights) ~= basisFunctions + 1
ME = MException('TRAIN:CALC', 'The number of weights should be equal to the number of basis functions plus one');
throw(ME);
end
y = weights(1) * ones(xvalues, 1);
for xIdx = 1:xvalues
for i = 1:basisFunctions
diff = x(:, xIdx) - gaussians(:, i);
y(xIdx) = y(xIdx) + weights(i+1) * exp(-(diff')*diff/(2*variance));
end
end
end
You can see that at the moment I simply iterate over the x vectors and then the gaussians inside 2 for loops. I'm hoping that this can be improved - I've looked at meshgrid but that seems to only apply to vectors (and I have matrices)
Thanks.
Try this
diffx = bsxfun(#minus,x,permute(gaussians,[1,3,2])); % binary operation with singleton expansion
diffx2 = squeeze(sum(diffx.^2,1)); % dot product, shape is now [XVALUES,BASISFUNCTIONS]
weight_col = weights(:); % make sure weights is a column vector
y = exp(-diffx2/2/variance)*weight_col(2:end); % a column vector of length XVALUES
Note, I changed diff to diffx since diff is a builtin. I'm not sure this will improve performance as allocating arrays will offset increase by vectorization.

Matlab disregarding NaN's in matrix

I have a matrix (X) of doubles containing time series. Some of the observations are set to NaN when there is a missing value. I want to calculate the standard deviation per column to get a std dev value for each column. Since I have NaNs mixed in, a simple std(X) will not work and if I try std(X(~isnan(X)) I end up getting the std dev for the entire matrix, instead of one per column.
Is there a way to simply omit the NaNs from std dev calculations along the 1st dim without resorting to looping?
Please note that I only want to ignore individual values as opposed to entire rows or cols in case of NaNs. Obviously I cannot set NaNs to zero or any other value as that would impact calculations.
Have a look at nanstd (stat toolbox).
The idea is to center the data using nanmean, then to replace NaN with zero, and finally to compute the standard deviation.
See nanmean below.
% maximum admissible fraction of missing values
max_miss = 0.6;
[m,n] = size(x);
% replace NaNs with zeros.
inan = find(isnan(x));
x(inan) = zeros(size(inan));
% determine number of available observations on each variable
[i,j] = ind2sub([m,n], inan); % subscripts of missing entries
nans = sparse(i,j,1,m,n); % indicator matrix for missing values
nobs = m - sum(nans);
% set nobs to NaN when there are too few entries to form robust average
minobs = m * (1 - max_miss);
k = find(nobs < minobs);
nobs(k) = NaN;
mx = sum(x) ./ nobs;
See nanstd below.
flag = 1; % default: normalize by nobs-1
% center data
xc = x - repmat(mx, m, 1);
% replace NaNs with zeros in centered data matrix
xc(inan) = zeros(size(inan));
% standard deviation
sx = sqrt(sum(conj(xc).*xc) ./ (nobs-flag));