I recently came across the Matlab smooth function used as follows:
ans = smooth(x, y, span, 'moving');
The Matlab documentation states
yy = smooth(x,y,...) additionally specifies x data. If x is not provided, methods that require x data assume x = 1:length(y). You should specify x data when it is not uniformly spaced or sorted. If x is not uniform and you do not specify method, lowess is used. If the smoothing method requires x to be sorted, the sorting occurs automatically.
However, I am unclear as to what this actually means for the 'moving' average case. Is x an index for the y data, and if so how do non-integer values of x affect the 'moving' average of y?
Moving average means each value of the yy (or ans in your case) is an average of the n closest points. https://en.wikipedia.org/wiki/Moving_average
There are several methods to calculate it - it depends on WHICH POINTS we will use. For example:
( (i-1) + (i-2) + ... + (i-n) )/n;
where n - is a span or linear filtration level.
This mean that first three points can't be calculated (there are no data for it). And sometimes result must be shifted (because really - average of the first 4 points not really corresponds to 4th elements).
So Matlab use another method:
yy(1) = y(1)
yy(2) = (y(1) + y(2) + y(3))/3
yy(3) = (y(1) + y(2) + y(3) + y(4) + y(5))/5
yy(4) = (y(2) + y(3) + y(4) + y(5) + y(6))/5
...
It's more useful.
About x and y - it is usual 2d-data: each x corresponds to each y. you can avoid setting of x, then matlab will use [1, 2, 3, ..., length(y)] sequence for this. But if you have some non-uniformly distributed data you have to set it for getting right result. So if you have non-integer values it will works correct, scaling it for x axis. Here the easiest example from my head:
Say you have data y corrupted by noise and let's assume y = [2.1, 3.2, 1.7, 4.5, 5.8, 6.9]. Let's say that you have decided to use moving average of 3 window filter to smooth y.
smoothedY1 = (2.1 + 3.2 + 1.7)/3 = 2.3333
smoothedY2 = (3.2 + 1.7 + 4.5)/3 = 3.1333
smoothedY3 = (1.7 + 4.5 + 5.8)/3 = 4.0000
smoothedY3 = (4.5 + 5.8 + 6.9)/3 = 5.7333
Pay attention to the way corrupted data is being shifted to the left by one element per iteration. Now let's use smooth() in Matlab.
y = [2.1, 3.2, 1.7, 4.5, 5.8, 6.9];
smooth(y, 3, 'moving')
The above script yields the following result
ans =
2.1000
2.3333 <----
3.1333 | (smoothed data)
4.0000 |
5.7333 <----
6.9000
To answer your original question, the "x" data is just used for sorting, but is otherwise ignored when the method is 'moving':
>> x = rand(10, 1);
>> y = (1:10)' + 0.1*randn(10,1);
>> isequal(smooth(x,y,'moving'), smooth(y,'moving'))
ans =
0
>> z = sortrows([x y], 1);
>> isequal(smooth(z(:,1),z(:,2),'moving'), smooth(z(:,2),'moving'))
ans =
1
The "x" values aren't actually taken into account for the averaging, they are just used to sort "y" by increasing index.
Related
I have a matrix DirModel of dimension 2x2x29x1739. I want to add 360 to all negative values in this matrix, but the code I use doesn't keep up the dimensions of this matrix, it makes it into an array:
Neg=DirModel<0;
DirModel2=DirModel(Neg)+360;
I found several combinations of this online but non of them seem to keep up the dimensions. I would like to do it without using loops.
Any answer is appreciated!
You could boil it all down to a one-liner by making use of logical indices:
DirModel(DirModel<0) = DirModel(DirModel<0) + 360
How about the following code?
>> DirModel = rand(2, 2, 29, 1739) - 0.5;
>> Neg = (DirModel < 0);
>> DirModel2 = DirModel;
>> DirModel2(Neg) = DirModel2(Neg) + 360;
>> DirModel(:, :, 1, 1)
ans =
0.169128 -0.180931
0.055867 0.339892
>> DirModel2(:, :, 1, 1)
ans =
0.169128 359.819069
0.055867 0.339892
Let's check what's happening:
>> Neg = (DirModel < 0);
Neg is a 2 x 2 x 29 x 1739 logical matrix, in which 1s indicate indices where DirModel has negative values.
>> DirModel2 = DirModel;
This assignment ensures, that all values are copied and that the matrix dimensions are preserved.
>> DirModel2(Neg) = DirModel2(Neg) + 360;
Only add 360 to those matrix elements, whose value has been negative in the original DirModel.
Your assignment
DirModel2=DirModel(Neg)+360
initiates the new DirModel2 matrix directly, and only addresses the negative values of DirModel. Check the total number of elements of your resulting DirModel2! Therefore, MATLAB has no reason to preserve the matrix dimensions.
As a one-line answer, I suggest DirModel2 = mod(DirModel,360) + ~mod(DirModel,360)*360;
mod(DirModel,360) alone will add 360 to any negative number but it will also bring 360 to 0. To avoid this, we add 360 if the result of mod(DirModel,360) is 0 (i.e. ~mod(DirModel,360) is 1).
With new information from OP, initial data being between -180 and 180, and no problem if 0 is changed to 360, I simply recommand
DirModel2 = mod(DirModel,360);
So, I need to vectorize some for loops into a single line. I understand how vectorize one and two for-loops, but am really struggling to do more than that. Essentially, I am computing a "blur" matrix M2 of size (n-2)x(m-2) of an original matrix M of size nxm, where s = size(M):
for x = 0:1
for y = 0:1
m = zeros(1, 9);
k = 1;
for i = 1:(s(1) - 1)
for j = 1:(s(2) - 1)
m(1, k) = M(i+x,j+y);
k = k+1;
end
end
M2(x+1,y+1) = mean(m);
end
end
This is the closest I've gotten:
for x=0:1
for y=0:1
M2(x+1, y+1) = mean(mean(M((x+1):(3+x),(y+1):(3+y))))
end
end
To get any closer to a one-line solution, it seems like there has to be some kind of "communication" where I assign two variables (x,y) to index over M2 and index over M; I just don't see how it can be done otherwise, but I am assured there is a solution.
Is there a reason why you are not using MATLAB's convolution function to help you do this? You are performing a blur with a 3 x 3 averaging kernel with overlapping neighbourhoods. This is exactly what convolution is doing. You can perform this using conv2:
M2 = conv2(M, ones(3) / 9, 'valid');
The 'valid' flag ensures that you return a size(M) - 2 matrix in both dimensions as you have requested.
In your code, you have hardcoded this for a 4 x 4 matrix. To double-check to see if we have the right results, let's generate a random 4 x 4 matrix:
rng(123);
M = rand(4, 4);
s = size(M);
If we run this with your code, we get:
>> M2
M2 =
0.5054 0.4707
0.5130 0.5276
Doing this with conv2:
>> M2 = conv2(M, ones(3) / 9, 'valid')
M2 =
0.5054 0.4707
0.5130 0.5276
However, if you want to do this from first principles, the overlapping of the pixel neighbourhoods is very difficult to escape using loops. The two for loop approach you have is good enough and it tackles the problem appropriately. I would make the size of the input instead of being hard coded. Therefore, write a function that does something like this:
function M2 = blur_fp(M)
s = size(M);
M2 = zeros(s(1) - 2, s(2) - 2);
for ii = 2 : s(1) - 1
for jj = 2 : s(2) - 1
p = M(ii - 1 : ii + 1, jj - 1 : jj + 1);
M2(ii - 1, jj - 1) = mean(p(:));
end
end
The first line of code defines the function, which we will call blur_fp. The next couple lines of code determine the size of the input matrix as well as initialising a blank matrix to store out output. We then loop through each pixel location in the matrix that is possible without the kernel going outside of the boundaries of the image, we grab a 3 x 3 neighbourhood with each pixel location serving as the centre, we then unroll the matrix into a single column vector, find the average and store it in the appropriate output. For small kernels and relatively large matrices, this should perform OK.
To take this a little further, you can use user Divakar's im2col_sliding function which takes overlapping neighbourhoods and unrolls them into columns. Therefore, each column represents a neighbourhood which you can then blur the input using vector-matrix multiplication. You would then use reshape to reshape the result back into a matrix:
T = im2col_sliding(M, [3 3]);
V = ones(1, 9) / 9;
s = size(M);
M2 = reshape(V * T, s(1) - 2, s(2) - 2);
This unfortunately cannot be done in a single line unless you use built-in functions. I'm not sure what your intention is, but hopefully the gamut of approaches you have seen here have given you some insight on how to do this efficiently. BTW, using loops for small matrices (i.e. 4 x 4) may be better in efficiency. You will start to notice performance changes when you increase the size of the input... then again, I would argue that using loops are competitive as of R2015b when the JIT has significantly improved.
I am trying to use mvregress with the data I have with dimensionality of a couple of hundreds. (3~4). Using 32 gb of ram, I can not compute beta and I get "out of memory" message. I couldn't find any limitation of use for mvregress that prevents me to apply it on vectors with this degree of dimensionality, am I doing something wrong? is there any way to use multivar linear regression via my data?
here is an example of what goes wrong:
dim=400;
nsamp=1000;
dataVariance = .10;
noiseVariance = .05;
mixtureCenters=randn(dim,1);
X=randn(dim, nsamp)*sqrt(dataVariance ) + repmat(mixtureCenters,1,nsamp);
N=randn(dim, nsamp)*sqrt(noiseVariance ) + repmat(mixtureCenters,1,nsamp);
A=2*eye(dim);
Y=A*X+N;
%without residual term:
A_hat=mvregress(X',Y');
%wit residual term:
[B, y_hat]=mlrtrain(X,Y)
where
function [B, y_hat]=mlrtrain(X,Y)
[n,d] = size(Y);
Xmat = [ones(n,1) X];
Xmat_sz=size(Xmat);
Xcell = cell(1,n);
for i = 1:n
Xcell{i} = [kron([Xmat(i,:)],eye(d))];
end
[beta,sigma,E,V] = mvregress(Xcell,Y);
B = reshape(beta,d,Xmat_sz(2))';
y_hat=Xmat * B ;
end
the error is:
Error using bsxfun
Out of memory. Type HELP MEMORY for your options.
Error in kron (line 36)
K = reshape(bsxfun(#times,A,B),[ma*mb na*nb]);
Error in mvregress (line 319)
c{j} = kron(eye(NumSeries),Design(j,:));
and this is result of whos command:
whos
Name Size Bytes Class Attributes
A 400x400 1280000 double
N 400x1000 3200000 double
X 400x1000 3200000 double
Y 400x1000 3200000 double
dataVariance 1x1 8 double
dim 1x1 8 double
mixtureCenters 400x1 3200 double
noiseVariance 1x1 8 double
nsamp 1x1 8 double
Okay, I think I have a solution for you, short version first:
dim=400;
nsamp=1000;
dataVariance = .10;
noiseVariance = .05;
mixtureCenters=randn(dim,1);
X=randn(dim, nsamp)*sqrt(dataVariance ) + repmat(mixtureCenters,1,nsamp);
N=randn(dim, nsamp)*sqrt(noiseVariance ) + repmat(mixtureCenters,1,nsamp);
A=2*eye(dim);
Y=A*X+N;
[n,d] = size(Y);
Xmat = [ones(n,1) X];
Xmat_sz=size(Xmat);
Xcell = cell(1,n);
for i = 1:n
Xcell{i} = kron(Xmat(i,:),speye(d));
end
[beta,sigma,E,V] = mvregress(Xcell,Y);
B = reshape(beta,d,Xmat_sz(2))';
y_hat=Xmat * B ;
Strangely, I could not access the function's workspace, it did not appear in the call stack. This is why I put the function after the script here.
Here's the explanation that might also help you in the future:
Looking at the kron definition, the result when inserting an m by n and a p by q matrix has size mxp by nxq, in your case 400 by 1001 and 1000 by 1000, that makes a 400000 by 1001000 matrix, which has 4*10^11 elements. Now you have four hundred of them, and each element takes up 8 bytes for double precision, that is a total size of about 1.281 Petabytes of memory (or 1.138 Pebibytes, if you prefer), well out of reach even with your grand 32 Gibibyte.
Seeing that one of your matrices, the eye one, contains mostly zeros, and the resulting matrix contains all possible element product combinations, most of them will be zero, too. For such cases specifically, MATLAB offers the sparse matrix format, which saves a lot of memory depending on the number of zero elements in a matrix by only storing nonzero ones. You can convert a full matrix to a sparse representation with sparse(X), or you get an eye matrix directly by using speye(n), which is what I did above. The sparse property propagates to the result, which you should now have enough memory for (I have with 1/4 of your memory available, and it works).
However, what remains is the problem Matthew Gunn mentioned in a comment. I get an error saying:
Error using mvregress (line 260)
Insufficient data to estimate either full or least-squares models.
Preface
If your regressors are all the same across each regression equation and you're interested in the OLS estimate, you can replace a call to mvregress with a simple call to \.
It appears in the call to mlrtrain you had a matrix transposition error (since corrected). In the language of mvregress, n is the number of observations, d is the number of outcome variables. You generate a matrix Y that is d by n. But THEN when you should call mlrtrain(X', Y') not mlrtrain(X, Y).
If below isn't specifically, what you're looking for, I suggest you precisely define what you're trying to estimate.
What I would have written if I were you
So much that's been said here is completely off base that I'm posting code of what I would have written if I were you. I've reduced the dimensionality to show the equivalence in your special case to simply calling \. I've also written stuff in a more standard way (i.e. having observations run down the rows and not making matrix transposition errors).
dim=5; % These can go way higher but only if you use my code
nsamp=20; % rather than call mvregress
dataVariance = .10;
noiseVariance = .05;
mixtureCenters=randn(dim,1);
X = randn(nsamp, dim)*sqrt(dataVariance ) + repmat(mixtureCenters', nsamp, 1); %'
E = randn(nsamp, dim)*sqrt(noiseVariance); %noise should be mean zero
B = 2*eye(dim);
Y = X*B+E;
% without constant:
B_hat = mvregress(X,Y); %<-------- slow, blows up with high dimension
B_hat2 = X \ Y; %<-------- fast, fine with higher dimensions
norm(B_hat - B_hat2) % show numerical equivalent if basically 0
% with constant:
B_constant_hat = mlrtrain(X,Y) %<-------- slow, blows up with high dimension
B_constant_hat2 = [ones(nsamp, 1), X] \ Y; % <-- fast, and fine with higher dimensions
norm(B_constant_hat - B_constant_hat2) % show numerical equivalent if basically 0
Explanation
I'll assume you have:
An nsamp by dim sized data matrix X.
An nsamp by ny sized matrix of outcome variables Y
You want the results from regressing each column of Y on data matrix X. That is, we're doing multivariate regression but there's a common data matrix X.
That is, we're estimating:
y_{ij} = \sum_k b_k * x_{ik} + e_{ijk} for i=1...nsamp, j = 1...ny, k=1...dim
If you're trying to do something different than this, you need to clearly state what you're trying to do!
To regress Y on X you could do:
[beta_mvr, sigma_mvr, resid_mvr] = mvregress(X, Y);
This appears to be horribly slow. The following should match mvregress for the case where you're using the same data matrix for each regression.
beta_hat = X \ Y; % estimate beta using least squares
resid = Y - X * beta_hat; % calculate residual
If you want to construct a new data matrix with a vector of ones, you would do:
X_withones = [ones(nsamp, 1), X];
Further clarification for some that are confused
Let's say we want to run the regression
y_i = \sum_j x_{ij} + e_i i=1...n, j=1...k
We can construct the data matrix n by k datamatrix X and an n by 1 outcome vector y. The OLS estimate is bhat = pinv(X' * X) * X' * y which can also be computed in MATLAB with bhat = X \ y.
If you want to do this multiple times (i.e. run multivariate regression on the same data matrix X), you can construct an outcome matrix Y where EACH column represents a separate outcome variable. Y = [ya, yb, yc, ...]. Trivially, the OLS solution is B = pinv(X'*X)*X'*Y which can be computed as B = X \ Y. The first column of B is the result of regressing Y(:,1) on X. The second column of B is the result of regressing Y(:,2) on X, etc... Under these conditions, this is equivalent to a call to B = mvregress(X, Y)
Even more test code
If regressors are the same and estimation is by simple OLS, there is an equivalence between multivariate regression and equation by equation ordinary least squares.
d = 10;
k = 15;
n = 100;
C = RandomCorr(d + k, 1); %Use any method you like to generate a random correlation matrix
s = randn(d+k , 1) * 10;
S = (s * s') .* C; % generate covariance matrix
mu = randn(d+k,1);
data = mvnrnd(ones(n, 1) * mu', S);
Y = data(:,1:d);
X = data(:,d+1:end);
[b1, sigma] = mvregress(X, Y);
b2 = X \ Y;
norm(b1 - b2)
You will notice b1 and b2 are numerically equivalent. They are equivalent even though sigma is EXTREMELY different from zero.
In an attempt to speed up for loops (or eliminate all together), I've been trying to pass matrices into functions. I have to use sine and cosine as well. However, when I attempt to find the integral of a matrix where the elements are composed of sines and cosines, it doesn't work and I can't seem to find a way to make it do so.
I have a matrix SI that is composed of sines and cosines with respect to a variable that I have defined using the Symbolic Math Toolbox. As such, it would actually be even better if I could just pass the SI matrix and receive a matrix of values that is the integral of the sine/cosine function at every location in this matrix. I would essentially get a square matrix back. I am not sure if I phrased that very well, but I have the following code below that I have started with.
I = [1 2; 3 4];
J = [5 6; 7 8];
syms o;
j = o*J;
SI = sin(I + j);
%SI(1,1) = sin(5*o + 1)
integral(#(o) o.*SI(1,1), 0,1);
Ideally, I would want to solve integral(#(o) o*SI,0,1) and get a matrix of values. What should I do here?
Given that A, B and C are all N x N matrices, for the moment, let's assume they're all 2 x 2 matrices to make the example I'm illustrating more succinct to understand. Let's also define o as a mathematical symbol based on your comments in your question above.
syms o;
A = [1 2; 3 4];
B = [5 6; 7 8];
C = [9 10; 11 12];
Let's also define your function f according to your comments:
f = o*sin(A + o*B + C)
We thus get:
f =
[ o*sin(5*o + 10), o*sin(6*o + 12)]
[ o*sin(7*o + 14), o*sin(8*o + 16)]
Remember, for each element in f, we take the corresponding elements in A, B and C and add them together. As such, for the first row and first column of each matrix, we have 1, 5 and 9. As such, A + o*B + C for the first row, first column equates to: 1 + 5*o + 9 = 5*o + 10.
Now if you want to integrate, just use the int command. This will find the exact integral, provided that the integral can be solvable in closed form. int also can handle matrices so it will integrate each element in the matrix. You can call it like so:
out = int(f,a,b);
This will integrate f for each element from the lower bound a to the upper bound b. As such, supposing our limits were from 0 to 1 as you said. Therefore:
out = int(f,0,1);
We thus get:
out =
[ sin(15)/25 - sin(10)/25 - cos(15)/5, sin(18)/36 - sin(12)/36 - cos(18)/6]
[ sin(21)/49 - sin(14)/49 - cos(21)/7, sin(24)/64 - sin(16)/64 - cos(24)/8]
Bear in mind that out is defined in the symbolic math toolbox. If you want the actual numerical values, you need to cast the answer to double. Therefore:
finalOut = double(out);
We thus get:
finalOut =
0.1997 -0.1160
0.0751 -0.0627
Obviously, this can generalize for any size M x N matrices, so long as they all share the same dimensions.
Caveat
sin, cos, tan and the other related functions have their units in radians. If you wish for the degrees equivalent, append a d at the end of the function (i.e. sind, cosd, tand, etc.)
I believe this is the answer you're after. Good luck!
I'm trying to find the max machine number x that satisfies the following equation: x+a=a, where a is a given integer. (I'm not allowed to use eps.)
Here's my code (which is not really working):
function [] = Largest_x()
a=2184;
x=0.0000000001
while (x+a)~=a
x=2*x;
end
fprintf('The biggest value of x in order that x+a=a \n (where a is equal to %g) is : %g \n',a,x);
end
Any help would be much appreciated.
The answer is eps(a)/2.
eps is the difference to the next floating point number, so if you add half or less than that to a float, it won't change. For example:
100+eps(100)/2==100
ans =
1
%# divide by less than two
100+eps(100)/1.9==100
ans =
0
%# what is that number x?
eps(100)/2
ans =
7.1054e-15
If you don't want to rely on eps, you can calculate the number as
2^(-53+floor(log2(a)))
You're small algorithm is certainly not correct. The only conditions where A = X + A are when X is equal to 0. By default matlab data types are doubles with 64 bits.
Lets pretend that matlab were instead using 8 bit integers. The only way to satisfy the equation A = X + A is for X to have the binary representation of [0 0 0 0 0 0 0 0]. So any number between 1 and 0 would work as decimal points are truncated from integers. So again if you were using integers A = A + X would resolve to true if you were to set the value of X to any value between [0,1). However this value is meaningless because X would not take on this value but rather it would take on the value of 0.
It sounds like you are trying to find the resolution of matlab data types. See this: http://www.mathworks.com/help/matlab/matlab_prog/floating-point-numbers.html
The correct answer is that, provided by Jonas: 0.5 * eps(a)
Here is an alternative for the empirical and approximate solution:
>> a = 2184;
>> e = 2 .^ (-100 : 100); % logarithmic scale
>> idx = find(a + e == a, 1, 'last')
idx =
59
>> e(idx)
ans =
2.2737e-013