Matrix dimension must agree error in matlab? - matlab

I have adapted some existing code for my program but I am coming across an error that I do not know the cause for. I have data with N observations where my goal is to break up the data into increasing smaller subsamples and do calculations on each of the subsamples. To determine the how the subsample size will change, the program finds divisors of N and stores it into an array OptN.
dmin = 2;
% Find OptN such that it has the largest number of
% divisors among all natural numbers in the interval [0.99*N,N]
N = length(x);
N0 = floor(0.99*N);
dv = zeros(N-N0+1,1);
for i = N0:N,
dv(i-N0+1) = length(divisors(i,dmin));
end
OptN = N0 + find(max(dv)==dv) - 1;
% Use the first OptN values of x for further analysis
x = x(1:OptN);
% Find the divisors >= dmin for OptN
d = divisors(OptN,dmin);
function d = divisors(n,n0)
% Find all divisors of the natural number N greater or equal to N0
i = n0:floor(n/2);
d = find((n./i)==floor(n./i))' + n0 - 1; % Problem line
In function divisors is where the problem occurs. I have 'Error using ./ Matrix dimensions must agree.' However, this worked with input data of length 60, but when I try data of length 1058 it gives me the above error.

I think that with large dataset it's possible that find(max(dv)==dv) will returns multiple numbers. So OptN will become a vector, not a scalar.
Then the length of i (BTW not a good name for variable in MATLAB, it's also a complex number i) will be unpredictable and probably different from n causing the dimension error in the next statement.
You can try find(max(dv)==dv,1) instead to get only the first match. Or add a loop.

Related

Explanation for a function within xcorr in MATLAB

Looking within the xcorr function, most of it is pretty straightforward, except for one function within xcorr called "findTransformLength".
function m = findTransformLength(m)
m = 2*m;
while true
r = m;
for p = [2 3 5 7]
while (r > 1) && (mod(r, p) == 0)
r = r / p;
end
end
if r == 1
break;
end
m = m + 1;
end
With no comments, i fail to understand what this function is meant to acheive and what is the significance of p = [2 3 5 7]. Why those numbers specifically? Why not take a fixed FFT size instead? Is there a disadvantage(cause errors) to taking a fixed FFT size?
This part is used to get the integer closest to 2*m that can be written in the form:
Either:
m is already of this form, then the loop
for p = [2 3 5 7]
while (r > 1) && (mod(r, p) == 0)
r = r / p;
end
end
Will decrease r down to 1 and the break will be reached.
Or m has at least one other prime factor, and r will not reach 1. You go back to the look with m+1 and so on until you reach a number of the right form.
As per why they do this, you can see on the fft doc, in the Input arguments section:
n — Transform length [] (default) | nonnegative integer scalar
Transform length, specified as [] or a nonnegative integer scalar.
Specifying a positive integer scalar for the transform length can
increase the performance of fft. The length is typically specified as
a power of 2 or a value that can be factored into a product of small
prime numbers. If n is less than the length of the signal, then fft
ignores the remaining signal values past the nth entry and returns the
truncated result. If n is 0, then fft returns an empty matrix.
Example: n = 2^nextpow2(size(X,1))

How do you find an 'M' by 'M' submatrix in the center of an input 'N' by 'N' matrix? (In Matlab)

How do I write a function in Matlab to output the M x M submatrix at the center of an N x N input matrix? The function should have two input arguments—the N x N input matrix (2D array) and the size of the square submatrix, M, to be extracted from the input matrix. The sole output should be the M x M submatrix at the center of the input matrix. The function should use for loops to extract the submatrix and not use colon notation or any built-in functions for this part of the code. The function should work for any square input matrix where N ≥ 3. If N is even, M should be even. If N is odd, M should be odd.
Here is a picture of my flowchart so far.
Using For-Loops and Offsetting Indexing
Preface:
Here I like to visualize this question as trimming the matrix. The amount to trim I denote in this example is Trim_Amount. The Trim_Amount dictates the size of the sub-matrix and the start point to begin reading/saving the sub-matrix.
Since the trim amount is always taken from each side you can expect the sub-matrix to have dimensions in the form:
Sub-Matrix Width = M - (2 × Trim_Amount)
2 × Trim_Amount will always result in an even number therefore the following can be said:
if M is even → M - (Even Number) → Even Number
if M is odd → M - (Even Number) → Odd Number
Test Output Results:
I recommend going through the code to filter through any unexpected issues.
Full Script:
Dimension = 7;
Matrix = round(100*rand(Dimension));
Trim_Amount = 1;
[Sub_Matrix] = Grab_Sub_Matrix(Matrix,Trim_Amount);
Matrix
Sub_Matrix
%Function definition%
function [Sub_Matrix] = Grab_Sub_Matrix(Matrix,Trim_Amount)
%Minimum of M must be 5 since N >= 3%
[M,~] = size(Matrix);
%Ensuring the trimming factor does not go over possible range%
Max_Trimming_Factor = M - 3;
if(Trim_Amount > Max_Trimming_Factor)
Trim_Amount = Max_Trimming_Factor;
end
%Fill in the boundaries%
Row_Start_Limit = Trim_Amount + 1;
Column_Start_Limit = Trim_Amount + 1;
%Creating sub-matrix based on amount of trimming%
Sub_Matrix = zeros(M-(2*Trim_Amount),M-(2*Trim_Amount));
for Row = 1: length(Sub_Matrix)
for Column = 1: length(Sub_Matrix)
% fprintf("(%d,%d)\n",Row,Column);
Sub_Matrix(Row,Column) = Matrix(Row + Row_Start_Limit-1,Column + Column_Start_Limit-1);
end
end
end
Ran using MATLAB R2019b

Optimize algorithm that generates the number of units in each binary state

TL;DR: I need to find all possible combinations of N row vectors (of size 1xB), whose row-wise sum produces the desired result vector (also of size 1xB).
I have a binary matrix (1 or 0 entries only) of size N x B where N denotes the number of units and B denotes the number of bins. Each unit, i.e., each row, of the matrix can be in one of 2^B states. That is, if B=2, the states possible are {0,0}, {0,1}, {1,0} or {1,1}. If B=3, then the possible states are {0,0,0}, {0,0,1}, {0,1,0}, {0,1,1}, {1,0,0}, {1,0,1}, {1,1,0} or {1,1,1}. Basically the binary representation of the numbers from 0 to 2^B-1.
For the matrix, I know the sum over the rows of the matrix, for example, {1,2}. This sum can be achieved through different binary matrices like [0,0;0,1;1,1] or [0,1;0,1;1,0]. The number of units in each state are {1,1,0,1} and {0,2,1,0}, respectively for each of the matrices, where the first number corresponds to the first state {0,0}, second to the second state {0,1} and so on in increasing order. My problem is to find all possible vectors of these numbers of states that satisfy a particular matrix sum.
Now to implement this in MATLAB, I used recursion and a global variable. This to me was the easiest approach, however, it takes a lot of time. The code I used is given below:
function output = getallstate()
global nState % stores all the possible vectors
global nStateRow % stores the current row of the vector
global statebin %stores the binary representation of all the possible states
nState = [];
nStateRow = 1;
nBin = 2; % number of columns or B
v = [1 2]; % should always be of the size 1 x nBin
N = 3; % number of units
statebin = de2bi(0:(2 ^ nBin - 1), nBin) == 1; % stored as logical because I use it to index later
getnstate(v, 2 ^ nBin - 1, nBin) % the main function
checkresult(v, nState, nBin) % will result in false if even one of the results is incorrect
% adjust for max number of units, because the total of each row cannot exceed this number.
output = nState(1:end-1, :); % last row is always repeated (needs to be fixed somehow)
output(:, 1) = N - sum(output(:, 2:end), 2); % the first column, that is the number of units in the all 0 state is always determined by the number of units in the other states
if any(output(:, 1) < 0)
output(output(:, 1) < 0, :) = [];
end
end
function getnstate(r, state, nBin)
global nState
global nStateRow
global statebin
if state == 0
if all(r == 0)
nStateRow = nStateRow + 1;
nState(nStateRow, :) = nState(nStateRow - 1, :);
end
else
for a = 0:min(r(statebin(state + 1, :)))
nState(nStateRow, state + 1) = a;
getnstate(r - a * statebin(state + 1, :), state - 1, nBin);
end
end
end
function allOk = checkresult(r, nState, nBin)
% just a function that checks whether the obtained vectors all result in the correct sum
allstate = de2bi(0:(2 ^ nBin - 1), nBin);
allOk = true;
for iRow = 1:size(nState, 1)
sumR = sum(bsxfun(#times, allstate, nState(iRow, :).'), 1);
allOk = allOk & isequal(sumR,r);
end
end
function b = de2bi(d, n)
d = d(:);
[~, e] = log2(max(d));
b = rem(floor(d * pow2(1-max(n, e):0)), 2);
end
The above code works fine and gives all possible states but, as is expected, it gets slower as you increase the number of columns (B) and the number of units (N). Also, it uses globals. The following are my questions:
Is there a way to generate these without using globals?
Is there a non-recursive way for this algorithm?
EDIT 1
In what way do the above and still have an optimised algorithm which is faster than the current version?
EDIT 2
Added the de2bi function to remove dependency on the Communications Toolbox.

Generating all ordered samples with replacement

I would like to generate an array which contains all ordered samples of length k taken from a set of n elements {a_1,...,a_n}, that is all the k-tuples (x_1,...,x_k) where each x_j can be any of the a_i (repetition of elements is allowed), and whose total number is n^k.
Is there a built-in function in Matlab to obtain it?
I have tried to write a code that iteratively uses the datasample function, but I couldn't get what desired so far.
An alternative way to get all the tuples is based on k-base integer representation.
If you take the k-base representation of all integers from 0 to n^k - 1, it gives you all possible set of k indexes, knowing that these indexes start at 0.
Now, implementing this idea is quite straightforward. You can use dec2base if k is lower than 10:
X = A(dec2base(0:(n^k-1), k)-'0'+1));
For k between 10 and 36, you can still use dec2base but you must take care of letters as there is a gap in ordinal codes between '9' and 'A':
X = A(dec2base(0:(n^k-1), k)-'0'+1));
X(X>=17) = X(X>=17)-7;
Above 36, you must use a custom made code for retrieving the representation of the integer, like this one. But IMO you may not need this as 2^36 is quite huge.
What you are looking for is ndgrid: it generates the grid elements in any dimension.
In the case k is fixed at the moment of coding, get all indexes of all elements a this way:
[X_1, ..., X_k] = ndgrid(1:n);
Then build the matrix X from vector A:
X = [A(X_1(:)), ..., A(X_k(:))];
If k is a parameter, my advice would be to look at the code of ndgrid and adapt it in a new function so that the output is a matrix of values instead of storing them in varargout.
What about this solution, I don't know if it's as fast as yours, but do you think is correct?
function Y = ordsampwithrep(X,K)
%ordsampwithrep Ordered samples with replacement
% Generates an array Y containing in its rows all ordered samples with
% replacement of length K with elements of vector X
X = X(:);
nX = length(X);
Y = zeros(nX^K,K);
Y(1,:) = datasample(X,K)';
k = 2;
while k < nX^K +1
temprow = datasample(X,K)';
%checknew = find (temprow == Y(1:k-1,:));
if not(ismember(temprow,Y(1:k-1,:),'rows'))
Y(k,:) = temprow;
k = k+1;
end
end
end

"out of memory" error for mvregress in matlab

I am trying to use mvregress with the data I have with dimensionality of a couple of hundreds. (3~4). Using 32 gb of ram, I can not compute beta and I get "out of memory" message. I couldn't find any limitation of use for mvregress that prevents me to apply it on vectors with this degree of dimensionality, am I doing something wrong? is there any way to use multivar linear regression via my data?
here is an example of what goes wrong:
dim=400;
nsamp=1000;
dataVariance = .10;
noiseVariance = .05;
mixtureCenters=randn(dim,1);
X=randn(dim, nsamp)*sqrt(dataVariance ) + repmat(mixtureCenters,1,nsamp);
N=randn(dim, nsamp)*sqrt(noiseVariance ) + repmat(mixtureCenters,1,nsamp);
A=2*eye(dim);
Y=A*X+N;
%without residual term:
A_hat=mvregress(X',Y');
%wit residual term:
[B, y_hat]=mlrtrain(X,Y)
where
function [B, y_hat]=mlrtrain(X,Y)
[n,d] = size(Y);
Xmat = [ones(n,1) X];
Xmat_sz=size(Xmat);
Xcell = cell(1,n);
for i = 1:n
Xcell{i} = [kron([Xmat(i,:)],eye(d))];
end
[beta,sigma,E,V] = mvregress(Xcell,Y);
B = reshape(beta,d,Xmat_sz(2))';
y_hat=Xmat * B ;
end
the error is:
Error using bsxfun
Out of memory. Type HELP MEMORY for your options.
Error in kron (line 36)
K = reshape(bsxfun(#times,A,B),[ma*mb na*nb]);
Error in mvregress (line 319)
c{j} = kron(eye(NumSeries),Design(j,:));
and this is result of whos command:
whos
Name Size Bytes Class Attributes
A 400x400 1280000 double
N 400x1000 3200000 double
X 400x1000 3200000 double
Y 400x1000 3200000 double
dataVariance 1x1 8 double
dim 1x1 8 double
mixtureCenters 400x1 3200 double
noiseVariance 1x1 8 double
nsamp 1x1 8 double
Okay, I think I have a solution for you, short version first:
dim=400;
nsamp=1000;
dataVariance = .10;
noiseVariance = .05;
mixtureCenters=randn(dim,1);
X=randn(dim, nsamp)*sqrt(dataVariance ) + repmat(mixtureCenters,1,nsamp);
N=randn(dim, nsamp)*sqrt(noiseVariance ) + repmat(mixtureCenters,1,nsamp);
A=2*eye(dim);
Y=A*X+N;
[n,d] = size(Y);
Xmat = [ones(n,1) X];
Xmat_sz=size(Xmat);
Xcell = cell(1,n);
for i = 1:n
Xcell{i} = kron(Xmat(i,:),speye(d));
end
[beta,sigma,E,V] = mvregress(Xcell,Y);
B = reshape(beta,d,Xmat_sz(2))';
y_hat=Xmat * B ;
Strangely, I could not access the function's workspace, it did not appear in the call stack. This is why I put the function after the script here.
Here's the explanation that might also help you in the future:
Looking at the kron definition, the result when inserting an m by n and a p by q matrix has size mxp by nxq, in your case 400 by 1001 and 1000 by 1000, that makes a 400000 by 1001000 matrix, which has 4*10^11 elements. Now you have four hundred of them, and each element takes up 8 bytes for double precision, that is a total size of about 1.281 Petabytes of memory (or 1.138 Pebibytes, if you prefer), well out of reach even with your grand 32 Gibibyte.
Seeing that one of your matrices, the eye one, contains mostly zeros, and the resulting matrix contains all possible element product combinations, most of them will be zero, too. For such cases specifically, MATLAB offers the sparse matrix format, which saves a lot of memory depending on the number of zero elements in a matrix by only storing nonzero ones. You can convert a full matrix to a sparse representation with sparse(X), or you get an eye matrix directly by using speye(n), which is what I did above. The sparse property propagates to the result, which you should now have enough memory for (I have with 1/4 of your memory available, and it works).
However, what remains is the problem Matthew Gunn mentioned in a comment. I get an error saying:
Error using mvregress (line 260)
Insufficient data to estimate either full or least-squares models.
Preface
If your regressors are all the same across each regression equation and you're interested in the OLS estimate, you can replace a call to mvregress with a simple call to \.
It appears in the call to mlrtrain you had a matrix transposition error (since corrected). In the language of mvregress, n is the number of observations, d is the number of outcome variables. You generate a matrix Y that is d by n. But THEN when you should call mlrtrain(X', Y') not mlrtrain(X, Y).
If below isn't specifically, what you're looking for, I suggest you precisely define what you're trying to estimate.
What I would have written if I were you
So much that's been said here is completely off base that I'm posting code of what I would have written if I were you. I've reduced the dimensionality to show the equivalence in your special case to simply calling \. I've also written stuff in a more standard way (i.e. having observations run down the rows and not making matrix transposition errors).
dim=5; % These can go way higher but only if you use my code
nsamp=20; % rather than call mvregress
dataVariance = .10;
noiseVariance = .05;
mixtureCenters=randn(dim,1);
X = randn(nsamp, dim)*sqrt(dataVariance ) + repmat(mixtureCenters', nsamp, 1); %'
E = randn(nsamp, dim)*sqrt(noiseVariance); %noise should be mean zero
B = 2*eye(dim);
Y = X*B+E;
% without constant:
B_hat = mvregress(X,Y); %<-------- slow, blows up with high dimension
B_hat2 = X \ Y; %<-------- fast, fine with higher dimensions
norm(B_hat - B_hat2) % show numerical equivalent if basically 0
% with constant:
B_constant_hat = mlrtrain(X,Y) %<-------- slow, blows up with high dimension
B_constant_hat2 = [ones(nsamp, 1), X] \ Y; % <-- fast, and fine with higher dimensions
norm(B_constant_hat - B_constant_hat2) % show numerical equivalent if basically 0
Explanation
I'll assume you have:
An nsamp by dim sized data matrix X.
An nsamp by ny sized matrix of outcome variables Y
You want the results from regressing each column of Y on data matrix X. That is, we're doing multivariate regression but there's a common data matrix X.
That is, we're estimating:
y_{ij} = \sum_k b_k * x_{ik} + e_{ijk} for i=1...nsamp, j = 1...ny, k=1...dim
If you're trying to do something different than this, you need to clearly state what you're trying to do!
To regress Y on X you could do:
[beta_mvr, sigma_mvr, resid_mvr] = mvregress(X, Y);
This appears to be horribly slow. The following should match mvregress for the case where you're using the same data matrix for each regression.
beta_hat = X \ Y; % estimate beta using least squares
resid = Y - X * beta_hat; % calculate residual
If you want to construct a new data matrix with a vector of ones, you would do:
X_withones = [ones(nsamp, 1), X];
Further clarification for some that are confused
Let's say we want to run the regression
y_i = \sum_j x_{ij} + e_i i=1...n, j=1...k
We can construct the data matrix n by k datamatrix X and an n by 1 outcome vector y. The OLS estimate is bhat = pinv(X' * X) * X' * y which can also be computed in MATLAB with bhat = X \ y.
If you want to do this multiple times (i.e. run multivariate regression on the same data matrix X), you can construct an outcome matrix Y where EACH column represents a separate outcome variable. Y = [ya, yb, yc, ...]. Trivially, the OLS solution is B = pinv(X'*X)*X'*Y which can be computed as B = X \ Y. The first column of B is the result of regressing Y(:,1) on X. The second column of B is the result of regressing Y(:,2) on X, etc... Under these conditions, this is equivalent to a call to B = mvregress(X, Y)
Even more test code
If regressors are the same and estimation is by simple OLS, there is an equivalence between multivariate regression and equation by equation ordinary least squares.
d = 10;
k = 15;
n = 100;
C = RandomCorr(d + k, 1); %Use any method you like to generate a random correlation matrix
s = randn(d+k , 1) * 10;
S = (s * s') .* C; % generate covariance matrix
mu = randn(d+k,1);
data = mvnrnd(ones(n, 1) * mu', S);
Y = data(:,1:d);
X = data(:,d+1:end);
[b1, sigma] = mvregress(X, Y);
b2 = X \ Y;
norm(b1 - b2)
You will notice b1 and b2 are numerically equivalent. They are equivalent even though sigma is EXTREMELY different from zero.