I am a beginner in matlab and I have a particular z matrix of size m×1 with values 0,1,3,5,2 etc..with above values repeating. Now I have 4 other column matrix x1,x2,x3 and y and I want to do regression.
I have used lm = LinearModel.fit(x,y,'linear') specifying columns.Now I want to do regression only for values in matrix x1,x2,x3 and y for those corresponding to z matrix with value of 1 and neglect the other rows.How do I do it?
That's very simple. I'm going to assume that your matrix of predictor variables and outputs are also of size m (number of samples). All you have to do is find the locations within z that are 1, subset your 3 column matrix of x1,x2,x3 and y, then use LinearModel.fit to fit your data. Assuming your matrix of predictors is stored in X, and your outputs are stored in y, you would do this:
ind = z == 1;
xOut = X(ind,:);
yOut = y(ind);
lm1 = LinearModel.fit(xOut, yOut, 'linear');
BTW, these are very simple subsetting operations in MATLAB. Suggest you read a tutorial before asking any further questions here.
Related
I am trying to run a nearest neighbour search in Julia using NearestNeighbors.jl package. The corresponding Matlab code is
X = rand(10);
Y = rand(100);
Z = zeros(size(Y));
Z = knnsearch(X, Y);
This generates Z, a vector of length 100, where the i-th element is the index of X whose element is nearest to the i-th element in Y, for all i=1:100.
Could really use some help converting the last line of the Matlab code above to Julia!
Use:
X = rand(1, 10)
Y = rand(1, 100)
nn(KDTree(X), Y)[1]
The storing the intermediate KDTree object would be useful if you wanted to reuse it in the future (as it will improve the efficiency of queries).
Now what is the crucial point of my example. The NearestNeighbors.jl accepst the following input data:
It can either be:
a matrix of size nd × np with the points to insert in the tree where nd is the dimensionality of the points and np is the number of points
a vector of vectors with fixed dimensionality, nd, which must be part of the type.
I have used the first approach. The point is that observations must be in columns (not in rows as in your original code). Remember that in Julia vectors are columnar, so rand(10) is considered to be 1 observation that has 10 dimensions by NearestNeighbors.jl, while rand(1, 10) is considered to be 10 observations with 1 dimension each.
However, for your original data since you want a nearest neighbor only and it is single-dimensional and is small it is enough to write (here I assume X and Y are original data you have stored in vectors):
[argmin(abs(v - y) for v in X) for y in Y]
without using any extra packages.
The NearestNeighbors.jl is very efficient for working with high-dimensional data that has very many elements.
The following is a function that takes two equal sized vectors X and Y, and is supposed to return a vector containing single correlation coefficients for image correspondence. The function is supposed to work similarly to the built in corr(X,Y) function in matlab if given two equal sized vectors. Right now my code is producing a vector containing multiple two-number vectors instead of a vector containing single numbers. How do I fix this?
function result = myCorr(X, Y)
meanX = mean(X);
meanY = mean(Y);
stdX = std(X);
stdY = std(Y);
for i = 1:1:length(X),
X(i) = (X(i) - meanX)/stdX;
Y(i) = (Y(i) - meanY)/stdY;
mult = X(i) * Y(i);
end
result = sum(mult)/(length(X)-1);
end
Edit: To clarify I want myCorr(X,Y) above to produce the same output at matlab's corr(X,Y) when given equal sized vectors of image intensity values.
Edit 2: Now the format of the output vector is correct, however the values are off by a lot.
I recommend you use r=corrcoef(X,Y) it will give you a normalized r value you are looking for in a 2x2 matrix and you can just return the r(2,1) entry as your answer. Doing this is equivalent to
r=(X-mean(X))*(Y-mean(Y))'/(sqrt(sum((X-mean(X)).^2))*sqrt(sum((Y-mean(Y)).^2)))
However, if you really want to do what you mentioned in the question you can also do
r=(X)*(Y)'/(sqrt(sum((X-mean(X)).^2))*sqrt(sum((Y-mean(Y)).^2)))
I am trying to generate a 2D data set with the following parameters:
x= N(-5,1) y= N(0,1) n= 1000
Where N(mean, std dev) and n = number of samples.
I tried:
x = normrnd(-5, 1, [100,10])
y = normrnd(0,1,[100,10])
to generate a 100 x 10 array with the appropriate values. I now need to find a way to output the values from these two arrays into an N(x,y) format that can be analyzed by Weka. Any suggestions on how to do this would be appreciated.
Given your comments, you want to generate a N x 2 matrix where each row is a pair of values that both come from different normal distributions.
You can either generate the 2D matrices of each separately and unroll them into single vectors and concatenate them both.... or the simplest way is to just generate 100 x 10 = 1000 elements in a 1D vector from each distribution and concatenate these together.
Method #1 - 2D matrix unrolling
x = normrnd(-5, 1, [100,10]);
y = normrnd(0, 1, [100,10]);
N = [x(:) y(:)];
Method #2 - 1D vector concatenation
x = normrnd(-5, 1, [1000,1]); %// Change
y = normrnd(0, 1, [1000,1]); %// Change
N = [x y];
If you wish to write this to a CSV file, where you have a pair of x,y values separated by a comma and you have Class_A at the end, a call to fopen to open up a file for writing, fwrite to write our stuff to the file and fclose to finally close the file is needed. You also require that the digits are 3 digits of precision. Something like this comes to mind:
f = fopen('numbers.csv', 'w'); %// Open up the file
fprintf(f,'%.3f,%.3f,Class_A\n', N.'); %'// Write the data
fclose(f); %// Close the file
It's important to look at the second statement carefully. Note that I'm writing the transpose of N because MATLAB writes values in column-major order. This means that if you want the rows to be written to the file, you have to transpose the matrix to do that. numbers.csv is what the file is called when it is written. If you examine this file now, you'll see that it's in the form of x,y,Class_A where x,y is a pair of values from both normal distributions.
You can use mvnrnd(mu_vector, sigma_matrix)
mu = [-5;0];
Sigma = [1,0;0,1];
n = 1000;
X = mvnrnd(mu, Sigma, n);
Which of the following statements will find the minimum difference between any pair of elements (a,b) where a is from the vector A and b is from the vector B.
A. [X,Y] = meshgrid(A,B);
min(abs(X-Y))
B. [X,Y] = meshgrid(A,B);
min(abs(min(Y-X)))
C. min(abs(A-B))
D. [X,Y] = meshgrid(A,B);
min(min(abs(X-Y)))
Can someone please explain to me?
By saying "minimum difference between any pair of elements(a,b)", I presume you mean that you are treating A and B as sets and you intend to find the absolute difference in any possible pair of elements from these two sets. So in this case you should use your option D
[X,Y] = meshgrid(A,B);
min(min(abs(X-Y)))
Explanation: Meshgrid turns a pair of 1-D vectors into 2-D grids. This link can explain what I mean to say:
http://www.mathworks.com/help/matlab/ref/meshgrid.html?s_tid=gn_loc_drop
Hence (X-Y) will give the difference in all possible pairs (a,b) such that a belongs to A and b belongs to B. Note that this will be a 2-D matrix.
abs(X-Y) would return the absolute values of all elements in this matrix (the absolute difference in each pair).
To find the smallest element in this matrix you will have to use min(min(abs(X-Y))). This is because if Z is a matrix, min(Z) treats the columns of Z as vectors, returning a row vector containing the minimum element from each column. So a single min command will give a row vector with each element being the min of the elements of that column. Using min for a second time returns the min of this row vector. This would be the smallest element in the entire matrix.
This can help:
http://www.mathworks.com/help/matlab/ref/min.html?searchHighlight=min
Options C is correct if you treat A and B as vectors and not sets. In this case you won't be considering all possible pairs. You'll end up finding the minimum of (a-b) where a,b are both in the same position in their corresponding vectors (pair-wise difference).
D. [X,Y] = meshgrid(A,B);
min(min(abs(X-Y)))
meshgrid will generate two grids - X and Y - from the vectors, which are arranged so that X-Y will generate all combinations of ax-bx where ax is in a and bx is in b.
The rest of the expression just gets the minimum absolute value from the array resulting from the subtraction, which is the value you want.
CORRECT ANSWER IS D
Let m = size(A) and n = size(B)
You want to subtract each pair of (a,b) such that a is from vector A and b is from vector B.
meshgrid(A,B) creates two matrices X Y both of size nxm where X have rows sames have vector A while Yhas columns same as vector B .
Hence , Z = X-Y will give you a matrix with n*m values corresponding to the difference between each pair of values taken from A and B . Now all you have to do is to find the absolute minimum among all values of Z.
You can do that by
req_min = min(min(abs(z)))
The whole code is
[X Y ] = meshgrid(A,B);
Z= X-Y;
Z = abs(Z);
req_min = min(min(Z));
You could also use bsxfun instead of meshgrid:
min(min(abs(bsxfun(#minus, A(:), B(:).'))))
Or use pdist2:
min(min(pdist2(A(:),B(:))))
Imagine a set of data with given x-values (as a column vector) and several y-values combined in a matrix (row vector of column vectors). Some of the values in the matrix are not available:
%% Create the test data
N = 1e2; % Number of x-values
x = 2*sort(rand(N, 1))-1;
Y = [x.^2, x.^3, x.^4, x.^5, x.^6]; % Example values
Y(50:80, 4) = NaN(31, 1); % Some values are not avaiable
Now i have a column vector of new x-values for interpolation.
K = 1e2; % Number of interplolation values
x_i = rand(K, 1);
My goal is to find a fast way to interpolate all y-values for the given x_i values. If there are NaN values in the y-values, I want to use the y-value which is before the missing data. In the example case this would be the data in Y(49, :).
If I use interp1, I get NaN-values and the execution is slow for large x and x_i:
starttime = cputime;
Y_i1 = interp1(x, Y, x_i);
executiontime1 = cputime - starttime
An alternative is interp1q, which is about two times faster.
What is a very fast way which allows my modifications?
Possible ideas:
Do postprocessing of Y_i1 to eliminate NaN-values.
Use a combination of a loop and the find-command to always use the neighbour without interpolation.
Using interp1 with spline interpolation (spline) ignores NaN's.