MATLAB Finding the number of data points within confidence intervals - matlab

I have a set of data in Matlab, a matrix 6256x48, that I have found the mean, std dev, and CI intervals for. This was done using:
[muhat1,sigmahat1,muci1,sigmaci1] = normfit(data);
My question is, how can I find the number of results or data points in each column of the original data that are within the confidence intervals within the muci1 array.
The muci1 array is 2 rows of 48 points, the top row being the lower bound and the bottom row being the upper bound.

data = rand(6258,48); %//data
[A,B]=size(data); %// size of your data
[muhat1,sigmahat1,muci1,sigmaci1] = normfit(data); %//stats of your data
mask(A,B)=0; %// create output mask
for ii = 1:B
mask(:,ii) = data(:,ii)<muci1(2,ii)&data(:,ii)>muci1(1,ii); %// fill mask
end
FinalResult = sum(mask,1); %// number of points within CI per column
finalresult2 = sum(FinalResult); %// number of points within all CIs total
The for loop searches for entries in each column that are between the two bounds as given by muci1. If a number is between the bounds it gets a 1 in the mask, otherwise it becomes a 0.

Related

Distance along x-axis between Global Maximum of a Wave Pattern and next Consecutive Maximum/Minimum

I have a set of wave patterns (one example wave pattern included with the highest peak visible somewhere around x = 420) with various alternating peaks and troughs and have been able to determine the global maximum of the wave pattern using max. I now need to determine the distance along the 'x'-axis from this global maximum to the next consecutive local minimum immediately afterward and the distance to the next consecutive local maximum.
I have used findpeaks to find positive and negative peaks using the code as follows:
pospks = findpeaks(h_dat(end,xi(i):xf(i)));
negpks = findpeaks(-((h_dat(end,xi(i):xf(i)))))
since the negative version should return the minima. h_dat(end,xi(i):xf(i)) is just the code which returns the wave pattern based on my data (corresponding to multiple wave patterns with different parameters at 'end times' when a stable state has been reached with a corresponding stable global maximum).
I have attempted to sort them using the sort function but I am not sure if that helps as it will put them in ascending/descending order and you can see from my image that the heights of the local maxima and minima tend to be arbitrary so sorting them in ascending/descending order would not guarantee finding the distance from the highest peak (call it hmax) to the next maximum or minimum. My idea was to use a distance function to find the distance from hmax to the next closest member of pospks and then to the closest member of negpks in order to find both relevant distances but have researched into it and still not sure how to do this (apologies if all this is very simple as I am quite new to Matlab).
Simply search for the first minimum that is after your global maximum, or the end of the data set if the maximum is last.
%% create the data set
x = rand(150,1);
idx = 55:85;
xx = x(idx);
xx = xx*max(x(:))/max(xx(:))*1.2; %% create a peak that will be in the middle of the data
x(idx) = xx;
x = smooth(x);
x = smooth(x);
%% find the next minimum
[pospks,locks] = findpeaks(x); %find all the maximums
[~,locksN] = findpeaks(-x); %find only the location of all the minimums
[~,maxIdx] = max(pospks); %find the location of the (first)global maximum
maxIdx = locks(maxIdx);
minIdxs = locksN(locksN > maxIdx); %find the indexes of all the minimums that are after the global maximum
if (isempty(minIdxs))
minIdx = length(x); % take the last element of the dataset
else
minIdx = minIdxs(1); % take the first min after the global max
end
plot(1:length(x),x,'green');
hold on;
plot(maxIdx,x(maxIdx),'bo');
plot(minIdx,x(minIdx),'r*');
hold off;
The result is:

MATLAB Staying in bounds of Complex matrix

I have a complex matrix cdata, that is 2144x2048. I am getting elements from cdata that are greater than a specified threshold by doing the following:
[row, col] = find(abs(cdata) > threshold);
row and col can have multiple values. Then, I take the row and col values, I perform a calculation to get N samples of the real x-data, and 33 samples of the y-data as follows:
xdata = real(cdata(row(i),col(i)-bw:col(i)+bw))
ydata = real(cdata(row(i)-bw:row(i)+bw,col(i)-bw:col(i)+bw))
where bw is a constant value that determines the number of samples I need to get. During this calculation, specifically the column portion of cdata for the xdata and the row portion of cdata for the ydata, I exceed the bounds of the matrix and MATLAB throws the following error:
??? Subscript indices must either be real positive integers or logicals
How can I ensure that I don't exceed the bounds? I'm ok with having to skip a row/col pair if it is going to exceed the bounds.
The reason you're having problems is because you're not restricting your search to closer then bw from the edge of the matrix. This means its possible to find values above the threshold near the edges of the matrix. When you add or subtract bw from these indices you end up out of bounds. You can restrict your search like this.
[row, col] = find(abs(cdata(bw+1:end-bw,bw+1:end-bw)) > threshold);
row = row + bw;
col = col + bw;
This guarantees your row and column indices are within the bounds so when you grab a region surrounding them you won't go out of bounds.
On a side note. The ydata variable in your code is indexing an entire square region of the matrix and the xdata is only indexing a section of a row. Should your ydata actually be ydata = real(cdata(row(i)-bw:row(i)+bw, col(i)))?

How do I visualize n-dimensional features?

I have two matrices A and B. The size of A is 200*1000 double (here: 1000 represents 1000 different features). Matrix A belongs to group 1, where I use ones(200,1) as the label vector. The size of B is also 200*1000 double (here: 1000 also represents 1000 different features). Matrix B belongs to group 2, where I use -1*ones(200,1) as the label vector.
My question is how do I visualize matrices A and B so that I can clearly distinguish them based on the given groups?
I'm assuming each sample in your matrices A and B is determined by a row in either matrix. If I understand you correctly, you want to draw a series of 1000-dimensional vectors, which is impossible. We can't physically visualize anything beyond three dimensions.
As such, what I suggest you do is perform a dimensionality reduction to reduce your data so that each input is reduced to either 2 or 3 dimensions. Once you reduce your data, you can plot them normally and assign a different marker to each point, depending on what group they belonged to.
If you want to achieve this in MATLAB, use Principal Components Analysis, specifically the pca function in MATLAB, that calculates the residuals and the reprojected samples if you were to reproject them onto a lower dimensionality. I'm assuming you have the Statistics Toolbox... if you don't, then sorry this won't work.
Specifically, given your matrices A and B, you would do this:
[coeffA, scoreA] = pca(A);
[coeffB, scoreB] = pca(B);
numDimensions = 2;
scoreAred = scoreA(:,1:numDimensions);
scoreBred = scoreB(:,1:numDimensions);
The second output of pca gives you reprojected values and so you simply have to determine how many dimensions you want by extracting the first N columns, where N is the desired number of dimensions you want.
I chose 2 for now, and we can see what it looks like in 3 dimensions after. Once we have what we need for 2 dimensions, it's just a matter of plotting:
plot(scoreAred(:,1), scoreAred(:,2), 'rx', scoreBred(:,1), scoreBred(:,2), 'bo');
This will produce a plot where the samples from matrix A are with red crosses while the samples from matrix B are with blue circles.
Here's a sample run given completely random data:
rng(123); %// Set seed for reproducibility
A = rand(200,1000); B = rand(200,1000); %// Generate random data
%// Code as before
[coeffA, scoreA] = pca(A);
[coeffB, scoreB] = pca(B);
numDimensions = 2;
scoreAred = scoreA(:,1:numDimensions);
scoreBred = scoreB(:,1:numDimensions);
%// Plot the data
plot(scoreAred(:,1), scoreAred(:,2), 'rx', scoreBred(:,1), scoreBred(:,2), 'bo');
We get this:
If you want three dimensions, simply change numDimensions = 3, then change the plot code to use plot3:
plot3(scoreAred(:,1), scoreAred(:,2), scoreAred(:,3), 'rx', scoreBred(:,1), scoreBred(:,2), scoreBred(:,3), 'bo');
grid;
With those changes, this is what we get:

How to create matrix of nearest neighbours from dataset using matrix of indices - matlab

I have an Nx2 matrix of data points where each row is a data point. I also have an NxK matrix of indices of the K nearest neighbours from the knnsearch function. I am trying to create a matrix that contains in each row the data point followed by the K neighbouring data points, i.e. for K = 2 we would have something like [data1, neighbour1, neighbour2] for each row.
I have been messing round with loops and attempting to index with matrices but to no avail, the fact that each datapoint is 1x2 is confusing me.
My ultimate aim is to calculate gradients to train an RBF network in a similar manner to:
D = (x_dist - y_dist)./(y_dist+(y_dist==0));
temp = y';
neg_gradient = -2.*sum(kron(D, ones(1,2)) .* ...
(repmat(y, 1, ndata) - repmat((temp(:))', ndata, 1)), 1);
neg_gradient = (reshape(neg_gradient, net.nout, ndata))';
You could use something along those lines:
K = 2;
nearest = knnsearch(data, data, 'K', K+1);%// Gets point itself and K nearest ones
mat = reshape(data(nearest.',:).',[],N).'; %// Extracts the coordinates
We generate data(nearest.',:) to get a 3*N-by-2 matrix, where every 3 consecutive rows are the points that correspond to each other. We transpose this to get the xy-coordinates into the same column. (MATLAB is column major, i.e. values in a column are stored consecutively). Then we reshape the data, so every column contains the xy-coordinates of the rows of nearest. So we only need to transpose once more in the end.

mean squared displacement from multiple trajectories

I have a matrix of multiple particle trajectories that I would like to analyze separately The trajectory number is one of the columns of the matrix, so I am trying to sort based on that number. I am using some of the code from this answer: MSD with matlab (which was very helpful, thank you!) to calculate MSD, but I am having difficulty parsing out the individual trajectories. To explain in more detail what I am trying to do: I have trajectory outputs that are in matrix format, with one column for trajectory number, one column for x-position, one column for y-position, etc. I want to be able to take this information and calculate the mean-squared displacement for each trajectory. In order to do this, I have to create a way to distinguish data points based on trajectory number (which is listed in row 7 of mymatrix). This seems to be where I am having trouble. The important columns in this matrix are 1: x-position, 2: y-position and 7: trajectory number. So far I have
total_rows=size(mymatrix,1);
max_trajectory_number=mymatrix(total_rows,7);
nData=0;
msd=zeros(total_rows, 4)
for i=0:max_trajectory_number
trajectornumber= mymatrix(i,7);
if trajectorynumber.equals(i)
nData=nData+1; %counts the number of instances of this trajectory number, which is the number of data points in the trajectory
for dt = 1:nData
deltaCoords = mymatrix(1+dt:end,1:2) - traj0mat(1:end-dt,1:2); %calculates time-averaged MSD based on y and y positions in colums 1 and 2 respectively
squaredDisplacement = sum(deltaCoords.^2,2); %# dx^2+dy^2+dz^2
msd(dt,1) = trajectorynumber; %trajectory number
msd(dt,2) = mean(squaredDisplacement); %# average
msd(dt,3) = std(squaredDisplacement); %# std
msd(dt,4) = length(squaredDisplacement); %# n
end
end
Unfortunately when I run this on mymatrix, the resulting msd matrix remains all zeros. I think this is likely due to an error in sorting based on the trajectory number. I do not get an error just not the results I was looking for
If anyone has any suggestions on how to fix this, it would be greatly appreciated.
It looks like you want to bundle all rows identified by the same trajectory number. I assume that they show up in chronological order as you continue down a column. Then try something like
tnumbs = unique(mymatrix(:,7)); % identify unique trajectory numbers
for i=1:length(tnumbs) % loop through trajectories
icurr = find(mymatrix(:,7)==tnumbs(i)); % find indices to entries for current trajectory
% perform your averaging
deltaCoords = mymatrix(icurr(1+dt:end),1:2) - traj0mat(icurr(1:end-dt),1:2); %calculates time-averaged MSD based on y and y positions in colums 1 and 2 respectively
squaredDisplacement = sum(deltaCoords.^2,2); %# dx^2+dy^2+dz^2
msd(i,1) = tnumbs(i); %trajectory number
msd(i,2) = mean(squaredDisplacement); %# average
msd(i,3) = std(squaredDisplacement); %# std
msd(i,4) = length(squaredDisplacement); %# n
end