Nearest neighbor analysis that provides me a distribution of adjacent values? - matlab

I am trying to perform a nearest neighbor type of analysis on an array based on a given set of (x,y) coordinate indexes (NN.HEME_indices). The array on which I am evaluating is a 142x128 double that has a number of positive values throughout (ATV_ng_g_raw). In this work, I’m trying to limit this nearest neighbor analysis to within 10 pixels of the indexes (including 0 and 10).
My goal is to retrieve two histograms from this analysis. One graph is where the x-axis is the number of pixels away from the indexes (from 0 to 10, inclusive) and the y-axis is the average of the non-zero numbers at each distance. The second histogram is the same but the y-axis is the average of all numbers (including the 0’s) at each distance.
I’ve placed a .mat file (“NN_04062021.mat) here. This includes:
“NN” data structure – contains the coordinate indeces that serve as the central points, (HEME_indices; 2046 x 2 double). This was derived from a separate 142x128 double
ATV_ng_g_raw – contains the array on which I would be performing the nearest neighbor analysis (142x128 double)
Anyone have any thoughts? Happy to clarify further or provide additional workspace variables if needed. Thank you!
yname='yname';
xname='xname';
indexname='raw_index';
[NN.yname,NN.xname]=find(ATV_ng_g_raw);
NN.(indexname)=[NN.(xname),NN.(yname)];
nnidxname='HEME_raw_idx';
nndistname='HEME_raw_dist';
[NN.(nnidxname),NN.(nndistname)]=knnsearch(NN.HEME_indices,NN.(indexname)); %knnsearch provides counts vs actual values in the ATV_ng_g_raw array
figure,histogram(NN.(nndistname),'BinMethod','integers');
ylabel('Counts of HEME+ voxels');
xlabel('Distance to ARVs (voxels)');

Related

MATLAB: count number of peaks

I have a graph like this and I want to determine the number of peaks. Since it's a wave function the whole graph has many peaks that is why I was unsuccefull in finding the number of peaks using functions such as findpeaks for the graph below it returns a number of peaks around 3000 whereas I want to have the number 7.
My idea was to do a for or while loop that counts the number of instances where the average is higher than 0.5. So ultimately I want a function that iterates in segments over the graph returns the number of peaks and the range of y index values for which this peak occurs (I think the best way to do this would to save them to a zeros matrix).
link of file data: https://www.dropbox.com/s/vv8vqv28mqzfr9l/Example_data.mat?dl=0
Do you mean you are trying to count the 'on' parts of your data?
You're on the right track using findpeaks. If you scroll to the bottom of the documentation you'll see that you can actually tweak the routine in various ways, for example specifying the minimum distance between peaks, or the minimum difference between a point and its neighbour before it is considered a peak.
By defining the minimum distance between peaks, I detected the following 7 peaks. Code is included below. Alternatively you can play around with the other parameters you can pass into findpeaks.
The only other thing to note is that I took the absolute value of your data.
load('Example_data.mat')
indx = 1:numel(number11);
[pks, locs] = findpeaks(abs(number11), indx, 'MinPeakDistance', 0.25e4);
hold on
plot(number11)
plot(locs,pks, 'rx')
disp(numel(pks))

K-means Clustering, major understanding issue

Suppose that we have a 64dim matrix to cluster, let's say that the matrix dataset is dt=64x150.
Using from vl_feat's library its kmeans function, I will cluster my dataset to 20 centrers:
[centers, assignments] = vl_kmeans(dt, 20);
centers is a 64x20 matrix.
assignments is a 1x150 matrix with values inside it.
According to manual: The vector assignments contains the (hard) assignments of the input data to the clusters.
I still can not understand what those numbers in the matrix assignments mean. I dont get it at all. Anyone mind helping me a bit here? An example or something would be great. What do these values represent anyway?
In k-means the problem you are trying to solve is the problem of clustering your 150 points into 20 clusters. Each point is a 64-dimension point and thus represented by a vector of size 64. So in your case dt is the set of points, each column is a 64-dim vector.
After running the algorithm you get centers and assignments. centers are the 20 positions of the cluster's center in a 64-dim space, in case you want to visualize it, measure distances between points and clusters, etc. 'assignments' on the other hand contains the actual assignments of each 64-dim point in dt. So if assignments[7] is 15 it indicates that the 7th vector in dt belongs to the 15th cluster.
For example here you can see clustering of lots of 2d points, let's say 1000 into 3 clusters. In this case dt would be 2x1000, centers would be 2x3 and assignments would be 1x1000 and will hold numbers ranging from 1 to 3 (or 0 to 2, in case you're using openCV)
EDIT:
The code to produce this image is located here: http://pypr.sourceforge.net/kmeans.html#k-means-example along with a tutorial on kmeans for pyPR.
In openCV it is the number of the cluster that each of the input points belong to

different sized bins in matlab

In Matlab I have a vector Muen which I want to reduce in size by dividing it in to different length bins. The vector has a few values that need high accuracy bins and a lot of values that are roughly equal and could be collected into bins with size of up to a few hundred values.
I also need to know the index for all old bins going into a new bin in order to shorten a sencod vector fluence.
The goal is to speed up a summation of two vectors sum(fluence.*Muen) by using different sized bins determined by Meun and do the sum of fluence into the new bins before the vector multiplication.
For this I try to use
edges=[min(Muen):0.0001:Muen(13),Muen(12:-1:1));
[N,bin]=histc(*Muen*,edges)
The problem is how to make the vector edges, as there is a large difference between the maximum and minimum of Muen and a small difference between other values. Is there a way to make the steps of edges depending on the derivative Muen?
In order to get the shorter version of Muen would be something like
MuenShort=N.*edges;
but it did not work quit right (could be a fault in edges), any suggestions?
I also do not really get how bin gives the index of the values that go into the new bins?
clarification:
what I want to do is from a vector m or Muen take the elements that are roughly equal and replace the with one element and at the same time keeping track of the index for which element goes into a new vector n or MuenShort. example
{m1}->n1,(1), {m2}->n2,(2), {m3,m4}-> m3=m4=n3,(3,4),{m5,m6,m7,m8}-> m5=m6=m7=m8=n4,{5,6,7,8}...
where n1>>n2 but the difference between n3 and n4 might not be so large. the number of m-elements in each n-element should be determined by the number of m-elements that are roughly equal to each other, or rather lies between two limits. So the bin size should vary between one element to a few hundred elements.
Then I want to use indexes to make the fluence vector shorter
fluenceShort(1:length(MuenShort))= [sum(fluence(1)),sum(fluence(2)),sum(fluence(3,4)),sum(fluence(5,6,7,8))...];
goal=sum(fluenceShort.*MuenShort)
Is there a way to implement this in Matlab?
Even if I don't understand your question clearly, I would suggest this. Perhaps you could sort your vector muen, pick a fixed number n, and define each bin so that it contains exactly n values of muen. For simplicity, the length of muen is assumed to be a multiple of n:
n = 10;
m = length(muen_sorted)/n;
muen_sorted = sort(muen);
edges = [-inf mean([muen_sorted(n:n:end-1); muen_sorted(n+1:n:end)]) inf ];
muen_short = mean(reshape(muen_sorted,n,m));
Note that m+1 edges (vector edges) are obtained, corresponding to m bins. Bin edges lie exactly between the closest values of neighbouring bins. Thus, the upper edge of the first bin is (muen_sorted(n)+muen_sorted(n+1)/2; the upper edge of the next bin is (muen_sorted(2*n)+muen_sorted(2*n+1)/2, and so on.
The "representative value" of each bin (vector muen_short) is computed as the mean of the values that lie in that bin. Or perhaps the median would make more sense, depending on your application.
As a result of this code, muen_short(1) is the value corresponding to the bin with edges edge(1) and edge(2); muen_short(2) is the value corresponding to the bin with edges edge(2) and edge(3), etc.
You can now use the variable edges to build the histogram of fluence with those same edges.

Controlled random number/dataset generation in MATLAB

Say, I have a cube of dimensions 1x1x1 spanning between coordinates (0,0,0) and (1,1,1). I want to generate a random set of points (assume 10 points) within this cube which are somewhat uniformly distributed (i.e. within certain minimum and maximum distance from each other and also not too close to the boundaries). How do I go about this without using loops? If this is not possible using vector/matrix operations then the solution with loops will also do.
Let me provide some more background details about my problem (This will help in terms of what I exactly need and why). I want to integrate a function, F(x,y,z), inside a polyhedron. I want to do it numerically as follows:
$F(x,y,z) = \sum_{i} F(x_i,y_i,z_i) \times V_i(x_i,y_i,z_i)$
Here, $F(x_i,y_i,z_i)$ is the value of function at point $(x_i,y_i,z_i)$ and $V_i$ is the weight. So to calculate the integral accurately, I need to identify set of random points which are not too close to each other or not too far from each other (Sorry but I myself don't know what this range is. I will be able to figure this out using parametric study only after I have a working code). Also, I need to do this for a 3D mesh which has multiple polyhedrons, hence I want to avoid loops to speed things out.
Check out this nice random vectors generator with fixed sum FEX file.
The code "generates m random n-element column vectors of values, [x1;x2;...;xn], each with a fixed sum, s, and subject to a restriction a<=xi<=b. The vectors are randomly and uniformly distributed in the n-1 dimensional space of solutions. This is accomplished by decomposing that space into a number of different types of simplexes (the many-dimensional generalizations of line segments, triangles, and tetrahedra.) The 'rand' function is used to distribute vectors within each simplex uniformly, and further calls on 'rand' serve to select different types of simplexes with probabilities proportional to their respective n-1 dimensional volumes. This algorithm does not perform any rejection of solutions - all are generated so as to already fit within the prescribed hypercube."
Use i=rand(3,10) where each column corresponds to one point, and each row corresponds to the coordinate in one axis (x,y,z)

Using triplequad to calculate density (in Matlab)

As i've explained in a previous question: I have a dataset consisting of a large semi-random collection of points in three dimensional euclidian space. In this collection of points, i am trying to find the point that is closest to the area with the highest density of points.
As high performance mark answered;
the most straightforward thing to do would be to divide your subset of
Euclidean space into lots of little unit volumes (voxels) and count
how many points there are in each one. The voxel with the most points
is where the density of points is at its highest. Perhaps initially
dividing your space into 2 x 2 x 2 voxels, then choosing the voxel
with most points and sub-dividing that in turn until your criteria are
satisfied.
Mark suggested i use triplequad for this, but this is not a function i am familiar with, or understand very well. Does anyone have any pointers on how i could go about using this function in Matlab for what i am trying to do?
For example, say i have a random normally distributed matrix A = randn([300,300,300]), how could i use triplequad to find the point i am looking for? Because as i understand currently, i also have to provide triplequad with a function fun when using it. Which function should that be for this problem?
Here's an answer which doesn't use triplequad.
For the purposes of exposition I define an array of data like this:
A = rand([30,3])*10;
which gives me 30 points uniformly distributed in the box (0:10,0:10,0:10). Note that in this explanation a point in 3D space is represented by each row in A. Now define a 3D array for the counts of points in each voxel:
counts = zeros(10,10,10)
Here I've chosen to have a 10x10x10 array of voxels, but this is just for convenience, it would be only a little more difficult to have chosen some other number of voxels in each dimension, and there don't have to be the same number of voxels along each axis. Then the code
for ix = 1:size(A,1)
counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3))) = counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3)))+1
end
will count up the number of points in each of the voxels in counts.
EDIT
Unfortunately I have to do some work this afternoon and won't be able to get back to wrestling with the triplequad solution until later. Hope this is OK in the meantime.