different sized bins in matlab - matlab

In Matlab I have a vector Muen which I want to reduce in size by dividing it in to different length bins. The vector has a few values that need high accuracy bins and a lot of values that are roughly equal and could be collected into bins with size of up to a few hundred values.
I also need to know the index for all old bins going into a new bin in order to shorten a sencod vector fluence.
The goal is to speed up a summation of two vectors sum(fluence.*Muen) by using different sized bins determined by Meun and do the sum of fluence into the new bins before the vector multiplication.
For this I try to use
edges=[min(Muen):0.0001:Muen(13),Muen(12:-1:1));
[N,bin]=histc(*Muen*,edges)
The problem is how to make the vector edges, as there is a large difference between the maximum and minimum of Muen and a small difference between other values. Is there a way to make the steps of edges depending on the derivative Muen?
In order to get the shorter version of Muen would be something like
MuenShort=N.*edges;
but it did not work quit right (could be a fault in edges), any suggestions?
I also do not really get how bin gives the index of the values that go into the new bins?
clarification:
what I want to do is from a vector m or Muen take the elements that are roughly equal and replace the with one element and at the same time keeping track of the index for which element goes into a new vector n or MuenShort. example
{m1}->n1,(1), {m2}->n2,(2), {m3,m4}-> m3=m4=n3,(3,4),{m5,m6,m7,m8}-> m5=m6=m7=m8=n4,{5,6,7,8}...
where n1>>n2 but the difference between n3 and n4 might not be so large. the number of m-elements in each n-element should be determined by the number of m-elements that are roughly equal to each other, or rather lies between two limits. So the bin size should vary between one element to a few hundred elements.
Then I want to use indexes to make the fluence vector shorter
fluenceShort(1:length(MuenShort))= [sum(fluence(1)),sum(fluence(2)),sum(fluence(3,4)),sum(fluence(5,6,7,8))...];
goal=sum(fluenceShort.*MuenShort)
Is there a way to implement this in Matlab?

Even if I don't understand your question clearly, I would suggest this. Perhaps you could sort your vector muen, pick a fixed number n, and define each bin so that it contains exactly n values of muen. For simplicity, the length of muen is assumed to be a multiple of n:
n = 10;
m = length(muen_sorted)/n;
muen_sorted = sort(muen);
edges = [-inf mean([muen_sorted(n:n:end-1); muen_sorted(n+1:n:end)]) inf ];
muen_short = mean(reshape(muen_sorted,n,m));
Note that m+1 edges (vector edges) are obtained, corresponding to m bins. Bin edges lie exactly between the closest values of neighbouring bins. Thus, the upper edge of the first bin is (muen_sorted(n)+muen_sorted(n+1)/2; the upper edge of the next bin is (muen_sorted(2*n)+muen_sorted(2*n+1)/2, and so on.
The "representative value" of each bin (vector muen_short) is computed as the mean of the values that lie in that bin. Or perhaps the median would make more sense, depending on your application.
As a result of this code, muen_short(1) is the value corresponding to the bin with edges edge(1) and edge(2); muen_short(2) is the value corresponding to the bin with edges edge(2) and edge(3), etc.
You can now use the variable edges to build the histogram of fluence with those same edges.

Related

Finding length between a lot of elements

I have an image of a cytoskeleton. There are a lot of small objects inside and I want to calculate the length between all of them in every axis and to get a matrix with all this data. I am trying to do this in matlab.
My final aim is to figure out if there is any axis with a constant distance between the object.
I've tried bwdist and to use connected components without any luck.
Do you have any other ideas?
So, the end goal is that you want to globally stretch this image in a certain direction (linearly) so that the distances between nearest pairs end up the closest together, hopefully the same? Or may you do more complex stretching ? (note that with arbitrarily complex one you can always make it work :) )
If linear global one, distance in x' and y' is going to be a simple multiplication of the old distance in x and y, applied to every pair of points. So, the final euclidean distance will end up being sqrt((SX*x)^2 + (SY*y)^2), with SX being stretch in x and SY stretch in y; X and Y are distances in X and Y between pairs of points.
If you are interested in just "the same" part, solution is not so difficult:
Find all objects of interest and put their X and Y coordinates in a N*2 matrix.
Calculate distances between all pairs of objects in X and Y. You will end up with 2 matrices sized N*N (with 0 on the diagonal, symmetric and real, not sure what is the name for that type of matrix).
Find minimum distance (say this is between A an B).
You probably already have this. Now:
Take C. Make N-1 transformations, which all end up in C->nearestToC = A->B. It is a simple system of equations, you have X1^2*SX^2+Y1^2*SY^2 = X2^2*SX^2+Y2*SY^2.
So, first say A->B = C->A, then A->B = C->B, then A->B = C->D etc etc. Make sure transformation is normalized => SX^2 + SY^2 = 1. If it cannot be found, the only valid transformation is SX = SY = 0 which means you don't have solution here. Obviously, SX and SY need to be real.
Note that this solution is unique except in case where X1 = X2 and Y1 = Y2. In this case, grab some other point than C to find this transformation.
For each transformation check the remaining points and find all nearest neighbours of them. If distance is always the same as these 2 (to a given tolerance), great, you found your transformation. If not, this transformation does not work and you should continue with the next one.
If you want a transformation that minimizes variations between distances (but doesn't require them to be nearly equal), I would do some optimization method and search for a minimum - I don't know how to find an exact solution otherwise. I would pick this also in case you don't have linear or global stretch.
If i understand your question correctly, the first step is to obtain all of the objects center of mass points in the image as (x,y) coordinates. Then, you can easily compute all of the distances between all points. I suggest taking a look on a histogram of those distances which may provide some information as to the nature of distance distribution (for example if it is uniformly random, or are there any patterns that appear).
Obtaining the center of mass points is not an easy task, consider transforming the image into a binary one, or some sort of background subtraction with blob detection or/and edge detector.
For building a histogram you can use histogram.

Knowing which elements from dataset went into tallest bin in histogram? MATLAB

I want to know which elements from my data set went into the tallest bin in a bivariate histogram, and have not found information on how to do this online. I suspect this is possible since it is fairly useful.
I know I can do some other code that helps me find it but I was wondering if there is a succinct way of doing this. For example I could search through the dataset with a conditional that helps me extract the things falling into the bins but I'm not interested in that. Right now I have written
X = [Eavg,Estdev];
hist3(X,[15 15])
The result is a 15x15 bin bivariate histogram. I want to extract the elements in the tallest bin in a very terse manner.
I'm doing a statistical mechanics (Monte Carlo) simulation, if its worth mentioning...
The signature [N, CEN] = hist3(... returns bincounts and center of bins. Bin centers can be converted to bin edges. Then edges can be use to find which data elements fall into a specific bin.
X = randi([1 100],10,2);
[N, CEN] = hist3(X,[5 5]);
%find row and column of highest value of histogram
%since there may be multiple histogram values that
%are equal to maximum value then we select the first one
[r,c]= find(N==max(N(:)),1);
%convert cell of bin centers to vector
R = [CEN{1}];
C = [CEN{2}];
%convert bin centers to edges
%realmax used to include values that
%are beyond the first and the last computed edges
ER = [-realmax R(1:end-1)+diff(R)/2 realmax];
EC = [-realmax C(1:end-1)+diff(C)/2 realmax];
%logical indices of rows where data fall into specified bin
IDX = X(:,1)>= ER(r) & X(:,1)< ER(r+1) & X(:,2)>= EC(c) & X(:,2)< EC(c+1)

Matlab : Help in finding minimum distance

I am trying to find the point that is at a minimum distance from the candidate set. Z is a matrix where the rows are the dimension and columns indicate points. Computing the inter-point distances, and then recording the point with minimum distance and its distance as well. Below is the code snippet. The code works fine for a small dimension and small set of points. But, it takes a long time for large data set (N = 1 million data points and dimension is also high). Is there an efficient way?
I suggest that you use pdist to do the heavy lifting for you. This function will compute the pairwise distance between every two points in your array. The resulting vector has to be put into matrix form using squareform in order to find the minimal value for each pair:
N = 100;
Z = rand(2,N); % each column is a 2-dimensional point
% pdist assumes that the second index corresponds to dimensions
% so we need to transpose inside pdist()
distmatrix = squareform(pdist(Z.','euclidean')); % output is [N, N] in size
% set diagonal values to infinity to avoid getting 0 self-distance as minimum
distmatrix = distmatrix + diag(inf(1,size(distmatrix,1)));
mindists = min(distmatrix,[],2); % find the minimum for each row
sum_dist = sum(mindists); % sum of minimal distance between each pair of points
This computes every pair twice, but I think this is true for your original implementation.
The idea is that pdist computes the pairwise distance between the columns of its input. So we put the transpose of Z into pdist. Since the full output is always a square matrix with zero diagonal, pdist is implemented such that it only returns the values above the diagonal, in a vector. So a call to squareform is needed to get the proper distance matrix. Then, the row-wise minimum of this matrix have to be found, but first we have to exclude the zero in the diagonals. I was lazy so I put inf into the diagonals, to make sure that the minimum is elsewhere. In the end we just have to sum up the minimal distances.

Kullback Leibler Divergence of 2 Histograms in MatLab

I would like a function to calculate the KL distance between two histograms in MatLab. I tried this code:
http://www.mathworks.com/matlabcentral/fileexchange/13089-kldiv
However, it says that I should have two distributions P and Q of sizes n x nbins. However, I am having trouble understanding how the author of the package wants me to arrange the histograms. I thought that providing the discretized values of the random variable together with the number of bins would suffice (I would assume the algorithm would use an arbitrary support to evaluate the expectations).
Any help is appreciated.
Thanks.
The function you link to requires that the two histograms passed be aligned and thus have the same length NBIN x N (not N X NBIN), that is, if N>1 then the number of rows in the inputs should be equal to the number of bins in the histograms. If you are just going to compare two histograms (that is if N=1) it doesn't really matter, you can pass either row or column vector versions of these as long as you are consistent and the order of bins matches.
A generic call to the function looks like this:
dists = kldiv(bins,P,Q)
The implementation allows comparison of multiple histograms to each other (that is, N>1), in which case pairs of columns (with matching column index) in each array are compared and the result is a row vector with distances for each matching pair.
Array bins should be the same size as P and Q and is used to perform a very minimal check that the inputs are of the same size, but is not used in the computation. The routine expects bins to contain the numeric labels of your bins so that it can check for repeated bin labels and warn you if repeats occur, but otherwise doesn't use the information.
You could do away with bins and compute the distance with
KL = sum(P .* (log2(P)-log2(Q)));
without using the Matlab Central versions. However the version you link to performs the abovementioned minimal checks and in addition allows computation of two alternative distances (consult the documentation).
The version linked to by eigenchris checks that no histogram bins are empty (which would make the computation blow up numerically) and if there are, removes their contribution to the sum (not sure this is entirely appropriate - consult an expert on the subject). It should probably also be aware of the exact form of the formula, specifically note the use of log2 above versus natural logarithm in the version linked to by eigenchris.

Selecting values plotted on a scatter3 plot

I have a 3d matrix of 100x100x100. Each point of that matrix has assigned a value that corresponds to a certain signal strength. If I plot all the points the result is incomprehensible and requires horsepower to compute, due to the large amount of points that are painted.
The next picture examplify the problem (in that case the matrix was 50x50x50 for reducing the computation time):
[x,y,z] = meshgrid(1:50,1:50,1:50);
scatter3(x(:),y(:),z(:),5,strength(:),'filled')
I would like to plot only the highest values (for example, the top 10). How can I do it?
One simple solution that came up in my mind is to asign "nan" to the values higher than the treshold.
Even the results are nice I think that it must be a most elegant solution to fix it.
Reshape it into an nx1 vector. Sort that vector and take the first ten values.
num_of_rows = size(M,1)
V = reshape(M,num_of_rows,1);
sorted_V = sort(V,'descend');
ind = sorted_V(1:10)
I am assuming that M is your 3D matrix. This will give you your top ten values in your matrix and the respective index. The you can use ind2sub() to get the x,y,z.