histogram in matlab with huge input - matlab

I am trying to make a histogram in matlab. My data size is huge (3.5 million), x and y data are the same size (both are 3.5 million)
My original data has 200,200,88 3D matrix, I reshaped it to 1 column
the code for this:
[dose , size] = Dose('C:\R1')
s = size(1)*size(2).size(3)
t = reshape(dose, s, [])
When I try the command hist(t), I got a 1 bar only.
My workspace is as the following:
dose <200x200x88 double>
s 3520000
size [200,200,88]
t <3520000x1 double>
Could you tell me how to make a histogram with this data?

I'm able to generate a vector of size 3520000x1 and build a histogram with it.
val=rand(3520000,1);
hist(val)
It's possible your data has a few singular outliers causing your bins to look something like (1,0,0,...,3519999).
If you save your histogram bins like h=hist(data); you can see what happened.

In order to get a single long vector from your 3D array you can use just the (:) operator. Try the following code:
num_of_bins = 100 ; %change to whatever # you want
hist(dose(:),linspace(min(dose(:)),max(dose(:)),num_of_bins));
The hist will take only the relevant limits of dose (min to max) and you can control the # of bins at will. I've used linspace to create a linearly spaced bin vector, but this can be modified also to a different set of bins by assigning a different range vector.

Related

How to compute histogram using three variables in MATLAB?

I have three variables, e.g., latitude, longitude and temperature. For each latitude and longitude, I have corresponding temperature value. I want to plot latitude v/s longitude plot in 5 degree x 5 degree grid , with mean temperature value inserted in that particular grid instead of occurring frequency.
Data= [latGrid,lonGrid] = meshgrid(25:45,125:145);
T = table(latGrid(:),lonGrid(:),randi([0,35],size(latGrid(:))),...
'VariableNames',{'lat','lon','temp'});
At the end, I need it somewhat like the following image:
Sounds to me like you want to scale your grid. The easiest way to do this is to smooth and downsample.
While 2d histograms also bin values into a grid, using a histogram is not the way to find the mean of datapoints in a smooth grid. A histogram counts the occurrence of values in a set of ranges. In a 2d example, a histogram would take the input measurements [1, 3, 3, 5] and count the number of ones, the number of threes, etc. A 2d histogram will count occurrences of pairs of numbers. (You might want to use histogram to help organize a measurements taken at irregular intervals, but that would be a different question)
How to smooth and downsample without the Image Processing Toolbox
Keep your data in the 2d matrix format rather than reshaping it into a table. This makes it easier to find the neighbors of each grid location.
%% Sample Data
[latGrid,lonGrid] = meshgrid(25:45,125:145);
temp = rand(size(latGrid));
There are many tools in Matlab for smoothing matrices. If you want to have the mean of a 5x5 window. You can write a for-loop, use a convolution, or use filter2. My example uses convolution. For more on convolutional filters, I suggest the wikipedia page.
%% Mean filter with conv2
M = ones(5) ./ 25; % 5x5 mean or box blur filter
C_temp = conv2(temp, M, 'valid');
C_temp is a blurry version of the original temperature variable with a slightly smaller size because we can't accurately take the mean of the edges. The border is reduced by a frame of 2 measurements. Now, we just need to take every fifth measurement from C_temp to scale down the grid.
%% Subsample result
C_temp = C_temp(1:5:end, 1:5:end);
% Because we removed a border from C_temp, we also need to remove a border from latGrid and lonGrid
[h, w] = size(latGrid)
latGrid = latGrid(5:5:h-5, 5:5:w-5);
lonGrid = lonGrid(5:5:h-5, 5:5,w-5);
Here's what the steps look like
If you use a slightly more organized, temp variable. It's easier to see that the result is correct.
With Image Processing Toolbox
imresize has a box filter method option that is equivalent to a mean filter. However, you have to do a little calculation to find the scaling factor that is equivalent to using a 5x5 window.
C_temp = imresize(temp, scale, 'box');

Selecting values plotted on a scatter3 plot

I have a 3d matrix of 100x100x100. Each point of that matrix has assigned a value that corresponds to a certain signal strength. If I plot all the points the result is incomprehensible and requires horsepower to compute, due to the large amount of points that are painted.
The next picture examplify the problem (in that case the matrix was 50x50x50 for reducing the computation time):
[x,y,z] = meshgrid(1:50,1:50,1:50);
scatter3(x(:),y(:),z(:),5,strength(:),'filled')
I would like to plot only the highest values (for example, the top 10). How can I do it?
One simple solution that came up in my mind is to asign "nan" to the values higher than the treshold.
Even the results are nice I think that it must be a most elegant solution to fix it.
Reshape it into an nx1 vector. Sort that vector and take the first ten values.
num_of_rows = size(M,1)
V = reshape(M,num_of_rows,1);
sorted_V = sort(V,'descend');
ind = sorted_V(1:10)
I am assuming that M is your 3D matrix. This will give you your top ten values in your matrix and the respective index. The you can use ind2sub() to get the x,y,z.

computing PCA matrix for set of sift descriptors

I want to compute a general PCA matrix for a dataset, and I will use it to reduce dimensions of sift descriptors. I have already found some algorithms to compute it, but I couldn't find a way to compute it by using MATLAB.
Can someone help me?
[coeff, score] = princomp(X)
is the right thing to do, but knowing how to use it is a little tricky.
My understanding is that you did something like:
sift_image = sift_fun(img)
which gives you a binary image: sift_feature?
(Even if not binary, this still works.)
Inputs, formulating X:
To use princomp/pca formulate X so that each column is a numel(sift_image) x 1 vector (i.e. sift_image(:))
Do this for all your images and line them up as columns in X. So X will be numel(sift_image) x num_images.
If your images aren't the same size (e.g. pixel dimensions different, more or less of a scene in the images), then you'll need to bring them into some common space, which is a whole different problem.
Unless your stuff is binary, you'll probably want to de-mean/normalize X, both in the column direction (i.e. normalizing each individual image) and row direction (de-meaning the whole dataset).
Outputs
score is the set of eigen vectors: it will be num_pixels * num_images.
To get, say the first eigen vector back into an image shape, do:
first_component = reshape(score(:,1),size(im));
And so on for the rest of the components. There are as many components as input images.
Each row of coeff is the num_images (equal to num_components) set of weights that can be applied to generate each input image. i.e.
input_image_1 = reshape(score * coeff(:,1) , size(original_im));
where input_image_1 is the correct, original shape
coeff(1,:) is a vector (num_images x 1)
score is pixels x num_images
(Disclaimer: I may have the columns/rows mixed up, but the descriptions are correct.)
Does that help?
If you have access to Statistics Toolbox, you can use the command princomp, or in recent versions the command pca.

Limit data values displayed in MATLAB histogram

I have a vector that I want to print a histogram of of data for. This data ranges from -100 to +100. The amount of data around the outer edges is insignificant and therefore I don't want to see it. I am most interested in showing data from -20 to +20.
1.) How can I limit that window to print on my histogram?
The amount of data I have at 0 outnumbers of the amount of data I have anywhere in the dataset by a minimum of 10:1. When I print the histogram, the layout of element frequency is lost because it is outnumbered by 0.
2.) Is there a way that I can scale the number of 0 values to be three times the number of -1 entries?
I am expecting an exponential drop of this dataset (in general) and therefore three times the frequency of -1 would easily allow me to see the frequency of the other data.
You can use something like
binCenters = -20:5:20;
[N,X] = hist(V,binCenters);
N = N./scalingVector;
bar(X(2:end-1),N(2:end-1));
Note that the code excludes the extremes of N and X from the bar plot, since they contain the number of values smaller than -20 and larger than 20. Also, by building scalingVector appropriately, you can scale N as you please.
You could also just toss out any values outside the [-20,20] range by using
subsetData=data(abs(data)<=20)
1) You can limit the histogram range you see on the plot by just setting the X axes limits:
xlim([-20 20])
Setting bins in hist command is good, but remember thatall the values outside the bins will fall into the most left and right bin. So you will need to set the axes limits anyway.
2) If there is a big difference between values in different bins, one way is to transform values on Y axes to log scale. Unfortunately just setting Y axes to log (set(gca,'YScale','log')) does not work for bar plot. Calculate the histogram with hist or histc (depending on whether you want to specify bins centers or edges) and log2 the values:
[y, xbin] = hist(data);
bar(xbin, log2(y) ,'hist')
Histogram has a few different methods of calling it. I strongly recommend you read the documentation on the function (doc hist)
What you are looking for is to put in a custom range in the histogram bin. It depends a bit on how many bins you want, but something like this will work.
Data=randn(1000,1)*20;
hist(Data,-20:20);
You could, if you want to, change the frequency of the binning as well. You could also change the axis so that you only focus on the range from -20 to 20, using a xaxis([-20 20]) command. You could also ignore the bin at 0, by using an yaxis and limiting the values to exclude the 0 bin. Without knowing what you want exactly, I can only give you suggestions.

normalize mat file in matlab

I have a mat file with a structure that looks like this:
How do I normalize the data and save it as a .dat file (ascii)
I assume that you want to normalize each column.
There are two ways you can normalize:
(1) Set minimum to 0 and maximum to 1
dataset = bsxfun(#minus,dataset,min(dataset));
dataset = bsxfun(#rdivide,dataset,max(dataset));
(2) Set average to zero, standard deviation to 1 (if you don't have the Statistics Toolbox, use mean and std to subtract and divide, respectively, as above).
dataset = zscore(dataset);
EDIT
Why anyone ever use option 2 to normalize?
When you calculate the difference (dissimilarity) between different data points, you may want to weigh the different dimensions equally. Since dimensions with large variance will dominate the dissimilarity measure, you normalize the variance to one.
Your normalization:
dataset = dataset-ones(size(dataset,1),1)*min(dataset) % subtract min
dataset = dataset ./ (ones(size(dataset,1),1)*max(dataset)+eps) % divide by max