I have a mat file with a structure that looks like this:
How do I normalize the data and save it as a .dat file (ascii)
I assume that you want to normalize each column.
There are two ways you can normalize:
(1) Set minimum to 0 and maximum to 1
dataset = bsxfun(#minus,dataset,min(dataset));
dataset = bsxfun(#rdivide,dataset,max(dataset));
(2) Set average to zero, standard deviation to 1 (if you don't have the Statistics Toolbox, use mean and std to subtract and divide, respectively, as above).
dataset = zscore(dataset);
EDIT
Why anyone ever use option 2 to normalize?
When you calculate the difference (dissimilarity) between different data points, you may want to weigh the different dimensions equally. Since dimensions with large variance will dominate the dissimilarity measure, you normalize the variance to one.
Your normalization:
dataset = dataset-ones(size(dataset,1),1)*min(dataset) % subtract min
dataset = dataset ./ (ones(size(dataset,1),1)*max(dataset)+eps) % divide by max
Related
I pulled histograms from images on matlab, than I want to compare the histograms using KL-divergence.
I found this script but I do not understand how I could apply it to my case.
So here I pull my histogram (pretty simple!!):
[N,X]=hist(I,n);
[N1,X1]=hist(I1,n);
KLDiv(N,N1)
% ans=inf
N is the histogram of my image I
Like you can see my result is inf...
Please can you tell me in my case how to use the script?
Thanks
You probably want to calculate the histogram of an image using imhist, instead of the columnwise calculation of the histogram:
I1 = rand(10);
I2 = rand(10);
[N1, X1] = imhist(I1, 10); % limit the number of bins to avoid zero values
[N2, X2] = imhist(I2, 10);
KLDiv(N1.', N2.') % convert to row vectors to correspond with the requested format
KLDiv(N1.', N1.') % the divergence of an histogram with itself is indeed zero
Note that I limited the number of bins to be sure that each bin has at least one point, because the Kullback-Leibler divergence is not defined if Q(i) is zero and P(i) not:
The Kullback–Leibler divergence is defined only if Q(i)=0 implies
P(i)=0, for all i (absolute continuity).
Notes
Range of Kullback–Leibler divergence?
Any positive number, zero if (and only if) they are equal: KLD >= 0.
To which base should I take the logarithm? Natural logarithm log or base 2 logarithm log2?
Note that it is just a matter of scaling your results. So in fact, it doesn't matter, but be sure to use the same logarithm if you want to compare your results. Wikipedia suggests the following:
logarithms in these formulae are taken to base 2 if information is
measured in units of bits, or to base e if information is measured in
nats.
What test can I do in MATLAB to test the spread of a histogram? For example, in the given set of histograms, I am only interested in 1,2,3,5 and 7 (going from left to right, top to bottom) because they are less spread out. How can I obtain a value that will tell me if a histogram is positively skewed?
It may be possible using Chi-Squared tests but I am not sure what the MATLAB code will be for that.
You can use the standard definition of skewness. In other words, you can use:
You compute the mean of your data and you use the above equation to calculate skewness. Positive and negative skewness are like so:
Source: Wikipedia
As such, the larger the value, the more positively skewed it is. The more negative the value, the more negatively skewed it is.
Now to compute the mean of your histogram data, it's quite simple. You simply do a weighted sum of the histogram entries and divide by the total number of entries. Given that your histogram is stored in h, the bin centres of your histogram are stored in x, you would do the following. What I will do here is assume that you have bins from 0 up to N-1 where N is the total number of bins in the histogram... judging from your picture:
x = 0:numel(h)-1; %// Judging from your pictures
num_entries = sum(h(:));
mu = sum(h.*x) / num_entries;
skew = ((1/num_entries)*(sum((h.*x - mu).^3))) / ...
((1/(num_entries-1))*(sum((h.*x - mu).^2)))^(3/2);
skew would contain the numerical measure of skewness for a histogram that follows that formula. Therefore, with your problem statement, you will want to look for skewness numbers that are positive and large. I can't really comment on what threshold you should look at, but look for positive numbers that are much larger than most of the histograms that you have.
I have a matrix of data X where rows are time stamps and columns are measurements. I can easily find the lowest sum path through the matrix by:
[r c]=size(X)
for w=1:r
Y(w)=min(X(w,:))
end
result = sum(Y)
this is useful, but what would be really useful is if there were a function that could tell me different paths for a specified frequency. For example if i group 2 rows together this halves the frequency...... If there was a function that could find me different paths with varying frequencies for a specified tolerance then rank them this would be perfect!
A lot to ask but there must be a statistical or mathematical tool that does this......
Not sure if I entirely understand the question, but if I read what you want this should do the trick for a fixed frequency:
frequency = 2;
r = size(X,1);
Y = zeros(r,1);
for w=1:frequency:r
Y(w)=min(min(X(w:w+frequency-1,:)))
end
result = sum(Y)
You can loop over frequencies to find the best path length for each frequency.
Note that finding the optimal path with varying frequencies (so for example first 2 then 3 then 2 again) would be a completely different problem. I think this is much more complex and that you may want to look into linear programming.
I am trying to make a histogram in matlab. My data size is huge (3.5 million), x and y data are the same size (both are 3.5 million)
My original data has 200,200,88 3D matrix, I reshaped it to 1 column
the code for this:
[dose , size] = Dose('C:\R1')
s = size(1)*size(2).size(3)
t = reshape(dose, s, [])
When I try the command hist(t), I got a 1 bar only.
My workspace is as the following:
dose <200x200x88 double>
s 3520000
size [200,200,88]
t <3520000x1 double>
Could you tell me how to make a histogram with this data?
I'm able to generate a vector of size 3520000x1 and build a histogram with it.
val=rand(3520000,1);
hist(val)
It's possible your data has a few singular outliers causing your bins to look something like (1,0,0,...,3519999).
If you save your histogram bins like h=hist(data); you can see what happened.
In order to get a single long vector from your 3D array you can use just the (:) operator. Try the following code:
num_of_bins = 100 ; %change to whatever # you want
hist(dose(:),linspace(min(dose(:)),max(dose(:)),num_of_bins));
The hist will take only the relevant limits of dose (min to max) and you can control the # of bins at will. I've used linspace to create a linearly spaced bin vector, but this can be modified also to a different set of bins by assigning a different range vector.
I have a requirement for the generation of a given number N of vectors of given size each consistent of a uniform distribution of 0s and 1s.
This is what I am doing at the moment, but I noticed that the distribution is strongly peaked at half 1s and half 0s, which is no good for what I am doing:
a = randint(1, sizeOfVector, [0 1]);
The unifrnd function looks promising for what I need, but I can't manage to understand how to output a binary vector of that size.
Is it the case that I can use the unifrnd function (and if so how would be appreciated!) or can is there any other more convenient way to obtain such a set of vectors?
Any help appreciated!
Note 1: just to be sure - here's what the requirement actually says:
randomly choose N vectors of given size that are
uniformly distributed over [0;1]
Note 2: I am generating initial configurations for Cellular Automata, that's why I can have only binary values [0;1].
To generate 1 random vector with elements from {0, 1}:
unidrnd(2,sizeOfVector,1)-1
The other variants are similar.
If you want to get uniformly distributed 0's and 1's, you want to use randi. However, from the requirement, I'd think that the vectors can have real values, for which you'd use rand
%# create a such that each row contains a vector
a = randi(1,N,sizeOfVector); %# for uniformly distributed 0's and 1's
%# if you want, say, 60% 1's and 40% 0's, use rand and threshold
a = rand(N,sizeOfVector) > 0.4; %# maybe you need to call double(a) if you don't want logical output
%# if you want a random number of 1's and 0's, use
a = rand(N,sizeOfVector) > rand(1);