Otsu's Thresholding Implementation not working properly - matlab

I have implemented otsu's threshold implementation which segments an image into foreground and background image.The output of my implementation seems off from the desired one. Any thoughts on it? Thanks in advance! Would really appreciate it someone can tell me how do i resolve the issue.
My output:
For image 1-
For image 2-
My Code:
im1=imread('D:\root-image.pgm');
% im1=rgb2gray(im1);
[n,m]=size(im1);
hst=imhist(im1);
mu=zeros(255,1);
N=0;
for i=1:255
N=N+hst(i);
end
% The total mean level of the original image
for i=1:255
mu(i)=mu(i)+((i.*hst(i))./N);
end
for T=1:254
qb=0;
muT=0;
qo=0;
for i=1:T
qb=qb+(hst(i)./N); % probability of class occurence (background)
m=m+((i.*hst(i))./N);% probability of class mean (background)
end
for i=T+1:255
qo=qo+(hst(i)./N);% probability of class occurence (object)
end
sigma(T)=((mu(T)-(qb*muT))^2)/(qb*qo);
end
[Y,T] = max(sigma);
[n,m]=size(im1);
for i=1:n
for j=1:m
if im1(i,j)>T
im(i,j)=1;
else
im(i,j)=0;
end
end
end
figure(1);
subplot(1,2,1);
imshow(im1);
% subplot(1,3,2);
% imhist(im1);
subplot(1,2,2);
imshow(im);

There are a few issues with your code, and I'll outline where it's wrong:
I'm simply nitpicking, but you can count the total number of pixels (N) by using numel... it's just cleaner :)
In your original code, you are checking for the right threshold between 1 and 254. You really should be checking from 0 to 255, as there are 256 possible intensities in your image.
You also need to change your sigma declaration so that there are 256 elements, not 255. Remember, there are 256 possible intensities in your image.
Within your for loop for checking each intensity, when you are calculating the probability of class occurrences, you need to check for intensity 0 as well. Because of the fact that MATLAB starts indexing arrays at 1, you'll need to offset your access index so that you're starting at 1.
Your definition of the variance between the object and background is slightly off. You need to also calculate the probability of the class mean for the object as well. You can check the code for more details.
Your probability of class mean definitions are slightly inaccurate. You need to divide by qb and qo, not N.
You are using m to calculate then mean when you should be storing it in muT.
Finally, when you find the maximum of the variance between the object and background, you need to subtract by 1, as this will provide an intensity between 0 and 255.
As such, this is what your code looks like. Note that I have omitted the code that thresholds your image. I'm only providing the code that calculates the threshold for your image.
hst=imhist(im1);
sigma = zeros(256,1); %// Change
N = numel(im1); %// Change
for T=0:255 %// Change
qb=0;
muT=0;
qo=0;
muQ=0; %// Change
for i=0:T %// Change
qb=qb+(hst(i+1)./N); % probability of class occurence (background)
end
for i=T+1:255
qo=qo+(hst(i+1)./N);% probability of class occurence (object)
end
for i=0:T%// Change
muT=muT+((i.*hst(i+1))./qb);% probability of class mean (background)
end
for i=T+1:255 %// Change
muQ=muQ+((i.*hst(i+1))./qo);% probability of class mean (object)
end
sigma(T+1) = qb*qo*((muT-muQ)^2); %// Change
end
[Y,T] = max(sigma);
T = T-1; %// Change - For 0 to 255
This code should now work. I ran this code with my own implementation of Otsu, and I get the same calculated threshold. To be honest, I find this code to be rather inefficient due to the many for loops. What I would personally do is vectorize it, but I'll leave that to you as a learning exercise :)
Edit
OK, I'll give in. Here's some code that I wrote for Otsu that is more vectorized. This essentially performs what you are doing above, but in a more vectorized manner. You're more than welcome to use it for your own purposes, but do cite me if you intend to use it of course :)
%// Function that performs Otsu's adaptive bimodal thresholding
%// Written by Raymond Phan - Version 1.0
%// Input - im - Grayscale image
%// Output - out - Thresholded image via Otsu
function [out] = otsu(im)
%// Compute histogram
J = imhist(im);
%// Total number of intensities
L = length(J);
%// Some pre-processing stuff
%// Determine total number of pixels
num_pixels = sum(J(:));
%// Determine PDF
pdf = J / num_pixels;
%// Storing between-class variances for each intensity
s_b = zeros(1,L);
for idx = 1 : L
%// Calculate w_0
w_0 = sum(pdf(1:idx));
%// Calculate w_1
w_1 = 1 - w_0;
%// Calculate u_0
u_0 = sum((0:idx-1)'.*pdf(1:idx)) / w_0;
%// Calculate u_1
u_1 = sum((idx:L-1)'.*pdf(idx+1:L)) / w_1;
% // Calculate \sigma_b^{2}
s_b(idx) = w_0*w_1*((u_1 - u_0)^2);
end
%// Find intensity that provided the biggest variance
[max_s_b T] = max(s_b);
T = T - 1; %// Must subtract by 1, since the index starts from 1
%// Now create an output image that thresholds the input image
out = im >= T;
end
Edits by Divakar
Divakar (thanks!) has created vectorized code to replace the loop portion of the above function code and this essentially gets rid of the pre-allocation for s_b:
w_0 = cumsum(pdf);
w_1 = 1 - w_0;
u_0 = cumsum((0:L-1)'.*pdf)./w_0;
u_1 = flipud([0 ; cumsum((L-1:-1:1)'.*pdf((L:-1:2)))])./w_1;
s_b = w_0.*w_1.*((u_1 - u_0).^2);

Related

How to use the randn function in Matlab to create an array of values (range 0-10) of size 1,000 that follows a Gaussian distribution? [duplicate]

Matlab has the function randn to draw from a normal distribution e.g.
x = 0.5 + 0.1*randn()
draws a pseudorandom number from a normal distribution of mean 0.5 and standard deviation 0.1.
Given this, is the following Matlab code equivalent to sampling from a normal distribution truncated at 0 at 1?
while x <=0 || x > 1
x = 0.5 + 0.1*randn();
end
Using MATLAB's Probability Distribution Objects makes sampling from truncated distributions very easy.
You can use the makedist() and truncate() functions to define the object and then modify (truncate it) to prepare the object for the random() function which allows generating random variates from it.
% MATLAB R2017a
pd = makedist('Normal',0.5,0.1) % Normal(mu,sigma)
pdt = truncate(pd,0,1) % truncated to interval (0,1)
sample = random(pdt,numRows,numCols) % Sample from distribution `pdt`
Once the object is created (here it is pdt, the truncated version of pd), you can use it in a variety of function calls.
To generate samples, random(pdt,m,n) produces a m x n array of samples from pdt.
Further, if you want to avoid use of toolboxes, this answer from #Luis Mendo is correct (proof below).
figure, hold on
h = histogram(cr,'Normalization','pdf','DisplayName','#Luis Mendo samples');
X = 0:.01:1;
p = plot(X,pdf(pdt,X),'b-','DisplayName','Theoretical (w/ truncation)');
You need the following steps
1. Draw a random value from uniform distribution, u.
2. Assuming the normal distribution is truncated at a and b. get
u_bar = F(a)*u +F(b) *(1-u)
3. Use the inverse of F
epsilon= F^{-1}(u_bar)
epsilon is a random value for the truncated normal distribution.
Why don't you vectorize? It will probably be faster:
N = 1e5; % desired number of samples
m = .5; % desired mean of underlying Gaussian
s = .1; % desired std of underlying Gaussian
lower = 0; % lower value for truncation
upper = 1; % upper value for truncation
remaining = 1:N;
while remaining
result(remaining) = m + s*randn(1,numel(remaining)); % (pre)allocates the first time
remaining = find(result<=lower | result>upper);
end

Image histogram implementation with Matlab

I'm tyring to implement (I know there's a custom function for achieving it) the grayscale image histogram in Matlab, so far I've tried:
function h = histogram_matlab(imageSource)
openImage = rgb2gray(imread(imageSource));
[rows,cols] = size(openImage);
histogram_values = [0:255];
for i = 1:rows
for j = 1:cols
p = openImage(i,j);
histogram_values(p) = histogram_values(p) + 1;
end
end
histogram(histogram_values)
However when I call the function, for example: histogram_matlab('Harris.png')
I obtain some graph like:
which is obviously not what I expect, x axis should go from 0 to 255 and y axis from 0 to whatever max value is stored in histogram_values.
I need to obtain something like what imhist offers:
How should I set it up? Am I doing a bad implementation?
Edit
I've changed my code to improvements and corrections suggested by #rayryeng:
function h = histogram_matlab(imageSource)
openImage = rgb2gray(imread(imageSource));
[rows,cols] = size(openImage);
histogram_values = zeros(256,1)
for i = 1:rows
for j = 1:cols
p = double(openImage(i,j)) +1;
histogram_values(p) = histogram_values(p) + 1;
end
end
histogram(histogram_values, 0:255)
However the histogram plot is not that expected:
Here it's noticeable that there's some issue or error on y axis as it definitely would reach MORE than 2.
In terms of calculating the histogram, the computation of the frequency per intensity is correct though there is a slight error... more on that later. Also, I would personally avoid using loops here. See my small note at the end of this post.
Nevertheless, there are three problems with your code:
Problem #1 - Histogram is not initialized properly
histogram_values should contain your histogram, yet you are initializing the histogram by a vector of 0:255. Each intensity value should start with a count of 0, and so you actually need to do this:
histogram_values = zeros(256,1);
Problem #2 - Slight error in for loop
Your intensities range from 0 to 255, yet MATLAB starts indexing at 1. If you ever get intensities that are 0, you will get an out-of-bounds error. As such, the proper thing to do is to take p and add it with 1 so that you start indexing at 1. However, one intricacy I need to point out is that if you have a uint8 precision image, adding 1 to an intensity of 255 will simply saturate the value to 255. It won't go to 256.... so it's also prudent that you cast to something like double to ensure that 256 will be reached.
Therefore:
histogram_values = zeros(256,1);
for i = 1:rows
for j = 1:cols
p = double(openImage(i,j)) + 1;
histogram_values(p) = histogram_values(p) + 1;
end
end
Problem #3 - Not calling histogram right
You should override the behaviour of histogram and include the edges. Basically, do this:
histogram(histogram_values, 0:255);
The second vector specifies where we should place bars on the x-axis.
Small note
You can totally implement the histogram computation yourself without any for loops. You can try this with a combination of bsxfun, permute, reshape and two sum calls:
mat = bsxfun(#eq, permute(0:255, [1 3 2]), im);
h = reshape(sum(sum(mat, 2), 1), 256, 1);
If you'd like a more detailed explanation of how this code works under the hood, see this conversation between kkuilla and myself: https://chat.stackoverflow.com/rooms/81987/conversation/explanation-of-computing-an-images-histogram-vectorized
However, the gist of it as below.
The first line of code creates a 3D vector of 1 column that ranges from 0 to 255 by permute, and then using bsxfun with the eq (equals) function, we use broadcasting so that we get a 3D matrix where each slice is the same size as the grayscale image and gives us locations that are equal to an intensity of interest. Specifically, the first slice tells you where elements are equal to 0, the second slice tells you where elements are equal to 1 up until the last slice where it tells you where elements are equal to 255.
For the second line of code, once we compute this 3D matrix, we compute two sums - first summing each row independently, then summing each column of this intermediate result. We then get the total sum per slice which tells us how many values there were for each intensity. This is consequently a 3D vector, and so we reshape this back into a single 1D vector to finish the computation.
In order to display a histogram, I would use bar with the histc flag. Here's a reproducible example if we use the cameraman.tif image:
%// Read in grayscale image
openImage = imread('cameraman.tif');
[rows,cols] = size(openImage);
%// Your code corrected
histogram_values = zeros(256,1);
for i = 1:rows
for j = 1:cols
p = double(openImage(i,j)) + 1;
histogram_values(p) = histogram_values(p) + 1;
end
end
%// Show histogram
bar(0:255, histogram_values, 'histc');
We get this:
Your code looks correct. The problem is with the call to histogram. You need to supply the number of bins in the call to histogram, otherwise they will be computed automatically.
Try this simple modification which calls stem to get the right plot, instead of relying on histogram
function h = histogram_matlab(imageSource)
openImage = rgb2gray(imread(imageSource));
[rows,cols] = size(openImage);
histogram_values = [0:255];
for i = 1:rows
for j = 1:cols
p = openImage(i,j);
histogram_values(p) = histogram_values(p) + 1;
end
end
stem(histogram_values); axis tight;
EDIT: After some inspection of the code you have a 0/1 error. If you have a pixel of value zero then histogram_value(p) will give you an index error
Try this instead. No need for vectorization for this simple case:
function hv = histogram_matlab_vec(I)
assert(isa(I,'uint8')); % for now we assume uint8 with range [0, 255]
hv = zeros(1,256);
for i = 1 : numel(I)
p = I(i);
hv(p + 1) = hv(p + 1) + 1;
end
stem(hv); axis tight;
end

How to make a vector that follows a certain trend?

I have a set of data with over 4000 points. I want to exclude grooves from them, ideally from the point from which they start. The data look for example like this:
The problem with this is the noise I get at the top of the plateaus. I have an idea, in which I would take an average value of the most common within some boundaries (again, ideally sth like the red line here:
and then I would construct a temporary matrix, which would fill up one by one with Y if they are less than this average. If the Y(i) would rise above average, the matrix would find its minima and compare it with the global minima. If the temporary matrix's minima wouldn't be sth like 80% of the global minima, it would be discarded as noise.
I've tried using mean(Y), interpolating and fitting it in a polynomial (the green line) - none of those method would cut it to the point I would be satisfied.
I need this to be extremely robust and it doesn't need to be quick. The top and bottom values can vary a lot, as well as the shape of the plateaus. The groove width is more or less the same.
Do you have any ideas? Again, the point is to extract the values that would make the groove.
How about a median filter?
Let's define some noisy data similar to yours, and plot it in blue:
x = .2*sin((0:9999)/1000); %// signal
x(1000:1099) = x(1000:1099) + sin((0:99)/50*pi); %// noise: spike
x(5000:5199) = x(5000:5199) - sin((0:199)/100*pi); %// noise: wider spike
x = x + .05*sin((0:9999)/10); %// noise: high-freq ripple
plot(x)
Now apply the median filter (using medfilt2 from the Image Processing Toolbox) and plot in red. The parameter k controls the filter memory. It should chosen to be large compared to noise variations, and small compared to signal variations:
k = 500; %// filter memory. Choose as needed
y = medfilt2(x,[1 k]);
hold on
plot(y, 'r', 'linewidth', 2)
In case you don't have the image processing toolbox and can't use medfilt2 a method that's more manual. Skip the extreme values, and do a curve fit with sin1 as curve type. Note that this will only work if the signal is in fact a sine wave!
x = linspace(0,3*pi,1000);
y1 = sin(x) + rand()*sin(100*x).*(mod(round(10*x),5)<3);
y2 = 20*(mod(round(5*x),5) == 0).*sin(20*x);
y = y1 + y2; %// A messy sine-wave
yy = y; %// Store the messy sine-wave
[~, idx] = sort(y);
y(idx(1:round(0.15*end))) = y(idx(round(0.15*end))); %// Flatten out the smallest values
y(idx(round(0.85*end):end)) = y(idx(round(0.85*end)));%// Flatten out the largest values
[foo goodness output] = fit(x.',y.', 'sin1'); %// Do a curve fit
plot(foo,x,y) %// Plot it
hold on
plot(x,yy,'black')
Might not be perfect, but it's a step in the right direction.

simulating a random walk in matlab

i have variable x that undergoes a random walk according to the following rules:
x(t+1)=x(t)-1; probability p=0.3
x(t+1)=x(t)-2; probability q=0.2
x(t+1)=x(t)+1; probability p=0.5
a) i have to create this variable initialized at zero and write a for loop for 100 steps and that runs 10000 times storing each final value in xfinal
b) i have to plot a probability distribution of xfinal (a histogram) choosing a bin size and normalization!!* i have to report the mean and variance of xfinal
c) i have to recreate the distribution by application of the central limit theorem and plot the probability distribution on the same plot!
help would be appreciated in telling me how to choose the bin size and normalize the histogram and how to attempt part c)
your help is much appreciated!!
p=0.3;
q=0.2;
s=0.5;
numberOfSteps = 100;
maxCount = 10000;
for count=1:maxCount
x=0;
for i = 1:numberOfSteps
random = rand(1, 1);
if random <=p
x=x-1;
elseif random<=(p+q)
x=x-2;
else
x=x+1;
end
end
xfinal(count) = x;
end
[f,x]=hist(xfinal,30);
figure(1)
bar(x,f/sum(f));
xlabel('xfinal')
ylabel('frequency')
mean = mean(xfinal)
variance = var(xfinal)
For the first question, check the help for hist on mathworks homepage
[nelements,centers] = hist(data,nbins);
You do not select the bin size, but the number of bins. nelements gives the elements per bin and center is all the bin centers. So to say, it would be the same to call
hist(data,nbins);
as
[nelements,centers] = hist(data,nbins);
plot(centers,nelements);
except that the representation is different (line or pile). To normalize, simply divide nelements with sum(nelements)
For c, here i.i.d. variables it actually is a difference if the variables are real or complex. However for real variables the central limit theorem in short tells you that for a large number of samples the distribution will limit the normal distribution. So if the samples are real, you simply asssumes a normal distribution, calculates the mean and variance and plots this as a normal distribution. If the variables are complex, then each of the variables will be normally distributed which means that you will have a rayleigh distribution instead.
Mathworks is deprecating hist that is being replaced with histogram.
more details in this link
You are not applying the PDF function as expected, the expression Y doesn't work
For instance Y does not have the right X-axis start stop points. And you are using x as input to Y while x already used as pivot inside the double for loop.
When I ran your code Y generates a single value, it is not a vector but just a scalar.
This
bar(x,f/sum(f));
bringing down all input values with sum(f) division? no need.
On attempting to overlap the ideal probability density function, often one has to do additional scaling, to have both real and ideal visually overlapped.
MATLAB can do the scaling for us, and no need to modify input data /sum(f).
With a dual plot using yyaxis
You also mixed variance and standard deviation.
Instead try something like this
y2=1 / sqrt(2*pi*var1)*exp(-(x2-m1).^2 / (2*var1))
ok, the following solves your question(s)
codehere
clear all;
close all;
clc
p=0.3; % thresholds
q=0.2;
s=0.5;
n_step=100;
max_cnt=10000;
n_bin=30; % histogram amount bins
xf=zeros(1,max_cnt);
for cnt=1:max_cnt % runs loop
x=0;
for i = 1:n_step % steps loop
t_rand1 = rand(1, 1);
if t_rand1 <=p
x=x-1;
elseif t_rand1<=(p+q)
x=x-2;
else
x=x+1;
end
end
xf(cnt) = x;
end
% [f,x]=hist(xf,n_bin);
hf1=figure(1)
ax1=gca
yyaxis left
hp1=histogram(xf,n_bin);
% bar(x,f/sum(f));
grid on
xlabel('xf')
ylabel('frequency')
m1 = mean(xf)
var1 = var(xf)
s1=var1^.5 % sigma
%applying central limit theorem %finding the mean
n_x2=1e3 % just enough points
min_x2=min(hp1.BinEdges)
max_x2=max(hp1.BinEdges)
% quite same as
min_x2=hp1.BinLimits(1)
max_x2=hp1.BinLimits(2)
x2=linspace(min_x2,max_x2,n_x2)
y2=1/sqrt(2*pi*var1)*exp(-(x2-m1).^2/(2*var1));
% hold(ax1,'on')
yyaxis right
plot(ax1,x2,y2,'r','LineWidth',2)
.
.
.
note I have not used these lines
% Xp=-1; Xq=-2; Xs=1; mu=Xp.*p+Xq.*q+Xs.*s;
% muN=n_step.*mu;
%
% sigma=(Xp).^2.*p+(Xq).^2.*q+(Xs).^2.s; % variance
% sigmaN=n_step.(sigma-(mu).^2);
People ususally call sigma to variance^.5
This supplied script is a good start point to now take it to wherever you need it to go.

Dynamic Time Warping for geology time series, Matlab

Background:
Basically I'm using a dynamic time warping algorithm like used in speech recognition to try to warp geological data (filter out noise from environmental conditions) The main difference between these two problems is that dtw prints a warping function that allows both vectors that are input to be warped, whereas for the problem I'm trying to solve I need to keep one reference vector constant while stretching and shrinking the test variable vector to fit.
here is dtw in matlab:
function [Dist,D,k,w]=dtw()
%Dynamic Time Warping Algorithm
%Dist is unnormalized distance between t and r
%D is the accumulated distance matrix
%k is the normalizing factor
%w is the optimal path
%t is the vector you are testing against
%r is the vector you are testing
[t,r,x1,x2]=randomtestdata();
[rows,N]=size(t);
[rows,M]=size(r);
%for n=1:N
% for m=1:M
% d(n,m)=(t(n)-r(m))^2;
% end
%end
d=(repmat(t(:),1,M)-repmat(r(:)',N,1)).^2; %this replaces the nested for loops from above Thanks Georg Schmitz
D=zeros(size(d));
D(1,1)=d(1,1);
for n=2:N
D(n,1)=d(n,1)+D(n-1,1);
end
for m=2:M
D(1,m)=d(1,m)+D(1,m-1);
end
for n=2:N
for m=2:M
D(n,m)=d(n,m)+min([D(n-1,m),D(n-1,m-1),D(n,m-1)]);
end
end
Dist=D(N,M);
n=N;
m=M;
k=1;
w=[];
w(1,:)=[N,M];
while ((n+m)~=2)
if (n-1)==0
m=m-1;
elseif (m-1)==0
n=n-1;
else
[values,number]=min([D(n-1,m),D(n,m-1),D(n-1,m-1)]);
switch number
case 1
n=n-1;
case 2
m=m-1;
case 3
n=n-1;
m=m-1;
end
end
k=k+1;
w=cat(1,w,[n,m]);
end
w=flipud(w)
%w is a matrix that looks like this:
% 1 1
% 1 2
% 2 2
% 3 3
% 3 4
% 3 5
% 4 5
% 5 6
% 6 6
so what this is saying is that the both the first and second points of the second vector should be mapped to the first point of the first vector. i.e. 1 1
1 2
and that the fifth and sixth points on the first vector should be mapped to the second vector at point six. etc. so w contains the x coordinates of the warped data.
Normally I would be able to say
X1=w(:,1);
X2=w(:,2);
for i=1:numel(reference vector)
Y1(i)=reference vector(X1(i));
Y2(i)=test vector(X2(i));
end
but I need not to stretch the reference vector so I need to use the repeats in X1 to know how to shrink Y2 and the repeats in X2 to know how to stretch Y2 rather than using repeats in X1 to stretch Y1 and repeats in X2 to stretch Y2.
I tried using a find method to find the repeats in both X1 and X2 and then average(shrink) or interpolate linearly(stretch) as needed but the code became very complicated and difficult to debug.
Was this really unclear? I had a hard time explaining this problem, but I just need to know how to take w and create a Y2 that is stretched and shrunk accordingly.
First, here's DTW in Matlab translated from the pseudocode on wikipedia:
t = 0:.1:2*pi;
x0 = sin(t) + rand(size(t)) * .1;
x1 = sin(.9*t) + rand(size(t)) * .1;
figure
plot(t, x0, t, x1);
hold on
DTW = zeros(length(x0), length(x1));
DTW(1,:) = inf;
DTW(:,1) = inf;
DTW(1,1) = 0;
for i0 = 2:length(x0)
for i1 = 2:length(x1)
cost = abs(x0(i0) - x1(i1));
DTW(i0, i1) = cost + min( [DTW(i0-1, i1) DTW(i0, i1-1) DTW(i0-1, i1-1)] );
end
end
Whether you are warping x_0 onto x_1, x_1 onto x_0, or warping them onto each other, you can get your answer out of the matrix DTW. In your case:
[cost, path] = min(DTW, [], 2);
plot(t, x1(path));
legend({'x_0', 'x_1', 'x_1 warped to x_0'});
I don't have an answer but I have been playing with the code of #tokkot implemented from the pseudocode in the Wikipedia article. It works, but I think it lacks three requeriments of DTW:
The first and last points of both sequences must be a match, with the use of min(), some (or many) of the first and ending points of one of the sequences are lost.
The output sequence is not monotonically increasing. I have used x1(sort(path)) instead, but I don't believe it is the real minimum distance.
Additionally, for a reason I haven't found yet, some intermediate points of the warped sequences are lost, which I believe is not compatible with DTW.
I'm still searching for an algorithm like DTW in which one of the sequences is fixed (not warped). I need to compare a time series of equally spaced temperature measurements with another sequence. The first one cannot be time shifted, it does not make sense.