Niblack algorithm for Document binarization - matlab

i've this photo :
and i'm trying to make Document binarization using niblack algorithm
i've implemented the simple Niblack algorithm
T = mean + K* standardDiviation
and that was it's result:
the problem is there's some parts of the image in which the window doesn't contain any objects so it detects the noise as objects and elaborates them .
i tried to apply blurring filter then global thresholding
that was the result :
which wont be solved by any other filter
i guess the only solution is preventing the algorithm from detecting global noise if the window i free from object
i'm interested to do this using niblack algorithm not using other algorithm so any suggestions ?

i tried sauvola algorithm in this paper Adaptive document image binarization J. Sauvola*, M. PietikaKinen section 3.3
it's a modified version of niblack algorithm which uses a modified equation of niblack
which returned a pretty good answers :
as well as i tried another modification of Niblack which is implemented in this paper
in the 5.5 Algorithm No. 9a: Université de Lyon, INSA, France (C. Wolf, J-M Jolion)
which returned a good results as well :

Did you look here: https://stackoverflow.com/a/9891678/105037
local_mean = imfilter(X, filt, 'symmetric');
local_std = sqrt(imfilter(X .^ 2, filt, 'symmetric'));
X_bin = X >= (local_mean + k_threshold * local_std);
I don't see many options here if you insist to use niblack. You can change the size and type of the filter, and the threshold.
BTW, it seems that your original image has colors. This information can significantly improve black text detection.

There are range of methods that can help in this situation:
Of course, you can change algorithm it self =)
Also it is possible just apply morphology filters: first you apply maximum in the window, and after - minimum. You should tune windows size to achieve a better result, see wiki.
You can choose the hardest but the better way and try to improve Niblack's scheme. It is necessary to increase Niblack's windows size if standard deviation is smaller than some fixed number (should be tuned).

i tried the niblack algorithm with k=-0.99 and windows=990 using optimisation:
Shafait – “Efficient Implementation of Local Adaptive Thresholding
Techniques Using Integral Images”, 2008
with : T = mean + K* standardDiviation; i have this result :
the implementation of algorithm is taken here

Related

how to transform output SVM into probability using 'Platt scaling' method?

I worked on the problem of handwritten recognition images. For this, I use support vector machines as a classifier . the matrix score shows an example of the scores returned by svm for 5 samples. the number of classes is also 5. I want to transform this matrix into probabilities.
score=[ 0,2590 -0,6033 -1,1350 -1,2347 -0,9776
-1,4727 -0,2136 -0,9649 0,1480 -1,4761
-0,9637 -0,8662 0,0674 -1,0051 -1,1293
-2,1230 -0,8805 -0,9808 -0,0520 -0,0836
-1,6976 -1,1578 -0,9205 -1,1101 1,0796]
According to research on existing methods, I found that the Platt's scaling method is most appropriate in my case. I found an implementation of this method on this link Platt scaling but the problem is that I don't understand the third parameter to enter. Please, help me to understand this implementation and to make it executable
I await your answers and thank you in advance

Conversion to binary : Matlab

I am trying to extract veins using thinning algorithm. So far i did this much of code for image enhancement and its pretty much working. But when i computed binary thresholding i am not able to identify the veins from the back ground.Due to a vague output i am not able to do further processing for thinning. Can any one tell me whats wrong in this code? or is it because the threshold has to be done in some other way.
a=imread('vein.jpg');
cform = makecform('srgb2lab');
for ii = 1:3
a(:,:,ii) = medfilt2(a(:,:,ii),[5 5]);
end
lab = applycform(a,cform);
b=lab(:,:,1);
c=im2bw(b,0.2);
neg=1-c;
color=a;
r=color(:,:,1);
r(~c)= 0;
g = color(:,:,2);
g(~c)= 0;
b = color(:,:,3);
b(~c)= 0;
color = cat(3,r,g,b);
gray=rgb2gray(color);
i1=imresize(gray,[256 256],'bilinear');
i2=histeq(i1,256);
e=medfilt2(i2,[5 5]);
figure(1),imshow(e);
f=medfilt2(e,[5 5]);
figure(2),imshow(f);
thresh_level = graythresh(g);
BW = im2bw(g, thresh_level);
figure(10),imshow(BW);
greythresh uses Otsu's method, which is a good general method that works on the distribution of intensities. However it's not ideal for every situation, particularly if you've applied lots of nonlinear processing to the image first (e.g. clipping the channels).
You could try to generate your own threshold - look at the intensity distribution or, since you have them, the distribution in the colour channels and see if there's a plausible place to separate them. You could try to model it as a mixture of Gaussians (2 seems like a good number to start with, look up gmdistribution.fit) or do some other type of clustering. Is there any information in the colour channels that you could use?
If you end up the other way - with veins that are much darker but are continuous - then you could use morphological operators on the binarised image to get it back to the expected range. Perhaps this is what the thinning algorithm does.

fit function of Matlab is really slow

Why is the fitfunction from Matlab so slow? I'm trying to fit a gauss4 so I can get the means of the gaussians.
here's my plot,
I want to get the means from the blue data and red data.
I'm fitting a gaussian there but this function is really slow.
Is there an alternative?
fa = fit(fn', facm', 'gauss4');
acm = [fa.b1 fa.b2 fa.b3 fa.b4];
a_cm = sort(acm, 'ascend');
I would apply some of the options available with fit. These include smoothing by setting SmoothingParam (your data is quite noisy, the alternative of applying a time domain filter may also help*), and setting the values of your initial parameter estimates, with StartPoint. Your fits may also not be converging because you set your tolerances (TolFun, TolX) too low, although from inspection of your fits that does not appear to be the case, in fact the opposite is likely, you probably want to increase the MaxIter and/or MaxFunEvals.
To figure out how to get going you can also try the Spectr-O-Matic toolbox. It requires Matlab 7.12. It includes a script called GaussFit.m to fit gauss4 to data, but it also uses the fit routine and provides examples on how to set and get parameters.
Note that smoothing will of course broaden your peaks, but you can subtract the contribution after the fact. The effect on the mean should not be deleterious, on the contrary, since you are presumably removing noise this should be more accurate.
In general functions will be faster if you apply it to a shorter series. Hence, if speedup is really important you could downsample.
For example, if you have a vector that you want to downsample by a factor 2: (you may need to make sure it fits first)
n = 2;
x = sin(0.01:0.01:pi);
x_downsampled = x(1:n:end)+x(2:n:end);
You will now see that x_downsampled is much smaller (and should thus be easier to process), but will still have the same shape. In your case I think this is sufficient.
To see what you got try:
plot(x)
Now you can simply process x_downsampled and map your solution, for example
f = find(x_downsampled == max(x_downsampled));
location_of_maximum = f * n;
Needless to say this should be done in combination with the most efficient options that the fit function has to offer.

function parameters in matlab wander off after curve fitting

first a little background. I'm a psychology student so my background in coding isn't on par with you guys :-)
My problem is as follow and the most important observation is that curve fitting with 2 different programs gives completly different results for my parameters, altough my graphs stay the same. The main program we have used to fit my longitudinal data is kaleidagraph and this should be seen as kinda the 'golden standard', the program I'm trying to modify is matlab.
I was trying to be smart and wrote some code (a lot at least for me) and the goal of that code was the following:
1. Taking an individual longitudinal datafile
2. curve fitting this data on a non-parametric model using lsqcurvefit
3. obtaining figures and the points where f' and f'' are zero
This all worked well (woohoo :-)) but when I started comparing the function parameters both programs generate there is a huge difference. The kaleidagraph program stays close to it's original starting values. Matlab wanders off and sometimes gets larger by a factor 1000. The graphs stay however more or less the same in both situations and both fit the data well. However it would be lovely if I would know how to make the matlab curve fitting more 'conservative' and more located near it's original starting values.
validFitPersons = true(nbValidPersons,1);
for i=1:nbValidPersons
personalData = data{validPersons(i),3};
personalData = personalData(personalData(:,1)>=minAge,:);
% Fit a specific model for all valid persons
try
opts = optimoptions(#lsqcurvefit, 'Algorithm', 'levenberg-marquardt');
[personalParams,personalRes,personalResidual] = lsqcurvefit(heightModel,initialValues,personalData(:,1),personalData(:,2),[],[],opts);
catch
x=1;
end
Above is a the part of the code i've written to fit the datafiles into a specific model.
Below is an example of a non-parametric model i use with its function parameters.
elseif strcmpi(model,'jpa2')
% y = a.*(1-1/(1+(b_1(t+e))^c_1+(b_2(t+e))^c_2+(b_3(t+e))^c_3))
heightModel = #(params,ages) abs(params(1).*(1-1./(1+(params(2).* (ages+params(8) )).^params(5) +(params(3).* (ages+params(8) )).^params(6) +(params(4) .*(ages+params(8) )).^params(7) )));
modelStrings = {'a','b1','b2','b3','c1','c2','c3','e'};
% Define initial values
if strcmpi('male',gender)
initialValues = [176.76 0.339 0.1199 0.0764 0.42287 2.818 18.52 0.4363];
else
initialValues = [161.92 0.4173 0.1354 0.090 0.540 2.87 14.281 0.3701];
end
I've tried to mimick the curve fitting process in kaleidagraph as good as possible. There I've found they use the levenberg-marquardt algorithm which I've selected. However results still vary and I don't have any more clues about how I can change this.
Some extra adjustments:
The idea for this code was the following:
I'm trying to compare different fitting models (they are designed for this purpose). So what I do is I have 5 models with different parameters and different starting values ( the second part of my code) and next I have the general curve fitting file. Since there are different models it would be interesting if I could put restrictions into how far my starting values could wander off.
Anyone any idea how this could be done?
Anybody willing to help a psychology student?
Cheers
This is a common issue when dealing with non-linear models.
If I were, you, I would try to check if you can remove some parameters from the model in order to simplify it.
If you really want to keep your solution not too far from the initial point, you can use upper bounds and lower bounds for each variable:
x = lsqcurvefit(fun,x0,xdata,ydata,lb,ub)
defines a set of lower and upper bounds on the design variables in x so that the solution is always in the range lb ≤ x ≤ ub.
Cheers
You state:
I'm trying to compare different fitting models (they are designed for
this purpose). So what I do is I have 5 models with different
parameters and different starting values ( the second part of my code)
and next I have the general curve fitting file.
You will presumably compare the statistics from fits with different models, to see whether reductions in the fitting error are unlikely to be due to chance. You may want to rely on that comparison to pick the model that not only fits your data suitably but is also simplest (which is often referred to as the principle of parsimony).
The problem is really with the model you have shown resulting in correlated parameters and therefore overfitting, as mentioned by #David. Again, this should be resolved when you compare different models and find that some do just as well (statistically speaking) even though they involve fewer parameters.
edit
To drive the point home regarding the problem with the choice of model, here are (1) results of a trial fit using simulated data (2) the correlation matrix of the parameters in graphical form:
Note that absolute values of the correlation close to 1 indicate strongly correlated parameters, which is highly undesirable. Note also that the trend in the data is practically linear over a long portion of the dataset, which implies that 2 parameters might suffice over that stretch, so using 8 parameters to describe it seems like overkill.

"ignore background" in matlab segmentation

I have an image shot with an x-ray for which I want to test different segmentation algorithm (like the ones found at http://www.academia.edu/913222/segmentation_techniques)
How can I ignore the background in the calculation, i.e. how can I ignore anything that has a gray value of under 50,000 (for a 16 bit image)?
the code I'm using right now is:
clc;
clear;
[fn,pn]=uigetfile({'*.TIF','Image files'}, 'Select an image');
x = imread(fullfile(pn,fn));
T=graythresh(x);
y=im2bw(x,T);
imshow(y);
but I also want to test different segmentation techniques.
I am trying to model the future implementation of a software in order to find the best course of action and this software will ignore the "background" (I already have a succesful implementation of the otsu algorithm.
Thanks for your wisdom =).
If you want to use Otsu only on the pixel values above 50000, you can simply write
T = graythresh(x(x>50000));