Grouping together of lines while doing line segmentation of printed text - matlab

I have been trying to segment lines from a printed text document. I have followed the following paper:
A Hough Transform based Technique for Text Segmentation Satadal Saha, Subhadip Basu, Mita Nasipuri and Dipak Kr. Basu
As per the paper, I used Hough transform to generate straight lines over the text and restricting angles in the vicinity of 90 degrees and connected component algorithm to group the generated straight lines to separate out lines from the text.
The hough transform output is given below:
But, the straight lines generated sometimes overlap between two text lines and more than one line segment gets grouped together.
The bounding boxes of lines in the text is given below:
Can anybody please help me to avoid this grouping together of lines of text? Please suggest a method so that the connected component analysis treat the lines of text as separate components.

You are using connected components to group your hough-lines into text-lines. This process is very sensitive to noise: even one misdetected pixel can bring together two lines.
You can make this process more robust if you look at the average "on" pixels per line in the image:
bw = imread('http://i.stack.imgur.com/tg2xN.png');
bw=bw>100;
figure; plot( mean(bw,2) ); xlabel('image row'); ylabel('#"on" pixels');
The red line shows 7.5% threshold on number of "on" pixels per row. As you can see it can help distinguish between well connected hough-lines to falsely connected ones.
Use this threshold to amend the mask:
msk = bsxfun(#times, bw, mean(bw,2)>0.075);
Now you can get the proper bounding boxes
bb=regionprops(bwlabel(msk,8),'BoundingBox');
Resulting with:

Related

How to find multiple horizontal straight lines in a scatter plot using MATLAB?

Hi,
I have a set of points (scatter plot). I am interested in finding multiple (approximately) horizontal straight lines (or clusters) in this data. For example in the image attached, I have drawn straight lines to show the desired result. Number of total possible straight lines in the data is unknown beforehand.
I shall be really thankful if anyone could tell me how to approach this problem.
Thanks in advance!
I'd use the RANSAC algorithm (see example in the documentation) or the Hough transform (See example here). Once lines are found, you can set a bound to their slope to select those that are close to being horizontal given some tolerance.

How to draw a best fit mesh on a set of points in 2D

I have a problem where we have a grid of points and I'd like to fit a "deformed grid which would best fit the set of points.
The MatLab data can be found at:
https://drive.google.com/file/d/14fKKEC5BKGDOjzWupWFSmythqUrRXae4/view?usp=sharing
You will see that cenX and cenY are the x and y coordinates of these centroids.
Like on this image. To note is that there are points missing, and there are a few extra points. Moreover, You can see some lines are not one single line from left to right, however, we could safely assume that the fitting a line somewhat horizontally (+-5degrees) would properly link the points into a somewhat deformed grid.
The vertical lines are trivial because that is how we generated these dots. We can find the number of lines required through a mode of the count of points on each of the columns of the grid.
I'd like to be able to ensure that a point is only part of one line, as this is a grid.

Interpolate out of vector range to split image in MATLAB

I have an image, and I'm letting the user draw a line on it to pick a region. Now, I would like to take that line (red line in the attached image) and extend it to get to the ends of the frame from both sides (white line in image).
I tried using interp1 but when I'm trying to get those coordinates on the frame itself, I get NaNs since it's not between the two points that the user picked.
Any suggestions on how to pick those points? Or alternatively, a better way to split the image?

Remove similar lines provided by Hough transform

I found with Hough transform more lines but somethings are very similar for my final target.
For example
In this image I have 5 lines but I really need just 2 lines.
How I can remove the unnecessary lines?
My code is
image = cv.Canny(image, 200);
lines = cv.HoughLinesP(image,'Threshold',80,'MinLineLength',100,'MaxLineGap',50);
A simple way can be with lines intersecting, but lines can be parallel and very close in certain situations.
Any idea?
My crude method was
use canny edge detector
take the first line from houghlines
draw black thick line over the original line in houglines inpu
repeat until you get no output from houghlines
I used it to detect edges of a card, so I took four best lines.
I would compute the the slope and intercept of the lines and compare them to see if they're both within some tolerance you define. The intercept should be described on the same coordinate frame, say with the origin at pixel r,c = (0,0). Identical lines could be merged then. The only failure case I can think of is if you have non-contiguous line segments that would have the same slope and intercept - those would be merged with this approach. But in your image you don't seem to have this issue.

how to do line fitting multiple lines MATLAB?

I'm trying find all straight lines in an image which is border. for example,stamps have four edges and I have already find those edges by edge function in MATLAB. But there is a problem that they are not real straight line. so I need to use line fitting to get all four borders. But polyfit function can only fit one line at one time. Is there any solutions that can fit all lines at one time.
for example:here I upload some pictures,the image with red lines is what I want. Please be ware I need four separate lines.
Judging from the image you won't be trying to smooth some lines, or fill in the gaps. Instead it looks more like you need to put your image in the smallest possible box.
Here is an algorithm that you can try:
Start from all 4 corners.
'walk' one of the corners inwards and determine if all points are still within four corners
If so, save this corner and go to step 2, else go to step 2
Keep repeating step 2 and 3 till you have a steady solution.
Are you trying to get rid of the perforations? In that case I would suggest using thresholding to segment out dark areas of the image, and then using regionprops to get their bounding boxes. Then you can figure out the largest rectangle that excludes them.