Zero reference point in Darknet YOLO

Zero reference point in Darknet YOLO - darknet

I'm trying to manually create label files for some images in YOLO/Darknet, and I need to fill in some values for the bounding boxes.
From the YOLO website (https://pjreddie.com/darknet/yolo/):
Now we need to generate the label files that Darknet uses. Darknet wants a .txt file for each image with a line for each ground truth object in the image that looks like:
[object-class] [x] [y] [width] [height]
Where x, y, width, and height are relative to the image's width and height.
The class, width and height are straight forward, but I'm wondering how to represent the center coordinates for the box, [x] and [y] , as I don't know where the (0,0) reference is allocated.
Thanks

For others who might be wondering, I found the answer to be at the top left corner of the image.

Related

Cropping the minimum sized rectangle of a shape from an image

I am making a card recognition project on MATLAB and I am stuck at this point. There are images of cards and on an image I want to define the smallest rectangle that takes the card inside. Example like below
Original image
Converted image
I am currently able to convert the image to black and white (leaves me only the cards white spaces), I want to define the rectangles by the whole white spaces. E.g., if I have 3 non-lapping cards in my image, I want to have 3 images like above (doesn't matter if another cards edge appears on the image, the important part is that rectangle must pass through the edges of the selected card).
I have tried edge definition methods but wasn't successful. Thanks for your help already.

I recommend you use regionprops function from the image processing tool box, i.e.,
bb = regionprops(yourImage, 'boundingbox');
which will return the bounding box. There is a nice MATWORKS video here and you can jump to about minute 26 for what you need.

How to find height of the specific point in this OCT image?

I am a beginner in Matlab who is working on medical image processing of retinal OCT images. My aim is to align all the images to 1 height value. I want to find the maximum height of the layer in the eye.
For example, if input :
the output: returns this height:
I have tried this approach as outlined in Hand_height but it returns the height of the complete image.

Iterate over X and find the first peak (blue point) using findpeaks in the vertical direction (Y) to generate the first layer (blue line),
and then determine the peak with the smallest index in the Y-direction.
Please see the image!

In order to find maximum high you should find the top border of a retina.Here you have an example of how to find it.

Detecting a line in a JPEG image

I'm new to Swift and image processing but I didn't find any program to do what I wanted. I have thousands of pages of questionnaires but the OMR freeware (Optical Mark Recognition) I use fails to detect the boxes. That is because the questionnaires were printed by me or by the participants in the study yielding to different images (scale and rotation). Redressing the image is not sufficient. Lucky me, there is an horizontal line somewhere on top of each pages. So, the algorithm would look something like this:
Select all the JPEG to transform (done)
Enter the coordinates of the target line (done)
For each JPEG image
3a. Load the image (NSData? not UIImage since it is an App)
3b. Uncompress the image
3c. Detect the line on top of the page
3d. Calculate and apply the angle and the translation (I found a free source in Java doing that)
3e. Save the image under a modified name
I need your help for steps 3a-3b. For step 3c, shall I use Canny edge detector followed by Hough transform?
Any thoughts would be appreciated.
---- EDIT ----
Here is an image describing the problem. On the upper part (Patient #1), the coordinate of the top horizontal line are (294, 242) to (1437, 241). One the lower part (Patient #2), the coordinate of the top horizontal line are (299, 230) to (1439, 230). This seems a small difference but the OMR looks at the ROIs (i.e. boxes) with fixed coordinates. In other scanned images, the difference may be even greater and the top line may be not horizontal (e.g. (X1, Y1) = (320, 235) and (X2, Y2) = (1480, 220)).
My idea is to get a template for the check boxes (the OMR does it) and coordinates of the top line once for ever (I can get them with Paint or whatever). Then align all the images to this template (using their top line) before running the OMR. There may be a scaling, a rotation and a translation needed. In other words, all the images should be perfectly stackable on the template image for the OMR to perform correctly...
--- EDIT Dec 26th ---
I've translated into Swift the Probabilistic Hough Transform of OpenCV (open cpp code from GitHub. Unfortunately, the segments detected are too short (i.e. the entire line segment is not captured). I'm wondering: does it make sense to use Canny Edge Detector before Hough Transform to detect a single segment of a black line on a white page?

Resizing command changes image shape

I have to resize image i.e if its dimension is 3456x5184 to 700X700 as my code needs image with less number of pixels otherwise it takes too much time to give results.So, when I use imresize command it changes the dimensions of image but at the same time it changes the shape of image i.e the circle in image which I also need to detect looks like oval instead of being cirle. I need your suggestions to resolve this problem. I am really grateful to you people.

Resizing images is done by either subsampling (to get smaller images) or some kind of interpolation (to get larger images)
Input is either a factor or a final dimension for width and height.
The only way to fit a rectangle into a square by simply resizing it is to use different scales for width and height. Which of course will yield in a distorted image.
To achieve what you want you can either crop a 700x700 region from your image or resize image using the same factor for with and height. Then you can fit the larger dimension into 700 and fill the rest around the other dimension with black or whatever you prefer.

How to auto-crop a barrel-distorted image using ImageMagick?

Using ImageMagick's convert to barrel-distort a photo to correct a strongly visible pincushion distortion, I provide positive a, b or c values (from a database for my lens + focal length). This results in an image that is corrected, has the original width and height, but includes a non-rectangular, bent/distorted border, as the image is corrected towards its center. Simplified example:
convert rose: -virtual-pixel black -distort Barrel '+0.0 +0.1 +0.0' out.png
How can I automatically crop the black, bent border to the largest possible rectangle in the original aspect ratio within the rose?
The ImageMagick website says, that a parameter "d" is automatically calculated, that could do this (resulting in linear distortion effectively zooming into the image and pushing the bent border right outside the image bounds), but the imagemagick-calculated value seems to aim for something different (v6.6.9 on ubuntu 12.04). If I guess and manually specify a "d", I can get the intended result:
convert rose: -virtual-pixel black -distort Barrel '+0.0 +0.1 +0.0 +0.6' out.png
The given formular a+b+c+d=1 does not seem to be a proper d for my cropping case. Also, d seems to depend on the aspect ratio of the image and not only on a/b/c. How do I make ImageMagick crop the image, or, how to I calculate a proper d?
Update
I found Fred's ImageMagick script innercrop (http://www.fmwconcepts.com/imagemagick/innercrop/index.php) that does a bit what I need, but has drawbacks and is no solution for me. It asumes arbitrary outer areas, so it takes long to find the cropping rectangle. It does not work within Unix pipes, and it does not keep the original aspect ratio.
Update 2
Contemplating on the problem makes me think that calculating a "d" is not the solution, as changing d introduces more or less bending and seems to do more than just zoom. The d=1-(a+b+c) that is calculated by imagemagick results in the bent image touching the upper/lower bounds (for landscape images) or the left/right bounds (for portrait images). So I think the proper solution would be to calculate where one of the new 4 corners will be given a/b/c/d, and then crop to those new corners.

The way I understand the docs, you do not use commas to separate the parameters for the barrel-distort operator.
Here is an example image, alongside the output of the two commands you gave:
convert o.png -virtual-pixel black -distort Barrel '+0.0 +0.1 +0.0' out0.png
convert o.png -virtual-pixel black -distort Barrel '+0.0 +0.1 +0.0 +0.6' out1.png
I created the example image in order to better visualize what you possibly want to achieve.
However, I do not see the point you stated about the automatically calculated parameter 'd', and I do not see the effect you stated about using 'd=+0.6'...
I'm not sure I understand your wanted result correctly, so I'm assuming you want the area marked by the yellow rectangle cropped.
The image on the left is out0.png as created by the first command above.
In order to guess the required coordinates, we have to determine the image dimensions first:
identify out0.png
out0.png PNG 700x700 700x700+0+0 8-bit sRGB 36KB 0.000u 0:00.000
The image in the center is marked up with the white rectangle. The rectangle is there so you can look at it and tell me if that is the region you want cropped. The image on the right is the cropped image (without scaling it back to the original size).
Is this what you want? If yes, I can possibly update the answer in order to automatically determine the required coordinates of the cropping. (For now I've done it based on guessing.)
Update
I think you may have mis-understood the purpose of the barrel-distortion operation. It is meant for correcting a barrel (slight) distortion, as is produced by camera lenses. The 3 parameters a, b and c to be used for any specific combination of camera, lens and current zoom could possibly be stated in your photo's EXIF data. The formula were a+b+c+d = 1 is meant to be used when the new, distortion-corrected image should have the same dimensions as the original (distorted) image.
So to imitate the barrel-correction, we should probably use the second image from the last row above as our input:
convert out3.png -virtual-pixel gray -distort barrel '0 -0.2 0' corrected.png
Result: