Annotate Images for training YOLO with custom dataset - image-segmentation

I am using MODD (marine obstacle detection dataset) for detecting marine objects using YOLO. I followed Mark Jay series for training the algorithm with custom dataset. But in the tutorial, detection of square objects is only specified. In my dataset I have three main classes
Large Objects
Small Objects
Horizons
A sample image from the dataset:
As you can see the yellow line (with few points) is the horizon. It classifies sea from all other parts (i.e it specifies the range to which the current boat move in the sea). As specified in mark Jay tutorial, for training YOLO with custom objects, I provided the annotations in above format (only for small and large objects):
<annotation>
<folder>/home/user/Desktop/dataset/modd_dataset1.0/data/01/images/</folder>
<filename>17.jpg</filename>
<segmented>0</segmented>
<size>
<width>640</width>
<height>480</height>
<depth>3</depth>
</size>
<object>
<name>largeobjects</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>518.9599375650363</xmin>
<ymin>145.26586888657653</ymin>
<xmax>638.835067637877</xmax>
<ymax>219.8548387096775</ymax>
</bndbox>
</object>
<object>
<name>largeobjects</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>239.91727367325706</xmin>
<ymin>22.7268470343393</ymin>
<xmax>306.5145681581686</xmax>
<ymax>223.18470343392306</ymax>
</bndbox>
</object>
<object>
<name>largeobjects</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>2.1649323621228262</xmin>
<ymin>196.54578563995847</ymin>
<xmax>8.158688865764859</xmax>
<ymax>219.18886576482834</ymax>
</bndbox>
</object>
<object>
<name>smallobjects</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>227.92976066597294</xmin>
<ymin>218.52289281997923</ymin>
<xmax>240.58324661810616</xmax>
<ymax>233.84027055150892</ymax>
</bndbox>
</object>
</annotation>
In the case of horizons, multiple (x,y) points are provided to draw a line (i.e while in case of small and large objects two (x,y) points are being specified so that a box can be drawn around the detected object). Is there any way to provide annotations in the above horizon ??
If annotations can be provided, how to do that? If not, is there any other algorithm that I can use for this purpose? The model need to be really fast because it will be used on a IOT device for real-time detection.
This is already implemented using semantic segmentation (but in matlab). Can semantic segmentation be used on edge devices(with python & openCV)?

Semantic segmentation can be run on edge devices like Jetson Nano.
This link link might help you. It is very much possible with python and OpenCV as long as you choose the suitable edge device. This is the official benchmarks page.

Related

Extracting 2D surface from 3D STEP model

I'm trying to figure out a good way to programmatically generate contours describing a 2D surface, from a 3D STEP model. Application is generating NC code for a laser-cutting program from a 3D model.
Note: it's easy enough to do this in a wide variety of CAD systems. I am writing software that needs to do it automatically.
For example, this (a STEP model):
Needs to become this (a vector file, like an SVG or a DXF):
Perhaps the most obvious way of tackling the problem is to parse the STEP model and run some kind of algorithm to detect planes and select the largest as the cut surface, then generate the contour. Not a simple task!
I've also considered using a pre-existing SDK to render the model using an orthographic camera, capture a high-res image, and then operating on it to generate the appropriate contours. This method would work, but it will be CPU-heavy, and its accuracy will be limited to the pixel resolution of the rendered image - not ideal.
This is perhaps a long shot, but does anyone have thoughts about this? Cheers!
I would use a CAD library to load the STEP file (not a CAD API), look for the planar face with the higher number of edge curves in the face loop and transpose them on the XY plane. Afterward, finding 2D geometry min/max for centering etc. would be pretty easy.
Depending on the programming language you are using I would search for "CAD control" or "CAD component" on Google combining it with "STEP import".

Any deep learning models for instance classification in an image, rather than bounding box?

I need to classify pixel-wise instances in an image. Most object detection models, e.g., RetinaNet, R-CNNs, only detect bounding box. In my case the non-instance region in a bounding box can be significantly different from the instance. Even though the mask R-CNN model still does object classification based on the bounding box area. Does anybody know what model should I use? I guess Facebook's MultiPathNet probably works, but I am not using Linux. Are there any other models? Thanks a lot.
It sounds that you're looking for instance-level segmentation (as a short term for the long explanation).
Mask R-CNN sounds just right for the job.
It does instance-level segmentation based on the region proposals, not only bounding boxes.
The segmentation is a binary mask of the instance. The classification is made by a dedicated branch.

Looking for a saliency map detection code for video processing

I am looking for a code or an application which can extract the salient object out of a video considering both context and motion,
or
an algorithm just for motion saliency map detection (motion contrast) so I can fuse it with a context_aware salient object detector that I have.
Actually I have tested context_aware saliency map detector already but it in some frame detects some part of background as salient object and I want to involve the motion and time in this detection so I can extract the exact salient object as it's possible.
Can anyone help me?
one of the most popular approaches (although a bit dated) in the computer vision community is the graph based visual saliency (GBVS) model.
it uses a graph-based method to compute visual saliency. first, the same feature maps than in the fsm model are extracted. it leads to three multiscale feature maps: colors, intensity and orientations. then, a fully-connected graph is built over all grid locations of each feature map and a weight is assigned between each node. this weight depends on the spatial distance and the value of the feature map between nodes. finally, each graph is treated as markov chains to build an activation map where nodes which are highly dissimilar to surrounding nodes will be assigned high values. finally, all activation maps are ultimately merged into the final saliency map.
you can find matlab source code here: http://www.vision.caltech.edu/~harel/share/gbvs.php

Matlab 3D reconstruction

Recently, I have to do a project of multi view 3D scanning within this 2 weeks and I searched through all the books, journals and websites for 3D reconstruction including Mathworks examples and so on. I written a coding to track matched points between two images and reconstruct them into 3D plot. However, despite of using detectSURFFeatures() and extractFeatures() functions, still some of the object points are not tracked. How can I reconstruct them also in my 3D model?
What you are looking for is called "dense reconstruction". The best way to do this is with calibrated cameras. Then you can rectify the images, compute disparity for every pixel (in theory), and then get 3D world coordinates for every pixel. Please check out this Stereo Calibration and Scene Reconstruction example.
The tracking approach you are using is fine but will only get sparse correspondences. The idea is that you would use the best of these to try to determine the difference in camera orientation between the two images. You can then use the camera orientation to get better matches and ultimately to produce a dense match which you can use to produce a depth image.
Tracking every point in an image from frame to frame is hard (its called scene flow) and you won't achieve it by identifying individual features (such as SURF, ORB, Freak, SIFT etc.) because these features are by definition 'special' in that they can be clearly identified between images.
If you have access to the Computer Vision Toolbox of Matlab you could use their matching functions.
You can start for example by checking out this article about disparity and the related matlab functions.
In addition you can read about different matching techniques such as block matching, semi-global block matching and global optimization procedures. Just to name a few keywords. But be aware that the topic of stereo matching is huge one.

Automatic Vehicle Plate Recognition system

I was currently doing a project on recognizing the vehicle license plate at the rear side, i have done the OCR as the preliminary step, but i have no idea on how to detect the rectangle shaped(which is the concerned area of the car) license plate, i have read lot of papers but in no where i found a useful information about recognizing the rectangle shaped area of the license plate. I am doing my project using matlab. Please anyone help me with this ...
Many Thanks
As you alluded to, there are at least two distinct phases:
Locating the number plate in the image
Recognising the license number from the image
Since number plates do not embed any location marks (as found in QR codes for example), the complexity of recognising the number plate within the image is reduced by limiting range of transformation on the incoming image.
The success of many ANPR systems relies on the accuracy of the position and timing of the capturing equipment to obtain an image which places the number plate within a predictable range of distortion.
Once the image is captured the location phase can be handled by using a statistical analysis to locate a "number plate" shaped region within the image, i.e. one which is of the correct proportions for the perspective. This article describes one such approach.
This paper and another one describe using Sobel edge detector to locate vertical edges in the number plate. The reasoning is that the letters form more vertical lines compared to the background.
Another paper compares the effectiveness of some techniques (including Sobel detection and Haar wavelets) and may be a good starting point.
I had done my project on 'OCR based Vehicle Identification'
In general, LPR consist of three main phases: License Plate Extraction from the captured image, image segmentation to extract individual characters and character recognition. All the above phases of License Plate Detection are most challenging as it is highly sensitive towards weather condition, lighting condition and license plate placement and other artefact like frame, symbols or logo which is placed on licence plate picture, In India the license number is written either in one row or in two rows.
For LPR system speed and accuracy both are very important factors. In some of the literatures accuracy level is good but speed of the system is less. Like fuzzy logic and neural network approach the accuracy level is good but they are very time consuming and complex. In our work we have maintained a balance between time complexity and accuracy. We have used edge detection method and vertical and horizontal processing for number plate localization. The edge detection is done with ‘Roberts’ operator. The connected component analysis (CCA) with some proper thresholding is used for segmentation. For character recognition we have used template matching by correlation function and to enhance the level of matching we have used enhanced database.
My Approach for Project
Input image from webcam/camera.
Convert image into binary.
Detect number plate area.
Segmentation.
Number identification.
Display on GUI.
My Approach for Number Plate Extraction
Take input from webcam/camera.
Convert it into gray-scale image.
Calculate threshold value.
Edge detection using Roberts’s operator.
Calculate horizontal projection.
Crop image horizontally by comparing with 1.3 times of threshold value.
Calculate vertical projection.
Crop image vertically.
My Approach for Segmentation
Convert extracted image into binary image.
Find in-compliment image of extracted binary image.
Remove connected component whose pixel value is less than 2% of area.
Calculate number of connected component.
For each connected component find the row and column value
Calculate dynamic thresholding (DM).
Remove unwanted characters from segmented characters by applying certain conditions
Store segmented characters coordinates.
My Approach for Recognition
Initialize templates.
For each segmented character repeat step 2 to 7
Convert segmented characters to data base image size i.e. 24x42.
Find correlation coefficient value of segmented character with each data base image and store that value in array.
Find out the index position of maximum value in the array.
Find the letter which is link by that index value
Store that letter in a array.
Check out OpenALPR (http://www.openalpr.com). It recognizes plate regions using OpenCV and the LBP/Haar algorithm. This allows it to recognize both light on dark and dark on light plate regions. After it recognizes the general region, it uses OpenCV to localize based on strong lines/edges in the image.
It's written in C++, so hopefully you can use it. If not, at least it's a reference.