I have seen that Google's AutoML will use some pre-training image augmentation to increase the robustness of the model.
(Adjacent discussion)
I have searched the documentation and forums for a way to limit these techniques. For instance it applies flips to the objects. However, in some cases flips hurt the predictions. For instance recognizing numbers in an image. For most fonts, 2's and 5's are different enough to have different features, even when flipped. However, a 7-segment display will have the same representation for 2's and 5's when they are flipped.
7-segment display example
I have labeled hundreds of images with many digits in each image. The model continues to confuse the 2's and 5's for the 7-segment displays. It has some success but not an acceptable amount.
Does anyone know if limiting the image augmentation with AutoML is possible?
Related
I want to implement an application, that is able to recognize pictures from camera input. I don't mean classification of objects, but rather detecting the exact single image from given set of images. So if I for example have an album with 500 pictures, then if I point a camera to one of them, then application will be able to tell it's filename. Most of tutorials I find about CoreML is strictly for image classification (recognizing class of object) and not about recognizing exact image name in camera. This needs to work from different angles as well, and all I can have for training the network is this album with many different pictures (single picture for single object). Can this be somehow achieved? I can't use ARKit Image Tracking, because there will be about 500 of these images, and I need to find at least a list of similar ones first with CoreML / Vision.
I am not sure, but I guess perceptual hashing might be able to help you.
It works in a way that it makes some fingerprint from the reference images, and for a given image, it extracts the fingerprints as well, and then you can find the most similar fingerprints.
in this way, even if the new image is not 100% as the image in the dataset, you still can detect it.
It is actually not very hard to implement. but if you would like, i think phash library is a good one to use.
I want to detect shape and then describe it (somehow) to compare it with server data.
So the first question is, is it possible to detect shape like blob with ARKit?
To be more specific, let's describe my usecase generally.
I want to scan image by phone, get the specific shape, send it on server, compare two images on server (server image is the real one, scanned image would be very similar) and then send back some data. I am not asking about server side, the only question about server side is what should I compare - images using OpenCV, some mathematical description of both images and try to find similarity, etc.).
If the question is hard to understand, let's split it on two easy questions:
1) How to scan 2D object by iPhone and save it (trim the specific shape from its background when object is black and background white).
2) Describe scanned object for comparision with almost the same object.
ARKit has no use here.
You will probably need a lot of CoreImage (for fixing perspective distortion and binarization) and OpenCV logic.
Perhaps Vision can help you a little bit with getting ROI from the entire frame, especially if the waveform image is located in some kind of rectangle.
Perhaps you can train a custom ML model that will recognize specific waveforms or waveforms in general to use with Vision.
In any case, it is not a trivial task.
I am currently studying image processing and learning matlab for my project.
I needed to know that if there is any method to detect a car from traffic image or parking lot image and then segment it out from it.
I have googled a lot but mostly the content is video based and I dont know anything about image processing.
language prefered : MATLAB
I am supposed to do this on images only not videos.
It's a very difficult problem in general. I'd suggest the easier way is to constrain the problem as much as possible - control lighting, size orientation of cars to detect, no occlusions.
This constraining has been the philosophy image processing has followed up until recently. Now the trend is that instead of constraining your problem, obtain as a massive amount of example data to train a suppervised learning algorithm. In fact it's possible that you can use a pre-trained model that would let you detect cars as it has been suggested in a previous answer.
There has been recently massive progress in the area of object detection in images and here are a few of the state of the art approaches based on neural network based approaches:
OverFeat
Rich feature hierarchies for accurate object detection and semantic segmentation (R-CNN paper)
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition (paper)
Framework that you could use include:
Caffe: http://caffe.berkeleyvision.org/
Theano
Torch
You can use the detection by parts method:
http://www.cs.berkeley.edu/~rbg/latent/
It contains a trained model for "car" which you can use to detect cars, surround then with a bounding box and then extract them from the images.
I have read that it's possible to create a depth image from a stereo camera setup (where two cameras of identical focal length/aperture/other camera settings take photographs of an object from an angle).
Would it be possible to take two snapshots almost immediately after each other(on the iPhone for example) and use the differences between the two pictures to develop a depth image?
Small amounts of hand-movement and shaking will obviously rock the camera creating some angular displacement, and perhaps that displacement can be calculated by looking at the general angle of displacement of features detected in both photographs.
Another way to look at this problem is as structure-from-motion, a nice review of which can be found here.
Generally speaking, resolving spatial correspondence can also be factored as a temporal correspondence problem. If the scene doesn't change, then taking two images simultaneously from different viewpoints - as in stereo - is effectively the same as taking two images using the same camera but moved over time between the viewpoints.
I recently came upon a nice toy example of this in practice - implemented using OpenCV. The article includes some links to other, more robust, implementations.
For a deeper understanding I would recommend you get hold of an actual copy of Hartley and Zisserman's "Multiple View Geometry in Computer Vision" book.
You could probably come up with a very crude depth map from a "cha-cha" stereo image (as it's known in 3D photography circles) but it would be very crude at best.
Matching up the images is EXTREMELY CPU-intensive.
An iPhone is not a great device for doing the number-crunching. It's CPU isn't that fast, and memory bandwidth isn't great either.
Once Apple lets us use OpenCL on iOS you could write OpenCL code, which would help some.
i would like to process infrared imaging in Matlab. Any kind of processing or techniques.
Is there any built-in function in Matlab?
And can anyone suggest any books or articles,as well as resources for sample Far Infrared images.
Thanks!
You may want to have a look at the image processing toolbox. There, you find plenty of built-in functionality for denoising and segmentation of any kind of images.
For more detailed answers, I suggest that you let us know in more detail what kind of processing that you want to do.
EDIT
Infrared images are normally grayscale images. Thus, it is very straightforward to false-color them by mapping the gray levels to colors (i.e. by applying a different colormap).
%# load a grayscale image
img = imread('coins.png');
%# display the image
figure
imshow(img,[]);
%# false-color
colormap('hot')
For more information about general techniques, you may want to Google 'infrared image processing' and start looking at the hits related to your specific application.
In general, processing of infrared images is not different from processing other grayscale images. What specific algorithms you apply depends very much on the image and the purpose of the processing.
LWIR imagery can be used for a large number of different applications. In general, each application domain has its own history, terminology and mathematical conventions.
As an example, we can use LWIR imagery for:
Detecting faulty components or components that are likely to fail.
Medical imaging for diagnosis of skin disorders.
Finding humans in Search & Rescue or border-control applications.
Detecting & classifying aircraft, missiles, vehicles etc... for various defense applications.
Geographical or Oceanographic research (using LWIR satellite imagery).
Each of these applications will rely upon very different techniques. The image processing toolbox may well be useful for some of these application areas, but, in general, you need to look at resources (software, textbooks, journals etc...) that are specific to the application domain or the specific sensor system that you will be using.
I don't think that the processing infrared images in general will not be different from processing visible-color images. As far as i came to know, during the processing of infrared images, we have to use raw data image which contains temperature information rather than pseudo color image which contains only the color intensity from 0-255.