Training image orientation classifier - Neural Network - neural-network

I would like to train a conv neural network to detect the correct orientation of images. only 4 degrees (0,90,180 and 270).
The difficulty is that: the images will contain different objects - single person, group of person, mountain view, buildings, etc...
I was thinking of training the convNet on a big set of images. each image will be rotated 4 times (0,90,180 and 270). and each image will have a label (0 -> 0, 90 -> 1, 180 -> 2, 270 -> 3).
Are there other examples of orientation convNets / complex 4-class convNets / RNNs I could use for inspiration? (I'm using Caffe framework)
Thank you!

A very interesting problem.
I agree with your observation, that looking at specific objects in the photos and use their orientation to decide on the image orientation can be misleading.
Check this image for example:
It is perfectly oriented, yet the face is not upright.
Therefore, I suppose your approach to treat this problem as an image labeling problem (i.e., single orientation label per input image) is a good way to proceed.
I would take any not-too-fancy off-the-shelf net and fine tune it based on the labeling you suggested.
Good luck!

Related

different Image sizes(resolution) as input for inference on CNN

This may be a basic conceptual question, but reading on different CNN's such as VGG, Alexnet, GoogleNet, etc it seems that once the model has been trained on a specific image size as input (lets say 256x256), I can't give a different image size to the model (1,920 x 1,080) during inference without resizing or croping. Is this true?
I know that YOLO handles images with different resolutions, is Yolo resizing the image before giving it to the convolution layers?
The requirement that I have is to do object recognition on a series of images that may not have the same image size, the obvious approach would be resizing the image, but that may lead to losing information on the image.
If so, do I need to train a model for every image size that I have, and then reload the model each time for that specific image?
There are more conceptual issues, VGG, AlexNet, GoogleNet are image classification models, while YOLO is an object detection model. Only if the network is fully convolutional it can accept variable-sized images.
So your only option is resize images to a common size, this works well in practice, so you should do it and evaluate different image sizes to see how accuracy changes with it. Only after doing such experiment you can decide if resizing is not appropriate.

Stitching overlaying images from different cameras in Matlab

Two cameras takes two images of a wooden plank. The images have an overlap of the plank which I need to stitch together in a way that it looks natural and preferably seamless to the human eye for inspection purposes. The images are cropped to the same size and masked to remove the background and most of the non-overlapping areas but the plank can have a slight tilt on the conveyor belt.
Currently I'm using the normxcorr2 function on the general overlay area, following ideas from the Matlab totorial of the normxcorr2 function, to try and identify one of the images in the other and work out an overlay offset, following the tutorial. However, this fails quite often as the normxcorr2 functions returns a zero offset - resulting in a bad stitching:
c = normxcorr2(plank_part1,plank_part2);
Find peak in cross-correlation:
[ypeak, xpeak] = ind(c==max(c(:)));
Account for the padding that normxcorr2 adds:
yoffSet = ypeak-size(onion,1);
xoffSet = xpeak-size(onion,2);
[xoffSet,yoffSet]
ans =
0 0
It would seem normxcorr2 can not always find the correct overlay of the images, or any overlay at all(?), even though I try to make it easier by increasing the gray scale contrast by the function histeq. My guess is that the amount of "gray-ish" area from the sapwood overwhelms the distinct knots, which are the important parts to stitch propperly.
Does anyone know of a way to either increase the likelihood of this stitching process, maybe by some more preprocessing, or use any other matlab skills/functions to make this work better?
P.S I can not use anything but freely accessible scripts as this would probably become license/copyright issues for my project.
Thank you for your time in trying to help!
You should look at the following link. The term that you should be looking for is image registration. There are more advanced methods than normxcorr2

Static image calibration

I am capturing static images of particulate biological materials on the millimeter scale, and then processing them in MATLAB. My routine is working well so far, but I am using a rudimentary calibration procedure where I include some coins in the image, automatically find them based on their size and circularity, count their pixels, and then remove them. This allows me to generate a calibration line with input "area-mm^2" and output "Area- pixels," which I then use to convert the pixel area of the particles into physical units of millimeters squared.
My question is: is there a better calibrant object that I can use, such as a stage graticule or "phantom" as some people seem to call them? Do you know where I could purchase such a thing? I can't even seem to find a possible vendor. Is there another rigorous way to approach this problem without using calibrant objects in the field of view?
Thanks in advance.
Clay
Image calibration is always done using features of knowns size or distance.
You could calculate the scale based on nominal specifications but your imaging equipment will always have some production tolerances, your object distance is only known to a certain accuracy...
So it's always safer and simpler to actually calibrate your scale.
As a calibrant you can use anything that meets your requirements. If you know the size well enough and if you are able to extract it's dimensions in pixels properly you can use it...
I don't know your requirements and your budget, but if you want something very precise and fancy you can use glass masks.
There are temperature stable glass slides that are coated with chrome for example. There are many companies that produce such masks customized (IMT AG, BVM maskshop, ...) Also most optics lab equipment suppliers have such things on stock. Edmund Optics, Newport, ...

Fit 3D matrices to same gray values

I'm trying to fit two data sets. Those contain the results of measuring the same object with two different measurement devices (x-ray vs. µct).
I did manage to reconstruct the image data and fit the orientation and offset of the stacks. It looks like this (one image from a stack of about 500 images):
The whole point of this is to compare several denoising algorithms on the x-ray data (left). It is assumed that the data from µCT (right) is close to the real signal without any noise. So, I want to compare the denoised x-ray data from each of the algorithms to the "pure" signal from µCT to see which algorithm produces the lowest RMS-error. Therefore, I need to somehow fit the grayvalues from the left part to those of the right part without manipulating the noise too much.
The gray values in the right are in the range of 0 to 100 whereas the x-ray data ranges from about 4000 to 30000. The "bubbles" are in a range of about 8000 to 11000. (those are not real bubbles but an artificial phantom with holes out of a 3D printer)
What I tried to do is (kind of) band pass those bubbles and map them to ~100 while shifting everything else towards 4 (which is the value for the background on the µCT data).
That's the code for this:
zwst = zwsr;
zwsr(zwst<=8000)=round(zwst(zwst<=6500)*4/8000);
zwsr(zwst<=11000 & zwst>8000)= round(zwst(zwst<=11000 & zwst>8000)/9500*100);
zwsr(zwst>11000)=round(zwst(zwst>11000)*4/30000);
The results look like this:
Some of those bubbles look distorted and the noise part in the background is gone completely. Is there any better way to fit those gray values while maintaining the noisy part?
EDIT: To clarify things: The µCT data is assumed to be noise free while the x-ray data is assumed to be noisy. In other words, µCT = signal while x-ray = signal + noise. To quantize the quality of my denoising methods, I want to calculate x-ray - µCT = noise.
Too long for a comment, and I believe a reasonable answer:
There is a huge subfield of image processing/ signal processing called image fusion. There is even a specific Matlab library for that using wavelets (http://uk.mathworks.com/help/wavelet/gs/image-fusion.html).
The idea behind image fusion is: given 2 images of the same thing but with very different resolution/data, how can we create a single image containing the information of both?
Stitching both images "by hand" does not give very good result generally so there are a big amount of techniques to do it mathematically. Waveletes are very common here.
Generally this techniques are widely used in medical imaging , as (like in your case) different imaging techniques give different information, and doctors want all of them together:
Example (top row: images pasted together, bottom row: image fusion techniques)
Have a look to some papers, some matlab tutorials, and probably you'll get there with the easy-to-use matlab code, without any fancy state of the art programming.
Good luck!

vision.PeopleDetector function in Matlab

Have anyone ever used vision.PeopleDetector function from Computer Vision System Toolbox in Matlab?
I've installed it and tried to apply to images I have.
Although it detects people on the training image, it detects nothing on real photos. Either it doesn't detect people at all or detects people at parts of the image where they are not presented.
Could anyone share the experience of using this function?
Thanks a lot!
Here is a sample image:
The vision.PeopleDetector object does indeed detect upright standing people in images. However, like most computer vision algorithms it is not 100% accurate. Can you post a sample image where it fails?
There are several things you can try to improve performance.
Try changing the ClassificationModel parameter to 'UprightPeople_96x48'. There are two models that come with the object, trained on different data sets.
How big (in pixels) are the people in your image? If you use the default 'UprightPeople_128x64' model, then you will not be able to detect a person smaller than 128x64 pixels. Similarly, for the 'UprightPeople_96x48' model the smallest size person you can detect is 96x48. If the people in your image are smaller than that, you can up-sample the image using imresize.
Try reducing the ClassificationThreshold parameter to get more detections.
Edit:
Some thoughts on your particular image. My guess would be that the people detector is not working well here, because it was not trained on this kind of images. The training sets for both models consist of natural images of pedestrians. Ironically, the fact that your image has a perfectly clean background may be throwing the detector off.
If this image is typical of what you have to deal with, then I have a few suggestions. One possibility is to use simple thresholding to segment out the people. The other is to use vision.CascadeObjectDetector to detect the faces or the upper bodies, which happens to work perfectly on this image:
im = imread('postures.jpg');
detector = vision.CascadeObjectDetector('ClassificationModel', 'UpperBody');
bboxes = step(detector, im);
im2 = insertObjectAnnotation(im, 'rectangle', bboxes, 'person', 'Color', 'red');
imshow(im2);