different Image sizes(resolution) as input for inference on CNN - neural-network

This may be a basic conceptual question, but reading on different CNN's such as VGG, Alexnet, GoogleNet, etc it seems that once the model has been trained on a specific image size as input (lets say 256x256), I can't give a different image size to the model (1,920 x 1,080) during inference without resizing or croping. Is this true?
I know that YOLO handles images with different resolutions, is Yolo resizing the image before giving it to the convolution layers?
The requirement that I have is to do object recognition on a series of images that may not have the same image size, the obvious approach would be resizing the image, but that may lead to losing information on the image.
If so, do I need to train a model for every image size that I have, and then reload the model each time for that specific image?

There are more conceptual issues, VGG, AlexNet, GoogleNet are image classification models, while YOLO is an object detection model. Only if the network is fully convolutional it can accept variable-sized images.
So your only option is resize images to a common size, this works well in practice, so you should do it and evaluate different image sizes to see how accuracy changes with it. Only after doing such experiment you can decide if resizing is not appropriate.

Related

Automatic assembly of multiple Kinect 2 grayscale depth images to a depth map

The project is about measurement of different objects under the Kinect 2. The image acquisition code sample from the SDK is adapted in a way to save the depth information at the whole range and without limiting the values at 0-255.
The Kinect is mounted on a beam hoist and moved in a straight line over the model. At every stop, multiple images are taken, the mean calculated and the error correction applied. Afterwards the images are put together to have a big depth map instead of multiple depth images.
Due to the reduced image size (to limit the influence of the noisy edges), every image has a size of 350x300 pixels. For the moment, the test is done with three images to be put together. As with the final program, I know in which direction the images are taken. Due to the beam hoist, there is no rotation, only translation.
In Matlab the images are saved as matrices with the depth values going from 0 to 8000. As I could only find ideas on how to treat images, the depth maps are transformed into images with a colorbar. Then only the color part is saved and put into the stitching script, i.e. not the axes and the grey part around the image.
The stitching algorithm doesn’t work. It seems to me that the grayscale-images don’t have enough contrast to be treated by the algorithms. Maybe I am just searching in the wrong direction?
Any ideas on how to treat this challenge?

Fit 3D matrices to same gray values

I'm trying to fit two data sets. Those contain the results of measuring the same object with two different measurement devices (x-ray vs. µct).
I did manage to reconstruct the image data and fit the orientation and offset of the stacks. It looks like this (one image from a stack of about 500 images):
The whole point of this is to compare several denoising algorithms on the x-ray data (left). It is assumed that the data from µCT (right) is close to the real signal without any noise. So, I want to compare the denoised x-ray data from each of the algorithms to the "pure" signal from µCT to see which algorithm produces the lowest RMS-error. Therefore, I need to somehow fit the grayvalues from the left part to those of the right part without manipulating the noise too much.
The gray values in the right are in the range of 0 to 100 whereas the x-ray data ranges from about 4000 to 30000. The "bubbles" are in a range of about 8000 to 11000. (those are not real bubbles but an artificial phantom with holes out of a 3D printer)
What I tried to do is (kind of) band pass those bubbles and map them to ~100 while shifting everything else towards 4 (which is the value for the background on the µCT data).
That's the code for this:
zwst = zwsr;
zwsr(zwst<=8000)=round(zwst(zwst<=6500)*4/8000);
zwsr(zwst<=11000 & zwst>8000)= round(zwst(zwst<=11000 & zwst>8000)/9500*100);
zwsr(zwst>11000)=round(zwst(zwst>11000)*4/30000);
The results look like this:
Some of those bubbles look distorted and the noise part in the background is gone completely. Is there any better way to fit those gray values while maintaining the noisy part?
EDIT: To clarify things: The µCT data is assumed to be noise free while the x-ray data is assumed to be noisy. In other words, µCT = signal while x-ray = signal + noise. To quantize the quality of my denoising methods, I want to calculate x-ray - µCT = noise.
Too long for a comment, and I believe a reasonable answer:
There is a huge subfield of image processing/ signal processing called image fusion. There is even a specific Matlab library for that using wavelets (http://uk.mathworks.com/help/wavelet/gs/image-fusion.html).
The idea behind image fusion is: given 2 images of the same thing but with very different resolution/data, how can we create a single image containing the information of both?
Stitching both images "by hand" does not give very good result generally so there are a big amount of techniques to do it mathematically. Waveletes are very common here.
Generally this techniques are widely used in medical imaging , as (like in your case) different imaging techniques give different information, and doctors want all of them together:
Example (top row: images pasted together, bottom row: image fusion techniques)
Have a look to some papers, some matlab tutorials, and probably you'll get there with the easy-to-use matlab code, without any fancy state of the art programming.
Good luck!

vision.PeopleDetector function in Matlab

Have anyone ever used vision.PeopleDetector function from Computer Vision System Toolbox in Matlab?
I've installed it and tried to apply to images I have.
Although it detects people on the training image, it detects nothing on real photos. Either it doesn't detect people at all or detects people at parts of the image where they are not presented.
Could anyone share the experience of using this function?
Thanks a lot!
Here is a sample image:
The vision.PeopleDetector object does indeed detect upright standing people in images. However, like most computer vision algorithms it is not 100% accurate. Can you post a sample image where it fails?
There are several things you can try to improve performance.
Try changing the ClassificationModel parameter to 'UprightPeople_96x48'. There are two models that come with the object, trained on different data sets.
How big (in pixels) are the people in your image? If you use the default 'UprightPeople_128x64' model, then you will not be able to detect a person smaller than 128x64 pixels. Similarly, for the 'UprightPeople_96x48' model the smallest size person you can detect is 96x48. If the people in your image are smaller than that, you can up-sample the image using imresize.
Try reducing the ClassificationThreshold parameter to get more detections.
Edit:
Some thoughts on your particular image. My guess would be that the people detector is not working well here, because it was not trained on this kind of images. The training sets for both models consist of natural images of pedestrians. Ironically, the fact that your image has a perfectly clean background may be throwing the detector off.
If this image is typical of what you have to deal with, then I have a few suggestions. One possibility is to use simple thresholding to segment out the people. The other is to use vision.CascadeObjectDetector to detect the faces or the upper bodies, which happens to work perfectly on this image:
im = imread('postures.jpg');
detector = vision.CascadeObjectDetector('ClassificationModel', 'UpperBody');
bboxes = step(detector, im);
im2 = insertObjectAnnotation(im, 'rectangle', bboxes, 'person', 'Color', 'red');
imshow(im2);

Scaling different dicom images for feature extraction

I am developing a brain tumor identification algo. I have worked that part out and now want to use neural network to predict the grade of tumor. For feature extraction, I am using different dicom images. Since different dicom image have different skull size, zoom level, etc I am confused how to scale all the images so that when I extract features such as Angular Second Moment, it is correctly scaled for both.
I am adding two dicoms.

Detecting shape from the predefined shape and cropping the background

I have several images of the pugmark with lots of irrevelant background region. I cannot do intensity based algorithms to seperate background from the foreground.
I have tried several methods. one of them is detecting object in Homogeneous Intensity image
but this is not working with rough texture images like
http://img803.imageshack.us/img803/4654/p1030076b.jpg
http://imageshack.us/a/img802/5982/cub1.jpg
http://imageshack.us/a/img42/6530/cub2.jpg
Their could be three possible methods :
1) if i can reduce the roughness factor of the image and obtain the more smoother texture i.e more flat surface.
2) if i could detect the pugmark like shape in these images by defining rough pugmark shape in the database and then removing the background to obtain image like http://i.imgur.com/W0MFYmQ.png
3) if i could detect the regions with depth and separating them from the background based on difference in their depths.
please tell if any of these methods would work and if yes then how to implement them.
I have a hunch that this problem could benefit from using polynomial texture maps.
See here: http://www.hpl.hp.com/research/ptm/
You might want to consider top-down information in the process. See, for example, this work.
Looks like you're close enough from the pugmark, so I think that you should be able to detect pugmarks using Viola Jones algorithm. Maybe a PCA-like algorithm such as Eigenface would work too, even if you're not trying to recognize a particular pugmark it still can be used to tell whether or not there is a pugmark in the image.
Have you tried edge detection on your image ? I guess it should be possible to finetune Canny edge detector thresholds in order to get rid of the noise (if it's not good enough, low pass filter your image first), then do shape recognition on what remains (you would then be in the field of geometric feature learning and structural matching) Viola Jones and possibly PCA-like algorithm would be my first try though.