Different approach to detecting shapes using CNNs (optic disc in a retina image) - neural-network

I'm solving a problem of detecting an optic disc in a retina image. As you can see from the image:
the optic disc is the epicentrum of the blood vessels, has an irregular circular shape and has a brighter color than the rest of the retina.
Now I want to use a convolutional neural network to detect it. I know that typical approaches to detecting something in an image using CNNs (consisting mostly of conv., pooling, dropout and fully connected layers) devide an image into smaller parts, each of them is send to a classifier asking whether there is the object or not.
But I'm thinking about another approach. It'd be a model, which gets a normal RGB image of Height x Width size as input, which goes through several convolution layers so as the size remains the same (Height x Width) but with more channels let's say N. There would be no pooling layers(??), so the final output of the convolution would be of the size Height x Width x N.
In this output there'd be Height x Width feature vectors of the size N, each somehow describing the pixel on this position in the original image and its neighbourhood (??). Now what I'm trying to do here is to take these individual vectors as inputs to a fully connected layer network. Output of this would be some number describing the relative position of the input pixel in respect to the position of the optic disc in the image (maybe its distance, or the position itself, I don't know yet...). The training data consists of an image and the x, y position ot the optic disc.
But I'm not sure about some things about this approach. Can I not to use pooling layers? I thought maybe it wouldn't be transform invariant then, or something like that. I'm also not sure if what I'm doing in the fully connected layers is correct. I don't understand neural networks so well to say that "it is obvious that this should work" or "or this is that case where it is not easy to say and it's worth implementing it to see how it will work" or "this obviously won't work because...". So my question is just: which one of this three cases is this?
And isn't there some "obvious" method for this stuff and I'm just trying to solve something that was already solved? (maybe RNNs or something...)


Training U-Net with Negatives

How to train a U-Net with negative examples?
I trained U-Net with pictures of hands and fingers. The ground truth data are binary masks with white pixels for the foreground object (finger/hand) and black pixels for the background object. Now I want to add negatives, i.e. images without hand/finger. The respective ground truth would then be completely black. However, the dice coefficient is not suitable as a metric or loss function. The reason for this is described here:
" If smooth is set too low, when the ground truth has few to 0 white pixels and the predicted image has some non-zero number of white pixels, the model will be penalized more heavily. Setting smooth higher means if the predicted image has some low amount of white pixels when the ground truth has none, the loss value will be lower. Depending on how aggressive the model needs to be, though, maybe a lower value is good..."
Correct Implementation of Dice Loss in Tensorflow / Keras
My question now is, does anyone have any experience on how best to train a U-Net with negatives?

Caffe | data augmentation by random cropping

I am trying to train my own network on Caffe, similar to Imagenet model. But I am confused with the crop layer. Till the point I understand about crop layer in Imagenet model, during training it will take random 227x227 image crops and train the network. But during testing it will take the center 227x227 image crop, does not we loose the information from image while we crop the center 227x27 image from 256x256 image? And second question, how can we define the number of crops to be taken during training?
And also, I trained the same network(same number of layers, same convolution size FC neurons will differ obviously), first taking 227x227 crop from 256x256 image, and second time taking 255x255 crop from 256x256 image. According to my intuition, the model with 255x255 crop should give me the best result. But I am getting higher accuracy with 227x227 image, can anyone explain me the intuition behind it, or am i doing something wrong?
Your observations are not specific to Caffe.
The sizes of the cropped images during training and testing need to be the same (227x227 in your case), because the upstream network layers (convolutions, etc) need the images to be the same size. Random crops are done during training is because you want data augmentation. However, during testing, you want to test against a standard dataset. Otherwise, the accuracy reported during testing would also depend on a shifting test database.
The crops are made dynamically at each iteration. All images in a training batch are randomly cropped. I hope this answers your second question.
Your intuition is not complete: With a bigger crop (227x227), you have more data augmentation. Data augmentation essentially creates "new" training samples out of nothing. This is vital to prevent overfitting during training. With a smaller crop (255x255), you should expect a better training accuracy but lower test accuracy, since the data is more likely be overfitted.
Of course, cropping can be overdone. Too much cropping and you lose too much information from an image. For image categorization, the ideal crop size is one that does not alter the category of an image, (ie, only background is cropped away).

use scale space representation to filter one image

Currently I hope to use scale space representation to filter one image. Features in one image can be filtered using an Gaussian smooth filter with one optimal sigma. It means different features in one image can be expressed best in different scale under scale space representation.
For example, I have one image with one tree in it. In the scale space representation, three sigma values are used and they are represented as sigma0, sigma1 and sigma2. The ground is best expressed in the smoothed image with sigma0 because it contains textures mainly. The branches are best expressed in the smoother image with sigma1 and the trunk is with the smoother image with sigma2. If I hope to filter the image, I hope that the filtered pixels for the group is from the smoothed image with sigma0.
The filtered pixels for the branches are from the smoothed image with sigma1. The filtered pixels for the trunk are from the smoothed image with sigma2.
It requires that I need to determine in which smoothed image one pixel is expressed best. Is this idea plausible?
I am trying to use differece-of-Gaussian of two successive smoothed images to perform the above task. Is there any other way to combine the three smoothed image?
I use Matlab to implement the idea. The values of the three sigmas is 1.0, 2.0 and 3.0. The corresponding size of Gaussian kernel is 3, 5 and 7. I use the function fspecial to generate the kernel. Are the parameter reasonable? Please share your experience with the scale space representation to help me. You can provide some links to useful papers.
your idea is very much plausible! You are just one step away from it. I did something very similar once and it looked like this:
After smoothing your images and extracting the edges for each smoothing step (I used a weighted [to compensate for maxima supression after Gauss filtering] Sobel filter for this since DOG was not quite stable for my aplication), you can proyect (and normalize) your whole stack of edge images into a single image ("cummulative edges") which will contain the characteristic edges. You can then compare the cummulative edges image (using cross-correlation or whatever you wish) with every single image in your edge stack, the biggest value of this comparation is then the smooth-scale in which the pixel is expressed the best.
Hope that makes sense for you after reading it a couple of times.
Also don't be afraid of using much bigger kernel sizes, while it all depends on your application, I ended up using things of 51 and bigger!!! (was working with 40MP images though...)
T. Lindeberg has literally dozens of papers related to this problem. I found this one the most useful, but since you are already in the right track, I don't think reading the 50 pages will make you that much smarter. The most important part of it is maybe this one:
Principle for scale selection:
In the absence of other evidence, assume that a scale level, at which some
(possibly non-linear) combination of normalized derivatives assumes a
local maximum over scales, can be treated as reflecting a characteristic
length of a corresponding structure in the data.

Specifications of Checkerboard (Calibration) for obtaining maximum accuracy in stereo reconstruction

I have to reconstruct an object which will be placed around 1 meter to 1.5 meters away from the baseline of my stereo setup. The image captured by both cameras have high resolution (10 MP)
The accuracy with which I have to detect it's position is +/- 0.5mm, in all the three co-ordinate axes. (If you require more details, please let me know)
For these, what should the optimal specifications of my checkerboard (for calibration) be?
I only know that it should be an asymmetric board. It should be placed in the same distance range as the range where object is expected to be placed. Also, it should be oriented in all possible angles (making sure all corners are seen by both cameras)
What about:
Number of squares horizontally and vertically? (also, on which side should the squares be more / even?)
Dimension of each square on checkerboard?
What effect does the baseline distance have on this?
Do these parameters of the checkerboard affect my accuracy in anyway? Are there any other parameters I need to consider for calibration?
I am using the MATLAB Stereo Calibrator App.
I will try to answer as good as I can:
Numbers of squares. Well, as you can guess, the more squares (actually corners between squares are used!) the better the result will be, as you have a more overdetermined system of equations to solve. Additionally, it doesnt matter the size of the chequerboard, only the odd/even pattern matters.
Dimensions of squares. the size does not matter very much in "mathematical" reresentation, but it matters practically. If your squares are very small, probably your printer wont draw a that good corner of the square and that will make your data "noisier". In the past, for really small calibration system I needed to go to an specialised printing shop so they could print it with the maximum quality possible. Of course if you make them very big you wont be able to fit lost of them in the iage which is not good.
The baseline distance has effect only in how properly can you see the corners between squares. The more accurate (in mm!, real distance!) you are detecting this corners the better. Obviously if you make small squares and put them very far, well, you wont see very much. This fits with the 1,2 question. Additionally, another problem you may have is focal length. In a application I worked on, some really small and close things wanted to be imaged. That was a problem while calibrating, as the amount if z distance I could see without blur was around 2mm. This really crippled my ability to calibrate properly because I could big angles in Z direction without getting blurred corners.
TL;DR: You want to have lots of corners between squares of the chequerboard but you want to see them as precisely as possible.

Remove paper texture pattern from a photograph

I've scanned an old photo with paper texture pattern and I would like to remove the texture as much as possible without lowering the image quality. Is there a way, probably using Image Processing toolbox in MATLAB?
I've tried to apply FFT transformation (using Photoshop plugin), but I couldn't find any clear white spots to be paint over. Probably the pattern is not so regular for this method?
You can see the sample below. If you need the full image I can upload it somewhere.
Unfortunately, you're pretty much stuck in the spatial domain, as the pattern isn't really repetitive enough for Fourier analysis to be of use.
As #Jonas and #michid have pointed out, filtering will help you with a problem like this. With filtering, you face a trade-off between the amount of detail you want to keep and the amount of noise (or unwanted image components) you want to remove. For example, the median filter used by #Jonas removes the paper texture completely (even the round scratch near the bottom edge of the image) but it also removes all texture within the eyes, hair, face and background (although we don't really care about the background so much, it's the foreground that matters). You'll also see a slight decrease in image contrast, which is usually undesirable. This gives the image an artificial look.
Here's how I would handle this problem:
Detect the paper texture pattern:
Apply Gaussian blur to the image (use a large kernel to make sure that all the paper texture information is destroyed
Calculate the image difference between the blurred and original images
EDIT 2 Apply Gaussian blur to the difference image (use a small 3x3 kernel)
Threshold the above pattern using an empirically-determined threshold. This yields a binary image that can be used as a mask.
Use median filtering (as mentioned by #Jonas) to replace only the parts of the image that correspond to the paper pattern.
Paper texture pattern (before thresholding):
You want as little actual image information to be present in the above image. You'll see that you can very faintly make out the edge of the face (this isn't good, but it's the best I have time for). You also want this paper texture image to be as even as possible (so that thresholding gives equal results across the image). Again, the right hand side of the image above is slightly darker, meaning that thresholding it well will be difficult.
Final image:
The result isn't perfect, but it has completely removed the highly-visible paper texture pattern while preserving more high-frequency content than the simpler filtering approaches.
The filled-in areas are typically plain-colored and thus stand out a bit if you look at the image very closely. You could also try adding some low-strength zero-mean Gaussian noise to the filled-in areas to make them look more realistic. You'd have to pick the noise variance to match the background. Determining it empirically may be good enough.
Here's the processed image with the noise added:
Note that the parts where the paper pattern was removed are more difficult to see because the added Gaussian noise is masking them. I used the same Gaussian distribution for the entire image but if you want to be more sophisticated you can use different distributions for the face, background, etc.
A median filter can help you a bit:
img = imread('http://i.stack.imgur.com/JzJMS.jpg');
%# convert rgb to grayscale
img = rgb2gray(img);
%# apply median filter
fimg = medfilt2(img,[15 15]);
%# show
Note that you may want to pad the image first to avoid edge effects.
EDIT: A smaller filter kernel than [15 15] will preserve image texture better, but will leave more visible traces of the filtering.
Well i have tried out a different approach using Anisotropc diffusion using the 2nd coefficient that operates on wider areas
Here is the output i got:
From what i can See from the Picture, the Noise has a relatively high Frequency Compared to the image itself. So applying a low Pass filter should work. Have a look at the Power spectrum abs(fft(...)) to determine the cutoff Frequency.