I have a fully convolutional neural network, U-Net, which can be read below.
https://arxiv.org/pdf/1505.04597.pdf
I want to use it to do pixelwise classification of images. I have my training images available in two sizes: 512x512 and 768x768. I am using reflection padding of size (256,256,256,256) in the former in the initial step, and (384,384,384,384) in the latter. I do successive padding before convolutions, to get output of the size of input.
But since my padding is dependant on the image/input's size, I can't build a generalised model (I am using Torch).
How is the padding done in such cases?
I am new to deep learning, any help would be great. Thanks.
Your model will only accept images of the size of the first layer. You have to pre-process all of them before forwarding them to the network. In order to do so, you can use:
image.scale(img, width, height, 'bilinear')
img will be the image to scale, width and heightthe size of the first layer of your model (if I'm not mistaken it is 572*572), 'bilinear' is the algorithm it is going to use to scale the image.
Keep in mind that it might be necessary to extract the mean of the image or to change it to BGR (depending on how the model was trained).
The first thing to do is to process all of your images to be the same size. The CONV layer input requires all images to be of the specified dimensions.
Caffe allows you a reshape within the prototxt file; in Torch, I think there's a comparable command you can drop at the front of createModel, but I don't recall the command name. If not, then you'll need to do it outside the model flow.
Related
I am doing a project for uni where i am detecting an object with U-net and then calculating the width of the object. I trained my U-net on images of size 300x300. Now i got to a point where i want to improve the accuracy of the width measurement, and for that reason i want to input images of larger size(600x600 lets say) into the model. Does this difference in size(training on 300x300, and using on 600x600) impact the overall segmentation quality?
I'm guessing it does but am not sure.
I have style transfer model which is trained by pytorch and converted by onnx to mlmodel. The style-image was 1500x2000. By using coremltools I set two sizes: 256x256 and 1500x2000.
Now I can pass two image sizes to prediction process. Here are results:
On the left side it is 1500x2000 image, and on the right side is 256x256 (scaled up after processing)
Is it possible to pass big image but have bigger size of brushstrokes as you can see on image on the right? So I want to keep image size and quality (1500x2000) but change the size of style(brushstrokes). Or it is not possible and it is totally depend of image-style size I was using to train model.
I know how to use the CoreML library to train a model and use it. However, I was wondering if it's possible to feed the model more than one image in order for it to identify it with better accuracy.
The reason for this is because i'm a trying to build an app that classifies histological slides, however, many of them look quite similar, so I thought maybe I could feed the model images at different magnifications in order to make the identification. Is it possible?
Thank you,
Mehdi
Yes, this is a common technique. You can give Core ML the images at different scales or use different crops from the same larger image.
A typical approach is to take 4 corner crops and 1 center crop, and also horizontally flip these, so you have 10 images total. Then feed these to Core ML as a batch. (Maybe in your case it makes sense to also vertically flip the crops.)
To get the final prediction, take the average of the predicted probabilities for all images.
Note that in order to use images at different sizes, the model must be configured to support "size flexibility". And it must also be trained on images of different sizes to get good results.
I have already trained the FCN model with fixed size images 256x256. Could I ask from experts how can I train the same model once the size of image are changing from one image to another image?
I really appreciate your advice.
Thanks
You can choose one of these strategies:
1. Batch = 1 image
By training each image as a different batch, you can reshape the net in the forward() (rather than in reshape()) of the data layer, thus changing the net at each iteration.
+write reshape once in forward method and you no longer need to worry about input shapes and sizes.
-reshapeing the net often requires allocation/deallocation of CPU/GPU memory and therefore it takes time.
-You might find a single image in a batch to be too small of a batch.
For example (assuming you are using a "Python" layer for input):
def reshape(self, bottom, top):
pass # you do not reshape here.
def forward(self, bottom, top):
top[0].data.reshape( ... ) # reshape the blob - this will propagate the reshape to the rest of the net at each iteration
top[1].data.reshape( ... ) #
# feed the data to the net
top[0].data[...] = current_img
top[1].data[...] = current_label
2. Random crops
You can decide on a fixed input size and then randomly crop all input images (and the corresponding ground truths).
+No need to reshape every iteration (faster).
+Control over model size during train.
-Need to implement random crops for images and labels
3. Fixed size
Resize all images to the same size (like in SSD).
+Simple
-Images are distorted if not all images have the same aspect ratio.
-You are no invariant to scale
I would like to check whether an image has a lot of homogeneous areas. Therefore I would like to get some kind of value of an image that declares a ratio for images depending on the amount/size of homogeneous areas (e.g. that value could have a range from 0 to 5).
Instead of a value there could be some kind of classification as well.
[many homogeneous areas -> value/class 5 ; few homogeneous areas -> value/class 0]
I would like to do that in perl. Is there a package/function or something like that?
What you want seems to be an area of image processing research which I am not familiar with. However, GraphicsMagick's mogrify utility has a -segment option:
Use -segment to segment an image by analyzing the histograms of the color components and identifying units that are homogeneous with the fuzzy c-means technique. The scale-space filter analyzes the histograms of the three color components of the image and identifies a set of classes. The extents of each class is used to coarsely segment the image with thresholding. The color associated with each class is determined by the mean color of all pixels within the extents of a particular class. Finally, any unclassified pixels are assigned to the closest class with the fuzzy c-means technique.
I don't know if this is any use to you. You might have to hit the library on this one, and read some research. You do have access to this through PerlMagick as well. However, it does not look like it gives access to the internals, but just produces an image based on parameters.
In my tests (without really understanding what the parameters do), photos turned entirely black, whereas PNG images with large areas of similar colors were reduced to a sort of an average color. Whether you can use that fact to develop a measure is an open question I am not going to investigate ;-)