How to train a CNN to differentiate between HTML objects(foreground) and the background in a screenshot of a webpage using caffe? - neural-network

I am working on a problem wherein I am trying to train a neural network to detect various html objects like textbox, radiobutton, button and dropdown list in a given snapshot of a webpage. I am supplying patches generated out of sliding window operation on the 1500 images (Training set) to my CNN for training.The label set is a 5 channel matrix for 5 classes of objects (including the background i.e. labelled as class 0, other object regions are labelled as class 1,2,..4).
I tried applying Con-Decon Architecture on this training data set using Caffe. But the problem that is happening IMHO is that there is a strong bias towards class 0 in the actual output as most of the region in my sliding window training is the background. Hence, it is classifying all the pixels in the actual output as Class 0 that is background and is unable to detect other HTML objects of Class labels 1,2,..4 on the test images that I supply to the network.
Any idea how to work around this problem?

This problem is present in a lot of real world datasets as well.
One way of solving it would be to present your non-background data (classes 1,2,3...) to the neural network more times than you present the background data. This can be done by artificially duplicating the data of which you have fewer samples.

You could also set:
ignore_label: 0
That might help.

Related

If the size of input image is different from the size of images used for training, does that impact the end segmentation/accuracy?

I am doing a project for uni where i am detecting an object with U-net and then calculating the width of the object. I trained my U-net on images of size 300x300. Now i got to a point where i want to improve the accuracy of the width measurement, and for that reason i want to input images of larger size(600x600 lets say) into the model. Does this difference in size(training on 300x300, and using on 600x600) impact the overall segmentation quality?
I'm guessing it does but am not sure.

Feed multiple images to CoreML image classification model (swift)

I know how to use the CoreML library to train a model and use it. However, I was wondering if it's possible to feed the model more than one image in order for it to identify it with better accuracy.
The reason for this is because i'm a trying to build an app that classifies histological slides, however, many of them look quite similar, so I thought maybe I could feed the model images at different magnifications in order to make the identification. Is it possible?
Thank you,
Mehdi
Yes, this is a common technique. You can give Core ML the images at different scales or use different crops from the same larger image.
A typical approach is to take 4 corner crops and 1 center crop, and also horizontally flip these, so you have 10 images total. Then feed these to Core ML as a batch. (Maybe in your case it makes sense to also vertically flip the crops.)
To get the final prediction, take the average of the predicted probabilities for all images.
Note that in order to use images at different sizes, the model must be configured to support "size flexibility". And it must also be trained on images of different sizes to get good results.

Biomedical Image Segmentation

In brain tumor segmentation,can I consider Images and Labels as color images?
Or images can have 3 channels but Ground Truth/ Mask/ Label must be in 1 channel. Or both must be of 1 channel?? As I have used both (images & GT) of 3 channels for UNET architecture, and giving me output as blank colored image. Why output is so?
There is no necessary to use colored images to perform biomedical image segmentation. The value of CT/MR image has a specific meaning, which denotes different lesions such as bones or vessels.
If you use 3 channels, I don't know whether the value still has the same meaning or not. Also, I do not recommend you take the GT as 3 channels image, because the voxel value denotes different classes. In your case, maybe 1-n for different kinds of tumors, 0 for background.
Thus, 3 channels representation will lose some semantic information, make the problem more complex.

Fully Convolution Networks with Varied inputs

I have a fully convolutional neural network, U-Net, which can be read below.
https://arxiv.org/pdf/1505.04597.pdf
I want to use it to do pixelwise classification of images. I have my training images available in two sizes: 512x512 and 768x768. I am using reflection padding of size (256,256,256,256) in the former in the initial step, and (384,384,384,384) in the latter. I do successive padding before convolutions, to get output of the size of input.
But since my padding is dependant on the image/input's size, I can't build a generalised model (I am using Torch).
How is the padding done in such cases?
I am new to deep learning, any help would be great. Thanks.
Your model will only accept images of the size of the first layer. You have to pre-process all of them before forwarding them to the network. In order to do so, you can use:
image.scale(img, width, height, 'bilinear')
img will be the image to scale, width and heightthe size of the first layer of your model (if I'm not mistaken it is 572*572), 'bilinear' is the algorithm it is going to use to scale the image.
Keep in mind that it might be necessary to extract the mean of the image or to change it to BGR (depending on how the model was trained).
The first thing to do is to process all of your images to be the same size. The CONV layer input requires all images to be of the specified dimensions.
Caffe allows you a reshape within the prototxt file; in Torch, I think there's a comparable command you can drop at the front of createModel, but I don't recall the command name. If not, then you'll need to do it outside the model flow.

Perl - Ratio of homogeneous areas of an image

I would like to check whether an image has a lot of homogeneous areas. Therefore I would like to get some kind of value of an image that declares a ratio for images depending on the amount/size of homogeneous areas (e.g. that value could have a range from 0 to 5).
Instead of a value there could be some kind of classification as well.
[many homogeneous areas -> value/class 5 ; few homogeneous areas -> value/class 0]
I would like to do that in perl. Is there a package/function or something like that?
What you want seems to be an area of image processing research which I am not familiar with. However, GraphicsMagick's mogrify utility has a -segment option:
Use -segment to segment an image by analyzing the histograms of the color components and identifying units that are homogeneous with the fuzzy c-means technique. The scale-space filter analyzes the histograms of the three color components of the image and identifies a set of classes. The extents of each class is used to coarsely segment the image with thresholding. The color associated with each class is determined by the mean color of all pixels within the extents of a particular class. Finally, any unclassified pixels are assigned to the closest class with the fuzzy c-means technique.
I don't know if this is any use to you. You might have to hit the library on this one, and read some research. You do have access to this through PerlMagick as well. However, it does not look like it gives access to the internals, but just produces an image based on parameters.
In my tests (without really understanding what the parameters do), photos turned entirely black, whereas PNG images with large areas of similar colors were reduced to a sort of an average color. Whether you can use that fact to develop a measure is an open question I am not going to investigate ;-)