Feed multiple images to CoreML image classification model (swift) - classification

I know how to use the CoreML library to train a model and use it. However, I was wondering if it's possible to feed the model more than one image in order for it to identify it with better accuracy.
The reason for this is because i'm a trying to build an app that classifies histological slides, however, many of them look quite similar, so I thought maybe I could feed the model images at different magnifications in order to make the identification. Is it possible?
Thank you,
Mehdi

Yes, this is a common technique. You can give Core ML the images at different scales or use different crops from the same larger image.
A typical approach is to take 4 corner crops and 1 center crop, and also horizontally flip these, so you have 10 images total. Then feed these to Core ML as a batch. (Maybe in your case it makes sense to also vertically flip the crops.)
To get the final prediction, take the average of the predicted probabilities for all images.
Note that in order to use images at different sizes, the model must be configured to support "size flexibility". And it must also be trained on images of different sizes to get good results.

Related

Creating custom image filter for flutter application

I kind of reverse engineered an image filter. Actually it was just a pixel by pixel operation so I applied it over different images and by comparing each pixel of original image and filtered image (using PIL) I now know what RGB value from original image became what RGB in the filtered image. For example like RGB(0,0,1) became RGB(2,3,81) (suppose) etc and I know this for all 16,777,216 colors.
If what I did is correct then my question is how can I create a filter out of this data that can be used in flutter apps. One option is to use conditional statements but that's just theoretical as I'll have to manually write 16,777,216 statements just for this one filter. So is there any software or program or code or anything else that I can use to create a filter out of this data that can be used in flutter apps. This is important because ultimately I want to use this filter in my flutter app.
Any help would be much appreciated.
Thank you very much.

How to train a CNN to differentiate between HTML objects(foreground) and the background in a screenshot of a webpage using caffe?

I am working on a problem wherein I am trying to train a neural network to detect various html objects like textbox, radiobutton, button and dropdown list in a given snapshot of a webpage. I am supplying patches generated out of sliding window operation on the 1500 images (Training set) to my CNN for training.The label set is a 5 channel matrix for 5 classes of objects (including the background i.e. labelled as class 0, other object regions are labelled as class 1,2,..4).
I tried applying Con-Decon Architecture on this training data set using Caffe. But the problem that is happening IMHO is that there is a strong bias towards class 0 in the actual output as most of the region in my sliding window training is the background. Hence, it is classifying all the pixels in the actual output as Class 0 that is background and is unable to detect other HTML objects of Class labels 1,2,..4 on the test images that I supply to the network.
Any idea how to work around this problem?
This problem is present in a lot of real world datasets as well.
One way of solving it would be to present your non-background data (classes 1,2,3...) to the neural network more times than you present the background data. This can be done by artificially duplicating the data of which you have fewer samples.
You could also set:
ignore_label: 0
That might help.

Fully Convolution Networks with Varied inputs

I have a fully convolutional neural network, U-Net, which can be read below.
https://arxiv.org/pdf/1505.04597.pdf
I want to use it to do pixelwise classification of images. I have my training images available in two sizes: 512x512 and 768x768. I am using reflection padding of size (256,256,256,256) in the former in the initial step, and (384,384,384,384) in the latter. I do successive padding before convolutions, to get output of the size of input.
But since my padding is dependant on the image/input's size, I can't build a generalised model (I am using Torch).
How is the padding done in such cases?
I am new to deep learning, any help would be great. Thanks.
Your model will only accept images of the size of the first layer. You have to pre-process all of them before forwarding them to the network. In order to do so, you can use:
image.scale(img, width, height, 'bilinear')
img will be the image to scale, width and heightthe size of the first layer of your model (if I'm not mistaken it is 572*572), 'bilinear' is the algorithm it is going to use to scale the image.
Keep in mind that it might be necessary to extract the mean of the image or to change it to BGR (depending on how the model was trained).
The first thing to do is to process all of your images to be the same size. The CONV layer input requires all images to be of the specified dimensions.
Caffe allows you a reshape within the prototxt file; in Torch, I think there's a comparable command you can drop at the front of createModel, but I don't recall the command name. If not, then you'll need to do it outside the model flow.

Perl - Ratio of homogeneous areas of an image

I would like to check whether an image has a lot of homogeneous areas. Therefore I would like to get some kind of value of an image that declares a ratio for images depending on the amount/size of homogeneous areas (e.g. that value could have a range from 0 to 5).
Instead of a value there could be some kind of classification as well.
[many homogeneous areas -> value/class 5 ; few homogeneous areas -> value/class 0]
I would like to do that in perl. Is there a package/function or something like that?
What you want seems to be an area of image processing research which I am not familiar with. However, GraphicsMagick's mogrify utility has a -segment option:
Use -segment to segment an image by analyzing the histograms of the color components and identifying units that are homogeneous with the fuzzy c-means technique. The scale-space filter analyzes the histograms of the three color components of the image and identifies a set of classes. The extents of each class is used to coarsely segment the image with thresholding. The color associated with each class is determined by the mean color of all pixels within the extents of a particular class. Finally, any unclassified pixels are assigned to the closest class with the fuzzy c-means technique.
I don't know if this is any use to you. You might have to hit the library on this one, and read some research. You do have access to this through PerlMagick as well. However, it does not look like it gives access to the internals, but just produces an image based on parameters.
In my tests (without really understanding what the parameters do), photos turned entirely black, whereas PNG images with large areas of similar colors were reduced to a sort of an average color. Whether you can use that fact to develop a measure is an open question I am not going to investigate ;-)

Algorithm for laying out images of different sizes in a grid-like way

I'm trying to lay out images in a grid, with a few featured ones being 4x as big.
I'm sure it's a well known layout algorithm, but i don't know what it is called.
The effect I'm looking for is similar to the screenshot shown below. Can anyone point me in the right direction?
UPDATED
To be more specific, lets limit it to the case of there being only the two sizes shown in the example. There can be an infinite number of items, with a set margin between them. Hope that clarifies things.
There is a well-known layout algorithm called treemapping, which is perhaps a bit too generic for your specific problem with some images being 4x as big, but could still be applicable particularly if you decide you want to have arbitrary sizes.
There are several different rectangular treemap algorithms, any of which could be used to visualise photos. Here is a nice example, which uses the strip algorithm to lay out photos with each size proportional to the rating of the photo.
This problem can be solved with a heatmap or a treemap. Heatmaps often use space-filling-curves. A heatmap reduces the 2d complexity to a 1d complexity. A heatmap looks like a quadtree. You want to look for Nick's hilbert curve quadtree spatial index blog.