Medical image segmentation with multi-mudality learning - image-segmentation

I want to diagnose the disease with three x-rays of a patient by segmentation. I read some papers so I found there is a method that is called multi-modality and it has three ways early fusion, layer fusion and late fusion, that layer fusion has better performance than other ways. But I want to diagnose two types of lesions on two images, and the same and additional lesions on the other image. Despite my goal is the algorithm to consider three images belonging to a patient, what is the appropriate method for this?
Ps: One image is total and two others are composed of parts of the total image.
Two images are the same type and another image is different and this is another challenge of mine how can I handle their differences for training?
Thank you.

Related

Would the division of classes into subclasses increase the predictive accuracy of image classification?

If images in a class differ in image classification, should the class be further subdivided into subclasses in which the images are more similar, or is that not necessary because different features can be learned in image classification?
For example, images should be classified into 3 classes: flowers, cars, and chairs. Is it then okay to pack all the different flowers in one flower class, or would it be better to further subdivide the class into daisies, tulips, dahlias, asters, and so on. If an aster is found, then I also know that it is a flower.
It depends on your problem.
What do you need to predict? If you need only the information whether the input is a flower, a car, or a chair, then there is no need to subdivide it.
However, it is always a good idea to do experiments and see the results.
Train different models using subdivided classes and some with the target as-is and compare the performance.

Siamese networks: Why does the network to be duplicated?

The DeepFace paper from Facebook uses a Siamese network to learn a metric. They say that the DNN that extracts the 4096 dimensional face embedding has to be duplicated in a Siamese network, but both duplicates share weights. But if they share weights, every update to one of them will also change the other. So why do we need to duplicate them?
Why can't we just apply one DNN to two faces and then do backpropagation using the metric loss? Do they maybe mean this and just talk about duplicated networks for "better" understanding?
Quote from the paper:
We have also tested an end-to-end metric learning ap-
proach, known as Siamese network [8]: once learned, the
face recognition network (without the top layer) is repli-
cated twice (one for each input image) and the features are
used to directly predict whether the two input images be-
long to the same person. This is accomplished by: a) taking
the absolute difference between the features, followed by b)
a top fully connected layer that maps into a single logistic
unit (same/not same). The network has roughly the same
number of parameters as the original one, since much of it
is shared between the two replicas, but requires twice the
computation. Notice that in order to prevent overfitting on
the face verification task, we enable training for only the
two topmost layers.
Paper: https://research.fb.com/wp-content/uploads/2016/11/deepface-closing-the-gap-to-human-level-performance-in-face-verification.pdf
The short answer is that yes, I think that looking at the architecture of the network will help you understand what is going on. You have two networks that are "joined at the hip" i.e. sharing weights. That's what makes it a "Siamese network". The trick is that you want the two images you feed into the network to pass through the same embedding function. So to ensure that this happens both branches of the network need to share weights.
Then we combine the two embeddings into a metric loss (called "contrastive loss" in the image below). And we can back-propagate as normal, we just have two input branches available so that we can feed in two images at a time.
I think a picture is worth a thousand words. So check out how a Siamese network is constructed (at least conceptually) below.
The gradients depend on the activation values. So for each branch gradients will be different and final update could be based on some averaging to share the weights

Generating Images From Dataset Of Images Using A Neural Network

I'm not looking for a chunk of code as a solution, just the name of the model I'd need to implement or some links would be nice.
My problem is I have a dataset I've made of a few hundred 128x128 images (abstract paintings) - I'd like to simply generate more images similar to these images using a neural network (preferably no input needed for the network, except maybe random values?), but it's unclear as to how I'd go about this.
One solution I've thought about but haven't tried out yet is making an LSTM neural network, turning the paintings into 1D arrays of pixel values, and feeding the arrays to the network (LSTM networks are real good at learning sequences) - but if I'd want to work with larger images, this might not be very practical.
Any info is greatly appreciated. Thanks!
GANs (generative adversarial networks) would be appropriate in this case. GANs rely on two separate neural networks and, when properly trained, can be used to generate new images (a process known as hallucinating) that are similar to a collection of known images.
there are many examples of using GANs to generate new images of numbers from the canonical mnist dataset. naturally, you can replace mnist with your abstract paintings.

Face Recognition based on Deep Learning (Siamese Architecture)

I want to use pre-trained model for the face identification. I try to use Siamese architecture which requires a few number of images. Could you give me any trained model which I can change for the Siamese architecture? How can I change the network model which I can put two images to find their similarities (I do not want to create image based on the tutorial here)? I only want to use the system for real time application. Do you have any recommendations?
I suppose you can use this model, described in Xiang Wu, Ran He, Zhenan Sun, Tieniu Tan A Light CNN for Deep Face Representation with Noisy Labels (arXiv 2015) as a a strating point for your experiments.
As for the Siamese network, what you are trying to earn is a mapping from a face image into some high dimensional vector space, in which distances between points reflects (dis)similarity between faces.
To do so, you only need one network that gets a face as an input and produce a high-dim vector as an output.
However, to train this single network using the Siamese approach, you are going to duplicate it: creating two instances of the same net (you need to explicitly link the weights of the two copies). During training you are going to provide pairs of faces to the nets: one to each copy, then the single loss layer on top of the two copies can compare the high-dimensional vectors representing the two faces and compute a loss according to a "same/not same" label associated with this pair.
Hence, you only need the duplication for the training. In test time ('deploy') you are going to have a single net providing you with a semantically meaningful high dimensional representation of faces.
For a more advance Siamese architecture and loss see this thread.
On the other hand, you might want to consider the approach described in Oren Tadmor, Yonatan Wexler, Tal Rosenwein, Shai Shalev-Shwartz, Amnon Shashua Learning a Metric Embedding for Face Recognition using the Multibatch Method (arXiv 2016). This approach is more efficient and easy to implement than pair-wise losses over image pairs.

Convolutional Neural Network for time-dependent features

I need to do dimensionality reduction from a series of images. More specifically, each image is a snapshot of a ball moving and the optimal features would be its position and velocity. As far as I know, CNN are the state-of-the-art for reducing the features for image classification, but in that case only a single frame is provided. Is it possible to extract also time-dependent features given many images at different time steps? Otherwise which is the state-of-the-art techniques for doing so?
It's the first time I use CNN and I would also appreciate any reference or any other suggestion.
If you want to be able to have the network somehow recognize a progression which is time dependent, you should probably look into recurrent neural nets (RNN). Since you would be operating on video, you should look into recurrent convolutional neural nets (RCNN) such as in: http://jmlr.org/proceedings/papers/v32/pinheiro14.pdf
Recurrence adds some memory of a previous state of the input data. See this good explanation by Karpathy: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
In your case you need to have the recurrence across multiple images instead of just within one image. It would seem like the first problem you need to solve is the image segmentation problem (being able to pick the ball out of the rest of the image) and the first paper linked above deals with segmentation. (then again, maybe you're trying to take advantage of the movement in order to identify the moving object?)
Here's another thought: perhaps you could only look at differences between sequential frames and use that as your input data to your convnet? The input "image" would then show where the moving object was in the previous frame and where it is in the current one. Larger differences would indicate larger amounts of movement. That would probably have a similar effect to using a recurrent network.