What is the relationship between instance segmentation and semantic segmentation from the perspective of neural networks? - semantic-segmentation

I am clear with the tasks of instance segmentation and semantic segmentation. However, from the perspective of the neural networks, what is the relationship between them? Namely, is it feasible to realize instance segmentation by improving or modifying a neural network for semantic segmentation, e.g. DeepLab? If so, what operations are usually used? Many thanks.

Lets assume that you want to know where exist desired class in image then you makes nn that for each pixel predict probability where is that class - this is sematic segmentation.
And when you want to know where exist each instance for desired class this is instance segmentation.picture for example

Related

How do self driving cars using vision detection systems handle the n possibilities as inputs

I understand Convolutional neural networks can be used to fix this problem, but if you look at videos of self driving cars, like tesla autopilot, they still use vision detection and labeling systems as input for their neural networks. I am wondering how the self driving cars fix the problem of having N possible number of detection objects and for each of the inputs there are a varing number of information to input about them. As a neural network structure is very rigid, I would imagine that this would cause a problem. Any explanation would be greatly helpful; however, if you do have a scientific paper that would be very appreciated!
These networks do not output a class label such as car, person or sidewalk, rather a probability distribution over N objects. The final decision is later made, basically taking the highest rated object in terms of probability as the prediction. The model is trained on lots of images and as you said all of these images contain a varying numbers of objects but since the model itself output probabilities for all N objects regardless of the number of objects in the input, this is already something that model is trained for. So they learn to output probabilities close to 0 for objects types if they are not extant in the image.
Since this is something that they are trained for they can also do it during the inference. Of course, some problems might occur if certain object type is very rare in the data but this is a class imbalance issue.

How to evaluate a quality of neural network for object detection in Keras?

I've already trained the neural network in Keras for detecting two classes of images (cats and dogs) and got accuracy on test data. Is it enough for the conclusion in the master thesis or should I do other actions for evaluating the quality of network (for instance, cross-validation)?
Not really, I would expect more than just accuracy from my students in any classification setup. Accuracy only evaluates that particular network on that particular test set but you would have to some extent justify the design choices you've made in building that network. Here are some things to consider:
Presumably you have some hyper-parameters you've fixed, you can investigate how these affect your results. How many filters? How many layers? and most importantly why?
An important aspect of object classification is how your model handles noise. Depending on your dataset, one simple way would be to pre-process the test data, blur it, invert colours etc and you'll see that your performance will drop. Why does it do that? How does the confusion matrix look like then?
What is the performance of the network? Is it fast, slow compared to another system, say VGG?
When you evaluate your project in general not just the network, asking why things worked helps a lot, not just why things didn't work.

Why we need CNN for the Object Detection?

I want to ask one general question that nowadays Deep learning specially Convolutional Neural Network (CNN) has been used in every field. Sometimes it is not necessary to use CNN for the problem but the researchers are using and following the trend.
So for the Object Detection problem, is it a kind of problem where CNN is really needed to solve the detection problem?
That is unhappy question. In title you ask about CNN, but you ask about deep learning in general.
So we don't necessary need deep learning for object recognition. But trained deep networks gets better results. Companies like Google and others are thankful for every % of better results.
About CNN, they gets better results than "traditional" ANN and also have less parameters because of weights sharing. CNN also allow transfer learning(you take a feature detector- convolution and pooling layers and than you connect on feature detector yours full connected layers).
A key concept of CNN's is the idea of translational invariance. In short, using a convolutional kernel on an image allows the machine to learn a set of weights for a specific feature (an edge, or a much more detailed object, depending on the layering of the network) and apply it across the entire image.
Consider detecting a cat in an image. If we designed some set of weights that allowed the learner to recognize a cat, we would like those weights to be the same no matter where the cat is in the image! So we would "assign" a layer in the convolutional kernel to detecting cats, and then convolve over the entire image.
Whatever the reason for the recent successes of CNN's, it should be noted that regular fully-connected ANN's should perform just as well. The problem is that they quickly become computationally infeasible on larger images, whereas CNN's are much more efficient due to parameter sharing.

Neural network: "InverseLayer"

I play around with neural networks. I understand how convolutional layers, fully connected layers and many other things work. I also know what a gradient is and how such a network is trained.
The framework lasagne contains a layer called InverseLayer.
The InverseLayer class performs inverse operations for a single layer of a neural network by applying the partial derivative of the layer to be inverted with respect to its input.
I do not know what this means or when i should use this layer in general. Or what is the idea behind of inverting the partial derivative?
Thank you very much
The InversLayer is needed when creating the Deconvolution Network. For more details take a look here: http://cvlab.postech.ac.kr/research/deconvnet/

How do I use a pre-trained Caffe model?

I have some questions about how to actually interact with a pre-trained Caffe model. In my case I'm using a model for scene recognition.
In the caffe git repository, there are some code examples in Python and C++ on the implementations of Image Classifiers. However, those do not apply to my use case (since they only classify the input image as ONE class).
My goal is an application that takes an input image (jpg) and outputs the highest predicted class label for each pixel in the input image (e.i., indices for sky, beach, road, car).
Could anyone give me some pointers on how to proceed?
There already seem to exist implementations for this. This demo (http://places.csail.mit.edu/demo.html) is kind of what I what.
Thank you!
What you are looking for is not image classification, but rather semantic segmentation.
A recent work, by Jonathan Long, Evan Shelhamer and Trevor Darrell is based on Caffe, and can be found here. It uses fully convolutional network, that is, a network with no "InnerProduct" layers only convolutional layers, thus capable of producing outputs with different sizes for different sizes of inputs.