back-propagation of multiple top layers caffe - neural-network

I am new to caffe. I have a question on multiple top layers for back-propagation, as the following show:
My question is, what's the differentiation of x_i w.r.t x_{i-1} ?
Thanks in advance.
xiaox

Related

How to create your own Autoencoder in Matlab?

I am trying to make an autoencoder that would work on the ORL dataset. I have the images ready in vectors(1024 * 400) and I was thinking of making an autoencoder with a linear (fully connected) layer.
Of course, with the help of the Internet and a little searching, you can come across the trainAutoencoder function.
network = trainAutoencoder(fea, 512)
But in this function I can't make an autoencoder with multiple layers?? By googling, I found stack autoencoder, which solves that problem. But I ask here a few questions about how to change the activation function (for example ReLu), and not the sigmoid that comes automatically.
autoenc1 = [featureInputLayer(32*32)
fullyConnectedLayer(16*16,"Name","fc_1")
reluLayer("Name","relu_1")
fullyConnectedLayer(8*8,"Name","fc_2")
fullyConnectedLayer(16*16,"Name","fc_3")
reluLayer("Name","relu_2")
fullyConnectedLayer(32*32,"Name","fc_4")
classificationLayer("Name","classoutput")]
Is it possible to write an autoencoder in this way? I know the classification output doesn't make sense with an unsupervised network, but MatLab was forcing me to set something up. Is it possible to make an autoencoder using Deep Network Designer?

Can I train Word2vec using a Stacked Autoencoder with non-linearities?

Every time I read about Word2vec, the embedding is obtained with a very simple Autoencoder: just one hidden layer, linear activation for the initial layer, and softmax for the output layer.
My question is: why can't I train some Word2vec model using a stacked Autoencoder, with several hidden layers with fancier activation functions? (The softmax at the output would be kept, of course.)
I never found any explanation about this, therefore any hint is welcome.
Word vectors are noting but hidden states of a neural network trying to get good at something.
To answer your question
Of course you can.
If you are going to do it why not use fancier networks/encoders as well like BiLSTM or Transformers.
This is what people who created things like ElMo and BERT did(though their networks were a lot fancier).

Keras - Linear stack of layers?

I started to follow this "guide" to learn how to make a neural network but I'm already stuck at the first sentence
https://keras.io/getting-started/sequential-model-guide/
What the hell is a LINEAR stack of layer ?
Does it mean the derivative of the stack is a constant ? (kidding but I'm getting really frustrated by guides who don't define what they're saying)
A linear stack is a model without any branching. Every layer has one input and output. The output of one layer is the input of the layer below it.
Stacks which are not linear can have layers with multiple inputs and outputs. They can have complex connections between layers
The term linear stack is used to mean that there is no funny business going on, e.g. recurrency (connections can go backwards) or residual connections (connections can skip layers). The connections between neurons go from one layer to the next, no more, no less.
Fully connected feed-forward layers are an example of a linear stack.

Is nnet package in R only used to fit a neural network with single hidden layer?

In the description of nnet in CRAN projects(https://cran.r-project.org/web/packages/nnet/nnet.pdf) it says that nnet fits a single hidden layer:
Description: Fit single-hidden-layer neural network, possibly with skip-layer connections
Is it possible for me to specify the number of hidden layers using nnet? My understanding was that my selection of hidden layers and number of neurons in the hidden layer were the parameters that can be changed to improve a model. Is it true that it could help a model to add/remove hidden layers? Or, are there separate areas of application of single-layered and multi-layered neural networks?
I am new to ANN. I am working on a classification model with training sample size of 55000 x 54.
Thanks in advance!
Simple answer NO, nnet always has a single layer where you specify the number of nodes. You can find more information in a similar question here. You will need to use other packages such as neuralnet or something more sophisticated like h20 or MXNet.
Regarding parameters to improve the model, there are many different parts of neural networks beyond the raw architecture (i.e. layers, nodes). These include optimization functions, activation functions, batchsizes, among many others. You likely want to consult some further resources on using neural networks.

Trying to find object coordinates (x,y) in image, my neural network seems to optimize error without learning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I generate images of a single coin pasted over a white background of size 200x200. The coin is randomly chosen among 8 euro coin images (one for each coin) and has :
random rotation ;
random size (bewteen fixed bounds) ;
random position (so that the coin is not cropped).
Here are two examples (center markers added): Two dataset examples
I am using Python + Lasagne. I feed the color image into the neural network that has an output layer of 2 linear neurons fully connected, one for x and one for y.
The targets associated to the generated coin images are the coordinates (x,y) of the coin center.
I have tried (from Using convolutional neural nets to detect facial keypoints tutorial):
Dense layer architecture with various number of layers and number of units (500 max) ;
Convolution architecture (with 2 dense layers before output) ;
Sum or mean of squared difference (MSE) as loss function ;
Target coordinates in the original range [0,199] or normalized [0,1] ;
Dropout layers between layers, with dropout probability of 0.2.
I always used simple SGD, tuning the learning rate trying to have a nice decreasing error curve.
I found that as I train the network, the error decreases until a point where the output is always the center of the image. It looks like the output is independent of the input. It seems that the network output is the average of the targets I give. This behavior looks like a simple minimization of the error since the positions of the coins are uniformly distributed on the image. This is not the wanted behavior.
I have the feeling that the network is not learning but is just trying to optimize the output coordinates to minimize the mean error against the targets. Am I right? How can I prevent this? I tried to remove the bias of the output neurons because I thought maybe I'm just modifying the bias and all others parameters are being set to zero but this didn't work.
Is it possible for a neural network alone to perform well at this task?
I have read that one can also train a net for present/not present binary classification and then scan the image to find possible locations of objects. But I just wondered if it was possible just using the forward computation of a neural net.
Question : How can I prevent this [overfitting without improvement to test scores]?
What needs to be done is to re-architect your neural net. A neural net just isn't going to do a good job at predicting an X and Y coordinate. It can through create a heat map of where it detects a coin, or said another way, you could have it turn your color picture into a "coin-here" probability map.
Why? Neurons have a good ability to be used to measure probability, not coordinates. Neural nets are not the magic machines they are sold to be but instead really do follow the program laid out by their architecture. You'd have to lay out a pretty fancy architecture to have the neural net first create an internal space representation of where the coins are, then another internal representation of their center of mass, then another to use the center of mass and the original image size to somehow learn to scale the X coordinate, then repeat the whole thing for Y.
Easier, much easier, is to create a coin detector Convolution that converts your color image to a black and white image of probability-a-coin-is-here matrix. Then use that output for your custom hand written code that turns that probability matrix into an X/Y coordinate.
Question : Is it possible for a neural network alone to perform well at this task?
A resounding YES, so long as you set up the right neural net architecture (like the above), but it would probably be much easier to implement and faster to train if you broke the task into steps and only applied the Neural Net to the coin detection step.