Keras - Linear stack of layers? - neural-network

I started to follow this "guide" to learn how to make a neural network but I'm already stuck at the first sentence
https://keras.io/getting-started/sequential-model-guide/
What the hell is a LINEAR stack of layer ?
Does it mean the derivative of the stack is a constant ? (kidding but I'm getting really frustrated by guides who don't define what they're saying)

A linear stack is a model without any branching. Every layer has one input and output. The output of one layer is the input of the layer below it.
Stacks which are not linear can have layers with multiple inputs and outputs. They can have complex connections between layers

The term linear stack is used to mean that there is no funny business going on, e.g. recurrency (connections can go backwards) or residual connections (connections can skip layers). The connections between neurons go from one layer to the next, no more, no less.
Fully connected feed-forward layers are an example of a linear stack.

Related

Can I train Word2vec using a Stacked Autoencoder with non-linearities?

Every time I read about Word2vec, the embedding is obtained with a very simple Autoencoder: just one hidden layer, linear activation for the initial layer, and softmax for the output layer.
My question is: why can't I train some Word2vec model using a stacked Autoencoder, with several hidden layers with fancier activation functions? (The softmax at the output would be kept, of course.)
I never found any explanation about this, therefore any hint is welcome.
Word vectors are noting but hidden states of a neural network trying to get good at something.
To answer your question
Of course you can.
If you are going to do it why not use fancier networks/encoders as well like BiLSTM or Transformers.
This is what people who created things like ElMo and BERT did(though their networks were a lot fancier).

Convolutional neural network back propagation - delta calculation in convolutional layer

So I’m trying to make a CNN and so far I think I understand all of the forward propagation and the back propagation in the fully connected layers. However, I’m having some issues with the back prop in the convolutional layers.
Basically I’ve written out the dimensions of everything at each stage in a CNN with two convolutional layers and two fully connected layers, with the input having a depth of 1(as it is black and white) and only one filter being applied at each convolutional layer. I haven’t bothered to use pooling at this stage as to my knowledge it shouldn’t have any impact on the calculus, just to where it is assigned, so the dimensions should still fit as long as I also don’t include any uppooling in my backprop. I also haven’t bothered to write out the dimensions after the application of the activation functions as they would be the same as that as their input and I would be writing the same values twice.
The dimensions, as you will see, vary slightly in format. For the convolutional layers I’ve written them as though they are images, rather than in a matrix form. Whilst for the fully connected layers I’ve written the dimensions as that of the size of the matrices used(will hopefully make more sense when you see it).
The issue is that in calculating the delta for the convolutional layers, the dimensions don’t fit, what am I doing wrong?
Websites used:
http://cs231n.github.io/convolutional-networks/
http://neuralnetworksanddeeplearning.com/chap2.html#the_cross-entropy_cost_function
http://www.jefkine.com/general/2016/09/05/backpropagation-in-convolutional-neural-networks/
Calculation of dimensions:

General Questions about NeuralNetworks

I have some general questions about NNs and their training in hope that you can answer them:
Lets propose, that Ive got an untrained NN with n hidden Layers and m neurons in it. I want to train the network to, eg recognice voice and so words. How can I make this possible when my sound input doesnt always have the same length (eg one is 1 second the other one is 5)? How many layers and what type should my NN be (Recurrent,LSTM,CNNs etc)? Are there any other training algorithms than the normal backpropagation ( I thought about having a NN with just one neuron in each Layer and then let grow new one till the problem could be solved)? And finally is it recommended/helpfull to make connections between the neurons of eg Layer 2 to Layer 4?
Thank you about your help!
This is a perfectly valid question, for your record.
You should definitely use a recurrent network for voice recognition. So that means you output say 1/100 of a second one by one. So for one second, you activate the network 100 times for one second of data.
Using an LSTM will make sure that patterns over large time lags are remembered, so the network will essentially rememember (useful) parts from previous inputs.
How many layers you should use is dependant on what exactly you want to recognize. But because voice recognition is not one of the easiest classification tasks, it will have to be a large deep network (combine convolutional with LSTM).
What you proposed, evolving the network one node by one, is basically called neuroevolution. Libraries such as Neataptic support the evolution of networks towards a certain solution.
Yes, that could definitely help. But this can purely be found out by trial and error.
PS: I strongly recommend to start on an easier task to develop an understanding of neural networks.

Theanets: Removing individual connections

How do you remove connections in Theanets? I'd like to create custom connectivity between an input layer, a single hidden layer, and an output layer. But the only defaults are feedforward all-to-all architectures or recurrent architectures. I'd like to remove specific connections from the all-to-all connectivity and then train the network.
Thanks in advance.
(Developer of theanets here.)
This is currently not directly possible with theanets. For computational efficiency the underlying computations in feedforward networks are implemented as simple matrix operations, which are fast and can be executed on a GPU for sometimes dramatic speedups.
You can, however, initialize the weights in a layer so that some (or many) of the weights are zero. To do this, just pass a dictionary in the layers list, and include a sparsity key:
import theanets
net = theanets.Autoencoder(
layers=(784, dict(size=1000, sparsity=0.9), 784))
This initializes the weights for the layer so that the given fraction of weights are zeros. The weights are, however, eligible for change during the training process, so this is only an initialization trick.
You can, however, implement a custom Layer subclass that does whatever you like, as long as you stay within the Theano boundaries. You could, for instance, implement a type of feedforward layer that uses a mask to ensure that some weights remain zeros during the feedforward computation.
For more details you might want to ask on the theanets mailing list.

Is there a rule/good advice on how big a artificial neural network should be?

My last lecture on ANN's was a while ago but I'm currently facing a project where I would want to use one.
So, the basics - like what type (a mutli-layer feedforward network), trained by an evolutionary algorithm (thats a given by the project), how many input-neurons (8) and how many ouput-neurons (7) - are set.
But I'm currently trying to figure out how many hidden layers I should use and how many neurons in each of these layers (the ea doesn't modify the network itself, but only the weights).
Is there a general rule or maybe a guideline on how to figure this out?
The best approach for this problem is to implement the cascade correlation algorithm, in which hidden nodes are sequentially added as necessary to reduce the error rate of the network. This has been demonstrated to be very useful in practice.
An alternative, of course, is a brute-force test of various values. I don't think simple answers such as "10 or 20 is good" are meaningful because you are directly addressing the separability of the data in high-dimensional space by the basis function.
A typical neural net relies on hidden layers in order to converge on a particular problem solution. A hidden layer of about 10 neurons is standard for networks with few input and output neurons. However, a trial an error approach often works best. Since the neural net will be trained by a genetic algorithm the number of hidden neurons may not play a significant role especially in training since its the weights and biases on the neurons which would be modified by an algorithm like back propogation.
As rcarter suggests, trial and error might do fine, but there's another thing you could try.
You could use genetic algorithms in order to determine the number of hidden layers or and the number of neurons in them.
I did similar things with a bunch of random forests, to try and find the best number of trees, branches, and parameters given to each tree, etc.