Does anyone know the default activation function used in the recurrent layers in Keras? https://keras.io/layers/recurrent/
It says the default activation function is linear. But what about the default recurrent activation function. Nothing is mentioned about that. Any help would be highly appreciated.
Thanks in advance
Keras Recurrent is an abstact class for recurrent layers. In Keras 2.0 all default activations are linear for all implemented RNNs (LSTM, GRU and SimpleRNN). In previous versions you had:
linear for SimpleRNN,
tanh for LSTM and GRU.
https://github.com/keras-team/keras/blob/master/keras/layers/recurrent.py#L2081
It mentions tanh here for version 2.3.0 :-)
Related
What choices of losses are there for an LSTM autoencoder?
Other than Mean Squared Error (MSE), is there a possibility of using anything else?
If yes, can you please link me to a keras (tf backend) implementation or provide me with one?
Thanks in advance.
If the dataset is highly-imbalanced, using "Median frequency class weighting" as the loss function other than MSE reconstruction loss is recommended.
The following links should help too:
https://datascience.stackexchange.com/questions/13490/how-to-set-class-weights-for-imbalanced-classes-in-keras
https://github.com/keras-team/keras/issues/3653
I was trying to apply a tansig or tanh function on my fixpointed data which I am using for my neural nework in MatLab, but when I use these functions on embedded.fi files, MatLab says that tanh or tansig function will not work on embedded.fi.
I am trying to set my neural network using fixpointed weights. I will really appreciate it if anyone has solution for that
As the error message states, there is no fixed-point version of tanh and tansig functions in Matlab. The complete list of functions that have fixed-point versions and are supported can be found here:
http://www.mathworks.com/help/hdlcoder/ug/fixed-point-run-time-library-support.html
I just want to know if a neural network can be trained with a single class of data set. I have a set of data that I want to train a neural network with. After training it, I want to give new data(for testing) to the trained neural network to check if it can recognize it as been similar to the training sample or not.
Is this possible with neural network? If yes, will that be a supervised learning or unsupervised.
I know neural networks can be used for classification if there are multiple classes but I have not seen with a single class before. A good explanation and link to any example will be much appreciated. Thanks
Of course it can be. But in this case it will only recognize this one class that you have trained it with. And depending on the expected output you can measure the similarity to the training data.
An NN, after training, is just a function. For classification problems you can imagine it as a function that takes data as input and returns an integer indicating to which class it belongs to. That being said, if you have only one class that can be represented by an integer value 1, and if training data is not similar to that class, you will get something like 1.555; It will not tel you that it belongs to another class, because you have introduced only one, but it will definitely give you a hint about its similarity.
NNs are considered to be supervised learning, because before training you have to provide both input and target, i. e. the expected output.
If you train a network with only a single class of data then It is popularly known as One-class Classification. There are various algorithms developed in the past like One-class SVM, Support Vector Data Description, OCKELM etc. Tax and Duin developed a MATLAB toolbox for this and it supports various one-class classifiers.
DD Toolbox
One-class SVM
Kernel Ridge Regression based or Kernelized ELM based or LSSVM(where bias=0) based One-class Classification
There is a paper Anomaly Detection Using One-Class Neural Networks
which combines One-Class SVM and Neural Networks.
Here is source code. However, I've had difficulty connecting the source code and the paper.
I don't quite understand why a sigmoid function is seen as more useful (for neural networks) than a step function... hoping someone can explain this for me. Thanks in advance.
The (Heaviside) step function is typically only useful within single-layer perceptrons, an early type of neural networks that can be used for classification in cases where the input data is linearly separable.
However, multi-layer neural networks or multi-layer perceptrons are of more interest because they are general function approximators and they are able to distinguish data that is not linearly separable.
Multi-layer perceptrons are trained using backpropapagation. A requirement for backpropagation is a differentiable activation function. That's because backpropagation uses gradient descent on this function to update the network weights.
The Heaviside step function is non-differentiable at x = 0 and its derivative is 0 elsewhere. This means gradient descent won't be able to make progress in updating the weights and backpropagation will fail.
The sigmoid or logistic function does not have this shortcoming and this explains its usefulness as an activation function within the field of neural networks.
It depends on the problem you are dealing with. In case of simple binary classification, a step function is appropriate. Sigmoids can be useful when building more biologically realistic networks by introducing noise or uncertainty. Another but compeletely different use of sigmoids is for numerical continuation, i.e. when doing bifurcation analysis with respect to some parameter in the model. Numerical continuation is easier with smooth systems (and very tricky with non-smooth ones).
I'm trying to implement Face Detection with Neural Network using Rowley's method.
www.informedia.cs.cmu.edu/documents/rowley-ieee.pdf
My problem is that i cant find anything about the activation function used in the proposed NN. Have anyone tried to implement Rowley's method, and what activation function should be used? thanks.
I think it is hyperbolic tangent (TanH) function because 1) paper says that: "The neural network produce real values between 1 and -1" - that rules out sigmoid 2) earlier Rowley paper references LeCun 1989 work while discussing network architecture, and that paper explicitly mention hyperbolic tangent.