Thresholding in intermediate layer using Gumbel Softmax - neural-network

In a neural network, for an intermediate layer, I need to threshold the output. The output of each neuron in the layer is a real value, but I need to binarize it (to 0 or 1). But with hard thresholding, backpropagation won't work. Is there a way to achieve this?
Details:
I have a GAN kind of network i.e. there are 2 neural networks trained end-to-end. The output of first neural network is real values. I need them to be binary values. I read that Gumbel Softmax (Categorical Reparameterization) is used to handle discrete variables in a neural network. Is there a way to use that for my use-case? If yes, how? If not, is there any other way?
From what I could gather in internet is that Gumbel is a probability distribution. Using that we can generate a discrete distribution. But for use-case, I need a function that can take a real input and output a binary value. So, I need an activation function of that form. How can I achieve that?
Thanks!

Related

Can a single input single output neural network with y=x as activation function reflect non-linear behavior?

I am currently learning a little bit about neural networks. One question I can't really get behind is about how neural networks reflect non-linear behavior. From my understanding there is no possibility to reflect non-linear behavior inside a compact set using a neural network.
For example if I would take the function from this question:
y = x^2
and I would use a neural network with a single input and single output the best the neural network could do for each compact set [x0...xn] is a linear function spanning from one end of the set to the other, as at the end all calculations inside the net are linear.
Do I have some misunderstanding about this concept?
The ANN's capabilties to model non-linear behaviour arise from the (usually) non-linear activation function.
If the activation function is linear, then the process of training the network is just another way to create a linear (or multi-linear) fit of input and output data.
Activation function in neural networks is exactly the part, that brings non-linearity. If you use linear activation function, then you cannot train non-linear model (thus fit quadratic or other non-linear functions).
The part, I guess, you are interested in is Universal Approximation Theorem, which says that any continuous function can be approximated with a neural network with a single hidden layer (some assumptions on activation function are applied thou). Take into account, that this theorem does not say anything on optimization of such a network (it does not guarantee you can train such a network with a specific algorithm, but only that such a network exists). Also it does not say anything on the number of neurons you should use.
You can check following links, to get more details:
Original proof with sigmoid activation function: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.441.7873&rep=rep1&type=pdf
And a more friendly derivation: http://mcneela.github.io/machine_learning/2017/03/21/Universal-Approximation-Theorem.html

How do i take a trained neural network and implement in another system?

I have trained a feedforward neural network in Matlab. Now I have to implement this neural network in C language (or simulate the model in Matlab using mathematical equations, without using direct functions). How do I do that? I know that I have to take the weights and bias and activation function. What else is required?
There is no point in representing it as a mathematical function because it won't save you any computations.
Indeed all you need is the weights, biases, activation and your architecture. I'm assuming it is a simple feedforward network as you said, you need to implement some kind of matrix multiplication and addition in C. Also, you'll need to implement the activation function. After that, you're ready to go. Your feed forward NN is ready to be implemented. If the C code will not be used for training, it won't be necessary to implement the backpropagation algorithm in C.
A feedforward layer would be implemented as follows:
Output = Activation_function(Input * weights + bias)
Where,
Input: (1 x number_of_input_parameters_for_this_layer)
Weights: (number_of_input_parameters_for_this_layer x number_of_neurons_for_this_layer)
Bias: (1 x number_of_neurons_for_this_layer)
Output: (1 x number_of_neurons_for_this_layer)
The output of a layer is the input to the next layer.
After some days of searching, I have found the following webpage to be very useful http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
The picture below shows a simple feedforward neural network. Picture taken from the above website.
In this figure, the circles denote the inputs to the network. The circles labeled “+1” are called bias units, and correspond to the intercept term. The leftmost layer of the network is called the input layer, and the rightmost layer the output layer (which, in this example, has only one node). The middle layer of nodes is called the hidden layer, because its values are not observed in the training set. In this example, the neural network has 3 input units (not counting the bias unit), 3 hidden units, and 1 output unit.
The mathematical equations representing this feedforward network are
This neural network has parameters (W,b)=(W(l),b(l),W(2),b(2)), where we write Wij(l) to denote the parameter (or weight) associated with the connection between unit j in layer l, and unit i in layer l+1. (Note the order of the indices.) Also, bi(l) is the bias associated with unit i in layer l+1.
So, from the trained model, as Mido mentioned in his answer, we have to take the input weight matrix which is W(1), the layer weight matrix which is W(2), biases, hidden layer transfer function and output layer transfer function. After this, use the above equations to estimate the output hW,b(x). A popular transfer function used for a regression problem is tan-sigmoid transfer function in the hidden layer and linear transfer function in the output layer.
Those who use Matlab, these links are highly useful
try to simulate neural network in Matlab by myself
Neural network in MATLAB
Programming a Basic Neural Network from scratch in MATLAB

How to convert RNN into a regression net?

If the output is a tanh function, then I get a number between -1 and 1.
How do I go about converting the output to the scale of my y values (which happens to be around 15 right now, but will vary depending on the data)?
Or am I restricted to functions which vary within some kind of known range...?
Just remove the tanh, and your output will be an unrestricted number. Your error function should probably be squared error.
You might have to change the gradient calculation for your back-prop, if this isn't done automatically by your framework.
Edit to add: You almost certainly want to keep the tanh (or some other non-linearity) between the recurrent connections, so remove it only for the output connection.
In most RNNs for classification, most people use a softmax layer on top of their LSTM or tanh layers so I think you can replace the softmax with just a linear output layer. This is what some people do for regular neural networks as well as convolutional neural networks. You will still have the nonlinearity from the hidden layers, but your outputs will not be restricted within a certain range such as -1 and 1. The cost function would probably be the squared error like larspars mentioned.

How to train a Matlab Neural Network using matrices as inputs?

I am making 8 x 8 tiles of Images and I want to train a RBF Neural Network in Matlab using those tiles as inputs. I understand that I can convert the matrix into a vector and use it. But is there a way to train them as matrices? (to preserve the locality) Or is there any other technique to solve this problem?
There is no way to use a matrix as an input to such a neural network, but anyway this won't change anything:
Assume you have any neural network with an image as input, one hidden layer, and the output layer. There will be one weight from every input pixel to every hidden unit. All weights are initialized randomly and then trained using backpropagation. The development of these weights does not depend on any local information - it only depends on the gradient of the output error with respect to the weight. Having a matrix input will therefore make no difference to having a vector input.
For example, you could make a vector out of the image, shuffle that vector in any way (as long as you do it the same way for all images) and the result would be (more or less, due to the random initialization) the same.
The way to handle local structures in the input data is using convolutional neural networks (CNN).

How to handle real numbers in a neural network?

I've been playing with neural networks. I started with approximating a XOR function without too many problems. But, then I attacked the problem of approximating the sqrt function.
The problem is that the input as well as the output can be any real numbers, not only numbers in ]0,1[
Is there a way I can handle that in the neural network so that it can output real numbers directly ?
Or do I have to normalize the input and output data to be in the ]0,1[ range ? Isn't that a loss of precision ?
Thanks
You can choose another activation function in your output layer, e.g. g(a) = a (identity). However, you should have a hidden layer with a nonlinear activation function (tanh, logistic) to approximate nonlinear functions.
Finally, I found that the most reasonable and generic solution was to normalize the inputs and then denormalize the outputs.
The user has to set the input / output ranges and then everything works well.
This is what is done by most of the neural networks frameworks.