How to model best an input/output model with neural network? - neural-network

The theme here is the use of neural network in learning time histories.
Lets consider for clarity Y = f(X) where X is a vector 1xN and Y 1xN.
In most of the models I can find or test online, X is directly the time vectorised with regular timesteps (X=T).
The prediction task performed on the time history is done therefore using the output Y, and using a sequence of this let say Y(i:i+Nsample) as an Neural Network input and then the output is Y(i+Nsample+1). Then the prediction is performed moving the window one step at the time ( i= i+1).
Now my question is the following. In the case where we have a vector X which is a generic function whose values are known, the problem to model with neural network is:
knowing X(i:i+Nsample+1) and Y(i:i+Nsample) we want to predict Y(i:i+Nsample+1)
then we can do i=i+1 and proceed forward.
What are the best solutions to design such a system, is there an example with Keras or other project from which I could be inspired?
I see several solutions but without being convinced.
a)Set a multidimensional vector (2xNsample) as input [X(i:i+Nsample ) ; Y(i-1:i+Nsample-1)] and predict Y(:i+Nsample) (treat the output as a second input)
b) set two separate lstm for X and Y and then concatenate them in some way

Related

Is it possible to train a neural netowrk using a loss function on unseen data (data different from input data)?

Normally, a loss function may be defined as
L(y_hat, y) or L(f(X), y), where f is the neural network, X is the input data, y is the target.
Is it possible to implement (preferably in PyTorch) a loss function that depends not only on the input data X, but also on X' (X != X)?
For example, let's say I have a neural network f, input data (X,y) and X'. Can I construct a loss function such that
f(X) is as close as possible to y, and also
f(X') > f(X)?
The first part can be easily implemented (PyTorch: nn.MSELoss()), the second part seems to be way harder.
P.S: this question is a reformulation of Multiple regression while avoiding line intersections using neural nets, which was closed. In the original data, input data and photos with a theoretical example are available.
Yes it is possible.
For instance, you can add a loss term using ReLU as follows:
loss = nn.MSELoss()(f(X),y) + lambd * nn.ReLU()(f(X)-f(X'))
where lambd is a hyperparameter.
Note that this corresponds to f(X') >= f(X) but it's easily modifiable to f(X') > f(X) by adding an epsilon<0 (small enough in absolute value) constant inside the ReLU.

why linear function is useless in multiple layer neural network? How last layer become the linear function of the input of first layer?

I was studying about activation function in NN but could not understand this part properly -
"Each layer is activated by a linear function. That activation in turn goes into the next level as input and the second layer calculates weighted sum on that input and it in turn, fires based on another linear activation function.
No matter how many layers we have, if all are linear in nature, the final activation function of last layer is nothing but just a linear function of the input of first layer! "
This is one of the most interesting concepts that I came across while learning neural networks. Here is how I understood it:
The input Z to one layer can be written as a product of a weight matrix and a vector of the output of nodes in the previous layer. Thus Z_l = W_l * A_l-1 where Z_l is the input to the Lth layer. Now A_l = F(Z_l) where F is the activation function of the layer L. If the activation function is linear then A_l will be simply a factor K of Z_l. Hence, we can write Z_l somewhat as:
Z_l = W_l*W_l-1*W_l-2*...*X where X is the input. So you see the output Y will finally be the multiplication of a few matrices times the input vector for a particular data instance. We can always find a resultant multiplication of the weight matrices. Thus, output Y will be W_Transpose * X. This equation is nothing but a linear equation that we come across in linear regression.
Therefore, if all the input layers have linear activation, the output will only be a linear combination of the input and can be written using a simple linear equation.
It isn't really useless.
If there are multiple linearly activated layers, the results of the calculations in the previous layer would be sent to the next layer as input. Same thing happens in the next layer. It would calculate the input and send it based on another linear activation function to the next layer.
If all layers are linear it doesn't matter how much layers there actually are. The last activation function of final layer will also be a linear function of the input from the first layer.
If you want a good read about Activation Functions you can find one here and here.

Neural Network Multiple Inputs and one output

i want to create a Neural Network with "three (2D) Matrices" as a inputs , and
the output is a 1 (2D) Matrix , so the three inputs is :
1-2D Matrix Contains ( X ,Y ) Coordinates From a device
2-2D Matrix Contains ( X ,Y ) Coordinates From another different Device
3-2D Matrix Contains the True exact( X , Y ) Coordinates that i already
measured it ( i don't know if that exact include from the inputs or What??)
***Note that each input have his own Error and i want to make the Neural Network
to Minimize that error and choose the best result depends on the true exact (X,Y)***
**Notice that : im Working on object tracking that i extracting (x,y) Coordinate
from the camera and the other device is same the data so for example
i will simulate the Coordinates as Follows:
{ (1,2), (1,3), (1,4), (1,5) , (1,6).......}
and so on
For Sure the Output is one 2D Matrix the best or the True Exact (x,y) ,So im a
beginner and i want to understand how to create this Network with this Different
inputs and choose the best training method to have the Best Results ...?!
thanks in Advance
It sounds like what you want is a HxWx2 input where the first channel (depthwise layer) is your 1st input and the 2nd channel is your 2nd input. Your "true exact" coordinates would be the target that your net output is compared to, rather than being an input.
Note that neural nets don't really handle regression (real-valued outputs) very well - you may get better results dividing your coordinate range into buckets and then treating it as a classification problem instead (use softmax loss vs regression mean-squared error loss).
Expanding on the regression vs classification point:
A regression problem is one where you want the net to output a real value such as a coordinate value in range 0-100. A classification problem is one where you want the net to output a set of propabilities that your input belongs to a given class it was trained on (e.g. you train a net on images belonging to classes "cat" "dog" and "rabbit").
It turns out that modern neural nets are much better at classification than regression, because the way they work is basically by subdividing the N-dimensional input spaces into sub-regions corresponding to the outputs they are being trained to make. What they are naturally doing is classifying.
The obvious way to turn a regression problem into a classification problem, which may work better, is to divide your desired output range into sub-ranges (aka buckets) which you treat as classes. e.g. instead of training your net to output a single (or multiple) value in range 0-100, instead train it to output class probabilities representing, for example, each of 10 separate sub-ranges (classes) 0-9, 10-19, 20-20, etc.

Back-propagation until the input layer in neural network

I have a question regarding neural networks back-propagation. Suppose we have a trained DNN for some data. Then we feed a corrupted data into NN and back-propagate the error not until the first hidden layer, but until the input layer (so to say we calculate deltas for the input neurons). Does the error term shows us a mismatch between "clean" and "corrupted" vector?
If I'm interpreting the question correctly, you have two input vectors i1 = (a, b, ...) and i2 = (c, d, ...). You then have two corresponding output vectors o1 = (v, w, ...) and o2 = (x, y, ...).
i1 is part of your valid training data and is used to teach the NN the model. After this is done, you want to use the NN to detect the delta between the invalid o2 and the valid output given a correct application of the model to i2?
If this is the case, train the NN as normal with all of your valid input cases, then feed your test cases (input vectors which correspond to known corrupted output vectors) and collect the results with backpropagation disabled. That is, once the NN has learned the correct model, stop training and simply compare the "clean" results to the corrupted results yourself.
Note: You could alternatively train a neural network to accept as input a set of values corresponding to the input of some other process and a set of values corresponding to the (possibly corrupted) output of that process and produce as output the difference between the clean values and the corrupted values, but the extra training and framing of the data just to avoid doing the subtraction yourself is probably not worth it.

local inverse of a neural network

I have a neural network with N input nodes and N output nodes, and possibly multiple hidden layers and recurrences in it but let's forget about those first. The goal of the neural network is to learn an N-dimensional variable Y*, given N-dimensional value X. Let's say the output of the neural network is Y, which should be close to Y* after learning. My question is: is it possible to get the inverse of the neural network for the output Y*? That is, how do I get the value X* that would yield Y* when put in the neural network? (or something close to it)
A major part of the problem is that N is very large, typically in the order of 10000 or 100000, but if anyone knows how to solve this for small networks with no recurrences or hidden layers that might already be helpful. Thank you.
If you can choose the neural network such that the number of nodes in each layer is the same, and the weight matrix is non-singular, and the transfer function is invertible (e.g. leaky relu), then the function will be invertible.
This kind of neural network is simply a composition of matrix multiplication, addition of bias and transfer function. To invert, you'll just need to apply the inverse of each operation in the reverse order. I.e. take the output, apply the inverse transfer function, multiply it by the inverse of the last weight matrix, minus the bias, apply the inverse transfer function, multiply it by the inverse of the second to last weight matrix, and so on and so forth.
This is a task that maybe can be solved with autoencoders. You also might be interested in generative models like Restricted Boltzmann Machines (RBMs) that can be stacked to form Deep Belief Networks (DBNs). RBMs build an internal model h of the data v that can be used to reconstruct v. In DBNs, h of the first layer will be v of the second layer and so on.
zenna is right.
If you are using bijective (invertible) activation functions you can invert layer by layer, subtract the bias and take the pseudoinverse (if you have the same number of neurons per every layer this is also the exact inverse, under some mild regularity conditions).
To repeat the conditions: dim(X)==dim(Y)==dim(layer_i), det(Wi) not = 0
An example:
Y = tanh( W2*tanh( W1*X + b1 ) + b2 )
X = W1p*( tanh^-1( W2p*(tanh^-1(Y) - b2) ) -b1 ), where W2p and W1p represent the pseudoinverse matrices of W2 and W1 respectively.
The following paper is a case study in inverting a function learned from Neural Networks. It is a case study from the industry and looks a good beginning for understanding how to go about setting up the problem.
An alternate way of approaching the task of getting the desired x that yields desired y would be start with random x (or input as seed), then through gradient decent (similar algorithm to back propagation, difference being that instead of finding derivatives of weights and biases, you find derivatives of x. Also, mini batching is not needed.) repeatedly adjust x until it yields a y that is close to the desired y. This approach has an advantage that it allows an input of a seed (starting x, if not randomly selected). Also, I have a hypothesis that the final x will have some similarity to initial x(seed), which would imply that this algorithm has the ability to transpose, depending on the context of the neural network application.