Is it possible to train a neural netowrk using a loss function on unseen data (data different from input data)? - neural-network

Normally, a loss function may be defined as
L(y_hat, y) or L(f(X), y), where f is the neural network, X is the input data, y is the target.
Is it possible to implement (preferably in PyTorch) a loss function that depends not only on the input data X, but also on X' (X != X)?
For example, let's say I have a neural network f, input data (X,y) and X'. Can I construct a loss function such that
f(X) is as close as possible to y, and also
f(X') > f(X)?
The first part can be easily implemented (PyTorch: nn.MSELoss()), the second part seems to be way harder.
P.S: this question is a reformulation of Multiple regression while avoiding line intersections using neural nets, which was closed. In the original data, input data and photos with a theoretical example are available.

Yes it is possible.
For instance, you can add a loss term using ReLU as follows:
loss = nn.MSELoss()(f(X),y) + lambd * nn.ReLU()(f(X)-f(X'))
where lambd is a hyperparameter.
Note that this corresponds to f(X') >= f(X) but it's easily modifiable to f(X') > f(X) by adding an epsilon<0 (small enough in absolute value) constant inside the ReLU.

Related

How to model best an input/output model with neural network?

The theme here is the use of neural network in learning time histories.
Lets consider for clarity Y = f(X) where X is a vector 1xN and Y 1xN.
In most of the models I can find or test online, X is directly the time vectorised with regular timesteps (X=T).
The prediction task performed on the time history is done therefore using the output Y, and using a sequence of this let say Y(i:i+Nsample) as an Neural Network input and then the output is Y(i+Nsample+1). Then the prediction is performed moving the window one step at the time ( i= i+1).
Now my question is the following. In the case where we have a vector X which is a generic function whose values are known, the problem to model with neural network is:
knowing X(i:i+Nsample+1) and Y(i:i+Nsample) we want to predict Y(i:i+Nsample+1)
then we can do i=i+1 and proceed forward.
What are the best solutions to design such a system, is there an example with Keras or other project from which I could be inspired?
I see several solutions but without being convinced.
a)Set a multidimensional vector (2xNsample) as input [X(i:i+Nsample ) ; Y(i-1:i+Nsample-1)] and predict Y(:i+Nsample) (treat the output as a second input)
b) set two separate lstm for X and Y and then concatenate them in some way

How does the linear transfer function in perceptrons (artificial neural network) work?

I know how the step transfer function works but how does the linear transfer function work? What equation do you use?
Relate answer to AND gate with two inputs and a bias
First of all, in general you want to apply linear transfer function only in the output layer of an MLP and "never" in the hidden layers, where non-linear transfer functions are typically used (logistic function, step. etc.).
Linear transfer function (in the form of f(x) = x for pure linear or purelin as it is mentioned in literature) is typically used for function approximation / regression tasks (this is intuitive because step and logistic functions give binary results where the linear function gives continuous results).
Non- linear transfer functions are used for classification tasks.
Non-linear transfer function(aka: activation function) is the most important factor which assigns the nonlinear approximation capability to the simple fully connected multilayer neural network.
Nevertheless, 'linear' activation function, of course, is one of the many alternatives you might want to adopt. But the problem is, pure linear transfer(f(x) = x) in hidden layers doesn't make sense for us, which means it may be 'in vain' if we try to train a network whose hidden units are activated by pure linear function.
We may understand this process with the following:
Assuming f(x)=x is our activation function, and we try to train a single hidden layer network having 2 input units(x1,x2), 3 hidden units(a1,a2,a3) and 1 output unit(y).
Hence, the network tries to approximate the function :
# hidden units
a1 = f(w11*x1+w12*x2+b1) = w11*x1+w12*x2+b1
a2 = f(w21*x1+w22*x2+b2) = w21*x1+w22*x2+b2
a3 = f(w31*x1+w32*x2+b3) = w31*x1+w32*x2+b3
# output unit
y = c1*a1+c2*a2+c3*a3+b4
if we combine all these equations, it turns out:
y = c1(w11*x1+w12*x2+b1) + c2(w21*x1+w22*x2+b2) + c3(w31*x1+w32*x2+b3) + b4
= (c1*w11+c2*w21+c3*w31)*x1 + (c1*w12+c2*w22+c3*w32)*x2 + (c1*b1+c2*b2+c3*b3+b4)
= A1*x1+A2*x2+C
As shown above, linear activation degenerate the network into a single input-output linear product, regardless of the structure of the network. What was done during the training process is factorizing A1, A2 and C into various factors.
Even one very popular quasi-linear activation function call Relu in deep neural network is also rectified. In other words, no pure linear activation in hidden layers is used unless you want to factorize coefficients.

How can I use a neural network to model a quadratic equation?

A lot of examples I've seen about neural network to model mathematical functions are using sin / cos / etc. These are nicely bounded between 0 and 1.
What if I wanted to model something that was quadratic? y = ax^2 + bx + c? How can I modify my input data to fit this?
Presumably I'll have only one input (x value) and a bias input. The output will be the y. My training data will have negative numbers as well as positive numbers.
Thank you.
You can feed any real number into a neural network and it can theoretically output any number, so long as the last layer of your neural network is linear. If not, you could possibly multiply all the targets by really small number.

local inverse of a neural network

I have a neural network with N input nodes and N output nodes, and possibly multiple hidden layers and recurrences in it but let's forget about those first. The goal of the neural network is to learn an N-dimensional variable Y*, given N-dimensional value X. Let's say the output of the neural network is Y, which should be close to Y* after learning. My question is: is it possible to get the inverse of the neural network for the output Y*? That is, how do I get the value X* that would yield Y* when put in the neural network? (or something close to it)
A major part of the problem is that N is very large, typically in the order of 10000 or 100000, but if anyone knows how to solve this for small networks with no recurrences or hidden layers that might already be helpful. Thank you.
If you can choose the neural network such that the number of nodes in each layer is the same, and the weight matrix is non-singular, and the transfer function is invertible (e.g. leaky relu), then the function will be invertible.
This kind of neural network is simply a composition of matrix multiplication, addition of bias and transfer function. To invert, you'll just need to apply the inverse of each operation in the reverse order. I.e. take the output, apply the inverse transfer function, multiply it by the inverse of the last weight matrix, minus the bias, apply the inverse transfer function, multiply it by the inverse of the second to last weight matrix, and so on and so forth.
This is a task that maybe can be solved with autoencoders. You also might be interested in generative models like Restricted Boltzmann Machines (RBMs) that can be stacked to form Deep Belief Networks (DBNs). RBMs build an internal model h of the data v that can be used to reconstruct v. In DBNs, h of the first layer will be v of the second layer and so on.
zenna is right.
If you are using bijective (invertible) activation functions you can invert layer by layer, subtract the bias and take the pseudoinverse (if you have the same number of neurons per every layer this is also the exact inverse, under some mild regularity conditions).
To repeat the conditions: dim(X)==dim(Y)==dim(layer_i), det(Wi) not = 0
An example:
Y = tanh( W2*tanh( W1*X + b1 ) + b2 )
X = W1p*( tanh^-1( W2p*(tanh^-1(Y) - b2) ) -b1 ), where W2p and W1p represent the pseudoinverse matrices of W2 and W1 respectively.
The following paper is a case study in inverting a function learned from Neural Networks. It is a case study from the industry and looks a good beginning for understanding how to go about setting up the problem.
An alternate way of approaching the task of getting the desired x that yields desired y would be start with random x (or input as seed), then through gradient decent (similar algorithm to back propagation, difference being that instead of finding derivatives of weights and biases, you find derivatives of x. Also, mini batching is not needed.) repeatedly adjust x until it yields a y that is close to the desired y. This approach has an advantage that it allows an input of a seed (starting x, if not randomly selected). Also, I have a hypothesis that the final x will have some similarity to initial x(seed), which would imply that this algorithm has the ability to transpose, depending on the context of the neural network application.

Neural Network (FFW, BP) - function approximation

is it possible to train NN to approximate this function:
If I tun approximation for x^2 or sin or something simple, it works fine, but for this sort of function i got only constant valued line.
My NN has 2 inputs (x, f(x)), one hidden layer (10 neurons), 1 output (f(x))
For training I am using BP, activation functions sigmoid -> tanh
My goal is to get "smooth" function without noise, that catch function on image above.
Or is there any other way with NN or genetic algorithm, how to approximate this ?
You're gping to have major problems because the input (x, f(x)) is discontinuous (not exactly, but sort of).
Therefore, your NN will have to literally memorize the x-f(x) mapping given the large discontinuities.
One approach is to use a four-layer NN which can address the discontinuities.
But really, you may simply want to look at other smoothening methods rather than NN for thos problem.
You have a periodic function so first of all, only use one period, or you will memorize and not generalize.