How do i take a trained neural network and implement in another system? - matlab

I have trained a feedforward neural network in Matlab. Now I have to implement this neural network in C language (or simulate the model in Matlab using mathematical equations, without using direct functions). How do I do that? I know that I have to take the weights and bias and activation function. What else is required?

There is no point in representing it as a mathematical function because it won't save you any computations.
Indeed all you need is the weights, biases, activation and your architecture. I'm assuming it is a simple feedforward network as you said, you need to implement some kind of matrix multiplication and addition in C. Also, you'll need to implement the activation function. After that, you're ready to go. Your feed forward NN is ready to be implemented. If the C code will not be used for training, it won't be necessary to implement the backpropagation algorithm in C.
A feedforward layer would be implemented as follows:
Output = Activation_function(Input * weights + bias)
Where,
Input: (1 x number_of_input_parameters_for_this_layer)
Weights: (number_of_input_parameters_for_this_layer x number_of_neurons_for_this_layer)
Bias: (1 x number_of_neurons_for_this_layer)
Output: (1 x number_of_neurons_for_this_layer)
The output of a layer is the input to the next layer.

After some days of searching, I have found the following webpage to be very useful http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
The picture below shows a simple feedforward neural network. Picture taken from the above website.
In this figure, the circles denote the inputs to the network. The circles labeled “+1” are called bias units, and correspond to the intercept term. The leftmost layer of the network is called the input layer, and the rightmost layer the output layer (which, in this example, has only one node). The middle layer of nodes is called the hidden layer, because its values are not observed in the training set. In this example, the neural network has 3 input units (not counting the bias unit), 3 hidden units, and 1 output unit.
The mathematical equations representing this feedforward network are
This neural network has parameters (W,b)=(W(l),b(l),W(2),b(2)), where we write Wij(l) to denote the parameter (or weight) associated with the connection between unit j in layer l, and unit i in layer l+1. (Note the order of the indices.) Also, bi(l) is the bias associated with unit i in layer l+1.
So, from the trained model, as Mido mentioned in his answer, we have to take the input weight matrix which is W(1), the layer weight matrix which is W(2), biases, hidden layer transfer function and output layer transfer function. After this, use the above equations to estimate the output hW,b(x). A popular transfer function used for a regression problem is tan-sigmoid transfer function in the hidden layer and linear transfer function in the output layer.
Those who use Matlab, these links are highly useful
try to simulate neural network in Matlab by myself
Neural network in MATLAB
Programming a Basic Neural Network from scratch in MATLAB

Related

Can a single input single output neural network with y=x as activation function reflect non-linear behavior?

I am currently learning a little bit about neural networks. One question I can't really get behind is about how neural networks reflect non-linear behavior. From my understanding there is no possibility to reflect non-linear behavior inside a compact set using a neural network.
For example if I would take the function from this question:
y = x^2
and I would use a neural network with a single input and single output the best the neural network could do for each compact set [x0...xn] is a linear function spanning from one end of the set to the other, as at the end all calculations inside the net are linear.
Do I have some misunderstanding about this concept?
The ANN's capabilties to model non-linear behaviour arise from the (usually) non-linear activation function.
If the activation function is linear, then the process of training the network is just another way to create a linear (or multi-linear) fit of input and output data.
Activation function in neural networks is exactly the part, that brings non-linearity. If you use linear activation function, then you cannot train non-linear model (thus fit quadratic or other non-linear functions).
The part, I guess, you are interested in is Universal Approximation Theorem, which says that any continuous function can be approximated with a neural network with a single hidden layer (some assumptions on activation function are applied thou). Take into account, that this theorem does not say anything on optimization of such a network (it does not guarantee you can train such a network with a specific algorithm, but only that such a network exists). Also it does not say anything on the number of neurons you should use.
You can check following links, to get more details:
Original proof with sigmoid activation function: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.441.7873&rep=rep1&type=pdf
And a more friendly derivation: http://mcneela.github.io/machine_learning/2017/03/21/Universal-Approximation-Theorem.html

Are fully connected layers really required in Deep neural networks?

I mean to ask that can I have a neural network classifier with a large number of layers without fully connected layers?
Yes, you can make a fully convolutional classifier, one example is SqueezeNet.
The basic working principle is that at the end of the network you insert a convolutional layer with C output channels, where C is the number of classes. Then you proceed to apply global average pooling, which will produce a 1D vector of C elements (independent of input feature map width/height), and you can apply the softmax function to that vector to produce output class probabilities.

Can a convolutional neural network be built with perceptrons?

I was reading this interesting article on convolutional neural networks. It showed this image, explaining that for every receptive field of 5x5 pixels/neurons, a value for a hidden value is calculated.
We can think of max-pooling as a way for the network to ask whether a given feature is found anywhere in a region of the image. It then throws away the exact positional information.
So max-pooling is applied.
With multiple convolutional layers, it looks something like this:
But my question is, this whole architecture could be build with perceptrons, right?
For every convolutional layer, one perceptron is needed, with layers:
input_size = 5x5;
hidden_size = 10; e.g.
output_size = 1;
Then for every receptive field in the original image, the 5x5 area is inputted into a perceptron to output the value of a neuron in the hidden layer. So basically doing this for every receptive field:
So the same perceptron is used 24x24 amount of times to construct the hidden layer, because:
is that we're going to use the same weights and bias for each of the 24×24 hidden neurons.
And this works for the hidden layer to the pooling layer as well, input_size = 2x2; output_size = 1;. And in the case of a max-pool layer, it's just a max() function on an array.
and then finally:
The final layer of connections in the network is a fully-connected
layer. That is, this layer connects every neuron from the max-pooled
layer to every one of the 10 output neurons.
which is a perceptron again.
So my final architecture looks like this:
-> 1 perceptron for every convolutional layer/feature map
-> run this perceptron for every receptive field to create feature map
-> 1 perceptron for every pooling layer
-> run this perceptron for every field in the feature map to create a pooling layer
-> finally input the values of the pooling layer in a regular ALL to ALL perceptron
Or am I overseeing something? Or is this already how they are programmed?
The answer very much depends on what exactly you call a Perceptron. Common options are:
Complete architecture. Then no, simply because it's by definition a different NN.
A model of a single neuron, specifically y = 1 if (w.x + b) > 0 else 0, where x is the input of the neuron, w and b are its trainable parameters and w.b denotes the dot product. Then yes, you can force a bunch of these perceptrons to share weights and call it a CNN. You'll find variants of this idea being used in binary neural networks.
A training algorithm, typically associated with the Perceptron architecture. This would make no sense to the question, because the learning algorithm is in principle orthogonal to the architecture. Though you cannot really use the Perceptron algorithm for anything with hidden layers, which would suggest no as the answer in this case.
Loss function associated with the original Perceptron. This notion of Peceptron is orthogonal to the problem at hand, you're loss function with a CNN is given by whatever you try to do with your whole model. You can eventually use it, but it is non-differentiable, so good luck :-)
A sidenote rant: You can see people refer to feed-forward, fully-connected NNs with hidden layers as "Multilayer Perceptrons" (MLPs). This is a misnomer, there are no Perceptrons in MLPs, see e.g. this discussion on Wikipedia -- unless you go explore some really weird ideas. It would make sense call these networks as Multilayer Linear Logistic Regression, because that's what they used to be composed of. Up till like 6 years ago.

Linear vs nonlinear neural network?

I'm new to machine learning and neural networks. I know how to build a nonlinear classification model, but my current problem has a continuous output. I've been searching for information on neural network regression, but all I encounter is information on linear regression - nothing about nonlinear cases. Which is odd, because why would someone use neural networks to solve a simple linear regression anyway? Isn't that like killing a fly with a nuclear bomb?
So my question is this: what makes a neural network nonlinear? (Hidden layers? Nonlinear activation function?) Or do I have a completely wrong understanding of the word "linear" - can a linear regression NN accurately model datasets that are more complex than y=aX+b? Is the word "linear" used just as the opposite of "logistic"?
(I'm planning to use TensorFlow, but the TensorFlow Linear Model Tutorial uses a binary classification problem as an example, so that doesn't help me either.)
For starters, a neural network can model any function (not just linear functions) Have a look at this - http://neuralnetworksanddeeplearning.com/chap4.html.
A Neural Network has got non linear activation layers which is what gives the Neural Network a non linear element.
The function for relating the input and the output is decided by the neural network and the amount of training it gets. If you supply two variables having a linear relationship, then your network will learn this as long as you don't overfit. Similarly, a complex enough neural network can learn any function.
WARNING: I do not advocate the use of linear activation functions only, especially in simple feed forward architectures.
Okay, I think I need to take some time and rewrite this answer explicitly because many people are misinterpreting the point I am trying to make.
First let me point out that we can talk about linearity in parameters or linearity in the variables.
The activation function is NOT necessarily what makes a neural network non-linear (technically speaking).
For example, notice that the following regression predicted values are considered linear predictions, despite non-linear transformations of the inputs because the output constitutes a linear combination of the parameters (although this model is non-linear in its variables):
Now for simplicity, let us consider a single neuron, single layer neural network:
If the transfer function is linear then:
As you have already probably noticed, this is a linear regression. Even if we were to add multiple inputs and neurons, each with a linear activation function, we would now only have an ensemble of regressions (all linear in their parameters and therefore this simple neural network is linear):
Now going back to (3), let's add two layers, so that we have a neural network with 3 layers, one neuron each (both with linear activation functions):
(first layer)
(second layer)
Now notice:
Reduces to:
Where and
Which means that our two layered network (each with a single neuron) is not linear in its parameters despite every activation function in the network being linear; however, it is still linear in the variables. Thus, once training has finished the model will be linear in both variables and parameters. Both of these are important because you cannot replicate this simple two layered network with a single regression and still capture all the effects of the model. Further, let me state clearly: if you use a model with multiple layers there is no guarantee that the output will be non-linear in it's variables (if you use a simple MLP perceptron and line activation functions your picture is still going to be a line).
That being said, let's take a look at the following statement from #Pawu regarding this answer:
The answer is very misleading and makes it sound, that we can learn non-linear relationships using only linear transformations, which is simply not true. When we back-propagate, we take the derivative of a single weight w1 and fix everything else. Now as mentioned above, we are still moving on a linear function.
While you could argue that what #Pawu is saying is technically true, I think they are implying:
The answer is very misleading and makes it sound, that we can learn non-linear relationships using only linear activation functions, which is simply not true.
I would argue that this modified statement is wrong and can easily be demonstrated incorrect. There is an implicit assumption being made about the architecture of the model. It is true that if you restrict yourself to using certain network architectures that you cannot introduce non-linearities without activation functions, but that is a arbitrary restriction and does not generalize to all network models.
Let me make this concrete. First take a simple xor problem. This is a basic classification problem where you are attempting to establish a boundary between data points in a configuration like so:
The kicker about this problem is that it is not linearly separable, meaning no single straight line will be able to perfectly classify. Now if you read anywhere on the internet I am sure they will say that this problem cannot be solved using only linear activation functions using a neural network (notice nothing is said about the architecture). This statement is only true in an extremely limited context and wrong generally.
Allow me to demonstrate. Below is a very simple hand written neural network. This network takes randomly generated weights between -1 and 1, an "xor_network" function which defines the architecture (notice no sigmoid, hardlims, etc. only linear transformations of the form mX or MX + B), and trains using standard backward propagation:
#%% Packages
import numpy as np
#%% Data
data = np.array([[0, 0, 0],[0, 1, 1],[1, 0, 1],[1, 1, 0]])
np.random.shuffle(data)
train_data = data[:,:2]
target_data = data[:,2]
#%% XOR architecture
class XOR_class():
def __init__(self, train_data, target_data, alpha=.1, epochs=10000):
self.train_data = train_data
self.target_data = target_data
self.alpha = alpha
self.epochs = epochs
#Random weights
self.W0 = np.random.uniform(low=-1, high=1, size=(2)).T
self.b0 = np.random.uniform(low=-1, high=1, size=(1))
self.W2 = np.random.uniform(low=-1, high=1, size=(2)).T
self.b2 = np.random.uniform(low=-1, high=1, size=(1))
#xor network (linear transfer functions only)
def xor_network(self, X0):
n0 = np.dot(X0, self.W0) + self.b0
X1 = n0*X0
a = np.dot(X1, self.W2) + self.b2
return(a, X1)
#Training the xor network
def train(self):
for epoch in range(self.epochs):
for i in range(len(self.train_data)):
# Forward Propagation:
X0 = self.train_data[i]
a, X1 = self.xor_network(X0)
# Backward Propagation:
e = self.target_data[i] - a
s_2 = -2*e
# Update Weights:
self.W0 = self.W0 - (self.alpha*s_2*X0)
self.b0 = self.b0 - (self.alpha*s_2)
self.W2 = self.W2 - (self.alpha*s_2*X1)
self.b2 = self.b2 - (self.alpha*s_2)
#Restart training if we get lost in the parameter space.
if np.isnan(a) or (a > 1) or (a < -1):
print('Bad initialization, reinitializing.')
self.W0 = np.random.uniform(low=-1, high=1, size=(2)).T
self.b0 = np.random.uniform(low=-1, high=1, size=(1))
self.W2 = np.random.uniform(low=-1, high=1, size=(2)).T
self.b2 = np.random.uniform(low=-1, high=1, size=(1))
self.train()
#Predicting using the trained weights.
def predict(self, test_data):
for i in train_data:
a, X1 = self.xor_network(i)
#I cut off decimals past 12 for convienience, not necessary.
print(f'input: {i} - output: {np.round(a, 12)}')
Now let's take a look at the output:
#%% Execution
xor = XOR_class(train_data, target_data)
xor.train()
np.random.shuffle(data)
test_data = data[:,:2]
xor.predict(test_data)
input: [1 0] - output: [1.]
input: [0 0] - output: [0.]
input: [0 1] - output: [1.]
input: [1 1] - output: [0.]
And what do you know, I guess we can learn non-linear relationships using only linear activation functions and multiple layers (that's right classification with pure line activation functions, no sigmoid needed). . .
The only catch here is that I cut off all decimals past 12, but let's be honest 7.3 X 10^-16 is basically 0.
Now to be fair I am doing a little trick, where I am using the network connections to get the non-linear result, but that's the whole point I am trying to drive home: THE MAGIC OF NON-LINEARITY FOR NEURAL NETWORKS IS IN THE LAYERS, NOT JUST THE ACTIVATION FUNCTIONS.
Thus the answer to your question, "what makes a neural network non-linear" is: non-linearity in the parameters or, obviously, non-linearity in the variables.
This non-linearity in the parameters/variables comes about two ways: 1) having more than one layer with neurons in your network (as exhibited above), or 2) having activation functions that result in weight non-linearities.
For an example on non-linearity coming about through activation functions, suppose our input space, weights, and biases are all constrained such that they are all strictly positive (for simplicity). Now using (2) (single layer, single neuron) and the activation function , we have the following:
Which Reduces to:
Where , , and
Now, ignoring what issues this neural network has, it should be clear, that at the very least, it is non-linear in the parameters and variables and that non-linearity has been introduced solely by choice of the activation function.
Finally, yes neural networks can model complex data structures that cannot be modeled by using linear models (see xor example above).
EDIT:
As pointed out by #hH1sG0n3, non-linearity in the parameters does not follow directly from many common activation functions (e.g. sigmoid). This is not to say that common activation functions do not make neural networks nonlinear (because they are non-linear in the variables), but that the non-linearity introduced by them is degenerate without parameter non-linearity. For example, a single layered MLP with sigmoid activation functions will produce outputs that are non-linear in the variables in that the output is not proportional to the input, but in reality this is just an array of Generalized Linear Models. This should be especially obvious if we were to transform the targets by the appropriate link function, where now the activation functions would be linear. Now this is not to say that activation functions don't play an important role in the non-linearity of neural networks (clearly they do), but that their role is more to alter/expand the solution space. Said differently, non-linearities in the parameters (usually expressed through many layers/connections) are necessary for non-degenerate solutions that go beyond regression. When we have a model with non-linearity in the parameters we have a whole different beast than regression.
At the end of the day all I want to do with this post is point out that the "magic" of neural networks is also in the layers and to dispel the ubiquitous myth that a multilayered neural network with linear activation functions is always just a bunch of linear regressions.
When it comes to nonlinear regression, this is referring to how the weights affect the output. If a function is not linear with respect to the weights, then your problem is a nonlinear regression problem. So for example, let's look at a Feedforward Neural Network with one hidden layer where the activation functions in the hidden layer are some function and the output layer has linear activation functions. Given this, the mathematical representation can be:
where we assume can operator on scalars and vectors with this notation to make it easy. , , , and are the weight you are aiming to estimate with the regression. If this was linear regression, would equal z, because that would make y linearly dependent on & . But if is nonlinear, say like , then now y is nonlinearly dependent on the weights .
Now provided you understand all that, I am surprised you haven't seen discussion of the nonlinear case because that's pretty much all people talk about in textbooks and research. The use of things like stochastic gradient descent, Nonlinear Conjugate Gradient, RProp, and other methods are to help find local minima (and hopefully good local minima) for these nonlinear regression problems, even though a global optimum is not typically guaranteed.
Any non-linearity from the input to output makes the network non-linear. In the way we usually think about and implement neural networks, those non-linearities come from activation functions.
If we are trying to fit non-linear data and only have linear activation functions, our best approximation to the non-linear data will be linear since that's all we can compute. You can see an example of a neural network trying to fit non-linear data with only linear activation functions here.
However, if we change the linear activation function to something non-linear like ReLu, then we can see a better non-linear fitting of the data. You can see that here.
I do not have enough reputation to comment on itwasthekix post, but I want to share my insight.
Someone asked in the comments whether equation 8 was linear, and the
answer was, that if w1 were to be varied when all else is constant
we would move up and down a non-linear function. This is not true.
When we vary w1, we essentially only change the output of z1 = (w1*p + b1). Since z1 is linearly transformed later, we will still
move an a linear function. If we were to fix everything except w1
AND w2, then we would move on a non-linear function.
If a multi-layer ANN is non-linear in parameters, because we have a
multiplication of parameters. That does not mean it can learn non-linear relationships.
The answer is very misleading and makes it sound, that we can
learn non-linear relationships using only linear transformations,
which is simply not true. When we back-propagate, we take the derivative of a single weight w1 and fix everything else. Now as mentioned above, we are still moving on a linear function.
If we take the gradient of w1*w2 and perform gradient descent, we only know the joint gradient, there is no way to determine the influence of the separate parameters without fixing one of them. And if we fix one, we move on a linear function.
If we add an (non-linear) activation function, we linearly transform a non-linear output enabling us to learn non-linear relationships, since we do not move on a linear function anymore.
Lets look at the case z = w2 * g(w1 * p + b1) + b2 assuming g is a non-linear activation function. Then if we fix everything and vary w1, we will move on a non-linear function, since w1 * p + b1 is transformed by g.
Non-linearity means different things in communities of regression analysis and neural network machine learning.
In regression analysis, when we say a fitting model is nonlinear, we mean that the model is nonlinear in terms of its parameters (not in terms of the independent variables).
A multiple-layer neural network is usually nonlinear in terms of the weights even the activation function is linear. This is simple to see because the information propagating in the network corresponds to function composition: f3(f2(f1())), which generally gives nonlinear functions of weights. Therefore, in terms of regression analysis, all neural networks are nonlinear models.
However in the community of neural network, people talk about the linearity in terms of input variables, rather than the weights/biases. Therefore, they define a neutral network with linear activation functions as linear and that with nonlinear activation function as nonlinear.
I had the same struggle, most online courses use ANNs for classification, but you never actually solve a regression problem with them in the courses.
What does make an ANN non-linear? The activation function.
Even if you have an ANN with thousands of perceptrons and hidden units, if all the activations are linear (or not activated at all) you are just training a plain linear regression.
But be careful, some activations functions (like sigmoid), have a range of values that act as a linear function and you may get stuck with a linear model even with non-linear activations.
How to predict continuous output with an ANN? The same way as when you classify.
It is the same problem, you just backpropagate the error (label - prediction) and update the weights. But don't forget to CHANGE THE ACTIVATION FUNCTION of the output layer to a continuous function (maybe ReLu if all labels are positive or don't activate the output at all), the intermediate hidden layers can be activated however you wish.
For small regression problems with ANNs you may need to start with a veeeeeery small learning rate since there will be lots of variance since the error will be "unbounded" at first.
Hope this helps :)
I don't want to be impolite, but the current answers are all related to nonlinear ND-polynomials resulting from linear activation functions. That simply doesn't make sense in terms of this question.
I get the point because you will have a polynomial as the objective function to minimize with coefficients that are products of layer coefficients and a product is nonlinear. Anyway, such a system will never be able to converge and doesn't make sense at all without any extra constraints.
The described system is not only completely unnecessarily nonlinear, but also ill-posed. Don't argue about stuff that leads ad absurdum. The original question actually completely nailed it.
Build a "linear neural network" with layers and try to use it as usual... then you will realise that this goes nowhere and you wasted your time.
So unless there is good reasons to believe this kind of ill-posed stuff has been handled I would never ever consider using a linear activation function. If you have extra constraints this might make sense. If you use stochastic gradient descent then you will at least skip some bad properties of it.
That the objective function is nonlinear in its parameters gives an impression that is wrong and bogus. And if the writer would have known about optimisation problems connected to terms with a product of coefficients he would have never written anything like this.
Any objective function can be made nonlinear. If you just replace one linear coefficient with a product of two coefficients. But that is nonsense because you can never determine those coefficients. NEVER. There are infinitely many solutions! And that doesn't even depend on the amount of data.
Because the activation is w*x, which is linear operation, so you need to have extra elements to make it non-linear.

the mathematical function that relates a given output to three inputs through artifical nural network analysis

I have a data set with three input variables and one output. I want to fit the target column to the input data using an artificial neural network with one layer, for example. My concern is, can I obtain a closed form or analytical formula that links the modeled target to the three input variable? Please let me know whether the software can process that.
Technically, you can always translate a one-layer neural net into a closed form solution. Let's call your input variables x, y, and z, your bias node b, and the connections between them and your output cx, cy, cz, and cb respectively. Once you've trained the neural network so that the connection weights are adjusted, the output value should equal xcx + ycy + zcz + cb. (technically the bias term is bcb, but b = 1).
If you were using a more complicated neural network, this equation would get a lot more complicated, particularly as you add hidden layers, but theoretically you can always write it out analytically. It just won't always be tractable to evaluate the analytic form.
For a more detailed answer, see: Deriving equation by weights and biases from a neural network