I'm using MATLABs ODE suite to solve a system of 13 differential equations that determine the behavior of a neuron. Now I'd like to add a second neuron, that is operated by the same set of differential equations, but is influenced by the first neuron. More importantly, this second neuron will also influence the first neuron. (Feedforward and feedback between these two cells.)
Is there a convenient way to do this? Can I distribute the differential equations over two function files or do I have to copy them below the original ones, such that there is a longer list of equations in the same file? I'd like to be able to have one file per cell and to somehow keep this organized. (Also in case I might want to expand it again to three or four neurons.)
If my question is in any way unclear or not specific enough, please indicate. I'll try and explain what I'm doing/trying better.
You have to split the big vector of all variables into the sub-arrays of the variables of each neuron, call each file with its variables and then concatenate the resulting direction vectors.
If the neurons behave similarly, you should think about having one method (file) with a loop inside for the neuron function-internal part of the direction and after that probably a double loop for the interaction terms. Put the connectivity information into a data structure to be flexible in changing it.
I have little experience with MATLAB, but one way that I've seen this done in MATLAB is by creating a list (1D matrix?) for each state variable that requires storage. For example, implementing the Hodgkin-Huxley neuron would require a matrix each for gating variables 'm', 'h', and 'n', as well as one for 'V'. Each list is as long as the number of neurons in the simulations. Then make the ith position in the list corresponds to the ith neuron.
The flow of the simulation would look like the following (let N be the number of neurons):
For each time step in the simulation,
1) let 'index = 1'
2) call the system of ODEs in your file using, as arguments, the 1st element from each list/matrix of state variables.
3) add one to the index. If the index is now greater than N, then move the timestep forward by one and start over at (1).
It sounds like you'll also need matrices to store information about the influence of one another. While I know a lot of people do it this way, it seems cumbersome on a larger scale (especially if you ever incorporate neurons with different sets of ODE's). In the long run, I highly recommend migrating to a more object-oriented approach. They should provide a way easier method to 'bind' each instance of a neuron with its variables and equations, and creating any number of neurons will require no additional code.
http://www.mathworks.com/discovery/object-oriented-programming.html
Related
I was running a linear multiple regression as well as a logistic multiple regression in SPSS.
After that when looking at the results, I realised that in each regression, one independent variable was automatically excluded by SPSS. Did we do something wrong here or what do we have to do in order to have all independent variables included in the regression?
thanks for your help!
In LOGISTIC REGRESSION you can specify a predictor variable as categorical, and thus make use of a single categorical variable. If you do this, you can specify the type of contrast coding to use and choose which category would be used as the reference category.
The REGRESSION procedure doesn't have facilities for declaring predictors categorical, so if you have an intercept or constant in the model (which of course is the default) and you try to enter K dummy or indicator variables for a K-level categorical variable, one of them will be linearly dependent on the intercept and the other K-1 dummies, and as Kevin said, will be left out.
You may be confused by which one the procedure leaves out. When the default ENTER method is used in REGRESSION, it tries to enter all the independent variables, but does it one variable at a time and performs a tolerance check to avoid numerical problems associated with redundant or nearly redundant variables. The way it does this results in the last predictor specified being the first to enter, and after that, the variable with the smallest absolute correlation with that previously entered variable, then the remaining candidate variable with the least relationship with the two entered already, etc. In cases where there are ties on tolerance values, the later listed variable gets entered.
What does it mean to "unroll a RNN dynamically". I've seen this specifically mentioned in the Tensorflow source code, but I'm looking for a conceptual explanation that extends to RNN in general.
In the tensorflow rnn method, it is documented:
If the sequence_length vector is provided, dynamic calculation is
performed. This method of calculation does not compute the RNN steps
past the maximum sequence length of the minibatch (thus saving
computational time),
But in the dynamic_rnn method it mentions:
The parameter sequence_length is optional and is used to
copy-through state and zero-out outputs when past a batch element's
sequence length. So it's more for correctness than performance,
unlike in rnn().
So does this mean rnn is more performant for variable length sequences? What is the conceptual difference between dynamic_rnn and rnn?
From the documentation I understand that what they are saying is that the parameter sequence_length in the rnn method affects the performance because when set, it will perform dynamic computation and it will stop before.
For example, if the rnn largest input sequence has a length of 50, if the other sequences are shorter it will be better to set the sequence_length for each sequence, so that the computation for each sequence will stop when the sequence ends and won't compute the padding zeros until reaching 50 timesteps. However, if sequence_length is not provided, it will consider each sequence to have the same length, so it will treat the zeros used for padding as normal items in the sequence.
This does not mean that dynamic_rnn is less performant, the documentation says that the parameter sequence_length will not affect the performance because the computation is already dynamic.
Also according to this post about RNNs in Tensorflow:
Internally, tf.nn.rnn creates an unrolled graph for a fixed RNN length. That means, if you call tf.nn.rnn with inputs having 200 time steps you are creating a static graph with 200 RNN steps. First, graph creation is slow. Second, you’re unable to pass in longer sequences (> 200) than you’ve originally specified.
tf.nn.dynamic_rnn solves this. It uses a tf.While loop to dynamically construct the graph when it is executed. That means graph creation is faster and you can feed batches of variable size. What about performance? You may think the static rnn is faster than its dynamic counterpart because it pre-builds the graph. In my experience that’s not the case.
In short, just use tf.nn.dynamic_rnn. There is no benefit to tf.nn.rnn and I wouldn’t be surprised if it was deprecated in the future.
dynamic_rnn is even faster (or equal) so he suggests to use dynamic_rnn anyway.
To better understand dynamic unrolling, consider how you would create RNN from scratch, but using Tensorflow (I mean without using any RNN library) for 2 time stamp input
Create two placeholders, X1 and X2
Create two variable weights, Wx and Wy, and bias
Calculate output, Y1 = fn(X1 x Wx + b) and Y2 = fn(X2 x Wx + Y1 x Wy + b).
Its clear that we get two outputs, one for each timestamp. Keep in mind that Y2 indirectly depends on X2, via Y1.
Now consider you have 50 timestamp of inputs, X1 through X50. In this case, you'll have to create 50 outputs, Y1 through Y50. This is what Tensorflow does by dynamic unrolling
It creates these 50 outputs for you via tf.dynamic_rnn() units.
I hope this helps.
LSTM (or GRU) cell are the base of both.
Imagine an RNN as a stacked deep net with
weights sharing (=weights and biases matrices are the same in all layers)
input coming "from the side" into each layer
outputs are interpreted in higher layers (i.e. decoder), one in each layer
The depth of this net should depend on (actually be equal to) actual input and output lengths. And nothing else, as weights are the same in all the layers anyway.
Now, the classic way to build this is to group input-output-pairs into fixed max-lengths (i.e. model_with_buckets()). DynRNN breaks with this constraint and adapts to the actual sequence-lengths.
So no real trade-off here. Except maybe that you'll have to rewrite older code in order to adapt.
I'm trying to create a sample neural network that can be used for credit scoring. Since this is a complicated structure for me, i'm trying to learn them small first.
I created a network using back propagation - input layer (2 nodes), 1 hidden layer (2 nodes +1 bias), output layer (1 node), which makes use of sigmoid as activation function for all layers. I'm trying to test it first using a^2+b2^2=c^2 which means my input would be a and b, and the target output would be c.
My problem is that my input and target output values are real numbers which can range from (-/infty, +/infty). So when I'm passing these values to my network, my error function would be something like (target- network output). Would that be correct or accurate? In the sense that I'm getting the difference between the network output (which is ranged from 0 to 1) and the target output (which is a large number).
I've read that the solution would be to normalise first, but I'm not really sure how to do this. Should i normalise both the input and target output values before feeding them to the network? What normalisation function is best to use cause I read different methods in normalising. After getting the optimized weights and use them to test some data, Im getting an output value between 0 and 1 because of the sigmoid function. Should i revert the computed values to the un-normalized/original form/value? Or should i only normalise the target output and not the input values? This really got me stuck for weeks as I'm not getting the desired outcome and not sure how to incorporate the normalisation idea in my training algorithm and testing..
Thank you very much!!
So to answer your questions :
Sigmoid function is squashing its input to interval (0, 1). It's usually useful in classification task because you can interpret its output as a probability of a certain class. Your network performes regression task (you need to approximate real valued function) - so it's better to set a linear function as an activation from your last hidden layer (in your case also first :) ).
I would advise you not to use sigmoid function as an activation function in your hidden layers. It's much better to use tanh or relu nolinearities. The detailed explaination (as well as some useful tips if you want to keep sigmoid as your activation) might be found here.
It's also important to understand that architecture of your network is not suitable for a task which you are trying to solve. You can learn a little bit of what different networks might learn here.
In case of normalization : the main reason why you should normalize your data is to not giving any spourius prior knowledge to your network. Consider two variables : age and income. First one varies from e.g. 5 to 90. Second one varies from e.g. 1000 to 100000. The mean absolute value is much bigger for income than for age so due to linear tranformations in your model - ANN is treating income as more important at the beginning of your training (because of random initialization). Now consider that you are trying to solve a task where you need to classify if a person given has grey hair :) Is income truly more important variable for this task?
There are a lot of rules of thumb on how you should normalize your input data. One is to squash all inputs to [0, 1] interval. Another is to make every variable to have mean = 0 and sd = 1. I usually use second method when the distribiution of a given variable is similiar to Normal Distribiution and first - in other cases.
When it comes to normalize the output it's usually also useful to normalize it when you are solving regression task (especially in multiple regression case) but it's not so crucial as in input case.
You should remember to keep parameters needed to restore the original size of your inputs and outputs. You should also remember to compute them only on a training set and apply it on both training, test and validation sets.
In my undergrad thesis I am creating a neural network to control automated shifting algorithm for a vehicle.
I have created the nn from scratch starting from .m script which works correctly. I tested it to recognize some shapes.
A brief background information;
NN rewires neurons which are mathematical blocks located in a layer. There are multiple layers. output of a layer is input of preceding layer. Actual output is subtracted from known output and error is obtained by this manner. By using back propagation algorithm which are some algebraic equation the coefficient of neurons are updated.
What I want to do is;
in code there are 6 input matrices, don't have to be matrix just anything and corresponding outputs. lets call them as x(i) matrices and y(i) vectors. In for loop I go through each matrix and vector to teach the network. Finally by using last known updated coeffs networks give some responses according to unknown input.
I couldn't find the way that, how to simulate that for loop in simulink to go through each different input and output pairs. When the network is done with one pair it should change the input and compare with corresponding output then update the coefficient matrices.
I model the layers as given and just fed with one input but I need multiple.
When it comes to automatic transmission control issue it should do all this real time. It should continuously read the output and updates the coeffs and gives the decision.
Check out the "For each Subsystem". Exists since 2011b
To create the input signals you use the "Concatenate" Block which would have six inputs in your case, and a three dimensional output x.dim = [1x20x6] then you could iterate over the third dimension...
A very useful pattern to create smaller models that run faster and to keep your code DRY (Dont repeat yourself)
I have two datasets at the time (in the form of vectors) and I plot them on the same axis to see how they relate with each other, and I specifically note and look for places where both graphs have a similar shape (i.e places where both have seemingly positive/negative gradient at approximately the same intervals). Example:
So far I have been working through the data graphically but realize that since the amount of the data is so large plotting each time I want to check how two sets correlate graphically it will take far too much time.
Are there any ideas, scripts or functions that might be useful in order to automize this process somewhat?
The first thing you have to think about is the nature of the criteria you want to apply to establish the similarity. There is a wide variety of ways to measure similarity and the more precisely you can describe what you want for "similar" to mean in your problem the easiest it will be to implement it regardless of the programming language.
Having said that, here is some of the thing you could look at :
correlation of the two datasets
difference of the derivative of the datasets (but I don't think it would be robust enough)
spectral analysis as mentionned by #thron of three
etc. ...
Knowing the origin of the datasets and their variability can also help a lot in formulating robust enough algorithms.
Sure. Call your two vectors A and B.
1) (Optional) Smooth your data either with a simple averaging filter (Matlab 'smooth'), or the 'filter' command. This will get rid of local changes in velocity ("gradient") that appear to be essentially noise (as in the ascending component of the red trace.
2) Differentiate both A and B. Now you are directly representing the velocity of each vector (Matlab 'diff').
3) Add the two differentiated vectors together (element-wise). Call this C.
4) Look for all points in C whose absolute value is above a certain threshold (you'll have to eyeball the data to get a good idea of what this should be). Points above this threshold indicate highly similar velocity.
5) Now look for where a high positive value in C is followed by a high negative value, or vice versa. In between these two points you will have similar curves in A and B.
Note: a) You could do the smoothing after step 3 rather than after step 1. b) Re 5), you could have a situation in which a 'hill' in your data is at the edge of the vector and so is 'cut in half', and the vectors descend to baseline before ascending in the next hill. Then 5) would misidentify the hill as coming between the initial descent and subsequent ascent. To avoid this, you could also require that the points in A and B in between the two points of velocity similarity have high absolute values.