CAFFE: Run forward pass with fewer nodes in FC layer - neural-network

I am trying to perform an experiment in Caffe with a very simple single hidden layer NN. I am using the MNIST dataset trained with a single hidden layer (of 128 nodes). I have all the weights from the fully trained network already.
However, during the feed forward stage I would like to use only a smaller subset of these nodes i.e 32 or 64. So for example, I would like to calculate the activations of 64 nodes during the feed forward pass and save them. then during the next run, calculate the activations of the other 64 nodes and combine them with the activations of the first 64 so i get the activations of all 128 nodes. Thus calculating the activations of all 128 nodes but in two 'passes'.
Is there a way to achieve this in Caffe? Please excuse me as I am very new to Caffe ( just started using it this week! ).

Related

Building a neural network chess engine

I am building a single layer NN to play chess.
The input is a matrix[n,m] of n training set, 64 features each where each feature represents a square.
The training output is a matrix[n,1] where each output is a centipawn score obtained from stockfish. For example, the score can be 100, -200, 1405, etc etc.
I want to build this NN that has 64 input nodes, 500 hidden nodes, and 1 output node for outputting a score.
I know that NN is used for classification, but I was wondering if I can build a NN that can output arbitrary integers as well?
Thanks

Neural net fitting in matlab

I am trying to find the optimum number of neurons to use to run the Neural Net Fitting tool in Neural Networks Matlab app.
I am currently using 62000 samples of 64 elements as input and 62000 samples of 1 element as target. I tried to obtain similar results as in data obtained through other means, but the results are not even similar when trying to run the tool with 1-12 neurons. I tried running it with 64 neurons and the results were closer to what it was expected.
Is there any kind of way to know how many neurons to use based on the number of elements/samples?
Any suggestions on how to select the number of neurons when running the tests?
Thanks.
Even for simple datasets like MNIST I will at minimum use 128 neurons. Possible values to check are 128, 256, 512, and maybe 1024. These numbers are just easy to remember and are not magical nor the consequence of a known formula. Alternatively, choose a few random samples from [100, 500] and see which number of neurons worked best. Harder tasks tend to require more neurons, and when you have many neurons you need to consider regularizing your network with L_2 regularization or dropout.

Hybrid SOM (with MLP)

Could someone please provide some information on how to properly combine a self organizing map with a multilayer perceptron?
I recently read some articles about this technique in comparison to regular MLPs and it performed way better in prediction tasks. So, I want to use the SOM as front-end for dimension reduction by clustering the input data and pass the results to an MLP back-end.
My current idea of implementing it is it to train the SOM with a couple of training sets and to determine the clusters. Afterwards, I initialize the MLP with as many input units as SOM clusters. Next step would be to train the MLP using the SOM's output (which value?...weights of BMU?) as in input for the network (SOM's Output for the Cluster matching Input Unit and zeros for any other Input Units?).
There is no single way of doing that. Let me list some possibilities:
The one you describe. But then, your MLP will need to have K*D inputs, where K is the number of clusters and D is the input dimension. There is no dimensionality reduction.
Similar to your idea, but instead of using the weights, just send 1 for the BMU and 0 for the remaining clusters. Then your MLP will need K inputs.
Same as above, but instead of 1 or 0, send the distance from the input vector to each cluster.
Same as above, but instead of the distance, compute a Gaussian activation for each cluster.
Since the SOM preserves topology, send only the 2D coordinates of the BMU (possibly normalized between 0 and 1). Then your MLP will need only 2 inputs and you achieve real extreme dimensionality reduction.
You can read about those ideas and some more here: Principal temporal extensions of SOM: Overview. It is not about feeding the output of a SOM to a MLP, but a SOM to itself. But you'll be able to understand the various possibilities when trying to produce some output from a SOM.

How can I add concurrency to neural network processing?

The basics of neural networks, as I understand them, is there are several inputs, weights and outputs. There can be hidden layers that add to the complexity of the whole thing.
If I have 100 inputs, 5 hidden layers and one output (yes or no), presumably, there will be a LOT of connections. Somewhere on the order of 100^5. To do back propagation via gradient descent seems like it will take a VERY long time.
How can I set up the back propagation in a way that is parallel (concurrent) to take advantage of multicore processors (or multiple processors).
This is a language agnostic question because I am simply trying to understand structure.
If you have 5 hidden layers (assuming with 100 nodes each) you have 5 * 100^2 weights (assuming the bias node is included in the 100 nodes), not 100^5 (because there are 100^2 weights between two consecutive layers).
If you use gradient descent, you'll have to calculate the contribution of each training sample to the gradient, so a natural way of distributing this across cores would be to spread the training sample across the cores and sum the contributions to the gradient in the end.
With backpropagation, you can use batch backpropagation (accumulate weight changes from several training samples before updating the weights, see e.g. https://stackoverflow.com/a/11415434/288875 ).
I would think that the first option is much more cache friendly (updates need to be merged only once between processors in each step).

Artificial neural network presented with unclassified inputs

I am trying to classify portions of time series data using a feed forward neural network using 20 neurons in a single hidden layer, and 3 outputs corresponding to the 3 events I would like to be able to recognize. There are many other things that I could classify in the data (obviously), but I don't really care about them for the time being. Neural network creation and training has been performed using Matlab's neural network toolbox for pattern recognition, as this is a classification problem.
In order to do this I am sequentially populating a moving window, then inputting the window into the neural network. The issue I have is that I am obviously not able to classify and train every possible shape the time series takes on. Due to this, I typically get windows filled with data that look very different from the windows I used to train the neural network, but still get outputs near 1.
Essentially, the 3 things I trained the ANN with are windows of 20 different data sets that correspond to shapes that would correspond to steady state, a curve that starts with a negative slope and levels off to 0 slope (essentially the left half side of a parabola that opens upwards), and a curve corresponding to 0 slope that quickly declines (right half side of a parabola that opens downwards).
Am I incorrect in thinking that if I input data that doesn't correspond to any of the items I trained the ANN with it should output values near 0 for all outputs?
Or is it likely due to the fact that these basically cover all the bases of steady state, increasing and decreasing, despite large differences in slope, and therefore something is always classified?
I guess I just need a nudge in the right direction.
Neural network output values
A neural network may not guarantee specific output values if these input values / expected output values were presented during the training period.
A neural network will not consistently output 0 for untrained input values.
A solution is to simply present the network with an array of input values that should result in the network outputting 0.