Regularly, a simple neural network to solve XOR should have 2 inputs, 2 neurons in hidden layer, 1 neuron in output layer.
However, the following example implementation has 2 output neurons, and I don't get it:
https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/feedforward/xor/XorExample.java
Why did the author put 2 output neurons in there?
Edit:
Author of the example noted that he is using 4 neurons in hidden layer, 2 neurons in output layer. But I still don't get it why, why a shape of {4,2} instead of {2,1}?
This is called one hot encoding. The idea is that you have one neuron per class. Each neuron gives the probability of that class.
I don't know why he uses 4 hidden neurons. 2 should be enough (if I remember correctly).
The author uses the Evaluation class in the end (for stats of how often the network gives the correct result). This class needs one neuron per classification to work correctly, i.e. one output neuron for true and one for false.
It might be helpful to think of it like this:
Training Set Label Set
0 | 1 0 | 1
0 | 0 | 0 0 | 0 | 1
1 | 1 | 0 1 | 1 | 0
2 | 0 | 1 2 | 1 | 0
3 | 1 | 1 3 | 0 | 1
So [[0,0], 0], [[0,1], 0], etc. for the Training Set.
If you're using the two column Label Set, 0 and 1 correspond to true or false.
Thus, [0,0] correctly maps to false, [1,0] correctly maps to true, etc.
A pretty good article that slightly modifies the original can be found here: https://medium.com/autonomous-agents/how-to-teach-logic-to-your-neuralnetworks-116215c71a49
Related
i have the following type of data:
*.edge file has the connections between ids of different users:
1 23
4 67
...
*.feat contains properties of the ids. Here the first column (column 0) are the userids. The other ones are representing features named in another file. For example userid 1 does not have the feature of column 1 (0), but userid 4 does (1):
1: 0 0 1 0 1 1 0 1 1
4: 1 0 1 1 1 0 1 1 1
...
Now i want to cluster the data and want to use different algorithms like k-means, DBSCAN, hierarchical clustering and so on. But as i read, there are several problems with multidimensional data?
There are problems with very high-dimensional data, but 10 is not high. You have other problems: k-means needs coordinates to compute means, not a graph with edges. Also, the values should be continuous, not binary. You need to study these methods in more detail. If you say "But as I read ...", then try to give a reference.
Suppose a dataset comprises independent variables that are continuous and binary variables. Usually the label/outcome column is converted to a one hot vector, whereas continuous variables can be normalized. But what needs to be applied for binary variables.
AGE RACE GENDER NEURO EMOT
15.95346 0 0 3 1
14.57084 1 1 0 0
15.8193 1 0 0 0
15.59754 0 1 0 0
How does this apply for logistic regression and neural networks?
If the range of continuous value is small, encode it into a binary form and use each bit of that binary form as a predictor.
For example, number 2 = 10 in binary.
Therefore
predictor_bit_0 = 0
predictor_bit_1 = 1
Try and see if it works. Just to warn you, this method is very subjective and may or may not yield good results for your data. I'll keep you posted if I find a better solution
there is a kind of NN that can give importance for some inputs ?
I have a problem like (actualy solved by 2 different NNs):
SITUATION 1)
inputs: 1 0 1 0 1 0 1 : target: 23
SITUATION 2)
inputs: 1 0 1 0 1 0 1 : target: 29
can I use the same NN for the both inputs, using the SITUATION as another INPUT for a single NN ?
One problem of this approach is that I have 50 different SITUATIONS.
Anyone with a good idea ?
Andre
I think your best bet would be adding another 50 input neurons and lighting one of them, signalizing your situation.
To make it smaller, you could use just 6 input neurons and light them up in binary code (situation 13 = 101100 as input for input neurons)
Other solution would be training neural network for each situation and save its weights+biases. Then for solving you would first apply weights+biases corresponding to situation you want to do and then calculate outputs.
Last option i can think of would be creating 50 different neural networks and use one you need.
I think that solution of having additional 6 neurons in binary is the riht way to go.
You can have up to 64 different situations. Adding 7th neuron can extend your situation count to 124 and every next neuron will double that
I have been fighting all day in understanding Dijkstra's algorithm and implementing with no significant results. I have a matrix of cities and their distances. What I want to do is to given an origin point and a destination point, to find the shortest path between the cities.
Example:
__0__ __1__ __2__
0 | 0 | 34 | 0 |
|-----|-----|-----|
1 | 34 | 0 | 23 |
|-----|-----|-----|
2 | 0 | 23 | 0 |
----- ----- -----
I started wondering if there is an other way to solve this. What if I apply Prim's algorithm from the origin's point and then I loop through the whole tree created until I find the destination point?
You could apply Prim's algorithm and then walk the resulting tree, but you answer may be wrong. Assume that you have a graph where each edge has the same weight. Prim's algorithm simply chooses a minimal weight edge in the set of edges that could be added to the tree. It is possible that you will not choose an edge that will lead to a shortest path between two nodes.
Assume:
__0__ __1__ __2__
0 | 0 | 1 | 1 |
|-----|-----|-----|
1 | 1 | 0 | 1 |
|-----|-----|-----|
2 | 1 | 1 | 0 |
----- ----- -----
Starting from node 0 you could, via Prim's, choose the edges 0-1 and 0-2 to make your tree. Alternately, you could pick edges 0-1 and 1-2 to make your tree. Under the first edge set, you could find the minimum length path from 0 to 2, but under the second edge set you would not find the minimal path. Since you can't a-priori determine which edges get added in the Prim algorithm, you can't use it to find a shortest path.
You could consider the Bellman-Ford algorithm, but unless you're dealing with negative edge weights I find Dijkstra's algorithm preferable.
I was asked to write an algorithm to make seven one layer perceptrons learn to show seven segments number according to 4 0-1 inputs, for example
-1 -1 -1 -1 ==> 1 1 1 1 1 1 -1 % 0
-1 -1 -1 1 ==> -1 -1 -1 -1 1 1 -1 % 1
...
can any one help me, please
So, if I'm interpreting this correctly, you give your net a binary representation of a digit and you want it to tell you what line segments are needed to display that digit seven-segments style.
Luckily, since there are only 10 digits, you can just hand write a training set where each digit is correctly matched to the segments needed, and then use the standard perceptron training algorithm: the delta rule.
This algorithm will change the weights of the network until every input pattern is associated with the correct output pattern.
Implementation note: make sure all 4 input units are connected to all 7 output units, and that all of the connection weights start out at some small random value.