It is very confusing while googling for LSTM gates , I find that some references and articles say that they are three gates: input , forget and output gates , While others say that they are four gates : learn , forget , remember, and use gate ?
So, what is the right ?
There are four gates: input modulation gate, input gate, forget gate and output gate, representing four sets of parameters.
We can see that there are four sets of parameters(8 matrices) colored in blue in the below graph of LSTM where f stands for the forget gate, g and i the add gate, o the output gate. Since the add gate needs two sets of parameters we can combine them as just one gate.
Reference:
Speech and Language Processing
Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. So there are different types of LSTMs, LSTMs have chain-like structure like RNN, but the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way. However, there are many variants of LSTMS.
Ref: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Related
I'm reading the Digital Design and Computer Architecture by David Harris, Sarah Harris. The authors give the following definition of combinational logic:
A combinational circuit’s outputs depend only on the current values of
the inputs; in other words, it combines the current input values to
compute the output... A combinational circuit is memoryless, but a
sequential circuit has memory. The functional specification of a
combinational circuit expresses the output values in terms of the
current input values.
However, they claim this circuit is not combinational:
because "node n6 connects to the output terminals of both I3 and I4". Indeed, it's one of the designated signs when a scheme can not be combinational but, according to the authors:
Certain circuits that disobey these rules are still combinational, so
long as the outputs depend only on the current values of the inputs.
As I'm able to catch on, the aforementioned circuit is the case: its output is 1 if and only if its inputs are both 1, otherwise the output is 0. So the output is defined as a function of the inputs (the AND function).
In fact, there was already a question about this circuit in the computer science network and it has an accepted answer. Here's an excerpt from it:
Circuit (d) cannot be written in this form [of formula], since the
outputs of I3 and I4 are wired together. What is the relation between
the input to the rightmost gate and the outputs of I3 and I4? Not
something that can be described combinatorially.
Unfortunately, I'm still confused due to
The circuit, regarded as a black box, is still in scope of the combinational logic definition: its output values depend only on the current values of the inputs;
The relation between the input to the rightmost gate and the outputs of I3 and I4 can be described through the function NAND of the circuit inputs and this function is quite "memoryless". It's not obvious for me why we can't afford to depict a gate input using multiple outputs of other gates.
I need some elaboration. Maybe things would fall into place if someone provide a circuit example when two gates outputs is connected to one input and it actually causes "memory" (in contrast to the considered sample).
Circuit (d) is not combinational because it is not a logic gate circuit at all.
I think it's a very silly example to explain combinational vs sequential circuits.
In a logic circuit, an output wire cannot go to another output wire. You assumed that the outputs, when connected together, will act as a logical OR or AND of themselves.
This is not true (otherwise why would we use AND/OR gates in the first place?).
What will happen depends on the specific implementation of the gates (i.e. specific IC or manufacturing process you used) and this is not something that a logic circuit is meant to model.
A logic circuit must behave the same, no matter what brand you are using.
In circuit (d), the output of I3 will feed both the input of the rightmost NOT and the output of I4 (the complementary is also true).
Most IC will break if a current will flow in from their outputs, others won't but they will interfere with the capability of the right-most NOT to sense its input.
Logic circuits are still circuits, so you should, in theory, perform a full circuit analysis, which includes solving differential equations, to solve for their output.
Digital electronics is a branch that abstracts from these "low-level" details but at the cost of making some assumptions, one of which is: outputs are never merged without a gate.
The whole point of a combinational circuit is that you can write out = f(in0, in1, ..., ink) but it's not always possible.
Take for example an edge detector, it is just a f(A) = (NOT A) AND A which should, by the law of the excluded middle, always output 0.
But it will not because the NOT A path takes a slightly longer time to reach the AND input.
How can you describe this dynamic behaviour with a f(A) function?
Don't think too much of it, when you'll get to sequential circuits you'll spot the difference immediately (if you need a preview, look up for "latch circuit").
I have created my truth table and drawn from this a boolean expression (f = B'A' + CA' + DC' + DB + D'CB') which I have then attempted to convert into a circuit using Quartus.
I am new to digital logic and I need some help from someone wit experience who can tell me if what I have attempted looks correct.
I am unable to compile the circuit as I don't have 'device support installed'. If anyone could point me in the right direction for how to obtain that, that would be greatly appreciated.
This is the circuit I created based on the boolean expression.
This is my truth table. The circuit corresponds to the f column
Everything to the left of your AND gates looks mathematically correct (although not very efficient). You can drastically reduce the amount of NOT gates used.
Instead of splitting the signals before the NOT gates and having each individual branch have its own NOT gate, you can split the signals after the NOT gate thus reducing the total amount of NOT gates used.
Anyway, the fundamental reason your circuit is invalid is because of the very right part of it, here:
You are shorting together the outputs of two gates, which is not allowed. A single node cannot simultaneously have two separate voltages.
What you need to do to fix this problem is take each individual output of your 5 AND gates and take them all to a separate input of a 5-input OR gate.
Something like this:
If this software package that you are using doesn't support five inputs to a gate, then you can split it up like so:
I am trying to build a complex neural network using Computation Graph implementation in Deeplearning4J. I need to have multiple outputs so that's why I can't go with the generic MultiLayerConfiguration.
However, my problem is that in this case I do not know how to do the evaluation of my model and I would like at least to know the accuracy.
Has anybody worked with Comp Graphs in dl4j?
First of all yes: tons of people use computation graph. They usually start from our existing examples though and tend to mainly use it for things like seq2seq.
As for your question on evaluation, it's conceptually the same as multi layer network. How you evaluate is likely going to be task specific though. If you think about where evaluation happens, it's always tied to a task (classification,regression,binary classification,..) with an output layer . In the most common case usually you only have 1 output which outputs a classification. In that case you can just use the first array it outputs.
Otherwise for multiple outputs..you'd have to define what you're evaluating. Usually tasks merge to 1 path.
If they don't, you'd have multiple output layers where you want to do an evaluation object per output.
Computation graphs and multi layer network both use a .output method to give you raw arrays. That is typically what you pass to eval.eval.
According to this answer, one should never use more than two hidden layers of Neurons.
According to this answer, a middle layer should contain at most twice the amount of input or output neurons (so if you have 5 input neurons and 10 output neurons, one should use (at most) 20 middle neurons per layer).
Does that mean that all data will be modeled within that amount of Neurons?
So if, for example, one wants to do anything from modeling weather (a million input nodes from data from different weather stations) to simple OCR (of scanned text with a resolution of 1000x1000DPI) one would need the same amount of nodes?
PS.
My last question was closed. Is there another SE site where these kinds of questions are on topic?
You will likely have overfitting of your data (aka, High Variance). Think of it like this: The more neurons and layers you have gives you more parameters to fit your data better.
Remember that for the first layer node the equation becomes Z = sigmoid(sum(W*x))
The second layer node becomes Z2 = Sigmoid(sum(W*Z))
Look into machine learning class taught at Stanford...its a great online course and good tool as a reference.
More than two hidden layers can be useful in certain architectures
such as cascade correlation (Fahlman and Lebiere 1990) and in special
applications, such as the two-spirals problem (Lang and Witbrock 1988)
and ZIP code recognition (Le Cun et al. 1989).
Fahlman, S.E. and Lebiere, C. (1990), "The Cascade Correlation
Learning Architecture," NIPS2, 524-532.
Le Cun, Y., Boser, B., Denker, J.s., Henderson, D., Howard, R.E.,
Hubbard, W., and Jackel, L.D. (1989), "Backpropagation applied to
handwritten ZIP code recognition", Neural Computation, 1, 541-551.
Check out the sections "How many hidden layers should I use?" and "How many hidden units should I use?" on comp.ai.neural-nets's FAQ for more information.
Suppose I've trained a region to recognize the 2D image of a letter "A". How do I interface this to an external module that wants signals (possibly fuzzy) of the form A or not A?
We've been working with NuPIC not for images processing but for transactions processing, and so far we have gotten very good results.
What we do is connect the classifier node to a efector node who writes into a socket the result of the inference.
Maybe we need more detail about your question. If you've trained a NuPIC network for identifying just the letter "A" then maybe you have a classifier trained with 2 categories: lets say 1 for "A" and 0 for anything else. In this case you can take the output of the classifier via an efector node that conects to your interface, may be a socket or a database.
Regards