Can a virtual machine be implemented as a neural network? - neural-network

Disclaimer: I'm not a mathematical genius, nor do I have any experience with writing neural networks. So, please, forgive whatever idiotic things I happen to say here. ;)
I've always read about neural networks being used for machine learning, but while experimenting with writing simple virtual machines, I began to wonder if they could be applied in another way.
Specifically, can a virtual machine be created as a neural network? If so, how would it work (feel free to use an abstract description here, if you have to)?
I've heard of the Joycean Machine, but I can't find any information other than very, very vague explanations.
EDIT: What I'm looking for here is an explanation of exactly how a neural network-based VM would interpret assembly. How would inputs be handled, etc? Would each individual input be a memory address? Let's brainstorm!

You really made my day buddy...
Since an already trained neural network won't be much different than a regular state machine, there is no point writing a neural network VM for a deterministic instruction set.
It might be interesting to train such a VM with multiple instruction sets or an unknown set. However, I doubt it will be practical to execute such a training and even a %99 correct interpreter will be of any use for conventional bytecode.
The only use of a neural network VM I can think of is executing a program that contains fuzzy logic constructs or AI algorithm heuristics.
Some silly stack machine example to demonstrate the idea:
push [x1]
push [y1] ;start coord
push [x2]
push [y2] ;end coord
pushmap [map] ;some struct
stepastar ;push the next step of A* heuristics to accumulator and update the map
pop ;do sth with is and pop
stepastar ;next step again
... ;stack top is a map
reward ;we liked the coordinate. reinforce the heuristic
stepastar
... ;stack top is a map
punish ;we didn't like the next coordinate. try something different
There is no explict heuristic here. Just assume we keep all state in *map including the heuristic algorithm.
You see it looks silly and not completely context sensitive but a neural network is of no value if it doesn't learn online.

Of course. With a rather complex network no doubt.
Much of the parsing of bytecodes/opcodes is pattern matching which neural networks excel at.

You could certainly do this with a neural network - I could easily see learning the correct state transitions for a given piece of bytecode.
Input could be something like:
Value at top of stack
Value in current accumulator
Byte code at current instruction pointer
Byte value at current data pointer
Previous flags
Output could be something like:
Change to instruction pointer
Change to data pointer
Change to accumulator
Stack operation (push, pop, or nothing)
Memory operation (read to accumulator, write accumulator or nothing)
New flags
However - I'm not sure why you would want to do this in the first place. A neural network would be much less efficient (and potentially make mistakes unless you trained it well enough) compared to just executing the bytecode directly. You'd probably need to write an accurate bytecode evaluator anyway just to create enough training data....
Also, in my experience neural networks tend to be good at pattern recognition but very bad at learning logical operations (like binary addition or XORs) once you get beyond a certain scale (i.e. more than a few bits). So depending on the complexity of your instruction set, the network could take a very large amount of time to train.

Related

Suggestions for optimising FPGA design

I need to figure out optimisations for this FPGA design. I've got a few ideas and I'd like to know if they sound reasonable for my design. I'd also like to ask if anyone has any other ideas to improve my designs efficiency.
The design I have to optimise is an ensemble of neurons, I've included two images below.
My current ideas
Add pipeline registers between each neuron and each adder
'Register the inputs and outputs' by inserting registers in-between each logic block
Convert the adder tree into an adder chain
Use time division multiplexers to share the LUT's between logic blocks
Do my current ideas for improving performance make any sense? I don't know very much about FPGA's at all so I'm not sure if my optimisations will make much of an improvement or if they even make sense.
Any help would be greatly appreciated.
Links to PDFs of my neuron and emsemble (The image quality is higher):
https://francismcnamee.com/pdfs/neuron_ensemble.pdf
https://francismcnamee.com/pdfs/single_neuron.pdf
Ensemble of neurons (Each subsystem is a single neuron, the design of each neuron is shown below)
A single neuron
To start with "less area usage and/or faster speed." forget about and: you can optimise of area or speed. Both will not work.
Use time division multiplexers to share the LUT's between logic blocks"
Multiplexers are also build from LUTs so you loose area before you gain some. Then the TDM needs to have a controller, the interim results need to be stored and retrieved. All and all it is not trivial and I would only do that if you are rather good at logic design. You may gain area, but you will lose speed.
Convert the adder tree into an adder chain.
No, you don't touch the adder tree. The FPGA synthesis tool will select the optimal adder configuration for you. It will balance area against speed and come up with something much better result then you can for yourself.
In fact this applies to every part of the design:let the synthesis tool do its work. You will not be able to outperform it.
Add pipeline registers between each neuron and each adder
Register the inputs and outputs' by inserting registers in-between each logic block
Sorry again but: Nope! Working with registers is not that simple.
You need to balance the registers. Ideally the logic delay between each pipeline stage should be the same.
Lets say you multiply takes** 10nS. The adder takes 3nS. Then you should place a pipeline stages after a set of 3 adders. The delay will be ~20nS. If you placed an pipeline stage after each adder, the total delay would be ~40nS.
Now you get to the core of speeding up a design: do you use 4 pipeline stages so you can run at 200 MHz or 2 pipeline stages and run at 100MHz? In both cases the throughput is the same.
Beware that each register stage also cost you time: you need to meet the set-up time of the register. As such the fastest design is the one with no registers: the data falls through at the maximum speed. But then you may need to wait a long time before you can present the next set of data.
As you may gather: balancing registers is not easy and is rather an art. The best way would be to run the design without any registers through the synthesis tool. Then run a timing analysis on it and look at the worst-case timing path. From that try to figure out where to put the register stages. But again, that is easier said then done. To me reading those timing analysis reports is easy, but for a novice they might seem all abracadabra.
Sorry if I let you hanging here but unfortunately there is no "magic trick" in these cases. Ideally you could let an experienced design play a few hours with your code and see what (s)he can do.
**The numbers I use have been made up

Evaluating neural networks built with Comp Graph dl4j

I am trying to build a complex neural network using Computation Graph implementation in Deeplearning4J. I need to have multiple outputs so that's why I can't go with the generic MultiLayerConfiguration.
However, my problem is that in this case I do not know how to do the evaluation of my model and I would like at least to know the accuracy.
Has anybody worked with Comp Graphs in dl4j?
First of all yes: tons of people use computation graph. They usually start from our existing examples though and tend to mainly use it for things like seq2seq.
As for your question on evaluation, it's conceptually the same as multi layer network. How you evaluate is likely going to be task specific though. If you think about where evaluation happens, it's always tied to a task (classification,regression,binary classification,..) with an output layer . In the most common case usually you only have 1 output which outputs a classification. In that case you can just use the first array it outputs.
Otherwise for multiple outputs..you'd have to define what you're evaluating. Usually tasks merge to 1 path.
If they don't, you'd have multiple output layers where you want to do an evaluation object per output.
Computation graphs and multi layer network both use a .output method to give you raw arrays. That is typically what you pass to eval.eval.

How to employ Learning-to-rank models (CNN, LSTM) in short-pair ranking?

In common applied learn-to-rank tasks, the inputs are usually semantic and have good syntactic structure, like Question-Answer ranking tasks. In this scenario, CNN or LSTM is a good structure to capture the latent information (local or long dependency) of QA-pairs.
But in reality, sometimes we just have short pair and discrete words. In this occasion, CNN or LSTM is still a fair choice?Or is there some more appropriate method can handle this?
The bigger question is how much training data you have. There's a lot of interesting work, but the reason that the deep neural network approaches tend to use QA ranking tasks is because those tasks typically have hundreds of thousands or millions of training examples.
When you have shorter queries, i.e. title or web queries, you will possibly need even more data to learn, because less of the network will be exercised by each training instance. It is possible, but the method you choose should be based on the training data you have available, rather than the size of your queries, in general.
[0-50 queries] -> Hand-tuned, time-tested model such as Query Likelihood, BM25, (or if you want better results, ngram models such as SDM) (if you want more recall, pseudo-relevance-feedback models such as RM3).
[50-1000 queries] -> Linear or Tree-based learning-to-rank methods
[1000-millions] -> Deep approach, or possibly still learning-to-rank. I'm not sure any of the deep papers have truly dominated a state-of-the-art gradient-boosted-regression-tree setup.
A recent paper by one of my labmates used pseudo-labels from BM25 to bootstrap a DNN. They got good results (better than BM25), but they literally had to be Google (training-time-wise) to pull it off.

Deep-learning for mapping large binary input

this question may come as being too broad, but I will try to make every sub-topic to be as specific as possible.
My setting:
Large binary input (2-4 KB per sample) (no images)
Large binary output of the same size
My target: Using Deep Learning to find a mapping function from my binary input to the binary output.
I have already generated a large training set (> 1'000'000 samples), and can easily generate more.
In my (admittedly limited) knowledge of Neural networks and deep learning, my plan was to build a network with 2000 or 4000 input nodes, the same number of output nodes and try different amounts of hidden layers.
Then train the network on my data set (waiting several weeks if necessary), and checking whether there is a correlation between in- and output.
Would it be better to input my binary data as single bits into the net, or as larger entities (like 16 bits at a time, etc)?
For bit-by-bit input:
I have tried "Neural Designer", but the software crashes when I try to load my data set (even on small ones with 6 rows), and I had to edit the project save files to set Input and Target properties. And then it crashes again.
I have tried OpenNN, but it tries to allocate a matrix of size (hidden_layers * input nodes) ^ 2, which, of course, fails (sorry, no 117GB of RAM available).
Is there a suitable open-source framework available for this kind of
binary mapping function regression? Do I have to implement my own?
Is Deep learning the right approach?
Has anyone experience with these kind of tasks?
Sadly, I could not find any papers on deep learning + binary mapping.
I will gladly add further information, if requested.
Thank you for providing guidance to a noob.
You have a dataset containing pairs of binary valued vectors, with a max length of 4,000 bits. You want to create a mapping function between the pairs. On the surface, that doesn't seem unreasonable - imagine a 64x64 image with binary pixels – this only contains 4,096 bits of data and is well within the reach of modern neural networks.
As your dealing with binary values, then a multi-layered Restricted Boltzmann Machine would seem like a good choice. How many layers you add to the network really depends on the level of abstraction in the data.
You don’t mention the source of the data, but I assume you expect there to be a decent correlation. Assuming the location of each bit is arbitrary and is independent of its near neighbours, I would rule out a convolutional neural network.
A good open source framework to experiment with is Torch - a scientific computing framework with wide support for machine learning algorithms. It has the added benefit of utilising your GPU to speed up processing thanks to its CUDA implementation. This would hopefully avoid you waiting several weeks for a result.
If you provide more background, then maybe we can home in on a solution…

Estimating possible # of actors in Scala

How can I estimate the number of actors that a Scala program can handle?
For context, I'm contemplating what is essentially a neural net that will be creating and forgetting cells at a high rate. I'm contemplating making each cell an actor, but there will be millions of them. I'm trying to decide whether this design is worth pursuing, but can't estimate the limits of number of actors. My intent is that it should totally run on one system, so distributed limits don't apply.
For that matter, I haven't definitely settled on Scala, if there's some better choice, but the cells do have state, as in, e.g., their connections to other cells, the weights of the connections, etc. Though this COULD be done as "Each cell is final. Changes mean replacing the current cell with a new one bearing the same id#."
P.S.: I don't know Scala. I'm considering picking it up to do this project. I'm also considering lots of other alternatives, including Java, Object Pascal and Ada. But actors seem a better map to what I'm after than thread-pools (and Java can't handle enough threads to make a thread/cell design feasible.
P.S.: At all times, most of the actors will be quiescent, but there wil need to be a way of cycling through the entire collection of them. If there isn't one built into the language, then this can be managed via first/next links within each cell. (Both links are needed, to allow cells in the middle to be extracted for release.)
With a neural net simulation, the real question is how much of the computational effort will be spent communicating, and how much will be spent computing something within a cell? If most of the effort is in communication then actors are perhaps a good choice for correctness, but not a good choice at all for efficiency (even with Akka, which performs reasonably well; AsyncFP might do the trick, though). Millions of neurons sounds slow--efficiency is probably a significant concern. If the neurons have some pretty heavy-duty computations to do themselves, then the communications overhead is no big deal.
If communications is the bottleneck, and you have lots of tiny messages, then you should design a custom data structure to hold the network, and also custom thread-handling that will take advantage of all the processors you have and minimize the amount of locking that you must do. For example, if you have space, each neuron could hold an array of input values from those neurons linked to it, and it would when calculating its output just read that array directly with no locking and the input neurons would just update the values also with no locking as they went. You can then just dump all your neurons into one big pool and have a master distribute them in chunks of, I don't know, maybe ten thousand at a time, each to its own thread. Scala will work fine for this sort of thing, but expect to do a lot of low-level work yourself, or wait for a really long time for the simulation to finish.