How to interpret Turing Machine illustrations on p79 of Stephen Wolfram's “A New Kind of Science” book? - visualization

I am reading Stephen Wolfram's "A New Kind of Science".
At present, I cannot understand how the cellular automata illustrations on p79 are created.
In the patterns, the active cell, representing the head, appears to change orientation between up and -45 degrees. However, none of the rules seem to include an active cell with an orientation other than up or down. How does the active cell orientation of -45 degrees come about in the patterns?
Am I missing something obvious (I am a beginner in this area)?

You have a simple rule. Just a mapping from 3 binary digits to 1 binary digit. For example:
111 - 0
110 - 0
101 - 0
100 - 1
011 - 1
010 - 1
001 - 1
000 - 0
Then you have some sequence of digits during time t0. For example 00111010. To find what will happen at time t1, you need to use this mapping. So 001 will be 1, then 011 will be 1, 111 = 0, then ... and 010 = 1. This way you will receive the sequence of the same length for the second generation (t1). And you move on further and further again till you will see repetition.
So on that pictures your X axis is this sequence (empty square 0, gray square 1). On your Y axis is how this sequence evolve under specific rule. In my example it was a rule 30 (because 00011110 = 30 in binary)
You can read high level overview here. Also these rules are simple, they can give rise to complex behavior.
P.S. paper was published in Nature (high level science journal) and considered revolutionized because it showed that complicated structures and motifs like dots on the leopard's skin or the image on the shell can arise from really simple rules.

I think that it is an inconsistency between the printing of the rules and the diagrams.
If the downward (-90 degrees) arrows in the rules are replaced with arrows pointing to the bottom right (-45 degrees) then the rules seem to make sense.

Related

How to use Morton Order(z order curve) in range search?

How to use Morton Order in range search?
From the wiki, In the paragraph "Use with one-dimensional data structures for range searching",
it says
"the range being queried (x = 2, ..., 3, y = 2, ..., 6) is indicated
by the dotted rectangle. Its highest Z-value (MAX) is 45. In this
example, the value F = 19 is encountered when searching a data
structure in increasing Z-value direction. ......BIGMIN (36 in the
example).....only search in the interval between BIGMIN and MAX...."
My questions are:
1) why the F is 19? Why the F should not be 16?
2) How to get the BIGMIN?
3) Are there any web blogs demonstrate how to do the range search?
EDIT: The AWS Database Blog now has a detailed introduction to this subject.
This blog post does a reasonable job of illustrating the process.
When searching the rectangular space x=[2,3], y=[2,6]:
The minimum Z Value (12) is found by interleaving the bits of the lowest x and y values: 2 and 2, respectively.
The maximum Z value (45) is found by interleaving the bits of the highest x and y values: 3 and 6, respectively.
Having found the min and max Z values (12 and 45), we now have a linear range that we can iterate across that is guaranteed to contain all of the entries inside of our rectangular space. The data within the linear range is going to be a superset of the data we actually care about: the data in the rectangular space. If we simply iterate across the entire range, we are going to find all of the data we care about and then some. You can test each value you visit to see if it's relevant or not.
An obvious optimization is to try to minimize the amount of superfluous data that you must traverse. This is largely a function of the number of 'seams' that you cross in the data -- places where the 'Z' curve has to make large jumps to continue its path (e.g. from Z value 31 to 32 below).
This can be mitigated by employing the BIGMIN and LITMAX functions to identify these seams and navigate back to the rectangle. To minimize the amount of irrelevant data we evaluate, we can:
Keep a count of the number of consecutive pieces of junk data we've visited.
Decide on a maximum allowable value (maxConsecutiveJunkData) for this count. The blog post linked at the top uses 3 for this value.
If we encounter maxConsecutiveJunkData pieces of irrelevant data in a row, we initiate BIGMIN and LITMAX. Importantly, at the point at which we've decided to use them, we're now somewhere within our linear search space (Z values 12 to 45) but outside the rectangular search space. In the Wikipedia article, they appear to have chosen a maxConsecutiveJunkData value of 4; they started at Z=12 and walked until they were 4 values outside of the rectangle (beyond 15) before deciding that it was now time to use BIGMIN. Because maxConsecutiveJunkData is left to your tastes, BIGMIN can be used on any value in the linear range (Z values 12 to 45). Somewhat confusingly, the article only shows the area from 19 on as crosshatched because that is the subrange of the search that will be optimized out when we use BIGMIN with a maxConsecutiveJunkData of 4.
When we realize that we've wandered outside of the rectangle too far, we can conclude that the rectangle in non-contiguous. BIGMIN and LITMAX are used to identify the nature of the split. BIGMIN is designed to, given any value in the linear search space (e.g. 19), find the next smallest value that will be back inside the half of the split rectangle with larger Z values (i.e. jumping us from 19 to 36). LITMAX is similar, helping us to find the largest value that will be inside the half of the split rectangle with smaller Z values. The implementations of BIGMIN and LITMAX are explained in depth in the zdivide function explanation in the linked blog post.
It appears that the quoted example in the Wikipedia article has not been edited to clarify the context and assumptions. The approach used in that example is applicable to linear data structures that only allow sequential (forward and backward) seeking; that is, it is assumed that one cannot randomly seek to a storage cell in constant time using its morton index alone.
With that constraint, one's strategy begins with a full range that is the mininum morton index (16) and the maximum morton index (45). To make optimizations, one tries to find and eliminate large swaths of subranges that are outside the query rectangle. The hatched area in the diagram refers to what would have been accessed (sequentially) if such optimization (eliminating subranges) had not been applied.
After discussing the main optimization strategy for linear sequential data structures, it goes on to talk about other data structures with better seeking capability.

How can I prevent my program from getting stuck at a local maximum (Feed forward artificial neural network and genetic algorithm)

I'm working on a feed forward artificial neural network (ffann) that will take input in form of a simple calculation and return the result (acting as a pocket calculator). The outcome wont be exact.
The artificial network is trained using genetic algorithm on the weights.
Currently my program gets stuck at a local maximum at:
5-6% correct answers, with 1% error margin
30 % correct answers, with 10% error margin
40 % correct answers, with 20% error margin
45 % correct answers, with 30% error margin
60 % correct answers, with 40% error margin
I currently use two different genetic algorithms:
The first is a basic selection, picking two random from my population, naming the one with best fitness the winner, and the other the loser. The loser receives one of the weights from the winner.
The second is mutation, where the loser from the selection receives a slight modification based on the amount of resulting errors. (the fitness is decided by correct answers and incorrect answers).
So if the network outputs a lot of errors, it will receive a big modification, where as if it has many correct answers, we are close to a acceptable goal and the modification will be smaller.
So to the question: What are ways I can prevent my ffann from getting stuck at local maxima?
Should I modify my current genetic algorithm to something more advanced with more variables?
Should I create additional mutation or crossover?
Or Should I maybe try and modify my mutation variables to something bigger/smaller?
This is a big topic so if I missed any information that could be needed, please leave a comment
Edit:
Tweaking the numbers of the mutation to a more suited value has gotten be a better answer rate but far from approved:
10% correct answers, with 1% error margin
33 % correct answers, with 10% error margin
43 % correct answers, with 20% error margin
65 % correct answers, with 30% error margin
73 % correct answers, with 40% error margin
The network is currently a very simple 3 layered structure with 3 inputs, 2 neurons in the only hidden layer, and a single neuron in the output layer.
The activation function used is Tanh, placing values in between -1 and 1.
The selection type crossover is very simple working like the following:
[a1, b1, c1, d1] // Selected as winner due to most correct answers
[a2, b2, c2, d2] // Loser
The loser will end up receiving one of the values from the winner, moving the value straight down since I believe the position in the array (of weights) matters to how it performs.
The mutation is very simple, adding a very small value (currently somewhere between about 0.01 and 0.001) to a random weight in the losers array of weights, with a 50/50 chance of being a negative value.
Here are a few examples of training data:
1, 8, -7 // the -7 represents + (1+8)
3, 7, -3 // -3 represents - (3-7)
7, 7, 3 // 3 represents * (7*7)
3, 8, 7 // 7 represents / (3/8)
Use a niching techniche in the GA. A useful alternative is niching. The score of every solution (some form of quadratic error, I think) is changed in taking account similarity of the entire population. This maintains diversity inside the population and avoid premature convergence an traps into local optimum.
Take a look here:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.7342
A common problem when using GAs to train ANNs is that the population becomes highly correlated
as training progresses.
You could try increasing mutation chance and/or effect as the error-change decreases.
In English. The population becomes genetically similar due to crossover and fitness selection as a local minim is approached. You can introduce variation by increasing the chance of mutation.
You can do a simple modification to the selection scheme: the population can be viewed as having a 1-dimensional spatial structure - a circle (consider the first and last locations to be adjacent).
The production of an individual for location i is permitted to involve only parents from i's local neighborhood, where the neighborhood is defined as all individuals within distance R of i. Aside from this restriction no changes are made to the genetic system.
It's only one or a few lines of code and it can help to avoid premature convergence.
References:
TRIVIAL GEOGRAPHY IN GENETIC PROGRAMMING (2005) - Lee Spector, Jon Klein

Why Does MIDI Offer 127 Notes

Is the 127 note values in MIDI musically significant (certain number of octaves or something)? or was it set at 127 due to the binary file format, IE for the purposes of computing?
In the MIDI protocol there are status bytes (think commands, such as note-on or note-off) and there are data bytes (think parameters, such as pitch value and velocity). The way to determine the difference between them is by the first bit. If that first bit is 1, then it is a status byte. If the first bit is 0, then it is a data byte. This leaves only 7 bits available for the rest of the status or data byte value.
So to answer your question in short, this has more to do with the protocol specification, but it just so happens to nicely line up to good number of available pitch values.
Now, these pitch values do not correspond to specific pitches. Yes it is true that typically a pitch value of 60 will give you C4, or middle C. Most synths work this way, but certainly not all. It isn't even a requirement that the synth uses the pitch value for pitches! MIDI doesn't care... it is just a protocol. You may be wondering how alternate tunings work... they work just fine. It is up to the synthesizer to produce the correct pitches for these alternate tunings. MIDI simply provides for a selection of 128 different values to be sent.
Also, if you are wondering why it is so important for that first bit to signify what the data is... There are system realtime messages that can be interjected in the middle of some other command. These are things like the timing clock which is often used to sync up LFOs among other things.
You can read more about the types of MIDI messages here: http://www.midi.org/techspecs/midimessages.php
127 = 27 - 1
It's the maximum positive value of an 8-bit signed integer, and so is a meaningful limit in file formats--it's the highest value you can store in a byte (on most systems) without making it unsigned.
I think what you are missing is that MIDI was created in the early 1980's, not to run on personal computers, but to run on musical instruments with extremely limited processing and storage capabilities. Storing 127 values seemed GIANT back then, especially when the largest keyboard typically has only 88 keys, and most electronic instruments only had 48. If you think MIDI is doing something in a strange way, it is likely that stems from its jurassic heritage.
Yes it is true that typically a pitch value of 60 will give you C4,
or middle C. Most synths work this way, but certainly not all.
Yes ... there has always been a disagreement about where middle C is in MIDI. On Yamaha keyboards it is C3, on Roland keyboards it is C4. Yamaha did it one way and Roland did it another.
Now, these pitch values do not correspond to specific pitches.
Not originally. However, in the "General MIDI" standard, A = 440, which is standard tuning. General MIDI also describes which patch is a piano, which is a guitar, and so on, so that MIDI files become portable across multitimbral sound sources.
Simple efficiency.
As a serial protocol MIDI was designed around simple serial chips of the time which would take 8 data bits in and transmit them as a stream out of one separate serial data pin at a proscribed rate. In the MIDI world this was 31,250 Hz. It added stop and start bits so all data could travel over one wire.
It was designed to be cheap and simple and the simplicity was extended into the data format.
The most significant bit of the 8 data bits was used to signal if the data byte was a command or data. So-
To send Middle C note ON on channel 1 at a velocity of 56 A command bytes is sent first
and the command for Note on was the upper 4 bits of that command bit 1001. Notice the 1 in the Most significant bit, this was followed by the channel ID for channel 1 0000 ( computers preferring to start counting from 0)
10010000 or 128 + 16 = 144
This was followed by the actual Note data
72 for Middle C or 01001000
and then the velocity data again specified in the range 0 -127 with a 0 MSB
56 in our case
00111000
So what would go down the wire (ignoring stop start & sync bits was)
144, 72, 56
For the almost brain dead microcomputers of the time in electronic keyboards the ability to separate command from data by simply looking at the first bit was a godsend.
As has been stated 127 bits covers pretty much any western keyboard you care to mention. So made perfectly logical sense and the protocols survival long after many serial protocols have disappeared into obscurity is a great compliment to http://en.wikipedia.org/wiki/Dave_Smith_(engineer) Dave Smith of Sequential Circuits who started the discussions with other manufacturers to set all this in place.
Modern music and composition would be considerably different without him and them.
Enjoy!
127 is enough to cover all piano keys
0 ~ 127 fits nicely for ADC conversions.
Many MIDI hardware devices rely on performing Analog to Digital conversions (ADC). Considering MIDI is a real time communication protocol, when performing an ADC conversion using successive-approximation (a commonly used algorithm), a good rule of thumb is to use 8 bit resolution for fast computation. This will yield values in the 0 ~ 1023 range, which can be converted to MIDI range by dividing by 8.

Help designing a hash function to detect duplicate records?

Let me explain my program thus far. It is a rubiks cube solver. I am given a scrambled cube (this is the initial state). This becomes the root node of a graph. I am using iterative deepening depth first search to "brute force" this scrambled cube to a recognizable state which I can then use pattern recognition to solve.
As you can imagine, this is a very large graph, so I would like to come up with some sort of hashing functionality to detect duplicate nodes in this graph (thus speeding up the traversal).
I am largely unfamiliar with hashing functions, but here is what I am thinking... Each node is essentially a different state of the rubik's cube. So if I come to a cube state (node) that has already be seen, I want to skip over it. So I need a hashing function that takes me from the state variable to a checksum, where the state variable is a 54-character string. The only allowed characters are y, r, g, o, b, w (which correspond to colors).
Any help designing this hash function would be greatly appreciated.
For the fastest duplicate detection and removal - avoid generating many of the repeated positions in the first place. This is easy to do and quicker than generating and then finding the repeats. So for example if you have moves like F and B, if you allow the sub sequence FB don't also allow BF, which gives the same result. If you've just done 3F, don't follow it with F. You can generate a small look-up table for allowed next moves, given the last three moves.
For the remaining duplicates you want a fast hash because there are a lot of positions. To make your hash go fast, as others have commented, you want what it hashes from, the representation of the position, to be small. There are 12 edge cubies and there are 8 corner cubies. Representing each cubies position and orientation need take only five bits per cubie, i.e. 100 bits (12.5 bytes) total. For edges its four bits for position and one for flip. For corners its three bits for position and 2 for spin. You can ignore the last edge cubie since its position and flip is fixed by the others. With this representation you are already down to 12 bytes for the position.
You have about 70 real bits of information in a rubik cube position, and 96 bits is close enough to 70 to make it actually counter productive hashing those bits further. I.e. treat this representation of the board as your hash. That may sound a bit strange, but from your question I'm envisaging you at the same time experimenting with a less compact representation of the cube that is more amenable to your pattern matching. In that case the 12 byte value can be treated as if it were a hash, with the advantage that it's a hash that never has a collision. That makes the duplicate testing code and new value insertion shorter and simpler and faster. It's going to be cheaper than the MD5 solutions suggested so far.
There are many other tricks you could use to cut down the work in searching for repeated positions. Have a look at http://cube20.org/ for ideas.
You can always try a cryptographic hash function. Since your problem is not a question of security (there is no attacker purposely trying to find distinct states which hash to the same value), you can use a broken hash function. I recommend trying MD4, which is quite fast. Your 54-character string is quite appropriate for MD4 input (MD4 can process inputs up to 55 bytes as a single block).
A basic 2.4 GHz PC can hash about 12 millions such strings per second, using a single core, with a simple unrolled C implementation (e.g. one which would look like the MD4Transform() function in the sample code included in RFC 1320). This may be enough for your needs.
1) Don't Use A Hash
You have 9*6 = 54 separate faces on a rubik cube. Even wastefully using 1 byte per face this is 432 bits, so hashing won't save you too much space. A better packing of 3 bits per face comes to 162 bits (21 bytes). It sounds to me like you need a compact way to represent the rubik.
OTOH, if you are looking to store a set of many many previously-visited states then I've found that using a bloom filter instead of a true set gets me decent results (but often non-optimal) with much lower space utilization.
2) If you are married to the idea of a hash:
Just use MD5, its slightly more compact than the proposed rubik states, rather fast, and has good collision properties - it's not like you have a malicious adversary trying to cause rubik cube hash collisions ;-).
EDIT: Using cryptographic hash functions, such as MD4/MD5, is usually simple once you have a library or function implementing the algorithm (ex: OpenSSL, GNU TLS, and many stand-alone implementations exist). Usually the function is something like void md5(unsigned char *buf, size_t len, unsigned char *digest) where digest points to a pre-allocated 16 byte buffer and buf is the data to be hashed (your rubik cube structure). Here is some untested C code:
#include <openssl/md5.h>
void main()
{
unsigned char digest[16];
unsigned char buf[BUFLEN];
initializeBuffer(buf);
MD5(buf,BUFLEN,digest); // This is the openssl function
printDigest(digest);
}
And be sure to compile/link with -lssl.
8 corner cubes:
You can assign each of these corners to 8 positions which each require 3 bits to determine which corner cube is at which position for a total of 24 bits.
You can further reduce this to just recording 7-of-8 positions as you can easily use a process of elimination to determine what the 8th corner is (for 21 bits).
However, this can be reduced further as the 8 corners can only be arranged in 8! = 40320 permutations and 40320 can be represented in 16 bits.
Each corner cube can be orientated correctly or be rotated 120° clockwise or anti-clockwise to be in three different positions (represented as 0, 1 and 2 respectively).
This requires 2 bits per corner to represent.
However, the sum of the orientations (modulo 3) is always 0; so, if you know 7-of-8 orientations then (assuming you have a solvable cube) you can calculate the orientation of the 8th corner (giving a total of 14 bits).
Or for a further reduction, seven ternary (base 3) digits can represent the orientation of the corners and this can be represented in 12 binary digits (bits).
So the corners cubes can be represented in 28 bits, if you want to decode the permutations, or in 33 bits, if you want to directly record the positions of 7-of-8 corners.
12 edge cubes:
Each can be represented in 4 bits (for a total of 48 bits) which can be reduced to 44 bits by only recording the position of 11-of-12 edges (for a total of 44 bits).
However, the 12! = 479001600 permutations of the edges can be stored in 29 bits.
Each edge can be either be oriented correctly or flipped:
This requires 1 bit to represent.
However, edges are always flipped in pairs so the parity of the flipped edges will always be zero (again, meaning that you only need to record 11-of-12 orientations for the edges) giving a total of 11 bits required.
So edge cubes can be represented in 40 bits, if you want to decode the permutations, or in 55 bits if you want to record all the positions and flips of 11-of-12 edges.
6 centre cubes
You do not need to record any information about the centre cubes - they are fixed relative to the ball at the centre of the Rubik's cube (so assuming you are not worried about the orientation of any logos on the cube) are immobile.
Total:
Using permutations: 68 bits
Using positions: 88 bits
Just to establish the theoretical minimum representation - the state space of a valid Rubik's cube is about 4.3*10^19. Log2(4.3*10^19) will then determine how many bits you need to represent that full space, the ceiling of which is 66. So in theory, if you could number every valid state, any given state could be uniquely represented in 66 bits.
While you may want to follow others' advice and find a more compact way of representing the cube, consider representing the state in terms of edge, corner, and face pieces. Due to the swapping laws of legal cube moves, you should be able to concatenate a sequence of 12 4-bit edge locations, 8 3-bit corner locations, and 6 3-bit face locations. This should result in a unique representation using 90 bits.
This representation may not be conducive to the way you are creating your tree, but it is unique, easily comparable, and should be possible to find given a state in your existing representation.

Actual note duration from MIDI duration

I'm currently implementing an application to perform some tasks on MIDI files, and my current problem is to output the notes I've read to a LilyPond file.
I've merged note_on and note_off events to single notes object with absolute start and absolute duration, but I don't really see how to convert that duration to actual music notation. I've guessed that a duration of 376 is a quarter note in the file I'm reading because I know the song, and obviously 188 is an eighth note, but this certainly does not generalise to all MIDI files.
Any ideas?
By default a MIDI file is set to a tempo of 120 bpm and the MThd chunk in the file will tell you the resolution in terms of "pulses per quarter note" (ppqn).
If the ppqn is, say, 96 than a delta of 96 ticks is a quarter note.
Should you be interested in the real duration (in seconds) of each sound you should also consider the "tempo" that can be changed by an event "FF 51 03 tt tt tt"; the three bytes are the microseconds for a quarter note.
With these two values you should find what you need. Beware that the duration in the midi file can be approximate, especially if that MIDI file it's the recording of a human player.
I've put together a C library to read/write midifiles a long time ago: https://github.com/rdentato/middl in case it may be helpful (it's quite some time I don't look at the code, feel free to ask if there's anything unclear).
I would suggest to follow this approach:
choose a "minimal note" that is compatible with your division (e.g. 1/128) and use it as a sort of grid.
Align each note to the closest grid line (i.e. to the closest integer multiple of the minimal node)
Convert it to standard notation (e.g a quarter note, a dotted eight note, etc...).
In your case, take 1/32 as minimal note and 384 as division (that would be 48 ticks). For your note of 376 tick you'll have 376/48=7.8 which you round to 8 (the closest integer) and 8/32 = 1/4.
If you find a note whose duration is 193 ticks you can see it's a 1/8 note as 193/48 is 4.02 (which you can round to 4) and 4/32 = 1/8.
Continuing this reasoning you can see that a note of duration 671 ticks should be a double dotted quarter note.
In fact, 671 should be approximated to 672 (the closest multiple of 48) which is 14*48. So your note is a 14/32 -> 7/16 -> (1/16 + 2/16 + 4/16) -> 1/16 + 1/8 + 1/4.
If you are comfortable using binary numbers, you could notice that 14 is 1110 and from there, directly derive the presence of 1/16, 1/4 and 1/8.
As a further example, a note of 480 ticks of duration is a quarter note tied with a 1/16 note since 480=48*10 and 10 is 1010 in binary.
Triplets and other groups would make things a little bit more complex. It's not by chance that the most common division values are 96 (3*2^5), 192 (3*2^6) and 384 (3*2^7); this way triplets can be represented with an integer number of ticks.
You might have to guess or simplify in some situations, that's why no "midi to standard notation" program can be 100% accurate.