I'm currently implementing an application to perform some tasks on MIDI files, and my current problem is to output the notes I've read to a LilyPond file.
I've merged note_on and note_off events to single notes object with absolute start and absolute duration, but I don't really see how to convert that duration to actual music notation. I've guessed that a duration of 376 is a quarter note in the file I'm reading because I know the song, and obviously 188 is an eighth note, but this certainly does not generalise to all MIDI files.
Any ideas?
By default a MIDI file is set to a tempo of 120 bpm and the MThd chunk in the file will tell you the resolution in terms of "pulses per quarter note" (ppqn).
If the ppqn is, say, 96 than a delta of 96 ticks is a quarter note.
Should you be interested in the real duration (in seconds) of each sound you should also consider the "tempo" that can be changed by an event "FF 51 03 tt tt tt"; the three bytes are the microseconds for a quarter note.
With these two values you should find what you need. Beware that the duration in the midi file can be approximate, especially if that MIDI file it's the recording of a human player.
I've put together a C library to read/write midifiles a long time ago: https://github.com/rdentato/middl in case it may be helpful (it's quite some time I don't look at the code, feel free to ask if there's anything unclear).
I would suggest to follow this approach:
choose a "minimal note" that is compatible with your division (e.g. 1/128) and use it as a sort of grid.
Align each note to the closest grid line (i.e. to the closest integer multiple of the minimal node)
Convert it to standard notation (e.g a quarter note, a dotted eight note, etc...).
In your case, take 1/32 as minimal note and 384 as division (that would be 48 ticks). For your note of 376 tick you'll have 376/48=7.8 which you round to 8 (the closest integer) and 8/32 = 1/4.
If you find a note whose duration is 193 ticks you can see it's a 1/8 note as 193/48 is 4.02 (which you can round to 4) and 4/32 = 1/8.
Continuing this reasoning you can see that a note of duration 671 ticks should be a double dotted quarter note.
In fact, 671 should be approximated to 672 (the closest multiple of 48) which is 14*48. So your note is a 14/32 -> 7/16 -> (1/16 + 2/16 + 4/16) -> 1/16 + 1/8 + 1/4.
If you are comfortable using binary numbers, you could notice that 14 is 1110 and from there, directly derive the presence of 1/16, 1/4 and 1/8.
As a further example, a note of 480 ticks of duration is a quarter note tied with a 1/16 note since 480=48*10 and 10 is 1010 in binary.
Triplets and other groups would make things a little bit more complex. It's not by chance that the most common division values are 96 (3*2^5), 192 (3*2^6) and 384 (3*2^7); this way triplets can be represented with an integer number of ticks.
You might have to guess or simplify in some situations, that's why no "midi to standard notation" program can be 100% accurate.
Related
I've been using matlab to solve some boundary value problems lately, and I've noticed an annoying quirk. Suppose I start with the interval [0,1], and I want to search inside it. Naturally, one would perform a binary search, so I would subdivide the interval into [0,0.5] and [0.5,1]. Excellent: let's now suppose we narrow down our search to [0.5,1]. Now we divide the interval [0.5,0.75] and [0.75,1]. No apparent problem yet. However, as we keep going, representation of powers of 2 in base 10 becomes less and less natural. For example, 2^-22 in binary is just 22 bits, while in decimal it is 16 digits. However, keep in mind that each digit of decimal is really encoding ~ 4 bits. In other words, representing these fractions as decimal is extremely inefficient.
Matlab's precision only extends to 16 digit decimal floats, so a binary search going to 2^-22 is as good as you can do. However, 2^-22 ~ 10^-7, which is much bigger than 10^-16, so the best search strategy in matlab seems to be a decimal search! In any case, this is what I have done so far: to take full advantage of the 16 digit precision, I've had to subdivide the interval [0,1] into 10 pieces.
Hopefully I've made my problem clear. So, my question is: how do I make matlab count in native binary? I want to work with 64 bit binary floats!
My question is regarding the division chunk of the header when the last word of the division is of SMPTE format i.e. the value lies between 0x8000 and 0xFFFF.
Lets say the division value is 0xE728. So in this case, the 15th bit is 1, which means it is of SMPTE format. After we have concluded that it is SMPTE, do we need to get rid of the 1 at the 15th bit? Or do we simply store 0xE7 as the SMPTE format and 0x28 as the ticks per frame?
I am really confused and I was not able to understand the online formats either. Thank you.
The Standard MIDI Files 1.0 specification says:
If bit 15 of <division> is a one, delta-times in a file correspond to subdivisions of a second, in a way consistent with SMPTE and MIDI time code. Bits 14 thru 8 contain one of the four values -24, -25, -29, or -30, corresponding to the four standard SMPTE and MIDI time code formats (-29 corresponds to 30 drop frame), and represents the number of frames per second. These negative numbers are stored in two's complement form.
It would be possible to mask bit 15 off. But in two's complement form, the most significant bit indicates a negative number, so you can simply interpret the entire byte (bits 15…8) as a signed 8-bit value (e.g., signed char in C), and it will have one of the four values.
The last word of a MIDI header chunk specifies the division. It contains information about whether delta times should be interpreted as ticks per quarter note or ticks per frame (where frame is a subdivision of a second). If bit 15 of this word is set then information is in ticks per frame. Next 7 bits (bit 14 through bit 8) specify the amount of frames per second and can contain one of four values: -24, -25, -29, or -30. (they are negative)
Does anyone know whether the bit 15 counts towards this negative value? So the question is, are the values which specify fps actually 8 bits long (15 through 8) or are they 7 bit long(14 through 8). The documentation I am reading is very unclear about this, and I can not find info anywhere else.
Thanks
The MMA's Standard MIDI-File Format Spec says:
The third word, <division>, specifies the meaning of the delta-times.
It has two formats, one for metrical time, and one for time-code-based
time:
+---+-----------------------------------------+
| 0 | ticks per quarter-note |
==============================================|
| 1 | negative SMPTE format | ticks per frame |
+---+-----------------------+-----------------+
|15 |14 8 |7 0 |
[...]
If bit 15 of <division> is a one, delta times in a file correspond
to subdivisions of a second, in a way consistent with SMPTE and MIDI
Time Code. Bits 14 thru 8 contain one of the four values -24, -25, -29,
or -30, corresponding to the four standard SMPTE and MIDI Time Code
formats (-29 corresponds to 30 drop frome), and represents the
number of frames per second. These negative numbers are stored in
two's complement form. The second byte (stored positive) is the
resolution within a frame [...]
Two's complement representation allows to sign-extend negative values without changing their value by adding a MSB bit of value 1.
So it does not matter whether you take 7 or 8 bits.
In practice, this value is designed to be interpreted as a signed 8-bit value, because otherwise it would have been stored as a positive value.
Is the 127 note values in MIDI musically significant (certain number of octaves or something)? or was it set at 127 due to the binary file format, IE for the purposes of computing?
In the MIDI protocol there are status bytes (think commands, such as note-on or note-off) and there are data bytes (think parameters, such as pitch value and velocity). The way to determine the difference between them is by the first bit. If that first bit is 1, then it is a status byte. If the first bit is 0, then it is a data byte. This leaves only 7 bits available for the rest of the status or data byte value.
So to answer your question in short, this has more to do with the protocol specification, but it just so happens to nicely line up to good number of available pitch values.
Now, these pitch values do not correspond to specific pitches. Yes it is true that typically a pitch value of 60 will give you C4, or middle C. Most synths work this way, but certainly not all. It isn't even a requirement that the synth uses the pitch value for pitches! MIDI doesn't care... it is just a protocol. You may be wondering how alternate tunings work... they work just fine. It is up to the synthesizer to produce the correct pitches for these alternate tunings. MIDI simply provides for a selection of 128 different values to be sent.
Also, if you are wondering why it is so important for that first bit to signify what the data is... There are system realtime messages that can be interjected in the middle of some other command. These are things like the timing clock which is often used to sync up LFOs among other things.
You can read more about the types of MIDI messages here: http://www.midi.org/techspecs/midimessages.php
127 = 27 - 1
It's the maximum positive value of an 8-bit signed integer, and so is a meaningful limit in file formats--it's the highest value you can store in a byte (on most systems) without making it unsigned.
I think what you are missing is that MIDI was created in the early 1980's, not to run on personal computers, but to run on musical instruments with extremely limited processing and storage capabilities. Storing 127 values seemed GIANT back then, especially when the largest keyboard typically has only 88 keys, and most electronic instruments only had 48. If you think MIDI is doing something in a strange way, it is likely that stems from its jurassic heritage.
Yes it is true that typically a pitch value of 60 will give you C4,
or middle C. Most synths work this way, but certainly not all.
Yes ... there has always been a disagreement about where middle C is in MIDI. On Yamaha keyboards it is C3, on Roland keyboards it is C4. Yamaha did it one way and Roland did it another.
Now, these pitch values do not correspond to specific pitches.
Not originally. However, in the "General MIDI" standard, A = 440, which is standard tuning. General MIDI also describes which patch is a piano, which is a guitar, and so on, so that MIDI files become portable across multitimbral sound sources.
Simple efficiency.
As a serial protocol MIDI was designed around simple serial chips of the time which would take 8 data bits in and transmit them as a stream out of one separate serial data pin at a proscribed rate. In the MIDI world this was 31,250 Hz. It added stop and start bits so all data could travel over one wire.
It was designed to be cheap and simple and the simplicity was extended into the data format.
The most significant bit of the 8 data bits was used to signal if the data byte was a command or data. So-
To send Middle C note ON on channel 1 at a velocity of 56 A command bytes is sent first
and the command for Note on was the upper 4 bits of that command bit 1001. Notice the 1 in the Most significant bit, this was followed by the channel ID for channel 1 0000 ( computers preferring to start counting from 0)
10010000 or 128 + 16 = 144
This was followed by the actual Note data
72 for Middle C or 01001000
and then the velocity data again specified in the range 0 -127 with a 0 MSB
56 in our case
00111000
So what would go down the wire (ignoring stop start & sync bits was)
144, 72, 56
For the almost brain dead microcomputers of the time in electronic keyboards the ability to separate command from data by simply looking at the first bit was a godsend.
As has been stated 127 bits covers pretty much any western keyboard you care to mention. So made perfectly logical sense and the protocols survival long after many serial protocols have disappeared into obscurity is a great compliment to http://en.wikipedia.org/wiki/Dave_Smith_(engineer) Dave Smith of Sequential Circuits who started the discussions with other manufacturers to set all this in place.
Modern music and composition would be considerably different without him and them.
Enjoy!
127 is enough to cover all piano keys
0 ~ 127 fits nicely for ADC conversions.
Many MIDI hardware devices rely on performing Analog to Digital conversions (ADC). Considering MIDI is a real time communication protocol, when performing an ADC conversion using successive-approximation (a commonly used algorithm), a good rule of thumb is to use 8 bit resolution for fast computation. This will yield values in the 0 ~ 1023 range, which can be converted to MIDI range by dividing by 8.
Let me explain my program thus far. It is a rubiks cube solver. I am given a scrambled cube (this is the initial state). This becomes the root node of a graph. I am using iterative deepening depth first search to "brute force" this scrambled cube to a recognizable state which I can then use pattern recognition to solve.
As you can imagine, this is a very large graph, so I would like to come up with some sort of hashing functionality to detect duplicate nodes in this graph (thus speeding up the traversal).
I am largely unfamiliar with hashing functions, but here is what I am thinking... Each node is essentially a different state of the rubik's cube. So if I come to a cube state (node) that has already be seen, I want to skip over it. So I need a hashing function that takes me from the state variable to a checksum, where the state variable is a 54-character string. The only allowed characters are y, r, g, o, b, w (which correspond to colors).
Any help designing this hash function would be greatly appreciated.
For the fastest duplicate detection and removal - avoid generating many of the repeated positions in the first place. This is easy to do and quicker than generating and then finding the repeats. So for example if you have moves like F and B, if you allow the sub sequence FB don't also allow BF, which gives the same result. If you've just done 3F, don't follow it with F. You can generate a small look-up table for allowed next moves, given the last three moves.
For the remaining duplicates you want a fast hash because there are a lot of positions. To make your hash go fast, as others have commented, you want what it hashes from, the representation of the position, to be small. There are 12 edge cubies and there are 8 corner cubies. Representing each cubies position and orientation need take only five bits per cubie, i.e. 100 bits (12.5 bytes) total. For edges its four bits for position and one for flip. For corners its three bits for position and 2 for spin. You can ignore the last edge cubie since its position and flip is fixed by the others. With this representation you are already down to 12 bytes for the position.
You have about 70 real bits of information in a rubik cube position, and 96 bits is close enough to 70 to make it actually counter productive hashing those bits further. I.e. treat this representation of the board as your hash. That may sound a bit strange, but from your question I'm envisaging you at the same time experimenting with a less compact representation of the cube that is more amenable to your pattern matching. In that case the 12 byte value can be treated as if it were a hash, with the advantage that it's a hash that never has a collision. That makes the duplicate testing code and new value insertion shorter and simpler and faster. It's going to be cheaper than the MD5 solutions suggested so far.
There are many other tricks you could use to cut down the work in searching for repeated positions. Have a look at http://cube20.org/ for ideas.
You can always try a cryptographic hash function. Since your problem is not a question of security (there is no attacker purposely trying to find distinct states which hash to the same value), you can use a broken hash function. I recommend trying MD4, which is quite fast. Your 54-character string is quite appropriate for MD4 input (MD4 can process inputs up to 55 bytes as a single block).
A basic 2.4 GHz PC can hash about 12 millions such strings per second, using a single core, with a simple unrolled C implementation (e.g. one which would look like the MD4Transform() function in the sample code included in RFC 1320). This may be enough for your needs.
1) Don't Use A Hash
You have 9*6 = 54 separate faces on a rubik cube. Even wastefully using 1 byte per face this is 432 bits, so hashing won't save you too much space. A better packing of 3 bits per face comes to 162 bits (21 bytes). It sounds to me like you need a compact way to represent the rubik.
OTOH, if you are looking to store a set of many many previously-visited states then I've found that using a bloom filter instead of a true set gets me decent results (but often non-optimal) with much lower space utilization.
2) If you are married to the idea of a hash:
Just use MD5, its slightly more compact than the proposed rubik states, rather fast, and has good collision properties - it's not like you have a malicious adversary trying to cause rubik cube hash collisions ;-).
EDIT: Using cryptographic hash functions, such as MD4/MD5, is usually simple once you have a library or function implementing the algorithm (ex: OpenSSL, GNU TLS, and many stand-alone implementations exist). Usually the function is something like void md5(unsigned char *buf, size_t len, unsigned char *digest) where digest points to a pre-allocated 16 byte buffer and buf is the data to be hashed (your rubik cube structure). Here is some untested C code:
#include <openssl/md5.h>
void main()
{
unsigned char digest[16];
unsigned char buf[BUFLEN];
initializeBuffer(buf);
MD5(buf,BUFLEN,digest); // This is the openssl function
printDigest(digest);
}
And be sure to compile/link with -lssl.
8 corner cubes:
You can assign each of these corners to 8 positions which each require 3 bits to determine which corner cube is at which position for a total of 24 bits.
You can further reduce this to just recording 7-of-8 positions as you can easily use a process of elimination to determine what the 8th corner is (for 21 bits).
However, this can be reduced further as the 8 corners can only be arranged in 8! = 40320 permutations and 40320 can be represented in 16 bits.
Each corner cube can be orientated correctly or be rotated 120° clockwise or anti-clockwise to be in three different positions (represented as 0, 1 and 2 respectively).
This requires 2 bits per corner to represent.
However, the sum of the orientations (modulo 3) is always 0; so, if you know 7-of-8 orientations then (assuming you have a solvable cube) you can calculate the orientation of the 8th corner (giving a total of 14 bits).
Or for a further reduction, seven ternary (base 3) digits can represent the orientation of the corners and this can be represented in 12 binary digits (bits).
So the corners cubes can be represented in 28 bits, if you want to decode the permutations, or in 33 bits, if you want to directly record the positions of 7-of-8 corners.
12 edge cubes:
Each can be represented in 4 bits (for a total of 48 bits) which can be reduced to 44 bits by only recording the position of 11-of-12 edges (for a total of 44 bits).
However, the 12! = 479001600 permutations of the edges can be stored in 29 bits.
Each edge can be either be oriented correctly or flipped:
This requires 1 bit to represent.
However, edges are always flipped in pairs so the parity of the flipped edges will always be zero (again, meaning that you only need to record 11-of-12 orientations for the edges) giving a total of 11 bits required.
So edge cubes can be represented in 40 bits, if you want to decode the permutations, or in 55 bits if you want to record all the positions and flips of 11-of-12 edges.
6 centre cubes
You do not need to record any information about the centre cubes - they are fixed relative to the ball at the centre of the Rubik's cube (so assuming you are not worried about the orientation of any logos on the cube) are immobile.
Total:
Using permutations: 68 bits
Using positions: 88 bits
Just to establish the theoretical minimum representation - the state space of a valid Rubik's cube is about 4.3*10^19. Log2(4.3*10^19) will then determine how many bits you need to represent that full space, the ceiling of which is 66. So in theory, if you could number every valid state, any given state could be uniquely represented in 66 bits.
While you may want to follow others' advice and find a more compact way of representing the cube, consider representing the state in terms of edge, corner, and face pieces. Due to the swapping laws of legal cube moves, you should be able to concatenate a sequence of 12 4-bit edge locations, 8 3-bit corner locations, and 6 3-bit face locations. This should result in a unique representation using 90 bits.
This representation may not be conducive to the way you are creating your tree, but it is unique, easily comparable, and should be possible to find given a state in your existing representation.