Tempo and time signatures from MIDI - midi

I'm currently building a software for displaying music notes from MIDI file. I can get every letter of tones from NoteOn and NoteOff events but I don`t know how I get or how calculate types of notes (whole, half, eigth..) and other time signatures.
How I can get it? I looked for some example but without success.

MIDI doesn't represent notes in absolute quantities, like in classical music. Instead, the length of the note continues until a corresponding note off event is parsed (also it's quite common that MIDI files use a note on event with 0 velocity as a note off, just keep this in mind). So basically you will need to translate the time in ticks between the two events to musical time to know whether use a whole, half, quarter note, etc.
This translation obviously depends on knowing the tempo and time signature, which are MIDI meta events. More information about parsing those can be found here:
http://www.sonicspot.com/guide/midifiles.html
Basically you take the PPQ to find the number of milliseconds per tick, then use the time signature and tempo to find the length of a quarter note in milliseconds. There are some answers on StackOverflow with this conversion, but I'm writing this post on my phone and can't be bothered to look them up right now. :-)
Hope this points you in the right direction!

Related

How to find how long notes are in MIDI

I have one file that is 4/4 time, has 24 clocks per click, and 8 32nd notes per beat. I have no other information that could plausibly relate to the tempo. By experimentation, I think each tick (or whatever measurement delta time uses) is around one millisecond.
I have another file that is also 4/4 time, has 24 clocks per click, and 8 32nd notes per beat. It also has a tempo of 500000, which from what I can find is the default. By experimentation, each tick is about one 380th of a second.
Googling around was not helpful. I keep finding things that talk about stuff like pulses per quarter note. Which would be great if that was one of the numbers in the MIDI file. And they convert it to beats per minute, which isn't what I need. Though I do at least know what that means.
Is there an equation I can use to find how long a tick is using only numbers that are actually given in the MIDI file?
I'm using Mido to read the MIDI files, if that matters. Neither file has messages that fail to parse that would plausibly contain any missing information on tempo.

AudioKit AKMetronome callback timing seems imprecise or quantized

I'm new to AudioKit and digital audio in general, so I'm sure there must be something I'm missing.
I'm trying to get precise timing from AKMetronome by getting the timestamp of each callback. The timing seems to be quantized in some way though, and I don't know what it is.
Example: if my metronome is set to 120, each callback should be exactly 0.5 seconds apart. But if I calculate the difference from one tick to the next, I get this:
0.49145491666786256
0.49166241666534916
0.5104563333334227
0.4917322500004957
0.5104953749978449
0.49178879166720435
0.5103940000008151
0.4916401666669117
It's always one of 2 values, within a very small margin of error. I want to be able to calculate when the next tick is coming so I can trigger animation a few frames ahead, but this makes it difficult. Am I overlooking something?
edit: I came up with a solution since I originally posted this question, but I'm not sure if it's the only or best solution.
I set the buffer to the smallest size using AKSettings.BufferLength.veryShort
With the smallest buffer, the timestamp is always on within a millisecond or two. I'm still not sure though if I'm doing this right, or whether this is the intended behavior of the AKCallback. It seems like the callback should be on time even with a longer buffer.
Are you using Timer to calculate the time difference? From my point of view and based on my findings, the issue is related to the Timer that is not meant to be precise in ios see the thread (Accuracy of NSTimer).
Alternatively, you can look into AVAudioTime (https://audiokit.io/docs/Extensions/AVAudioTime.html)

Processing accelerometer data

I would like to know if there are some libraries/algorithms/techniques that help to extract the user context (walking/standing) from accelerometer data (extracted from any smartphone)?
For example, I would collect accelerometer data every 5 seconds for a definite period of time and then identify the user context (ex. for the first 5 minutes, the user was walking, then the user was standing for a minute, and then he continued walking for another 3 minutes).
Thank you very much in advance :)
Check new activity recognization apis
http://developer.android.com/google/play-services/location.html
its still a research topic,please look at this paper which discuss the algorithm
http://www.enggjournals.com/ijcse/doc/IJCSE12-04-05-266.pdf
I don't know of any such library.
It is a very time consuming task to write such a library. Basically, you would build a database of "user context" that you wish to recognize.
Then you collect data and compare it to those in the database. As for how to compare, see Store orientation to an array - and compare, the same holds for accelerometer.
Walking/running data is analogous to heart-rate data in a lot of ways. In terms of getting the noise filtered and getting smooth peaks, look into noise filtering and peak detection algorithms. The following is used to obtain heart-rate information for heart patients, it should be a good starting point : http://www.docstoc.com/docs/22491202/Pan-Tompkins-algorithm-algorithm-to-detect-QRS-complex-in-ECG
Think about how you want to filter out the noise and detect peaks; the filters will obviously depend on the raw data you gather, but it's good to have a general idea of what kind of filtering you'd want to do on your data. Think about what needs to be done once you have filtered data. In your case, think about how you would go about designing an algorithm to find out when the data indicates activity (like walking, running,etc.), and when it shows the user being stationary. This is a fairly challenging problem to solve, once you consider the dynamics of the device itself (how it's positioned when the user is walking/running), and the fact that there are very few (if not no) benchmarked algos that do this with raw smartphone data.
Start with determining the appropriate algorithms, and then tackle the complexities (mentioned above) one by one.

How can I Compare 2 Audio Files Programmatically?

I want to compare 2 audio files programmatically.
For example: I have a sound file in my iPhone app, and then I record another one. I want to check if the existing sound matches the recorded sound or not ( - similar to voice recognition).
How can I accomplish this?
Have a server doing audio fingerprinting computation that is not suitable for mobile device anyway. And then your mobile app uploads your files to the server and gets the analysis result for display. So I don't think programming language implementing it matters much. Following are a few AF implementations.
Java: http://www.redcode.nl/blog/2010/06/creating-shazam-in-java/
VC++: http://code.google.com/p/musicip-libofa/
C#: https://web.archive.org/web/20190128062416/https://www.codeproject.com/Articles/206507/Duplicates-detector-via-audio-fingerprinting
I know the question has been asked a long time ago, but a clear answer could help someone else.
The libraries from Echoprint ( website: echoprint.me/start ) will help you solve the following problems :
De-duplicate a big collection
Identify (Track, Artist ...) a song on a hard drive or on a server
Run an Echoprint server with your data
Identify a song on an iOS device
PS: For more music-oriented features, you can check the list of APIs here.
If you want to implement Fingerprinting by yourself, you should read the docs listed as references here, and probably have a look at musicip-libofa on Google Code
Hope this will help ;)
Apply bandpass filter to reduce noise
Normalize for amplitude
Calculate the cross-correlation
It can be fairly Mhz intensive.
The DSP details are in the well known text:
Digital Signal Processing by
Alan V. Oppenheim and Ronald W. Schafer
I think as well you may try to select a few second sample from both audio track, mnormalise them in amplitude and reduce noise with a band pass filter and after try to use a correlator.
for instance you may take a 5 second sample of one of the thwo and made it slide over the second one computing a cross corelation for any time you shift. (be carefull that if you take a too small pachet you may have high correlation when not expeced and you will soffer the side effect due to the croping of the signal and the crosscorrelation).
After yo can collect an array with al the results of the cross correlation and get the index of the maximun.
You should then set experimentally up threshould o decide when yo assume the pachet to b the same. this will change depending on the quality of the audio track you are comparing.
I implemented a correator to receive and distinguish preamble in wireless communication. My script is actually done in matlab. if you are interested i can try to find the common part and send it to you.
It would be a too long code to be pasted hene in the forum. if you want just let me know and i will send it to ya asap.
cheers

How to export sound from timeline of sounds on iOS with OpenAL

I'm not sure if it's possible to achieve what I want, but basically I have a NSDictionary which represents a recording. It's a timeline of what sound id was played at what point in time.
I have it so that you can play back this timeline/recording, and it works perfectly.
I'm wondering if there is anyway to take this timeline, and export it as a single sound that could be saved to a computer if the device was synced with iTunes.
So basically I'm asking if I can take a timeline of sounds, play it back and have these sounds stitched together as a single sound, that can then be exported.
I'm using OpenAL as my sound framework and the sound files are all CAFs.
Any help or guidance is appreciated.
Thanks!
You will need:
A good understanding of linear PCM audio format (See Wikipedia's Linear PCM page).
A good understanding of audio sample-rates and some basic maths to convert your timings into sample-offsets.
An awareness of how two's-complement binary numbers (signed/unsigned, 16-bit, 32-bit, etc.) are stored in computers, and how the endian-ness of a processor affects this.
Patience, interest in learning, and a strong desire to get this working.
Here's what to do:
Enable file sharing in your app (UIFileSharingEnabled=YES in info.plist and write files to /Documents directory).
Render the used sounds into memory buffers containing linear PCM audio data (if they are not already, i.e. if they are compressed). You can do this using the offline rendering functionality of Audio Queues (see Apple audio queue docs). It will make things a lot easier if you render them all to the same PCM format and sample rate (For example 16-bit signed samples #44,100Hz, I'll use this format for all examples), and use the same format for your output. I recommend starting off with a Mono format then adding stereo once you get it working.
Choose an uncompressed output format and mix your sounds into a single stream:
3.1. Allocate a buffer large enough, or open a file stream to write to.
3.2. Write out any headers (for example if using WAV format output instead of raw PCM) and write zeros (or the mid-point of your sample range if not using a signed sample format) for any initial silence before your first sound starts. For example if you want 0.1 seconds silence before your first sound, write 4410 (0.1 * 44100) zero-samples i.e. write 4410 shorts (16-bit) all with zero.
3.3. Now keep track of all 'currently playing' sounds and mix them together. Start with an empty list of 'currently playing sounds and keep track of the 'current time' of the sample you are mixing, for each sample you write out increment the 'current time' by 1.0/sample_rate. When it gets time for another sound to start, add it to the 'currently playing' list with a sample offset of 0. Now to do the mixing, you iterate through all of the 'currently playing' sounds and add together their current sample, then increment the sample offset for each of them. Write the summed value into the output buffer. For example if soundA starts at 0.1 seconds (after the silence) and soundB starts at 0.2 seconds, you will be doing the equivalent of output[8820] = soundA[4410] + soundB[0]; for sample 8820 and then output[8821] = soundA[4411] + soundB[1]; for sample 8821, etc. As a sound ends (you get to the end of its samples) simply remove it from the 'currently playing' list and keep going until the end of your audio data.
3.4. The simple mixing (sum of samples) described above does have some problems. For example if two samples have values that add up to a number larger than 32767, this cannot be stored in a signed-16-bit number, this is called clipping. For now, just clamp the value to 32767, and get it working... later on come back and implement a simple limiter (see description at end).
Now that you have a mixed version of your track in an uncompressed linear PCM format, that might be enough, so write it to /Documents. If you want to write it in a compressed format, you will need to get the source for an audio encoder and run your linear PCM output through that.
Simple limiter:
Let's choose to limit the top 10% of the sample range, so if the absolute value is greater than 29490 (int limitBegin = (int)(32767 * 0.9f);) we will scale down the value. The maximum possible peak would be int maxSampleValue = 32767 * numPlayingSounds; and we want to scale values above limitBegin to peak at 32767. So do the summation into sampleValue as per the very simple mixer described above, then:
if(sampleValue > limitBegin)
{
float overLimit = (sampleValue - limitBegin) / (float)(maxSampleValue - limitBegin);
sampleValue = limitBegin + (int)(overLimit * (32767 - limitBegin));
}
If you're paying attention, you will have noticed that when numPlayingSounds changes (for example when a new sound starts), the limiter becomes more (or less) harsh and this may result in abrupt volume changes (within the limited range) to accommodate the extra sound. You can use the maximum number of playing sounds instead, or devise some clever way to ramp up the limiter over a few milliseconds.
Remember that this is operating on the absolute value of sampleValue (which may be negative in signed formats), so the code here is just to demonstrate the idea. You'll need to write it properly to handle limiting at both ends (peak and trough) of your sample range. Also, there are some tricks you can do to optimize all of the above during the mixing - you will probably spot these while you're writing the mixer, be careful and get it working first, then go back and refactor/optimize if needed.
Also remember to consider the endian-ness of the platform you are using and the file-format you are writing to, as you may need to do some byte-swapping.
One approach which isn't too hard if your files are stored in a simple format is just to combine them together manually. That is, create a new file with the caf format and manually put together the pieces you want.
This will be really easy if the sounds are uncompressed (linear PCM). But, read the documents on the caf file format here:
http://developer.apple.com/library/mac/#documentation/MusicAudio/Reference/CAFSpec/CAF_spec/CAF_spec.html#//apple_ref/doc/uid/TP40001862-CH210-SW1