I am retrieving values from SFVoiceAnalytics "Pitch." My goal is to transform the data to the raw Fundamental Frequency. According to the documentation the values are returned log_e.
When I apply exp() to the values returned I get the following ranges:
Male voice: [0.25, 1.85], expected: [85, 180]
Female voice: [0.2,1.6], expected: [165, 255]
For sake of simplicity I am using Apple's sample code "Recognizing Speech in Live Audio."
Thanks for the help!!
Documentation: https://developer.apple.com/documentation/speech/sfvoiceanalytics/3229976-pitch
if let result = result {
// returned pitch values
for segment in result.bestTranscription.segments {
if let pitchSegment = segment.voiceAnalytics?.pitch.acousticFeatureValuePerFrame {
for p in pitchSegment {
let pitch = exp(p)
print(pitch)
}
}
}
// Update the text view with the results.
self.textView.text = result.bestTranscription.formattedString
isFinal = result.isFinal
}
I ran into a similar problem lately and ultimately used another solution to retrieve pitch data.
I went with a pitch detection library for Swift called Beethoven. It detects pitches in real-time, whereas the voice analytics of SFSpeechRecognizer only returns them once the transcription is complete.
Beethoven hasn't been updated to work with Swift 5, but I didn't find it too difficult to get it to work.
Also, upon digging up why the values in voiceAnalytics were as they were, I found out via the documentation that the pitch is a normalized pitch estimate:
The value is a logarithm (base e) of the normalized pitch estimate for each frame.
My interpretation of this is likely that the values were normalized (divided by) the fundamental frequency, so I'm not sure it's possible to use this data to recover the absolute frequencies. It seems best used to convey interval changes from pitch-to-pitch.
Related
I have a non-geographic map aka flat image using CRS.Simple extended by a custom transformation. Everything works fine so far, but now I want to add a distance measurement button. I'm confident I could implement a distance measurement between two markers myself, but the dynamic line drawing and measuring is still a bit above my skills, so I hoped I could use a plugin. None of the ones I found, did offer this though. After looking at the plugins page of leaflet, I tried this fork https://github.com/aprilandjan/leaflet.measure of leaflet.measure originally by https://github.com/jtreml/leaflet.measure as it seemed to offer the ability to add custom units - in my case pixels.
I added this:
L.control.measure({
// distance formatter, output mile instead of km
formatDistance: function (val) {
return Math.round(1000 * val / scaleFactor) / 1000 + 'mapUnits';
}
}).addTo(map)
Unfortunately, the result is a number far too big compared to the pixelsize of the map (4096x4096). distance() returns the expected 1414.213562373095 between a point 1000,1000 and one at 2000,2000. Calculating distanctTo returns 8009572.105082839 instead though. I use this at the beginning of my file
var yx = L.latLng;
var xy = function(x, y) {
if (L.Util.isArray(x)) { // When doing xy([x, y]);
return yx(x[1], x[0]);
}
return yx(y, x); // When doing xy(x, y);
};
If I log val to the console, I get things like this:
20411385.176805027
7118674.47741132
20409736.502863288
7117025.8034695815
20409186.004645467
20409736.502863288
That's likely some problem of the function trying to calculate latlng without a proper reference system.
Anyone got an idea how to solve this? I feel like it can't be overly difficult, but I don't know exactly where to start.
I found a way to do it, even though it feels a bit 'hacky':
I replaced the line
var distance = e.latlng.distanceTo(this._lastPoint)
in the _mouseMove and the _mouseClick events of leaflet.measure with
var currentPoint = e.latlng;
var lastPoint = this._lastPoint;
var distance = map.distance(currentPoint, lastPoint);
as the distance() method of the map returns meters, or in the case of a flat image, pixel values. And those we can translate in whatever unit we want in our flat image.
If anyone has a more elegant way, I'm all ears!
I have written a sample project in Swift to try out the relatively new Core Audio V3 API stuff. Everything seems to work around creating a custom Audio Unit and loading it in process. But the actual audio rendering isn't going so well. I've often read that the rendering code needs to be in C or C++ but I've also heard Swift is fast and thought I could write some minimal audio rendering code in it.
the rendering code
override var internalRenderBlock: AUInternalRenderBlock {
get {
return {
(_ actionFlags: UnsafeMutablePointer<AudioUnitRenderActionFlags>,
_ timeStamp: UnsafePointer<AudioTimeStamp>,
_ frameCount: AUAudioFrameCount,
_ outputBusNumber: Int,
_ bufferList: UnsafeMutablePointer<AudioBufferList>,
_ renderEvent: UnsafePointer<AURenderEvent>?,
_ pull: AudioToolbox.AURenderPullInputBlock?) -> AUAudioUnitStatus in
let bufferList = bufferList.pointee
let theBuffers = bufferList.mBuffers // only one (AudioBuffer) ??
guard let theBufferData = theBuffers.mData?.assumingMemoryBound(to: Float.self) else {
return 1 // come up with better error?
}
let amountFrames = Int(frameCount)
for frame in 0...amountFrames / 2 {
let frame = theBufferData.advanced(by: frame)
frame.pointee = sin(self.phase)
self.phase += 0.0001
}
return noErr
}
}
}
Sounds Bad
The resulting sound is not what I'd expect. My initial thoughts are that Swift is the wrong choice. Yet Interestingly, AudioToolbox does provide a typealias for this AUAudioUnit's rendering property which looks like:
public typealias AUInternalRenderBlock = (UnsafeMutablePointer<AudioUnitRenderActionFlags>, UnsafePointer<AudioTimeStamp>, AUAudioFrameCount, Int, UnsafeMutablePointer<AudioBufferList>, UnsafePointer<AURenderEvent>?, AudioToolbox.AURenderPullInputBlock?) -> AUAudioUnitStatus
This would lead me to believe that it is perhaps possible to write rendering code in Swift.
observed problems
But still, there are a few things going wrong here. (aside from my obvious lack of competency with Swift memory management stuff).
A) despite theBuffers saying that its mNumberOfBuffers is 2, theBuffers winds up not being an array but rather of type (AudioBuffer). I don't understand the need for parenthesis. I can't find a second AudioBuffer.
B) more importantly, when I write a basic sin wave to the one AudioBuffer I can access, the resulting sound is distorted and inconsistent. Could this be Swift's fault? Is it just impossible to write any audio unit rendering code in Swift? Or have a made some assumptions here that is breaking my rendering somehow?
Finally
If it is simply the case that writing this part in Swift is infeasible, then I would like to have some resources on interoperating Swift and C for Audio Unit rendering blocks. So, could the property returning the closure be written in Swift, but the closure's implementation calls down into C? or does the property have to simply return a C function whose prototype matches the closure's type?
Thanks in advance.
The rest of this project can be seen here for context.
The main reason that you were listening a distorted sound was that the phase increment of 0.0001 is too small, which would take 62832 samples to fill up one period of the sine wave -- merely 0.70 hertz! (Assuming your sample rate is 44100)
In addition to the ultra-low-frequency sine wave, you were listening to a sound of about 44100 / 512 = 86.1 Hz, because you were filling only the half of the audio buffer (amountFrames / 2). So the sound was a near-rectangular wave of the period of your audio rendering period, with slowly varying amplitude in about 0.70 Hz.
I could write a working sine wave generator unit based on your code:
override var internalRenderBlock: AUInternalRenderBlock {
return { ( _, _, frameCount, _, bufferList, _, _) in
let srate = Float(self.bus.format.sampleRate)
var phase = self.phase
for buffer in UnsafeMutableAudioBufferListPointer(bufferList) {
phase = self.phase
assert(buffer.mNumberChannels == 1, "interleaved channel not supported")
let frames = buffer.mData!.assumingMemoryBound(to: Float.self)
for i in 0 ..< Int(frameCount) {
frames[i] = sin(phase)
phase += 2 * .pi * 440 / srate // 440 Hz
if phase > 2 * .pi {
phase -= 2 * .pi // to avoid floating point inaccuracy
}
}
}
self.phase = phase
return noErr
}
}
Regarding the observed problem A, the AudioBufferList is a wrapper for variable length C struct, where the first field mNumberBuffers indicates the number of buffers (i.e. number of non-interleaved channels), and the second field is a variable length array:
typedef struct AudioBufferList {
UInt32 mNumberBuffers;
AudioBuffer mBuffers[1];
} AudioBufferList;
The user of this struct, in Objective-C or C++, is expected to allocate mNumberBuffers * sizeof(AudioBuffer) bytes, which is enough for storing multiple mBuffers. Since C does not perform boundary checks on arrays, the users could just write mBuffers[1] or mBuffers[2] to access the second or third buffer.
Because Swift doesn't have this variable length array feature, Apple provides UnsafeMutableAudioBufferListPointer, which can be used like a Swift collection of AudioBuffers; I used this in the outer for loop above.
Finally, I tried not to access self in the innermost loop in the code, because accessing a Swift or Objective-C object might involve unexpected lags, which was the reason why Apple recommends writing rendering loop in C/C++. But for simple cases like this, I would say writing in Swift is a lot easier and the latency is still manageable.
I'm trying to fill an AVAudioPCMBuffer programmatically in Swift to build a metronome. This is the first real app I'm trying to build, so it's also my first audio app. Right now I'm experimenting with different frameworks and methods of getting the metronome looping accurately.
I'm trying to build an AVAudioPCMBuffer with the length of a measure/bar so that I can use the .Loops option of the AVAudioPlayerNode's scheduleBuffer method. I start by loading my file(2 ch, 44100 Hz, Float32, non-inter, *.wav and *.m4a both have same issue) into a buffer, then copying that buffer frame by frame separated by empty frames into the barBuffer. The loop below is how I'm accomplishing this.
If I schedule the original buffer to play, it will play back in stereo, but when I schedule the barBuffer, I only get the left channel. As I said I'm a beginner at programming, and have no experience with audio programming, so this might be my lack of knowledge on 32 bit float channels, or on this data type UnsafePointer<UnsafeMutablePointer<float>>. When I look at the floatChannelData property in swift, the description makes it sound like this should be copying two channels.
var j = 0
for i in 0..<Int(capacity) {
barBuffer.floatChannelData.memory[j] = buffer.floatChannelData.memory[i]
j += 1
}
j += Int(silenceLengthInSamples)
// loop runs 4 times for 4 beats per bar.
edit: I removed the glaring mistake i += 1, thanks to hotpaw2. The right channel is still missing when barBuffer is played back though.
Unsafe pointers in swift are pretty weird to get used to.
floatChannelData.memory[j] only accesses the first channel of data. To access the other channel(s), you have a couple choices:
Using advancedBy
// Where current channel is at 0
// Get a channel pointer aka UnsafePointer<UnsafeMutablePointer<Float>>
let channelN = floatChannelData.advancedBy( channelNumber )
// Get channel data aka UnsafeMutablePointer<Float>
let channelNData = channelN.memory
// Get first two floats of channel channelNumber
let floatOne = channelNData.memory
let floatTwo = channelNData.advancedBy(1).memory
Using Subscript
// Get channel data aka UnsafeMutablePointer<Float>
let channelNData = floatChannelData[ channelNumber ]
// Get first two floats of channel channelNumber
let floatOne = channelNData[0]
let floatTwo = channelNData[1]
Using subscript is much clearer and the step of advancing and then manually
accessing memory is implicit.
For your loop, try accessing all channels of the buffer by doing something like this:
for i in 0..<Int(capacity) {
for n in 0..<Int(buffer.format.channelCount) {
barBuffer.floatChannelData[n][j] = buffer.floatChannelData[n][i]
}
}
Hope this helps!
This looks like a misunderstanding of Swift "for" loops. The Swift "for" loop automatically increments the "i" array index. But you are incrementing it again in the loop body, which means that you end up skipping every other sample (the Right channel) in your initial buffer.
This is an extension of my previous question: https://dsp.stackexchange.com/questions/28095/choosing-low-pass-filter-parameters
I am recording people from an overheard camera. I have tracks of each's head using some software. I want to periodicity from tracks due to head wobbling.
I apply low-pass butterworth filter. I want the starting point and ending point of the filtered to be same as unfiltered tracks.
Data:
K>> [xcor_i,ycor_i ]
ans =
-101.7000 -77.4040
-102.4200 -77.4040
-103.6600 -77.4040
-103.9300 -76.6720
-103.9900 -76.5130
-104.0000 -76.4780
-105.0800 -76.4710
-106.0400 -77.5660
-106.2500 -77.8050
-106.2900 -77.8570
-106.3000 -77.8680
-106.3000 -77.8710
-107.7500 -78.9680
-108.0600 -79.2070
-108.1200 -79.2590
-109.9500 -80.3680
-111.4200 -80.6090
-112.8200 -81.7590
-113.8500 -82.3750
-115.1500 -83.2410
-116.1500 -83.4290
-116.3700 -83.8360
-117.5000 -84.2910
-117.7400 -84.3890
-118.8800 -84.7770
-119.8400 -85.2270
-121.1400 -85.3250
-123.2200 -84.9800
-125.4700 -85.2710
-127.0400 -85.7000
-128.8200 -85.7930
-130.6500 -85.8130
-132.4900 -85.8180
-134.3300 -86.5500
-136.1700 -87.0760
-137.6500 -86.0920
-138.6900 -86.9760
-140.3600 -87.9000
-142.1600 -88.4660
-144.7200 -89.3210
Code(answer by #SleuthEye):
dataOut_x = xcor_i(1)+filter(b,a,xcor_i-xcor_i(1));
dataOut_y = ycor_i(1)+filter(b,a,ycor_i-ycor_i(1));
Output:
In the above example, the endpoint(to the left) is different for filtered and unfiltered tracks. How can I ensure it is same?
Your question is pretty ambiguous, and doesn't really have a specific question. I'm assuming you want to have your filtered data start at the same points as the measured data, but are unsure why this is not happening already, and how to do so.
A low pass filter is a filter which lowers the effect of rapid changes. One way of doing this, and the method which appears to be used here, is by using a rolling average. A rolling average is simply an average (mean) of the previous data points. It looks like you are using a rolling average of 5 data points. Therefore you need five points of raw data before your filter will give you a single data point.
-101.7000 -77.4040 }
-102.4200 -77.4040 } }
-103.6600 -77.4040 } }
-103.9300 -76.6720 } }
-103.9900 -76.5130 } Filter point 1. }
-104.0000 -76.4780 } Filter point 2.
-105.0800 -76.4710
-106.0400 -77.5660
-106.2500 -77.8050
-106.2900 -77.8570
-106.3000 -77.8680
-106.3000 -77.8710
In order to solve this problem, you could just append the first data point to the data set four times, as this means that the filter will produce the same number of points. This is a pretty rough solution, however, as you are creating new data. This could be achieved quite simply, for example if your dataset is called myArray:
firstEntry = myArray(1,:);
myNewArray = [firstEntry; firstEntry; firstEntry; firstEntry; myArray];
This will create four data points equal to your first data point, which should then allow you to apply the low pass filter to your data, and have it start at the same point.
Hope this helps, although it's worth bearing in mind that filtering ALWAYS results in a loss of data.
Because you don't want to implement it but want someone else to:
The theory as above is correct, but instead you need to add 2 values at the end of your vectors:
x_last = xcor_i(end);
y_last = ycor_i(end);
xcor_i = [xcor_i;x_last;x_last];
ycor_i = [ycor_i;y_last;y_last];
This gives the following:
As you can see the ends are pretty close to being the same now.
I'm looking into developing an iPhone app that will potentially involve a "simple" analysis of audio it is receiving from the standard phone mic. Specifically, I am interested in the highs and lows the mic pics up, and really everything in between is irrelevant to me. Is there an app that does this already (just so I can see what its capable of)? And where should I look to get started on such code? Thanks for your help.
Look in the Audio Queue framework. This is what I use to get a high water mark:
AudioQueueRef audioQueue; // Imagine this is correctly set up
UInt32 dataSize = sizeof(AudioQueueLevelMeterState) * recordFormat.mChannelsPerFrame;
AudioQueueLevelMeterState *levels = (AudioQueueLevelMeterState*)malloc(dataSize);
float channelAvg = 0;
OSStatus rc = AudioQueueGetProperty(audioQueue, kAudioQueueProperty_CurrentLevelMeter, levels, &dataSize);
if (rc) {
NSLog(#"AudioQueueGetProperty(CurrentLevelMeter) returned %#", rc);
} else {
for (int i = 0; i < recordFormat.mChannelsPerFrame; i++) {
channelAvg += levels[i].mPeakPower;
}
}
free(levels);
// This works because one channel always has an mAveragePower of 0.
return channelAvg;
You can get peak power in either dB Free Scale (with kAudioQueueProperty_CurrentLevelMeterDB) or simply as a float in the interval [0.0, 1.0] (with kAudioQueueProperty_CurrentLevelMeter).
Don't forget to activate level metering for AudioQueue first:
UInt32 d = 1;
OSStatus status = AudioQueueSetProperty(mQueue, kAudioQueueProperty_EnableLevelMetering, &d, sizeof(UInt32));
Check the 'SpeakHere' sample code. it will show you how to record audio using the AudioQueue API. It also contains some code to analyze the audio realtime to show a level meter.
You might actually be able to use most of that level meter code to respond to 'highs' and 'lows'.
The AurioTouch example code performs Fourier analysis
on the mic input. Could be a good starting point:
https://developer.apple.com/iPhone/library/samplecode/aurioTouch/index.html
Probably overkill for your application.