I am retrieving values from SFVoiceAnalytics "Pitch." My goal is to transform the data to the raw Fundamental Frequency. According to the documentation the values are returned log_e.
When I apply exp() to the values returned I get the following ranges:
Male voice: [0.25, 1.85], expected: [85, 180]
Female voice: [0.2,1.6], expected: [165, 255]
For sake of simplicity I am using Apple's sample code "Recognizing Speech in Live Audio."
Thanks for the help!!
Documentation: https://developer.apple.com/documentation/speech/sfvoiceanalytics/3229976-pitch
if let result = result {
// returned pitch values
for segment in result.bestTranscription.segments {
if let pitchSegment = segment.voiceAnalytics?.pitch.acousticFeatureValuePerFrame {
for p in pitchSegment {
let pitch = exp(p)
print(pitch)
}
}
}
// Update the text view with the results.
self.textView.text = result.bestTranscription.formattedString
isFinal = result.isFinal
}
I ran into a similar problem lately and ultimately used another solution to retrieve pitch data.
I went with a pitch detection library for Swift called Beethoven. It detects pitches in real-time, whereas the voice analytics of SFSpeechRecognizer only returns them once the transcription is complete.
Beethoven hasn't been updated to work with Swift 5, but I didn't find it too difficult to get it to work.
Also, upon digging up why the values in voiceAnalytics were as they were, I found out via the documentation that the pitch is a normalized pitch estimate:
The value is a logarithm (base e) of the normalized pitch estimate for each frame.
My interpretation of this is likely that the values were normalized (divided by) the fundamental frequency, so I'm not sure it's possible to use this data to recover the absolute frequencies. It seems best used to convey interval changes from pitch-to-pitch.
I have problems reading out an MTLTexture which has a pixel format of .rgba16Float, the main reason is that Swift does not seem to have a corresponding SIMD4 format.
For .rgba32Float I can simply use the SIMD4< Float > format, like so
if let texture = texture {
let region = MTLRegionMake2D(x, y, 1, 1)
let texArray = Array<SIMD4<Float>>(repeating: SIMD4<Float>(repeating: 0), count: 1)
texture.getBytes(UnsafeMutableRawPointer(mutating: texArray), bytesPerRow: (MemoryLayout<SIMD4<Float>>.size * texture.width), from: region, mipmapLevel: 0)
let value = texArray[0]
}
This works fine as the Swift Float data type is 32bit, how can I do the same for a 16 bit .rgba16Float texture ?
You can use vImage to convert the buffer from 16-bit to 32-bit float first. Check out vImageConvert_Planar16FtoPlanarF. But not that the documentation on the site is wrong (it's from another function...). I found this utility that demonstrate the process.
It would be more efficient, however, if you could use Metal to convert the texture into 32-bit float (or directly render into a 32-bit texture in the first place).
I've been going through documentation looking for an answer for this. I see that AudioKit can draw waveforms for in realtime as you record or playback, but I was wondering if you could load in a file and it draw the waveform in full so I can see the whole file's waveform without playing it back.
Any help would be greatly appreciated. Even just a pointer to what object I should look into.
You can also use the Objective C EZAudioPlot class which is in AudioKitUI:
let file = EZAudioFile(url: url)
guard let data = file?.getWaveformData() else { return }
let waveform = EZAudioPlot()
addSubview( waveform )
waveform.frame = NSMakeRect(0, 0, 200, 20)
waveform.plotType = EZPlotType.buffer
waveform.shouldFill = true
waveform.shouldMirror = true
waveform.color = NSColor.black
waveform.updateBuffer( data.buffers[0], withBufferSize: data.bufferSize )
I haven't benchmarked the two methods, but the plot version is very fast for longer files. For stereo files, make two stacked plots using buffers[0] and [1]
there is a waveform that is drawn in the tables playground example here: http://audiokit.io/playgrounds/Basics/Tables/
Basically:
let file = try AKAudioFile(readFileName: "drumloop.wav")
let fileTable = AKTable(file: file)
...UI set up...
addView(AKTableView(fileTable))
I have a 128 by 128 pixel image.
It's broken down into an 8 by 8 grid.
Each grid block contains 16 by 16 pixels.
Requirement
I want to count how many black pixels my image contains.
The straight forward way:
I could do this by going row by row, column by column, over the whole image and checking if the pixel was black or not.
The GPU way
...but I'd like to know if using the GPU, I could break down the image into chunks/blocks and count all pixels in each block and then sum the results.
For example:
If you look at the top left of the image:
First block, 'A1' (Row A, Column 1) contains a grid of 16 by 16 pixels, I know by counting them manually, there are 16 blacks pixels.
Second block: 'A2', (Row A, Column 2) contains a grid of 16 by 16 pixels, I know by counting them manually, there are 62 blacks pixels.
All other blocks for this example are blank/empty.
If I ran my image through my program, I should get the answer: 16 + 62 = 78 Black pixels.
Reasoning
It's my understanding that the GPU can operate on a lot of data in parallel, effectively running a small program on a chunk of data spread across multiple GPU threads.
I'm not worried about speed/performance, I'd just like to know if this is something the GPU can/could do?
Indeed, General Purpose GPUs (such as those in Apple devices from the A8 on, for example) are not only capable but also intended to be able to solve such parallel data processing problems.
Apple introduced Data-parallel-processing using Metal in their platforms, and with some simple code you can solve problems like yours using the GPU. Even if this can also be done using other frameworks, I am including some code for the Metal+Swift case as proof of concept.
The following runs as a Swift command line tool on OS X Sierra, and was built using Xcode 9 (yup, I know it's beta). You can get the full project from my github repo.
As main.swift:
import Foundation
import Metal
import CoreGraphics
import AppKit
guard FileManager.default.fileExists(atPath: "./testImage.png") else {
print("./testImage.png does not exist")
exit(1)
}
let url = URL(fileURLWithPath: "./testImage.png")
let imageData = try Data(contentsOf: url)
guard let image = NSImage(data: imageData),
let imageRef = image.cgImage(forProposedRect: nil, context: nil, hints: nil) else {
print("Failed to load image data")
exit(1)
}
let bytesPerPixel = 4
let bytesPerRow = bytesPerPixel * imageRef.width
var rawData = [UInt8](repeating: 0, count: Int(bytesPerRow * imageRef.height))
let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedFirst.rawValue).union(.byteOrder32Big)
let colorSpace = CGColorSpaceCreateDeviceRGB()
let context = CGContext(data: &rawData,
width: imageRef.width,
height: imageRef.height,
bitsPerComponent: 8,
bytesPerRow: bytesPerRow,
space: colorSpace,
bitmapInfo: bitmapInfo.rawValue)
let fullRect = CGRect(x: 0, y: 0, width: CGFloat(imageRef.width), height: CGFloat(imageRef.height))
context?.draw(imageRef, in: fullRect, byTiling: false)
// Get access to iPhone or iPad GPU
guard let device = MTLCreateSystemDefaultDevice() else {
exit(1)
}
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .rgba8Unorm,
width: Int(imageRef.width),
height: Int(imageRef.height),
mipmapped: true)
let texture = device.makeTexture(descriptor: textureDescriptor)
let region = MTLRegionMake2D(0, 0, Int(imageRef.width), Int(imageRef.height))
texture.replace(region: region, mipmapLevel: 0, withBytes: &rawData, bytesPerRow: Int(bytesPerRow))
// Queue to handle an ordered list of command buffers
let commandQueue = device.makeCommandQueue()
// Buffer for storing encoded commands that are sent to GPU
let commandBuffer = commandQueue.makeCommandBuffer()
// Access to Metal functions that are stored in Shaders.metal file, e.g. sigmoid()
guard let defaultLibrary = device.makeDefaultLibrary() else {
print("Failed to create default metal shader library")
exit(1)
}
// Encoder for GPU commands
let computeCommandEncoder = commandBuffer.makeComputeCommandEncoder()
// hardcoded to 16 for now (recommendation: read about threadExecutionWidth)
var threadsPerGroup = MTLSize(width:16, height:16, depth:1)
var numThreadgroups = MTLSizeMake(texture.width / threadsPerGroup.width,
texture.height / threadsPerGroup.height,
1);
// b. set up a compute pipeline with Sigmoid function and add it to encoder
let countBlackProgram = defaultLibrary.makeFunction(name: "countBlack")
let computePipelineState = try device.makeComputePipelineState(function: countBlackProgram!)
computeCommandEncoder.setComputePipelineState(computePipelineState)
// set the input texture for the countBlack() function, e.g. inArray
// atIndex: 0 here corresponds to texture(0) in the countBlack() function
computeCommandEncoder.setTexture(texture, index: 0)
// create the output vector for the countBlack() function, e.g. counter
// atIndex: 1 here corresponds to buffer(0) in the Sigmoid function
var counterBuffer = device.makeBuffer(length: MemoryLayout<UInt32>.size,
options: .storageModeShared)
computeCommandEncoder.setBuffer(counterBuffer, offset: 0, index: 0)
computeCommandEncoder.dispatchThreadgroups(numThreadgroups, threadsPerThreadgroup: threadsPerGroup)
computeCommandEncoder.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
// a. Get GPU data
// outVectorBuffer.contents() returns UnsafeMutablePointer roughly equivalent to char* in C
var data = NSData(bytesNoCopy: counterBuffer.contents(),
length: MemoryLayout<UInt32>.size,
freeWhenDone: false)
// b. prepare Swift array large enough to receive data from GPU
var finalResultArray = [UInt32](repeating: 0, count: 1)
// c. get data from GPU into Swift array
data.getBytes(&finalResultArray, length: MemoryLayout<UInt>.size)
print("Found \(finalResultArray[0]) non-white pixels")
// d. YOU'RE ALL SET!
Also, in Shaders.metal:
#include <metal_stdlib>
using namespace metal;
kernel void
countBlack(texture2d<float, access::read> inArray [[texture(0)]],
volatile device uint *counter [[buffer(0)]],
uint2 gid [[thread_position_in_grid]]) {
// Atomic as we need to sync between threadgroups
device atomic_uint *atomicBuffer = (device atomic_uint *)counter;
float3 inColor = inArray.read(gid).rgb;
if(inColor.r != 1.0 || inColor.g != 1.0 || inColor.b != 1.0) {
atomic_fetch_add_explicit(atomicBuffer, 1, memory_order_relaxed);
}
}
I used the question to learn a bit about Metal and data-parallel computing, so most of the code was used as boilerplate from articles online and edited. Please take the time to visit the sources mentioned below for some more examples. Also, the code is pretty much hardcoded for this particular problem, but you shouldn't have a lot of trouble adapting it.
Sources:
http://flexmonkey.blogspot.com.ar/2016/05/histogram-equalisation-with-metal.html
http://metalbyexample.com/introduction-to-compute/
http://memkite.com/blog/2014/12/15/data-parallel-programming-with-metal-and-swift-for-iphoneipad-gpu/
Your Question: I'd just like to know if this is something the GPU can/could do?
The Answer: Yes, GPU can handle your computation. All the numbers look very GPU friendly:
warp size: 32 (16x2)
Maximum number of threads per block: 1024 (8x128) (8x8x16)
Maximum number of threads per multiprocessor: 2048 ...etc
You could try many block/thread configurations to get an optimum performance.
The Procedure: Generally, using the GPU means that you copy data from CPU memory to GPU memory, you then perform calculations on the GPU, and finally you copy back the result to the CPU for further calculations. An important think to consider is that all this data transfer is done through the PCI-e link between CPU and GPU, that is very slow compared to both.
My Opinion: In this case, by the time it takes to copy the image to the GPU memory, you would have the result even if you used a lone CPU thread of computation. This is due because your process is not math/computationally intensive. You are just reading the data and comparing it to a black color and then adding an accumulator or counter to get a total (which itself raises a race condition that you'd have to resolve).
My Advice: If after analyzing (profiling) your whole program you think that this routine of getting the black pixel count is a real bottleneck, try:
a divide-and-conquer recursive algorithm, or
parallelizing your calculations in the multiple CPU cores.
There is a lot a GPU can do here.
I am not sure if you are looking for an algorithm here but I can point you to a wide used GPU library which implements an efficient counting procedure. Take a look at the count function within the thrust library: https://thrust.github.io/doc/group__counting.html
It works taking as an input a predicate function. It counts the number of occurrences of the input data which satisfy the predicate.
The follows counts the number of elements in data which are equals to zero.
template <typename T>
struct zero_pixel{
__host__ __device__ bool operator()(const T &x) const {return x == 0;}
};
thrust::count_if(data.begin(), data.end(), zero_pixel<T>())
A working example here: https://github.com/thrust/thrust/blob/master/testing/count.cu
You should code a predicate which tests whether a pixel is black or not (depending on what a pixel is for you (it could be an RGB triplet and in this case, the predicate should be a bit more elaborate).
I would also linearize the pixels into a linear and iterable data structure (but that depends on what your data actually is).
If you are interested in the histogram approach what you can do is sort the pixels of the image (using any GPU efficient algorithm or, why not the thrust implementation of sort, thrust::sort(...) ) data in order to group equal elements together and then perform a reduction by key thrust::reduce_by_key.
Take a look at this example: https://github.com/thrust/thrust/blob/master/examples/histogram.cu
Note that the histogram approach is a bit more costly because it solves a bigger problem (counts the number of occurrences of all the unique elements).
I am trying to do some computation on the raw PCM samples of a mp3 files I'm playing with an AVAudioEngine graph. I have a closure every 44100 samples that provides an AVAudioPCMBuffer. It has a property channelData of type UnsafePointer<UnsafeMutablePointer<Float>>?. I have not worked with pointers in Swift 3 and so I'm unclear how to access these Float values.
I have the following code but there are many issues:
audioPlayerNode.installTap(onBus: 0,
bufferSize: 1024,
format: audioPlayerNode.outputFormat(forBus: 0)) { (pcmBuffer, time) in
let numChans = Int(pcmBuffer.format.channelCount)
let frameLength = pcmBuffer.frameLength
if let chans = pcmBuffer.floatChannelData?.pointee {
for a in 0..<numChans {
let samples = chans[a]// samples is type Float. should be pointer to Floats.
for b in 0..<flength {
print("sample: \(b)") // should be samples[b] but that gives error as "samples" is Float
}
}
}
For instance, how do I iterate through the UnsafeMutablePointer<Floats which are N float pointers where N is the number of channels in the buffer. I could not find discussion on accessing buffer samples in the Apple Docs on this Class.
I think the main problem is let samples = chans[a]. Xcode says chans is of type UnsafeMutablePointer<Float>. But that should be NumChannels worth of those pointers. Which is why I use a in 0..<numChans to subscript it. Yet I get just Float when I do.
EDIT:
hm, seems using chans.advanced(by: a) instead of subscripting fixed things
Here is what I've found:
let arraySize = Int(buffer.frameLength)
let samples = Array(UnsafeBufferPointer(start: buffer.floatChannelData![0], count:arraySize))
This is assuming buffer is the name of your AVAudioPCMBuffer.
This way you can avoid pointers, which is likely much simpler. Now you can actually search through the data using a for loop.