I have problems reading out an MTLTexture which has a pixel format of .rgba16Float, the main reason is that Swift does not seem to have a corresponding SIMD4 format.
For .rgba32Float I can simply use the SIMD4< Float > format, like so
if let texture = texture {
let region = MTLRegionMake2D(x, y, 1, 1)
let texArray = Array<SIMD4<Float>>(repeating: SIMD4<Float>(repeating: 0), count: 1)
texture.getBytes(UnsafeMutableRawPointer(mutating: texArray), bytesPerRow: (MemoryLayout<SIMD4<Float>>.size * texture.width), from: region, mipmapLevel: 0)
let value = texArray[0]
}
This works fine as the Swift Float data type is 32bit, how can I do the same for a 16 bit .rgba16Float texture ?
You can use vImage to convert the buffer from 16-bit to 32-bit float first. Check out vImageConvert_Planar16FtoPlanarF. But not that the documentation on the site is wrong (it's from another function...). I found this utility that demonstrate the process.
It would be more efficient, however, if you could use Metal to convert the texture into 32-bit float (or directly render into a 32-bit texture in the first place).
Related
I am creating an app to create and print patterns in real size. It works great on the screen since I can get screenwidth and ppi from GeometryReader and CGDisplayScreenSize. However, you can't cut a pattern out of the screen. I need to do it on normal printers and I need precision down to 0.004" or 0.1mmm.
My only real options for formatting are PDF and EPS. My first read on PDF indicated that 72dpi is a reliable dpi for PDF. Deeper reading and experimentation have shown me that is not the case. Experimentation has shown me that 75dpi produces better results. It's no coincidence that 300, 600 and 1200 are multiples of 75.
I can't find a way in PDFKit to set a resolution. Can anybody provide guidance or alternative approaches? The way I create the PDF right now is;
let output = URL(fileURLWithPath: "PaperPattern.pdf", relativeTo: FileManager.default.temporaryDirectory)
let page = PDFPage()
if let gc = CGContext(output as CFURL, mediaBox: nil, nil)
{
gc.beginPDFPage(nil)
drawPattern(path: gc, hp: hp) //hp contains a series of points based on a fixed dpi resolution
page.draw(with: .mediaBox, to: gc)
gc.endPDFPage()
gc.closePDF()
}
let doc: PDFDocument = PDFDocument(url: output)!
doc.insert(page, at: 0)
doc.removePage(at: 0)
return doc
I have a 128 by 128 pixel image.
It's broken down into an 8 by 8 grid.
Each grid block contains 16 by 16 pixels.
Requirement
I want to count how many black pixels my image contains.
The straight forward way:
I could do this by going row by row, column by column, over the whole image and checking if the pixel was black or not.
The GPU way
...but I'd like to know if using the GPU, I could break down the image into chunks/blocks and count all pixels in each block and then sum the results.
For example:
If you look at the top left of the image:
First block, 'A1' (Row A, Column 1) contains a grid of 16 by 16 pixels, I know by counting them manually, there are 16 blacks pixels.
Second block: 'A2', (Row A, Column 2) contains a grid of 16 by 16 pixels, I know by counting them manually, there are 62 blacks pixels.
All other blocks for this example are blank/empty.
If I ran my image through my program, I should get the answer: 16 + 62 = 78 Black pixels.
Reasoning
It's my understanding that the GPU can operate on a lot of data in parallel, effectively running a small program on a chunk of data spread across multiple GPU threads.
I'm not worried about speed/performance, I'd just like to know if this is something the GPU can/could do?
Indeed, General Purpose GPUs (such as those in Apple devices from the A8 on, for example) are not only capable but also intended to be able to solve such parallel data processing problems.
Apple introduced Data-parallel-processing using Metal in their platforms, and with some simple code you can solve problems like yours using the GPU. Even if this can also be done using other frameworks, I am including some code for the Metal+Swift case as proof of concept.
The following runs as a Swift command line tool on OS X Sierra, and was built using Xcode 9 (yup, I know it's beta). You can get the full project from my github repo.
As main.swift:
import Foundation
import Metal
import CoreGraphics
import AppKit
guard FileManager.default.fileExists(atPath: "./testImage.png") else {
print("./testImage.png does not exist")
exit(1)
}
let url = URL(fileURLWithPath: "./testImage.png")
let imageData = try Data(contentsOf: url)
guard let image = NSImage(data: imageData),
let imageRef = image.cgImage(forProposedRect: nil, context: nil, hints: nil) else {
print("Failed to load image data")
exit(1)
}
let bytesPerPixel = 4
let bytesPerRow = bytesPerPixel * imageRef.width
var rawData = [UInt8](repeating: 0, count: Int(bytesPerRow * imageRef.height))
let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedFirst.rawValue).union(.byteOrder32Big)
let colorSpace = CGColorSpaceCreateDeviceRGB()
let context = CGContext(data: &rawData,
width: imageRef.width,
height: imageRef.height,
bitsPerComponent: 8,
bytesPerRow: bytesPerRow,
space: colorSpace,
bitmapInfo: bitmapInfo.rawValue)
let fullRect = CGRect(x: 0, y: 0, width: CGFloat(imageRef.width), height: CGFloat(imageRef.height))
context?.draw(imageRef, in: fullRect, byTiling: false)
// Get access to iPhone or iPad GPU
guard let device = MTLCreateSystemDefaultDevice() else {
exit(1)
}
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .rgba8Unorm,
width: Int(imageRef.width),
height: Int(imageRef.height),
mipmapped: true)
let texture = device.makeTexture(descriptor: textureDescriptor)
let region = MTLRegionMake2D(0, 0, Int(imageRef.width), Int(imageRef.height))
texture.replace(region: region, mipmapLevel: 0, withBytes: &rawData, bytesPerRow: Int(bytesPerRow))
// Queue to handle an ordered list of command buffers
let commandQueue = device.makeCommandQueue()
// Buffer for storing encoded commands that are sent to GPU
let commandBuffer = commandQueue.makeCommandBuffer()
// Access to Metal functions that are stored in Shaders.metal file, e.g. sigmoid()
guard let defaultLibrary = device.makeDefaultLibrary() else {
print("Failed to create default metal shader library")
exit(1)
}
// Encoder for GPU commands
let computeCommandEncoder = commandBuffer.makeComputeCommandEncoder()
// hardcoded to 16 for now (recommendation: read about threadExecutionWidth)
var threadsPerGroup = MTLSize(width:16, height:16, depth:1)
var numThreadgroups = MTLSizeMake(texture.width / threadsPerGroup.width,
texture.height / threadsPerGroup.height,
1);
// b. set up a compute pipeline with Sigmoid function and add it to encoder
let countBlackProgram = defaultLibrary.makeFunction(name: "countBlack")
let computePipelineState = try device.makeComputePipelineState(function: countBlackProgram!)
computeCommandEncoder.setComputePipelineState(computePipelineState)
// set the input texture for the countBlack() function, e.g. inArray
// atIndex: 0 here corresponds to texture(0) in the countBlack() function
computeCommandEncoder.setTexture(texture, index: 0)
// create the output vector for the countBlack() function, e.g. counter
// atIndex: 1 here corresponds to buffer(0) in the Sigmoid function
var counterBuffer = device.makeBuffer(length: MemoryLayout<UInt32>.size,
options: .storageModeShared)
computeCommandEncoder.setBuffer(counterBuffer, offset: 0, index: 0)
computeCommandEncoder.dispatchThreadgroups(numThreadgroups, threadsPerThreadgroup: threadsPerGroup)
computeCommandEncoder.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
// a. Get GPU data
// outVectorBuffer.contents() returns UnsafeMutablePointer roughly equivalent to char* in C
var data = NSData(bytesNoCopy: counterBuffer.contents(),
length: MemoryLayout<UInt32>.size,
freeWhenDone: false)
// b. prepare Swift array large enough to receive data from GPU
var finalResultArray = [UInt32](repeating: 0, count: 1)
// c. get data from GPU into Swift array
data.getBytes(&finalResultArray, length: MemoryLayout<UInt>.size)
print("Found \(finalResultArray[0]) non-white pixels")
// d. YOU'RE ALL SET!
Also, in Shaders.metal:
#include <metal_stdlib>
using namespace metal;
kernel void
countBlack(texture2d<float, access::read> inArray [[texture(0)]],
volatile device uint *counter [[buffer(0)]],
uint2 gid [[thread_position_in_grid]]) {
// Atomic as we need to sync between threadgroups
device atomic_uint *atomicBuffer = (device atomic_uint *)counter;
float3 inColor = inArray.read(gid).rgb;
if(inColor.r != 1.0 || inColor.g != 1.0 || inColor.b != 1.0) {
atomic_fetch_add_explicit(atomicBuffer, 1, memory_order_relaxed);
}
}
I used the question to learn a bit about Metal and data-parallel computing, so most of the code was used as boilerplate from articles online and edited. Please take the time to visit the sources mentioned below for some more examples. Also, the code is pretty much hardcoded for this particular problem, but you shouldn't have a lot of trouble adapting it.
Sources:
http://flexmonkey.blogspot.com.ar/2016/05/histogram-equalisation-with-metal.html
http://metalbyexample.com/introduction-to-compute/
http://memkite.com/blog/2014/12/15/data-parallel-programming-with-metal-and-swift-for-iphoneipad-gpu/
Your Question: I'd just like to know if this is something the GPU can/could do?
The Answer: Yes, GPU can handle your computation. All the numbers look very GPU friendly:
warp size: 32 (16x2)
Maximum number of threads per block: 1024 (8x128) (8x8x16)
Maximum number of threads per multiprocessor: 2048 ...etc
You could try many block/thread configurations to get an optimum performance.
The Procedure: Generally, using the GPU means that you copy data from CPU memory to GPU memory, you then perform calculations on the GPU, and finally you copy back the result to the CPU for further calculations. An important think to consider is that all this data transfer is done through the PCI-e link between CPU and GPU, that is very slow compared to both.
My Opinion: In this case, by the time it takes to copy the image to the GPU memory, you would have the result even if you used a lone CPU thread of computation. This is due because your process is not math/computationally intensive. You are just reading the data and comparing it to a black color and then adding an accumulator or counter to get a total (which itself raises a race condition that you'd have to resolve).
My Advice: If after analyzing (profiling) your whole program you think that this routine of getting the black pixel count is a real bottleneck, try:
a divide-and-conquer recursive algorithm, or
parallelizing your calculations in the multiple CPU cores.
There is a lot a GPU can do here.
I am not sure if you are looking for an algorithm here but I can point you to a wide used GPU library which implements an efficient counting procedure. Take a look at the count function within the thrust library: https://thrust.github.io/doc/group__counting.html
It works taking as an input a predicate function. It counts the number of occurrences of the input data which satisfy the predicate.
The follows counts the number of elements in data which are equals to zero.
template <typename T>
struct zero_pixel{
__host__ __device__ bool operator()(const T &x) const {return x == 0;}
};
thrust::count_if(data.begin(), data.end(), zero_pixel<T>())
A working example here: https://github.com/thrust/thrust/blob/master/testing/count.cu
You should code a predicate which tests whether a pixel is black or not (depending on what a pixel is for you (it could be an RGB triplet and in this case, the predicate should be a bit more elaborate).
I would also linearize the pixels into a linear and iterable data structure (but that depends on what your data actually is).
If you are interested in the histogram approach what you can do is sort the pixels of the image (using any GPU efficient algorithm or, why not the thrust implementation of sort, thrust::sort(...) ) data in order to group equal elements together and then perform a reduction by key thrust::reduce_by_key.
Take a look at this example: https://github.com/thrust/thrust/blob/master/examples/histogram.cu
Note that the histogram approach is a bit more costly because it solves a bigger problem (counts the number of occurrences of all the unique elements).
I am trying to do some computation on the raw PCM samples of a mp3 files I'm playing with an AVAudioEngine graph. I have a closure every 44100 samples that provides an AVAudioPCMBuffer. It has a property channelData of type UnsafePointer<UnsafeMutablePointer<Float>>?. I have not worked with pointers in Swift 3 and so I'm unclear how to access these Float values.
I have the following code but there are many issues:
audioPlayerNode.installTap(onBus: 0,
bufferSize: 1024,
format: audioPlayerNode.outputFormat(forBus: 0)) { (pcmBuffer, time) in
let numChans = Int(pcmBuffer.format.channelCount)
let frameLength = pcmBuffer.frameLength
if let chans = pcmBuffer.floatChannelData?.pointee {
for a in 0..<numChans {
let samples = chans[a]// samples is type Float. should be pointer to Floats.
for b in 0..<flength {
print("sample: \(b)") // should be samples[b] but that gives error as "samples" is Float
}
}
}
For instance, how do I iterate through the UnsafeMutablePointer<Floats which are N float pointers where N is the number of channels in the buffer. I could not find discussion on accessing buffer samples in the Apple Docs on this Class.
I think the main problem is let samples = chans[a]. Xcode says chans is of type UnsafeMutablePointer<Float>. But that should be NumChannels worth of those pointers. Which is why I use a in 0..<numChans to subscript it. Yet I get just Float when I do.
EDIT:
hm, seems using chans.advanced(by: a) instead of subscripting fixed things
Here is what I've found:
let arraySize = Int(buffer.frameLength)
let samples = Array(UnsafeBufferPointer(start: buffer.floatChannelData![0], count:arraySize))
This is assuming buffer is the name of your AVAudioPCMBuffer.
This way you can avoid pointers, which is likely much simpler. Now you can actually search through the data using a for loop.
I'm aware of AVFoundation and its capture support (not too familiar though). However, I don't see any readily-accessible API to get pixel-by-pixel data (RGB-per-pixel or similar). I do recall reading in the docs that this is possible, but I don't really see how. So:
Can this be done? If so, how?
Would I be getting raw image data, or data that's been JPEG-compressed?
AV Foundation can give you back the raw bytes for an image captured by either the video or still camera. You need to set up an AVCaptureSession with an appropriate AVCaptureDevice and a corresponding AVCaptureDeviceInput and AVCaptureDeviceOutput (AVCaptureVideoDataOutput or AVCaptureStillImageOutput). Apple has some examples of this process in their documentation, and it requires some boilerplate code to configure.
Once you have your capture session configured and you are capturing data from the camera, you will set up a -captureOutput:didOutputSampleBuffer:fromConnection: delegate method, where one of the parameters will be a CMSampleBufferRef. That will have a CVImageBufferRef within it that you access via CMSampleBufferGetImageBuffer(). Using CVPixelBufferGetBaseAddress() on that pixel buffer will return the base address of the byte array for the raw pixel data representing your camera frame. This can be in a few different formats, but the most common are BGRA and planar YUV.
I have an example application that uses this here, but I'd recommend that you also take a look at my open source framework which wraps the standard AV Foundation boilerplate and makes it easy to perform image processing on the GPU. Depending on what you want to do with these raw camera bytes, I may already have something you can use there or a means of doing it much faster than with on-CPU processing.
lowp vec4 textureColor = texture2D(inputImageTexture, textureCoordinate);
float luminance = dot(textureColor.rgb, W);
mediump vec2 p = textureCoordinate;
if (p.x == 0.2 && p.x<0.6 && p.y > 0.4 && p.y<0.6) {
gl_FragColor = vec4(textureColor.r * 1.0, textureColor.g * 1.0, textureColor.b * 1.0, textureColor.a);
} else {
gl_FragColor = vec4(textureColor.r * 0.0, textureColor.g * 0.0, textureColor.b * 0.0, textureColor.a *0.0);
}
I have an Objective-C class (although I don't believe this is anything Obj-C specific) that I am using to write a video out to disk from a series of CGImages. (The code I am using at the top to get the pixel data comes right from Apple: http://developer.apple.com/mac/library/qa/qa2007/qa1509.html). I successfully create the codec and context - everything is going fine until it gets to avcodec_encode_video, when I get EXC_BAD_ACCESS. I think this should be a simple fix, but I just can't figure out where I am going wrong.
I took out some error checking for succinctness. 'c' is an AVCodecContext*, which is created successfully.
-(void)addFrame:(CGImageRef)img
{
CFDataRef bitmapData = CGDataProviderCopyData(CGImageGetDataProvider(img));
long dataLength = CFDataGetLength(bitmapData);
uint8_t* picture_buff = (uint8_t*)malloc(dataLength);
CFDataGetBytes(bitmapData, CFRangeMake(0, dataLength), picture_buff);
AVFrame *picture = avcodec_alloc_frame();
avpicture_fill((AVPicture*)picture, picture_buff, c->pix_fmt, c->width, c->height);
int outbuf_size = avpicture_get_size(c->pix_fmt, c->width, c->height);
uint8_t *outbuf = (uint8_t*)av_malloc(outbuf_size);
out_size = avcodec_encode_video(c, outbuf, outbuf_size, picture); // ERROR occurs here
printf("encoding frame %3d (size=%5d)\n", i, out_size);
fwrite(outbuf, 1, out_size, f);
CFRelease(bitmapData);
free(picture_buff);
free(outbuf);
av_free(picture);
i++;
}
I have stepped through it dozens of times. Here are some numbers...
dataLength = 408960
picture_buff = 0x5c85000
picture->data[0] = 0x5c85000 -- which I take to mean that avpicture_fill worked...
outbuf_size = 408960
and then I get EXC_BAD_ACCESS at avcodec_encode_video. Not sure if it's relevant, but most of this code comes from api-example.c. I am using XCode, compiling for armv6/armv7 on Snow Leopard.
Thanks so much in advance for help!
I have not enough information here to point to the exact error, but I think that the problem is that the input picture contains less data than avcodec_encode_video() expects:
avpicture_fill() only sets some pointers and numeric values in the AVFrame structure. It does not copy anything, and does not check whether the buffer is large enough (and it cannot, since the buffer size is not passed to it). It does something like this (copied from ffmpeg source):
size = picture->linesize[0] * height;
picture->data[0] = ptr;
picture->data[1] = picture->data[0] + size;
picture->data[2] = picture->data[1] + size2;
picture->data[3] = picture->data[1] + size2 + size2;
Note that the width and height is passed from the variable "c" (the AVCodecContext, I assume), so it may be larger than the actual size of the input frame.
It is also possible that the width/height is good, but the pixel format of the input frame is different from what is passed to avpicture_fill(). (note that the pixel format also comes from the AVCodecContext, which may differ from the input). For example, if c->pix_fmt is RGBA and the input buffer is in YUV420 format (or, more likely for iPhone, a biplanar YCbCr), then the size of the input buffer is width*height*1.5, but avpicture_fill() expects the size of width*height*4.
So checking the input/output geometry and pixel formats should lead you to the cause of the error. If it does not help, I suggest that you should try to compile for i386 first. It is tricky to compile FFMPEG for the iPhone properly.
Does the codec you are encoding support the RGB color space? You may need to use libswscale to convert to I420 before encoding. What codec are you using? Can you post the code where you initialize your codec context?
The function RGBtoYUV420P may help you.
http://www.mail-archive.com/libav-user#mplayerhq.hu/msg03956.html