I'm trying to create an NSBitmapImageRep object with 32 bits per sample and an alpha channel (128 bits per pixel in total).
My code looks like this:
let renderSize = NSSize(width: 640, height: 360)
let bitmapRep = NSBitmapImageRep(bitmapDataPlanes: nil, pixelsWide: Int(renderSize.width), pixelsHigh: Int(renderSize.height), bitsPerSample: 32, samplesPerPixel: 4, hasAlpha: true, isPlanar: false, colorSpaceName: NSCalibratedRGBColorSpace, bytesPerRow: 16 * Int(renderSize.width), bitsPerPixel: 128)
println(bitmapRep) //prints "nil"
println(16 * Int(renderSize.width)) //prints 10240
println()
//So what does a 'valid' 32bpc TIFF file with an alpha channel look like?
let imgFile = NSImage(named: "32grad") //http://i.peterwunder.de/32grad.tif
let imgRep = imgFile?.representations[0] as NSBitmapImageRep
println(imgRep.bitsPerSample) //prints 32
println(imgRep.bitsPerPixel) //prints 128
println(imgRep.samplesPerPixel) //prints 4
println(imgRep.bytesPerRow) //prints 10240
println(imgRep.bytesPerRow / Int(imgFile!.size.width)) //prints 16
This appears in the console after executing line 2:
2014-12-31 04:49:16.639 PixelTestBed[4413:2775703] Inconsistent set of values to create NSBitmapImageRep
What's going on here? Why can't I manually create an NSBitmapImageRep with the exact same values that TIFF image has?
By the way, I can't upload the image here because imgur would butcher the image's quality. It's a 3,7 MB 32bpc TIFF with an alpha channel, after all.
You are trying to assign 32 bitsPerSample which is out of the range specified in the docs:
bps
The number of bits used to specify one pixel in a single component of
the data. All components are assumed to have the same bits per sample.
bps should be one of these values: 1, 2, 4, 8, 12, or 16.
NSBitmapImageRep Class Reference
Related
I'm currently working on a iOS-application, which should be able to detect the localization. I've created an tflite which comprises some CNN layers. In order to use the tflite in XCode/Swift I've created a helper class in which the tflite calculates the output. Whenever I run the predict-function once, it works. But apparently the predict function doesn't work in real-time camera-thread.
After about 7 seconds, XCode is throwing the following error:
Thread 1: EXC_BAD_ACCESS (code=1, address=0x123424001). This error must be evoken by looping through the image. Since I need each pixel value I'm using the solution suggested by Firebase.But apparently this solution is not waterproofed. Can anybody help to resolve this memory issue?
func creatInputForCNN(resizedImage: UIImage?) -> Data{
// In this section of the code I loop through the image (150,200,3)
// in order to fetch each pixel value (RGB).
let image: CGImage = resizedImage.cgImage!
guard let context = CGContext(
data: nil,
width: image.width, height: image.height,
bitsPerComponent: 8, bytesPerRow: image.width * 4,
space: CGColorSpaceCreateDeviceRGB(),
bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue
) else {return nil}
context.draw(image, in: CGRect(x: 0, y: 0, width: image.width, height: image.height))
guard let imageData = context.data else {return nil}
let size_w = 150
let size_H = 200
var inputData:Data?
inputData = Data()
for row in 0 ..< size_H{
for col in 0 ..< size_w {
let offset = 4 * (row * context.width + col)
// (Ignore offset 0, the unused alpha channel)
let red = imageData.load(fromByteOffset: offset+1, as: UInt8.self)
let green = imageData.load(fromByteOffset: offset+2, as: UInt8.self)
let blue = imageData.load(fromByteOffset: offset+3, as: UInt8.self)
// Normalize channel values to [0.0, 1.0]. This requirement varies
// by model. For example, some models might require values to be
// normalized to the range [-1.0, 1.0] instead, and others might
// require fixed-point values or the original bytes.
var normalizedRed:Float32 = Float32(red) / 255
var normalizedGreen:Float32 = (Float32(green) / 255
var normalizedBlue:Float32 = Float32(blue) / 255
// Append normalized values to Data object in RGB order.
let elementSize = MemoryLayout.size(ofValue: normalizedRed)
var bytes = [UInt8](repeating: 0, count: elementSize)
memcpy(&bytes, &normalizedRed, elementSize)
inputData!.append(&bytes, count: elementSize)
memcpy(&bytes, &normalizedGreen, elementSize)
inputData!.append(&bytes, count: elementSize)
memcpy(&bytes, &normalizedBlue, elementSize)
inputData!.append(&bytes, count: elementSize)
return inputData
}
This is the code
I converted my mlmodel from tf.keras. The goal is to recognize handwritten text from the image
When I run it using this code:
func performCoreMLImageRecognition(_ image: UIImage) {
let model = try! HTRModel()
// process input image
let scale = image.scaledImage(200)
let sized = scale?.resize(size: CGSize(width: 200, height: 50))
let gray = sized?.rgb2GrayScale()
guard let pixelBuffer = sized?.pixelBufferGray(width: 200, height: 50) else { fatalError("Cannot convert image to pixelBufferGray")}
UIImageWriteToSavedPhotosAlbum(gray! ,
self,
#selector(self.didFinishSavingImage(_:didFinishSavingWithError:contextInfo:)),
nil)
let mlArray = try! MLMultiArray(shape: [1, 1], dataType: MLMultiArrayDataType.float32)
let htrinput = HTRInput(image: pixelBuffer, label: mlArray)
if let prediction = try? model.prediction(input: htrinput) {
print(prediction)
}
}
I get the following error:
[espresso] [Espresso::handle_ex_plan] exception=Espresso exception: "Invalid argument": generic_reshape_kernel: Invalid bottom shape (64 12 1 1 1) for reshape to (768 50 -1 1 1) status=-6
2021-01-21 20:23:50.712585+0900 Guided Camera[7575:1794819] [coreml] Error computing NN outputs -6
2021-01-21 20:23:50.712611+0900 Guided Camera[7575:1794819]
[coreml] Failure in -executePlan:error:.
Here is the model configuration
The model ran perfectly fine. Where am I going wrong in this. I am not well versed with swift and need help.
What does this error mean and How do I resolve this error?
Sometimes during the conversion from Keras (or whatever) to Core ML, the converter doesn't understand how to handle certain operations, which results in a model that doesn't work.
In your case, there is a layer that outputs a tensor with shape (64, 12, 1, 1, 1) while there is a reshape layer that expects something that can be reshaped to (768, 50, -1, 1, 1).
You'll need to find out which layer does this reshape and then examine the Core ML model why it gets an input tensor that is not the correct size. Just because it works OK in Keras does not mean the conversion to Core ML was flawless.
You can examine the Core ML model with Netron, an open source model viewer.
(Note that 64x12 = 768, so the issue appears to be with the 50 in that tensor.)
I need to perform simple math operation on Data that contains RGB pixels data. Currently Im doing this like so:
let imageMean: Float = 127.5
let imageStd: Float = 127.5
let rgbData: Data // Some data containing RGB pixels
let floats = (0..<rgbData.count).map {
(Float(rgbData[$0]) - imageMean) / imageStd
}
return Data(bytes: floats, count: floats.count * MemoryLayout<Float>.size)
This works, but it's too slow. I was hoping I could use the Accelerate framework to calculate this faster, but have no idea how to do this. I reserved some space so that it's not allocated every time this function starts, like so:
inputBufferDataNormalized = malloc(width * height * 3) // 3 channels RGB
I tried few functions, like vDSP_vasm, but I couldn't make it work. Can someone direct me to how to use it? Basically I need to replace this map function, because it takes too long time. And probably it would be great to use pre-allocated space all the time.
Following up on my comment on your other related question. You can use SIMD to parallelize the operation, but you'd need to split the original array into chunks.
This is a simplified example that assumes that the array is exactly divisible by 64, for example, an array of 1024 elements:
let arr: [Float] = (0 ..< 1024).map { _ in Float.random(in: 0...1) }
let imageMean: Float = 127.5
let imageStd: Float = 127.5
var chunks = [SIMD64<Float>]()
chunks.reserveCapacity(arr.count / 64)
for i in stride(from: 0, to: arr.count, by: 64) {
let v = SIMD64.init(arr[i ..< i+64])
chunks.append((v - imageMean) / imageStd) // same calculation using SIMD
}
You can now access each chunk with a subscript:
var results: [Float] = []
results.reserveCapacity(arr.count)
for chunk in chunks {
for i in chunk.indices {
results.append(chunk[i])
}
}
Of course, you'd need to deal with a remainder if the array isn't exactly divisible by 64.
I have found a way to do this using Accelerate. First I reserve space for converted buffer like so
var inputBufferDataRawFloat = [Float](repeating: 0, count: width * height * 3)
Then I can use it like so:
let rawBytes = [UInt8](rgbData)
vDSP_vfltu8(rawBytes, 1, &inputBufferDataRawFloat, 1, vDSP_Length(rawBytes.count))
vDSP.add(inputBufferDataRawScalars.mean, inputBufferDataRawFloat, result: &inputBufferDataRawFloat)
vDSP.multiply(inputBufferDataRawScalars.std, inputBufferDataRawFloat, result: &inputBufferDataRawFloat)
return Data(bytes: inputBufferDataRawFloat, count: inputBufferDataRawFloat.count * MemoryLayout<Float>.size)
Works very fast. Maybe there is better function in Accelerate, if anyone know of it, please let me know. It need to perform function (A[n] + B) * C (or to be exact (A[n] - B) / C but the first one could be converted to this).
I'm trying to understand the AudioStreamBasicDescription results. Practically non of what I can get makes sense for me. For example:
AudioStreamBasicDescription(mSampleRate: 44100.0, mFormatID: 1819304813, mFormatFlags: 41, mBytesPerPacket: 4, mFramesPerPacket: 1, mBytesPerFrame: 4, mChannelsPerFrame: 2, mBitsPerChannel: 32, mReserved: 0)
What I would expect:
"Bytes per packet" and "bytes per frame" should be 8 not 4:
4 (size of 32 bit Float) x 2 (two channels per frame) x 1 (1 frame per packet) = 8 bytes
Why is it 4?
import CoreAudio
import AudioUnit
var inputUnitDescription = AudioComponentDescription(componentType: kAudioUnitType_Output,
componentSubType: kAudioUnitSubType_HALOutput,
componentManufacturer: kAudioUnitManufacturer_Apple,
componentFlags: 0,
componentFlagsMask: 0)
let defaultInput = AudioComponentFindNext(nil, &inputUnitDescription)
var inputUnit: AudioUnit?
AudioComponentInstanceNew(defaultInput!, &inputUnit)
var asbd = AudioStreamBasicDescription()
var propertySize = UInt32(MemoryLayout<AudioStreamBasicDescription>.size)
AudioUnitGetProperty(inputUnit!,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output,
1,
&asbd,
&propertySize)
dump(asbd)
Your ABSD has mFormatFlags == 41 .
if (mFormatFlags & 32) != 0 , that means the format includes the kAudioFormatFlagIsNonInterleaved bit.
A non-interleaved format only returns one channel of data per frame, not 2.
Instead you get multiple buffers, each buffer with only one channel per frame, or 4 bytes (for Float32 format), not 8.
I understand bitmap layout and pixel format subject pretty well, but getting an issue when working with png / jpeg images loaded through NSImage – I can't figure out if what I get is the intended behaviour or a bug.
let nsImage:NSImage = NSImage(byReferencingURL: …)
let cgImage:CGImage = nsImage.CGImageForProposedRect(nil, context: nil, hints: nil)!
let bitmapInfo:CGBitmapInfo = CGImageGetBitmapInfo(cgImage)
Swift.print(bitmapInfo.contains(CGBitmapInfo.ByteOrderDefault)) // True
My kCGBitmapByteOrder32Host is little endian, which implies that the pixel format is also little endian – BGRA in this case. But… png format is big endian by specification, and that's how the bytes are actually arranged in the data – opposite from what bitmap info tells me.
Does anybody knows what's going on? Surely the system somehow knows how do deal with this, since pngs are displayed correctly. Is there a bullet-proof way detecting pixel format of CGImage? Complete demo project is available at GitHub.
P. S. I'm copying raw pixel data via CFDataGetBytePtr buffer into another library buffer, which is then gets processed and saved. In order to do so, I need to explicitly specify pixel format. Actual images I'm dealing with (any png / jpeg files that I've checked) display correctly, for example:
But bitmap info of the same images gives me incorrect endianness information, resulting in bitmap being handled as BGRA pixel format instead of actual RGBA, when I process it the result looks like this:
The resulting image demonstrates the colour swapping between red and blue pixels, if RGBA pixel format is specified explicitly, everything works out perfectly, but I need this detection to be automated.
P. P. S. Documentation briefly mentions that CGColorSpace is another important variable that defines pixel format / byte order, but I found no mentions how to get it out of there.
Some years later and after testing my findings in production I can share them with good confidence, but hoping someone with theory knowledge will explain things better here? Good places to refresh memory:
Wikipedia: RGBA color space – Representation
Apple Lists: Byte Order in CGBitmapContextCreate
Apple Lists: kCGImageAlphaPremultiplied First/Last
Based on that you can use following extensions:
public enum PixelFormat
{
case abgr
case argb
case bgra
case rgba
}
extension CGBitmapInfo
{
public static var byteOrder16Host: CGBitmapInfo {
return CFByteOrderGetCurrent() == Int(CFByteOrderLittleEndian.rawValue) ? .byteOrder16Little : .byteOrder16Big
}
public static var byteOrder32Host: CGBitmapInfo {
return CFByteOrderGetCurrent() == Int(CFByteOrderLittleEndian.rawValue) ? .byteOrder32Little : .byteOrder32Big
}
}
extension CGBitmapInfo
{
public var pixelFormat: PixelFormat? {
// AlphaFirst – the alpha channel is next to the red channel, argb and bgra are both alpha first formats.
// AlphaLast – the alpha channel is next to the blue channel, rgba and abgr are both alpha last formats.
// LittleEndian – blue comes before red, bgra and abgr are little endian formats.
// Little endian ordered pixels are BGR (BGRX, XBGR, BGRA, ABGR, BGR).
// BigEndian – red comes before blue, argb and rgba are big endian formats.
// Big endian ordered pixels are RGB (XRGB, RGBX, ARGB, RGBA, RGB).
let alphaInfo: CGImageAlphaInfo? = CGImageAlphaInfo(rawValue: self.rawValue & type(of: self).alphaInfoMask.rawValue)
let alphaFirst: Bool = alphaInfo == .premultipliedFirst || alphaInfo == .first || alphaInfo == .noneSkipFirst
let alphaLast: Bool = alphaInfo == .premultipliedLast || alphaInfo == .last || alphaInfo == .noneSkipLast
let endianLittle: Bool = self.contains(.byteOrder32Little)
// This is slippery… while byte order host returns little endian, default bytes are stored in big endian
// format. Here we just assume if no byte order is given, then simple RGB is used, aka big endian, though…
if alphaFirst && endianLittle {
return .bgra
} else if alphaFirst {
return .argb
} else if alphaLast && endianLittle {
return .abgr
} else if alphaLast {
return .rgba
} else {
return nil
}
}
}
Note, that you should always pay attention to colour space – it directly affects how raw pixel data is stored. CGColorSpace(name: CGColorSpace.sRGB) is probably the safest one – it stores colours in plain format, for example, if you deal with red RGB it will be stored just like that (255, 0, 0) while device colour space will give you something like (235, 73, 53).
To see this in practice drop above and the following into a playground. You'll need two one-pixel red images with alpha and without, this and this should work.
import AppKit
import CoreGraphics
extension CFData
{
public var pixelComponents: [UInt8] {
let buffer: UnsafeMutablePointer<UInt8> = UnsafeMutablePointer.allocate(capacity: 4)
defer { buffer.deallocate(capacity: 4) }
CFDataGetBytes(self, CFRange(location: 0, length: CFDataGetLength(self)), buffer)
return Array(UnsafeBufferPointer(start: buffer, count: 4))
}
}
let color: NSColor = .red
Thread.sleep(forTimeInterval: 2)
// Must flip coordinates to capture what we want…
let screen: NSScreen = NSScreen.screens.first(where: { $0.frame.contains(NSEvent.mouseLocation) })!
let rect: CGRect = CGRect(origin: CGPoint(x: NSEvent.mouseLocation.x - 10, y: screen.frame.height - NSEvent.mouseLocation.y), size: CGSize(width: 1, height: 1))
Swift.print("Will capture image with \(rect) frame.")
let screenImage: CGImage = CGWindowListCreateImage(rect, [], kCGNullWindowID, [])!
let urlImageWithAlpha: CGImage = NSImage(byReferencing: URL(fileURLWithPath: "/Users/ianbytchek/Downloads/red-pixel-with-alpha.png")).cgImage(forProposedRect: nil, context: nil, hints: nil)!
let urlImageNoAlpha: CGImage = NSImage(byReferencing: URL(fileURLWithPath: "/Users/ianbytchek/Downloads/red-pixel-no-alpha.png")).cgImage(forProposedRect: nil, context: nil, hints: nil)!
Swift.print(screenImage.colorSpace!, screenImage.bitmapInfo, screenImage.bitmapInfo.pixelFormat!, screenImage.dataProvider!.data!.pixelComponents)
Swift.print(urlImageWithAlpha.colorSpace!, urlImageWithAlpha.bitmapInfo, urlImageWithAlpha.bitmapInfo.pixelFormat!, urlImageWithAlpha.dataProvider!.data!.pixelComponents)
Swift.print(urlImageNoAlpha.colorSpace!, urlImageNoAlpha.bitmapInfo, urlImageNoAlpha.bitmapInfo.pixelFormat!, urlImageNoAlpha.dataProvider!.data!.pixelComponents)
let formats: [CGBitmapInfo.RawValue] = [
CGImageAlphaInfo.premultipliedFirst.rawValue,
CGImageAlphaInfo.noneSkipFirst.rawValue,
CGImageAlphaInfo.premultipliedLast.rawValue,
CGImageAlphaInfo.noneSkipLast.rawValue,
]
for format in formats {
// This "paints" and prints out components in the order they are stored in data.
let context: CGContext = CGContext(data: nil, width: 1, height: 1, bitsPerComponent: 8, bytesPerRow: 32, space: CGColorSpace(name: CGColorSpace.sRGB)!, bitmapInfo: format)!
let components: UnsafeBufferPointer<UInt8> = UnsafeBufferPointer(start: context.data!.assumingMemoryBound(to: UInt8.self), count: 4)
context.setFillColor(red: 1 / 0xFF, green: 2 / 0xFF, blue: 3 / 0xFF, alpha: 1)
context.fill(CGRect(x: 0, y: 0, width: 1, height: 1))
Swift.print(context.colorSpace!, context.bitmapInfo, context.bitmapInfo.pixelFormat!, Array(components))
}
This will output the following. Pay attention how screen-captured image differs from ones loaded from disk.
Will capture image with (285.7734375, 294.5, 1.0, 1.0) frame.
<CGColorSpace 0x7fde4e9103e0> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; iMac) CGBitmapInfo(rawValue: 8194) bgra [27, 13, 252, 255]
<CGColorSpace 0x7fde4d703b20> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; Color LCD) CGBitmapInfo(rawValue: 3) rgba [235, 73, 53, 255]
<CGColorSpace 0x7fde4e915dc0> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; Color LCD) CGBitmapInfo(rawValue: 5) rgba [235, 73, 53, 255]
<CGColorSpace 0x7fde4d60d390> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; sRGB IEC61966-2.1) CGBitmapInfo(rawValue: 2) argb [255, 1, 2, 3]
<CGColorSpace 0x7fde4d60d390> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; sRGB IEC61966-2.1) CGBitmapInfo(rawValue: 6) argb [255, 1, 2, 3]
<CGColorSpace 0x7fde4d60d390> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; sRGB IEC61966-2.1) CGBitmapInfo(rawValue: 1) rgba [1, 2, 3, 255]
<CGColorSpace 0x7fde4d60d390> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; sRGB IEC61966-2.1) CGBitmapInfo(rawValue: 5) rgba [1, 2, 3, 255]
Could you use NSBitmapFormat?
I wrote a class to source color schemes from images, and that's what I used to determine the bitmap format. Here's a snippet of how I used it:
var averageColorImage: CIImage?
var averageColorImageBitmap: NSBitmapImageRep
//... core image filter code
averageColorImage = filter?.outputImage
averageColorImageBitmap = NSBitmapImageRep(CIImage: averageColorImage!)
let red, green, blue: Int
switch averageColorImageBitmap.bitmapFormat {
case NSBitmapFormat.NSAlphaFirstBitmapFormat:
red = Int(averageColorImageBitmap.bitmapData.advancedBy(1).memory)
green = Int(averageColorImageBitmap.bitmapData.advancedBy(2).memory)
blue = Int(averageColorImageBitmap.bitmapData.advancedBy(3).memory)
default:
red = Int(averageColorImageBitmap.bitmapData.memory)
green = Int(averageColorImageBitmap.bitmapData.advancedBy(1).memory)
blue = Int(averageColorImageBitmap.bitmapData.advancedBy(2).memory)
}
Check out the answer to How to keep NSBitmapImageRep from creating lots of intermediate CGImages?.
The gist is that the NSImage/NSBitmapImageRepresentation implementation automatically handles the input format.
Apple's docs fail to note that the format parameter (for example in CIRenderDestination) specifies the desired output space.
If you want it in a particular format, the docs recommend drawing into that format (example in linked answer).
If you just need particular information, NSBitmapImageRepresentation provides easy access to individual parameters. I could not find a clear and direct route to a CIFormat without setting up cascading manual tests. I assume a way exists somewhere.