How do I use my own occlusion ML model with ARKit? - swift

ARKit has built-in people occlusion, which you can enable with something like
guard let config = arView.session.configuration as? ARWorldTrackingConfiguration else {
fatalError("Unexpectedly failed to get the configuration.")
}
guard ARWorldTrackingConfiguration.supportsFrameSemantics(.personSegmentationWithDepth) else {
fatalError("People occlusion is not supported on this device.")
}
config.frameSemantics.insert(.personSegmentationWithDepth)
arView.session.run(config)
I would like to provide my own binary segmentation model ("binary" as in person/not-person for each pixel), presumably using CoreML, instead of using whatever Apple is using to segment people because I want to occlude something else instead of people. How do I do this? Is there a straightforward way to do this or will I have to re-implement parts of the rendering pipeline? They show some code about how to use people segmentation with a custom Metal renderer in this WWDC 2019 video (starts around 9:30) but it's not clear to me how to use my own model based on that, and I would prefer to use ARKit/RealityKit instead of implementing my own rendering (I am a mere mortal).

Related

Why does AVCaptureDevice.default return nil in swiftui?

I've been trying to create my own in-built camera but it's crashing when I try to set up the device.
func setUp() {
do {
self.session.beginConfiguration()
let device = AVCaptureDevice.default(.builtInDualCamera, for: .video, position: .front)
let input = try AVCaptureDeviceInput(device: device!)
if self.session.canAddInput(input) {
self.session.addInput(input)
}
if self.session.canAddOutput(self.output) {
self.session.addOutput(self.output)
}
self.session.commitConfiguration()
} catch {
print(error.localizedDescription)
}
}
When I execute the program, it crashed with the input because I try to force unwrap a nil value which is device.
I have set the required authorization so that the app can use the camera and it still end up with a nil value.
If anyone has any clue how to solve the problem it would be very appreciate
You're asking for a builtInDualCamera, i.e. one that supports:
Automatic switching from one camera to the other when the zoom factor, light level, and focus position allow.
Higher-quality zoom for still captures by fusing images from both cameras.
Depth data delivery by measuring the disparity of matched features between the wide and telephoto cameras.
Delivery of photos from constituent devices (wide and telephoto cameras) from a single photo capture request.
And you're requiring it to be on the front of the phone. I don't know any iPhone that has such a camera on the front (particularly the last one). You likely meant to request position: .back like in the example code. But keep in mind that not all phones have a dual camera on the back either.
You might want to use default(for:) to request the default "video" camera rather than requiring a specific type of camera. Alternately, you can use a AVCaptureDevice.DiscoverSession to find a camera based on specific characteristics.

Using CoreML to classify NSImages

I'm trying to work with Xcode CoreML to classify images that are simply single digits or letters. To start out with I'm just usiing .png images of digits. Using Create ML tool, I built an image classifier (NOT including any Vision support stuff) and provided a set of about 300 training images and separate set of 50 testing images. When I run this model, it trains and tests successfully and generates a model. Still within the tool I access the model and feed it another set of 100 images to classify. It works properly, identifying 98 of them corrrectly.
Then I created a Swift sample program to access the model (from the Mac OS X single view template); it's set up to accept a dropped image file and then access the model's prediction method and print the result. The problem is that the model expects an object of type CVPixelBuffer and I'm not sure how to properly create this from NSImage. I found some reference code and incorported but when I actually drag my classification images to the app it's only about 50% accurate. So I'm wondering if anyone has any experience with this type of model. It would be nice if there were a way to look at the "Create ML" source code to see how it processes a dropped image when predicting from the model.
The code for processing the image and invoking model prediction method is:
// initialize the model
mlModel2 = MLSample() //MLSample is model generated by ML Create tool and imported to project
// prediction logic for the image
// (included in a func)
//
let fimage = NSImage.init(contentsOfFile: fname) //fname is obtained from dropped file
do {
let fcgImage = fimage.cgImage(forProposedRect: nil, context: nil, hints: nil)
let imageConstraint = mlModel2?.model.modelDescription.inputDescriptionsByName["image"]?.imageConstraint
let featureValue = try MLFeatureValue(cgImage: fcgImage!, constraint: imageConstraint!, options: nil)
let pxbuf = featureValue.imageBufferValue
let mro = try mlModel2?.prediction(image: pxbuf!)
if mro != nil {
let mroLbl = mro!.classLabel
let mroProb = mro!.classLabelProbs[mroLbl] ?? 0.0
print(String.init(format: "M2 MLFeature: %# %5.2f", mroLbl, mroProb))
return
}
}
catch {
print(error.localizedDescription)
}
return
There are several ways to do this.
The easiest is what you're already doing: create an MLFeatureValue from the CGImage object.
My repo CoreMLHelpers has a different way to convert CGImage to CVPixelBuffer.
A third way is to get Xcode 12 (currently in beta). The automatically-generated class now accepts images instead of just CVPixelBuffer.
In cases like this it's useful to look at the image that Core ML actually sees. You can use the CheckInputImage project from https://github.com/hollance/coreml-survival-guide to verify this (it's an iOS project but easy enough to port to the Mac).
If the input image is correct, and you still get the wrong predictions, then probably the image preprocessing options on the model are wrong. For more info: https://machinethink.net/blog/help-core-ml-gives-wrong-output/

Apple Vision Framework: LCD/LED digit recognition

I was developing on an iOS app and everything seemed to work pretty well until I tried capturing images of digital clock, calculators, blood pressure monitors, electronic thermometers, etc.
For some reason Apple Vision Framework and VNRecognizeTextRequest fail to recognize texts on primitive LCD screens like this one:
You can try capturing numbers with Apple's sample project and it will fail. Or you can try any other sample project for the Vision Framework and it will fail to recognize digits as text.
What can I do as an end framework user? Is there a workaround?
Train a model...
Train your own .mlmodel using up to 10K images containing screens of digital clocks, calculators, blood pressure monitors, etc. For that you can use Xcode Playground or Apple Create ML app.
Here's a code you can copy and paste into macOS Playground:
import Foundation
import CreateML
let trainDir = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Digits")
// let testDir = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Test")
var model = try MLImageClassifier(trainingData: .labeledDirectories(at: trainDir),
parameters: .init(featureExtractor: .scenePrint(revision: nil),
validation: .none,
maxIterations: 25,
augmentationOptions: [.blur, .noise, .exposure]))
let evaluation = model.evaluation(on: .labeledDirectories(at: trainDir))
let url = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Screens.mlmodel")
try model.write(to: url)
Extracting a text from image...
If you want to know how to extract a text from image using Vision framework, look at this post.
You can train your own model, for example https://developer.apple.com/documentation/vision/training_a_create_ml_model_to_classify_flowers

Using multiple audio devices simultaneously on osx

My aim is to write an audio app for low latency realtime audio analysis on OSX. This will involve connecting to one or more USB interfaces and taking specific channels from these devices.
I started with the learning core audio book and writing this using C. As I went down this path it came to light that a lot of the old frameworks have been deprecated. It appears that the majority of what I would like to achieve can be written using AVAudioengine and connecting AVAudioUnits, digging down into core audio level only for the lower things like configuring the hardware devices.
I am confused here as to how to access two devices simultaneously. I do not want to create an aggregate device as I would like to treat the devices individually.
Using core audio I can list the audio device ID for all devices and change the default system output device here (and can do the input device using similar methods). However this only allows me one physical device, and will always track the device in system preferences.
static func setOutputDevice(newDeviceID: AudioDeviceID) {
let propertySize = UInt32(MemoryLayout<UInt32>.size)
var deviceID = newDeviceID
var propertyAddress = AudioObjectPropertyAddress(
mSelector: AudioObjectPropertySelector(kAudioHardwarePropertyDefaultOutputDevice),
mScope: AudioObjectPropertyScope(kAudioObjectPropertyScopeGlobal),
mElement: AudioObjectPropertyElement(kAudioObjectPropertyElementMaster))
AudioObjectSetPropertyData(AudioObjectID(kAudioObjectSystemObject), &propertyAddress, 0, nil, propertySize, &deviceID)
}
I then found that the kAudioUnitSubType_HALOutput is the way to go for specifying a static device only accessible through this property. I can create a component of this type using:
var outputHAL = AudioComponentDescription(componentType: kAudioUnitType_Output, componentSubType: kAudioUnitSubType_HALOutput, componentManufacturer: kAudioUnitManufacturer_Apple, componentFlags: 0, componentFlagsMask: 0)
let component = AudioComponentFindNext(nil, &outputHAL)
guard component != nil else {
print("Can't get input unit")
exit(-1)
}
However I am confused about how you create a description of this component and then find the next device that matches the description. Is there a property where I can select the audio device ID and link the AUHAL to this?
I also cannot figure out how to assign an AUHAL to an AVAudioEngine. I can create a node for the HAL but cannot attach this to the engine. Finally is it possible to create multiple kAudioUnitSubType_HALOutput components and feed these into the mixer?
I have been trying to research this for the last week, but nowhere closer to the answer. I have read up on channel mapping and everything I need to know down the line, but at this level getting the audio at. lower level seems pretty undocumented, especially when using swift.

How to use Depth Testing with CAMetalLayer using Metal and Swift?

Recently I decided to learn how to use the Metal framework with Swift. I read some turorials, watched videos, did a few things and finally I got to part where I have to use Depth Testing to make things look good.
I haven't done such low level graphics programming before, so I looked around the whole Internet on how Depth Testing works and how to implemented it using CAMetalLayer and Metal.
However all examples of Depth Testing, which I found, were done using Open GL and I couldn't find such functions in Metal.
How do I implement Depth Testing with CAMetalLayer using Metal and Swift?
Thank you in advance!
This is a good example.
http://metalbyexample.com/up-and-running-3/
The key is that CAMetalLayer does not maintain the depth map for you. You need to create and manage explicitly the depth texture. And attach the depth texture to the depth-stencil descriptor which you use to create the render-encoder.
The question of this Stackoverflow post contains your answer, although it's written in Obj-C. But basically, like what Dong Feng has pointed out, you need to create and manage the depth texture by yourself.
Here's a Swift 4 snippet for how to create a depth texture
func buildDepthTexture(_ device: MTLDevice, _ size: CGSize) -> MTLTexture {
let desc = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .depth32Float_stencil8,
width: Int(size.width), height: Int(size.height), mipmapped: false)
desc.storageMode = .private
desc.usage = .renderTarget
return device.makeTexture(descriptor: desc)!
}
And here's how you need to attach it to a MTLRenderPassDescriptor
let renderPassDesc = MTLRenderPassDescriptor()
let depthAttachment = renderPassDesc.depthAttachment!
// depthTexture is created using the above function
depthAttachment.texture = depthTexture
depthAttachment.clearDepth = 1.0
depthAttachment.storeAction = .dontCare
// Maybe set up color attachment, etc.