I'm trying to work with Xcode CoreML to classify images that are simply single digits or letters. To start out with I'm just usiing .png images of digits. Using Create ML tool, I built an image classifier (NOT including any Vision support stuff) and provided a set of about 300 training images and separate set of 50 testing images. When I run this model, it trains and tests successfully and generates a model. Still within the tool I access the model and feed it another set of 100 images to classify. It works properly, identifying 98 of them corrrectly.
Then I created a Swift sample program to access the model (from the Mac OS X single view template); it's set up to accept a dropped image file and then access the model's prediction method and print the result. The problem is that the model expects an object of type CVPixelBuffer and I'm not sure how to properly create this from NSImage. I found some reference code and incorported but when I actually drag my classification images to the app it's only about 50% accurate. So I'm wondering if anyone has any experience with this type of model. It would be nice if there were a way to look at the "Create ML" source code to see how it processes a dropped image when predicting from the model.
The code for processing the image and invoking model prediction method is:
// initialize the model
mlModel2 = MLSample() //MLSample is model generated by ML Create tool and imported to project
// prediction logic for the image
// (included in a func)
//
let fimage = NSImage.init(contentsOfFile: fname) //fname is obtained from dropped file
do {
let fcgImage = fimage.cgImage(forProposedRect: nil, context: nil, hints: nil)
let imageConstraint = mlModel2?.model.modelDescription.inputDescriptionsByName["image"]?.imageConstraint
let featureValue = try MLFeatureValue(cgImage: fcgImage!, constraint: imageConstraint!, options: nil)
let pxbuf = featureValue.imageBufferValue
let mro = try mlModel2?.prediction(image: pxbuf!)
if mro != nil {
let mroLbl = mro!.classLabel
let mroProb = mro!.classLabelProbs[mroLbl] ?? 0.0
print(String.init(format: "M2 MLFeature: %# %5.2f", mroLbl, mroProb))
return
}
}
catch {
print(error.localizedDescription)
}
return
There are several ways to do this.
The easiest is what you're already doing: create an MLFeatureValue from the CGImage object.
My repo CoreMLHelpers has a different way to convert CGImage to CVPixelBuffer.
A third way is to get Xcode 12 (currently in beta). The automatically-generated class now accepts images instead of just CVPixelBuffer.
In cases like this it's useful to look at the image that Core ML actually sees. You can use the CheckInputImage project from https://github.com/hollance/coreml-survival-guide to verify this (it's an iOS project but easy enough to port to the Mac).
If the input image is correct, and you still get the wrong predictions, then probably the image preprocessing options on the model are wrong. For more info: https://machinethink.net/blog/help-core-ml-gives-wrong-output/
Related
ARKit has built-in people occlusion, which you can enable with something like
guard let config = arView.session.configuration as? ARWorldTrackingConfiguration else {
fatalError("Unexpectedly failed to get the configuration.")
}
guard ARWorldTrackingConfiguration.supportsFrameSemantics(.personSegmentationWithDepth) else {
fatalError("People occlusion is not supported on this device.")
}
config.frameSemantics.insert(.personSegmentationWithDepth)
arView.session.run(config)
I would like to provide my own binary segmentation model ("binary" as in person/not-person for each pixel), presumably using CoreML, instead of using whatever Apple is using to segment people because I want to occlude something else instead of people. How do I do this? Is there a straightforward way to do this or will I have to re-implement parts of the rendering pipeline? They show some code about how to use people segmentation with a custom Metal renderer in this WWDC 2019 video (starts around 9:30) but it's not clear to me how to use my own model based on that, and I would prefer to use ARKit/RealityKit instead of implementing my own rendering (I am a mere mortal).
Need help in deploying coreML model generated from GCP to be built, and deployed on Xcode ?
The app on my iPhone opens up and I can take a picture, but the model gets stuck at 'classifying...'
This was initially due to the input image size (I changed it to 224*224) which I was able to fix using coremltools but looks like for the output I need to have a dictionary output when the .mlmodel that I have has a multiarray(float32) output. Also, GCP coreML provided two files, a label.txt file and .mlmodel.
So, I have two questions:
How do I leverage the label.text file during the classification/Xcode build process ?
My error happens at
{ guard let results = request.results as? [VNClassificationObservation] else {
fatalError("Model failed to load image")
}
Can I change my mlmodel output from multiarray to dictionary with labels to suit VNClassificationObservation OR VNCoreMLFeatureValueObservation can be used in someway with multiarray output ? I tried it but app on the iphone gets stuck.
Not sure how to use the label file in Xcode. Any help is much appreciated. I have spent a day researching online.
You will only get VNClassificationObservation when the model is a classifier. If you're getting an MLMultiArray as output, then your model is NOT a classifier according to Core ML.
It's possible to change your model into a classifier using coremltools. You need to write a Python script that:
loads the mlmodel
assigns the layers from model._spec.neuralNetwork to model._spec.neuralNetworkClassifier
adds two outputs, one for the winning class label & one for the dictionary with the probabilities for all class labels
fill in the class labels
save the mlmodel
I was developing on an iOS app and everything seemed to work pretty well until I tried capturing images of digital clock, calculators, blood pressure monitors, electronic thermometers, etc.
For some reason Apple Vision Framework and VNRecognizeTextRequest fail to recognize texts on primitive LCD screens like this one:
You can try capturing numbers with Apple's sample project and it will fail. Or you can try any other sample project for the Vision Framework and it will fail to recognize digits as text.
What can I do as an end framework user? Is there a workaround?
Train a model...
Train your own .mlmodel using up to 10K images containing screens of digital clocks, calculators, blood pressure monitors, etc. For that you can use Xcode Playground or Apple Create ML app.
Here's a code you can copy and paste into macOS Playground:
import Foundation
import CreateML
let trainDir = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Digits")
// let testDir = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Test")
var model = try MLImageClassifier(trainingData: .labeledDirectories(at: trainDir),
parameters: .init(featureExtractor: .scenePrint(revision: nil),
validation: .none,
maxIterations: 25,
augmentationOptions: [.blur, .noise, .exposure]))
let evaluation = model.evaluation(on: .labeledDirectories(at: trainDir))
let url = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Screens.mlmodel")
try model.write(to: url)
Extracting a text from image...
If you want to know how to extract a text from image using Vision framework, look at this post.
You can train your own model, for example https://developer.apple.com/documentation/vision/training_a_create_ml_model_to_classify_flowers
I am creating an app based on Neural Network and the CoreML model size is of around 150MB. So, it's obvious that I can't ship it within the app.
To overcome this issue, I came to know about this article, mentioning that you can download and compile the CoreML model on device.
I did and I download on my device, but the problem is I cannot do the predictions as the original model. Like, the original model is taking UIImage as an input but the MLModel is MLFeatureProvider, can anyone address how can I do the type casting to my model and use it as original?
do {
let compiledUrl = try MLModel.compileModel(at: modelUrl)
let model = try MLModel(contentsOf: compiledUrl)
debugPrint("Model compiled \(model.modelDescription)")
//model.prediction(from: MLFeatureProvider) //Problem
//It should be like this
//guard let prediction = try? model.prediction(image: pixelBuffer!) else {
// return
//}
} catch {
debugPrint("Error while compiling \(error.localizedDescription)")
}
When you add an mlmodel file to your project, Xcode automatically generates a source file for you. That's why you were able to write model.prediction(image: ...) before.
If you compile your mlmodel at runtime then you don't have that special source file and you need to call the MLModel API yourself.
The easiest solution here is to add the mlmodel file to your project, copy-paste the automatically generated source file into a new source file, and use that with the mlmodel you compile at runtime. (After you've copied the generated source, you can remove the mlmodel again from your Xcode project.)
Also, if your model is 150MB, you may want to consider making a small version of it by choosing an architecture that is more suitable for mobile. (Not VGG16, which it seems you're currently using.)
guard let raterOutput = try? regressionModel.prediction(from: RegressorFeatureProviderInput(
feature1: 3.4,
feature2: 4.5))
else {return 0}
return Double(truncating: NSNumber(value:RegressorFeatureProviderOutput.init(features: raterOutput).isSaved))
Adding to what #Matthjis Hollemans said
let url = try! MLModel.compileModel(at: URL(fileURLWithPath: model))
visionModel = try! VNCoreMLModel(for: MLModel(contentsOf: url))
I'm running an mlmodel that is coming from keras on an iPhone 6. The predictions often fails with the error Error computing NN outputs. Does anyone know what could be the cause and if there is anything I can do about it?
do {
return try model.prediction(input1: input)
} catch let err {
fatalError(err.localizedDescription) // Error computing NN outputs error
}
EDIT: I tried apple's sample project and that one works in the background so it seems it's specific to either our project or model type.
I got the same error myself at similar "seemingly random" times. A bit of debug tracing established that it was caused by the app sometimes trying to load its coreml model when it was sent to background, then crashing or freezing when reloaded into foreground.
The message Error computing NN outputs error was preceded by:
Execution of the command buffer was aborted due to an error during execution. Insufficient Permission (to submit GPU work from background) (IOAF code 6)
I didn't need (or want) the model to be used when the app was in background, so I detected when the app was going in / out of background, set a flag and used a guard statement before attempting to call the model.
Detect when going into background using applicationWillResignActive within the AppDelegate.swift file and set a Bool flag e.g. appInBackground = true. See this for more info: Detect iOS app entering background
Detect when app re-enters foreground using applicationDidBecomeActive in the same AppDelegate.swift file, and reset flag appInBackground = false
Then in the function where you call the model, just before calling model, use a statement such as:
guard appInBackground == false else { return } // new line to add
guard let model = try? VNCoreMLModel(for modelName.model) else { fatalError("could not load model") // original line to load model
I doubt this is the most elegant solution, but it worked for me.
I haven't established why the attempt to load the model in background only happens sometimes.
In the Apple example you link to, it looks like their app only ever calls the model in response to a user input, so it will never try to load the model when in background. Hence the difference in my case ... and possibly yours as well?
In the end it was enough for us to set the usesCPUOnly flag. Using the GPU in the background seems prohibited in iOS. Apple actually wrote about this in their documentation as well. To specify this flag we couldn't use the generated model class anymore but had to call the raw coreml classes instead. I can imagine this changing in a future version however. The snippet below is taken from the generated model class, but with the added MLPredictionOptions specified.
let options = MLPredictionOptions()
options.usesCPUOnly = true // Can't use GPU in the background
// Copied from from the generated model class
let input = model_input(input: mlMultiArray)
let output = try generatedModel.model.prediction(from: input, options: options)
let result = model_output(output: output.featureValue(for: "output")!.multiArrayValue!).output