Apple Vision Framework: LCD/LED digit recognition - swift

I was developing on an iOS app and everything seemed to work pretty well until I tried capturing images of digital clock, calculators, blood pressure monitors, electronic thermometers, etc.
For some reason Apple Vision Framework and VNRecognizeTextRequest fail to recognize texts on primitive LCD screens like this one:
You can try capturing numbers with Apple's sample project and it will fail. Or you can try any other sample project for the Vision Framework and it will fail to recognize digits as text.
What can I do as an end framework user? Is there a workaround?

Train a model...
Train your own .mlmodel using up to 10K images containing screens of digital clocks, calculators, blood pressure monitors, etc. For that you can use Xcode Playground or Apple Create ML app.
Here's a code you can copy and paste into macOS Playground:
import Foundation
import CreateML
let trainDir = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Digits")
// let testDir = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Test")
var model = try MLImageClassifier(trainingData: .labeledDirectories(at: trainDir),
parameters: .init(featureExtractor: .scenePrint(revision: nil),
validation: .none,
maxIterations: 25,
augmentationOptions: [.blur, .noise, .exposure]))
let evaluation = model.evaluation(on: .labeledDirectories(at: trainDir))
let url = URL(fileURLWithPath: "/Users/swift/Desktop/Screens/Screens.mlmodel")
try model.write(to: url)
Extracting a text from image...
If you want to know how to extract a text from image using Vision framework, look at this post.

You can train your own model, for example https://developer.apple.com/documentation/vision/training_a_create_ml_model_to_classify_flowers

Related

How do I use my own occlusion ML model with ARKit?

ARKit has built-in people occlusion, which you can enable with something like
guard let config = arView.session.configuration as? ARWorldTrackingConfiguration else {
fatalError("Unexpectedly failed to get the configuration.")
}
guard ARWorldTrackingConfiguration.supportsFrameSemantics(.personSegmentationWithDepth) else {
fatalError("People occlusion is not supported on this device.")
}
config.frameSemantics.insert(.personSegmentationWithDepth)
arView.session.run(config)
I would like to provide my own binary segmentation model ("binary" as in person/not-person for each pixel), presumably using CoreML, instead of using whatever Apple is using to segment people because I want to occlude something else instead of people. How do I do this? Is there a straightforward way to do this or will I have to re-implement parts of the rendering pipeline? They show some code about how to use people segmentation with a custom Metal renderer in this WWDC 2019 video (starts around 9:30) but it's not clear to me how to use my own model based on that, and I would prefer to use ARKit/RealityKit instead of implementing my own rendering (I am a mere mortal).

Output shape of mlmodel NNClassifier is Multiarray, VNClassificationObservation not working?

Need help in deploying coreML model generated from GCP to be built, and deployed on Xcode ?
The app on my iPhone opens up and I can take a picture, but the model gets stuck at 'classifying...'
This was initially due to the input image size (I changed it to 224*224) which I was able to fix using coremltools but looks like for the output I need to have a dictionary output when the .mlmodel that I have has a multiarray(float32) output. Also, GCP coreML provided two files, a label.txt file and .mlmodel.
So, I have two questions:
How do I leverage the label.text file during the classification/Xcode build process ?
My error happens at
{ guard let results = request.results as? [VNClassificationObservation] else {
fatalError("Model failed to load image")
}
Can I change my mlmodel output from multiarray to dictionary with labels to suit VNClassificationObservation OR VNCoreMLFeatureValueObservation can be used in someway with multiarray output ? I tried it but app on the iphone gets stuck.
Not sure how to use the label file in Xcode. Any help is much appreciated. I have spent a day researching online.
You will only get VNClassificationObservation when the model is a classifier. If you're getting an MLMultiArray as output, then your model is NOT a classifier according to Core ML.
It's possible to change your model into a classifier using coremltools. You need to write a Python script that:
loads the mlmodel
assigns the layers from model._spec.neuralNetwork to model._spec.neuralNetworkClassifier
adds two outputs, one for the winning class label & one for the dictionary with the probabilities for all class labels
fill in the class labels
save the mlmodel

How to generate QRCode image with parameters in swift?

I need to create QRCode image with app registered users 4 parameters, so that if i scan that QRCode image from other device i need to display that user details, that is my requirement.
here i am getting registered user details: with these parameters i need to generate QRCode image
var userId = userModel?.userId
var userType = userModel?.userType
var addressId = userModel?.userAddrId
var addressType = userModel?.userAddrType
according to [this answer][1] i have created QRCode with string... but i. need to generate with my registered user parameters
sample Code with string:
private func createQRFromString(str: String) -> CIImage? {
let stringData = str.data(using: .utf8)
let filter = CIFilter(name: "CIQRCodeGenerator")
filter?.setValue(stringData, forKey: "inputMessage")
filter?.setValue("H", forKey: "inputCorrectionLevel")
return filter?.outputImage
}
var qrCode: UIImage? {
if let img = createQRFromString(str: "Hello world program created by someone") {
let someImage = UIImage(
ciImage: img,
scale: 1.0,
orientation: UIImage.Orientation.down
)
return someImage
}
return nil
}
#IBAction func qrcodeBtnAct(_ sender: Any) {
qrImag.image = qrCode
}
please suggest me
[1]: Is there a way to generate QR code image on iOS
You say you need a QR reader, but here you are solely talking about QR generation. Those are two different topics.
In terms of QR generation, you just need to put your four values in the QR payload. Right now you’re just passing a string literal, but you can just update that string to include your four properties in whatever easily decoded format you want.
That having been said, when writing apps like this, you often want to able to scan your QR code not only from within the app, but also any QR scanning app, such as the built in Camera app, and have it open your app. That influences how you might want to encode your payload.
The typical answer would be to make your QR code payload be a URL, using, for example, a universal link. See Supporting Universal Links in Your App. So, first focus on enabling universal links.
Once you’ve got the universal links working (not using QR codes at all, initially), the question then becomes how one would programmatically create the universal link that you’d supply to your QR generator routine, above. For that URLComponents is a great tool for encoding URLs. For example, see Swift GET request with parameters. Just use your universal link for the host used in the URL.
FWIW, while I suggest just encoding a universal link URL into your QR code, above, another option would be some other deep linking pattern, such as branch.io.

Using CoreML to classify NSImages

I'm trying to work with Xcode CoreML to classify images that are simply single digits or letters. To start out with I'm just usiing .png images of digits. Using Create ML tool, I built an image classifier (NOT including any Vision support stuff) and provided a set of about 300 training images and separate set of 50 testing images. When I run this model, it trains and tests successfully and generates a model. Still within the tool I access the model and feed it another set of 100 images to classify. It works properly, identifying 98 of them corrrectly.
Then I created a Swift sample program to access the model (from the Mac OS X single view template); it's set up to accept a dropped image file and then access the model's prediction method and print the result. The problem is that the model expects an object of type CVPixelBuffer and I'm not sure how to properly create this from NSImage. I found some reference code and incorported but when I actually drag my classification images to the app it's only about 50% accurate. So I'm wondering if anyone has any experience with this type of model. It would be nice if there were a way to look at the "Create ML" source code to see how it processes a dropped image when predicting from the model.
The code for processing the image and invoking model prediction method is:
// initialize the model
mlModel2 = MLSample() //MLSample is model generated by ML Create tool and imported to project
// prediction logic for the image
// (included in a func)
//
let fimage = NSImage.init(contentsOfFile: fname) //fname is obtained from dropped file
do {
let fcgImage = fimage.cgImage(forProposedRect: nil, context: nil, hints: nil)
let imageConstraint = mlModel2?.model.modelDescription.inputDescriptionsByName["image"]?.imageConstraint
let featureValue = try MLFeatureValue(cgImage: fcgImage!, constraint: imageConstraint!, options: nil)
let pxbuf = featureValue.imageBufferValue
let mro = try mlModel2?.prediction(image: pxbuf!)
if mro != nil {
let mroLbl = mro!.classLabel
let mroProb = mro!.classLabelProbs[mroLbl] ?? 0.0
print(String.init(format: "M2 MLFeature: %# %5.2f", mroLbl, mroProb))
return
}
}
catch {
print(error.localizedDescription)
}
return
There are several ways to do this.
The easiest is what you're already doing: create an MLFeatureValue from the CGImage object.
My repo CoreMLHelpers has a different way to convert CGImage to CVPixelBuffer.
A third way is to get Xcode 12 (currently in beta). The automatically-generated class now accepts images instead of just CVPixelBuffer.
In cases like this it's useful to look at the image that Core ML actually sees. You can use the CheckInputImage project from https://github.com/hollance/coreml-survival-guide to verify this (it's an iOS project but easy enough to port to the Mac).
If the input image is correct, and you still get the wrong predictions, then probably the image preprocessing options on the model are wrong. For more info: https://machinethink.net/blog/help-core-ml-gives-wrong-output/

How to get frames from a local video file in Swift?

I need to get the frames from a local video file so i can process them before the video is played. I already tried using AVAssetReader and VideoOutput.
[EDIT] Here is the code i used from Accesing Individual Frames using AV Player
let asset = AVAsset(URL: inputUrl)
let reader = try! AVAssetReader(asset: asset)
let videoTrack = asset.tracksWithMediaType(AVMediaTypeVideo)[0]
// read video frames as BGRA
let trackReaderOutput = AVAssetReaderTrackOutput(track: videoTrack, outputSettings:[String(kCVPixelBufferPixelFormatTypeKey): NSNumber(unsignedInt: kCVPixelFormatType_32BGRA)])
reader.addOutput(trackReaderOutput)
reader.startReading()
while let sampleBuffer = trackReaderOutput.copyNextSampleBuffer() {
print("sample at time \(CMSampleBufferGetPresentationTimeStamp(sampleBuffer))")
if let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) {
// process each CVPixelBufferRef here
// see CVPixelBufferGetWidth, CVPixelBufferLockBaseAddress, CVPixelBufferGetBaseAddress, etc
}
}
I believe AVAssetReader should work. What did you try? Have you seen this sample code from Apple? https://developer.apple.com/library/content/samplecode/ReaderWriter/Introduction/Intro.html
I found out what the problem was! It was with my implementation. The code i posted is correct. Thank you all
You can have a look at VideoToolbox : https://developer.apple.com/documentation/videotoolbox
But beware: this is close to the hardware decompressor and sparsely documented terrain.
Depending on what processing you want to do, OpenCV may be a an option - in particular if you are detecting or tracking objets in your frames. If your needs are simpler, then the effort to use OpenCV with swift may be a little too much - see below.
You can open a video, read it frame by frame, do your work on the frames and then display then - bearing in mind the need to be efficient to avoid delaying the display.
The basic code structure is quite simple - this is a python example but the same principles apply across supported languages
import numpy as np
import cv2
cap = cv2.VideoCapture('vtest.avi')
while(cap.isOpened()):
ret, frame = cap.read()
//Do whatever work you want on the frame here - in this example
//from the tutorial the image is being converted from one colour
//space to another
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
//This displays the resulting frame
cv2.imshow('frame',gray)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
More info here: http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_gui/py_video_display/py_video_display.html
The one caveat is that using OpenCV with swift requires some additional effort - this is a good example, but it evolves constantly so it is worth searching for if you decide to go this way: https://medium.com/#yiweini/opencv-with-swift-step-by-step-c3cc1d1ee5f1