Coordinates from object detection Core ML model - swift

I created an ML Model for simple object detection. When I used it in the Xcode "Preview" tab, it perfectly identified and put a bounding box around the object. However, when I try to do it programmatically, I end up with an MLMultiArray (or a similar type depending on what I try) which I cannot use to create a bounding box. Here is the relevant part of my code:
func performAnalysis(frame: CVImageBuffer) {
guard let input = try? IdentifyBoomInput(imagePath: frame) else { return }
guard let result = try? identifyBoom.prediction(input: input) else { return }
if result.coordinates.count == 0 { return }
var coords = result.***????????????????***
print(coords)
}
I've tried every member of result (see here), but I'm unable to get anything useful for creating a bounding box. Any help would be greatly appreciated.
Update:
I had originally created the model using CreateML. I thought that maybe CreateML was the issue, so I used PyTorch with YOLOv8. When that didn't work, I tried YOLOv5, which also didn't work. My annotations and training data are clearly fine– as is my original model– because the interface Xcode provides for testing allows me to use it just fine. Thoughts?

Did you take a look at the .mlmodel specification in XCode? Screenshot of my .mlmodel details. I have trained the model with YOLOv5 PyTorch and converted the .pt file into .ml file format.
After that, in the image
3 is how I get the element of my MLMultiArray (1 x 25200 x 5+C), in my situation I only have one class --> (1 x 25200 x 6). As I read in this thread https://github.com/ultralytics/yolov5/issues/7011 :
private func getOutput(image: UIImage) {
let image = image.resizeImageTo(size: CGSize(width: 640, height: 640))
let buffer = image!.convertToBuffer()
let output = try? model.prediction(image: buffer!)
let prediction = output!.var_903
x = Double(truncating: prediction[0])
y = Double(truncating: prediction[1])
width = Double(truncating: prediction[2])
height = Double(truncating: prediction[3])
Hopefully, this could help

Related

How to generate multiple stars using CIStarShineGenerator?

I'm looking for efficient way to generate multiple stars at random places (for example 10-15 of them). With this code below I can easily achieve one star but my question would be how to generate more than one star.
let filter = StarShineGenerator()
filter.center = CIVector(values: [CGFloat.random(in: 50.0...finalImage.size.width), CGFloat.random(in: 50.0...finalImage.size.height)], count: 2)
let ciFilter = filter.filter()
guard let ciFilter = ciFilter else { return }
let result = ciFilter.outputImage
guard let result = result else { return }
finalImage = UIImage(cgImage: context.createCGImage(result, from: CIImage(image: finalImage)!.extent)!)
The CIStarShineGenerator creates a single starburst. If you want to generate lots of them, you'd have to call it repeatedly and composite the resulting images together.
Core Image also has compositing filters that will combine images; you could use one of those to combine your different starbursts. I don't know if it would be fast enough.
You might also install your image into multiple CALayers, apply rotation and shift transforms to those layers, add them to the backing layer of a UIView, and then capture the combined view's contents into an image.

VNRecognizeTextRequest digital / seven-segment numbers

I basically followed this great tutorial on VNRecognizeTextRequest and modified some things:
https://bendodson.com/weblog/2019/06/11/detecting-text-with-vnrecognizetextrequest-in-ios-13/
I am trying to recognise text from devices with seven-segment-style displays which seems to get a bit tricky for this framework. Often it works, but numbers with comma are hard and if there's a a gap as well. I'm wondering whether there is the possibility to "train" this recognition engine. Another possibility might be to somehow tell it to specifically look for numbers, maybe then it can focus more processing power on that instead of generically looking for text?
I use this modified code for the request:
ocrRequest = VNRecognizeTextRequest { (request, error) in
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
for observation in observations {
guard let topCandidate = observation.topCandidates(1).first else { continue }
let topCandidateText = topCandidate.string
if let float = Float(topCandidateText), topCandidate.confidence > self.bestConfidence {
self.bestCandidate = float
self.bestConfidence = topCandidate.confidence
}
}
if self.bestConfidence >= 0.5 {
self.captureSession?.stopRunning()
DispatchQueue.main.async {
self.found(measurement: self.bestCandidate!)
}
}
}
ocrRequest.recognitionLevel = .accurate
ocrRequest.minimumTextHeight = 1/10
ocrRequest.recognitionLanguages = ["en-US", "en-GB"]
ocrRequest.usesLanguageCorrection = true
There are 3 global variables in this class regarding the text recognition:
private var ocrRequest = VNRecognizeTextRequest(completionHandler: nil)
private var bestConfidence: Float = 0
private var bestCandidate: Float?
Thanks in advance for your answers, even though this is not directly code-related, but more concept-related (i.e. "am I doing something wrong / did I overlook an important feature?" etc.).
Example image that work:
Example that half works:
(recognises 58)
Example that does not work:
(it has a very low confidence for "91" and often thinks it's just 9 or 9!)

How to apply a 3D Model on detected face by Apple Vision "NO AR"

With iPhoneX True-Depth camera its possible to get the 3D Coordinates of any object and use that information to position and scale the object, but with older iPhones we don't have access to AR on front-face camera, what i've done so far was detecting the face using Apple Vison frame work and drawing some 2D paths around the face or landmarks.
i've made a SceneView and Applied that as front layer of My view with clear background, and beneath it is AVCaptureVideoPreviewLayer , after detecting the face my 3D Object appears on the screen but positioning and scaling it correctly according to the face boundingBox required unprojecting and other stuffs which i got stuck there, i've also tried converting the 2D BoundingBox to 3D using CATransform3D but i failed! i am wondering if what i want to achieve is even possible ? i remember SnapChat was doing this before ARKit was available on iPhone if i'm not wrong!
override func viewDidLoad() {
super.viewDidLoad()
self.view.addSubview(self.sceneView)
self.sceneView.frame = self.view.bounds
self.sceneView.backgroundColor = .clear
self.node = self.scene.rootNode.childNode(withName: "face",
recursively: true)!
}
fileprivate func updateFaceView(for result:
VNFaceObservation, twoDFace: Face2D) {
let box = convert(rect: result.boundingBox)
defer {
DispatchQueue.main.async {
self.faceView.setNeedsDisplay()
}
}
faceView.boundingBox = box
self.sceneView.scene?.rootNode.addChildNode(self.node)
let unprojectedBox = SCNVector3(box.origin.x, box.origin.y,
0.8)
let worldPoint = sceneView.unprojectPoint(unprojectedBox)
self.node.position = worldPoint
/* Here i have to to unprojecting
to convert the value from a 2D point to 3D point also
issue here. */
}
The only way to achieve this is to SceneKit with an ortographic camera and use SCNGeometrySource to match the landmarks from Vision to the vertices of the mesh.
First, you need the mesh with the same number of vertices of Vision (66 - 77 depending on which Vision Revision you're in). You can create one using a tool like Blender.
The mesh on Blender
Then, on code, on each time you process your landmarks, you do the steps:
1- Get the mesh vertices:
func getVertices() -> [SCNVector3]{
var result = [SCNVector3]()
let planeSources = shape!.geometry?.sources(for: SCNGeometrySource.Semantic.vertex)
if let planeSource = planeSources?.first {
let stride = planeSource.dataStride
let offset = planeSource.dataOffset
let componentsPerVector = planeSource.componentsPerVector
let bytesPerVector = componentsPerVector * planeSource.bytesPerComponent
let vectors = [SCNVector3](repeating: SCNVector3Zero, count: planeSource.vectorCount)
// [SCNVector3](count: planeSource.vectorCount, repeatedValue: SCNVector3Zero)
let vertices = vectors.enumerated().map({
(index: Int, element: SCNVector3) -> SCNVector3 in
var vectorData = [Float](repeating: 0, count: componentsPerVector)
let byteRange = NSMakeRange(index * stride + offset, bytesPerVector)
let data = planeSource.data
(data as NSData).getBytes(&vectorData, range: byteRange)
return SCNVector3( x: vectorData[0], y: vectorData[1], z: vectorData[2])
})
result = vertices
}
return result
}
2- Unproject each landmark captured by Vision and keep them in a SCNVector3 array:
let unprojectedLandmark = sceneView.unprojectPoint( SCNVector3(landmarks[i].x + (landmarks[i].x,landmarks[i].y,0))
3- Modify the geometry using the new vertices:
func reshapeGeometry( _ vertices: [SCNVector3] ){
let source = SCNGeometrySource(vertices: vertices)
var newSources = [SCNGeometrySource]()
newSources.append(source)
for source in shape!.geometry!.sources {
if (source.semantic != SCNGeometrySource.Semantic.vertex) {
newSources.append(source)
}
}
let geometry = SCNGeometry(sources: newSources, elements: shape!.geometry?.elements)
let material = shape!.geometry?.firstMaterial
shape!.geometry = geometry
shape!.geometry?.firstMaterial = material
}
I was able to do that and that was my method.
Hope this helps!
I would suggest looking at Google's AR Core products which support an Apple AR scene with the back or front facing camera...but adds some additional functionality beyond Apple, when it comes to non Face depth camera devices.
Apple's Core Vision is almost the same as Googles Core Vision framework which returns 2D points representing the eyes/mouth/nose etc...and a face tilt component.
However, if you want a way to simply apply either 2D textures to a responsive 3D face, or alternatively attach 3D models to points on the face then take a look at Google's Core Augmented Faces framework. It has great sample code on iOS and Android.

How to get the advantages of Scenekit's level editor programatically

I've just ran a couple of tests comparing the performance of different ways of loading/creating a scene to see the performance impact. The test was simply rendering a 32x32 grid of cubes and eyeballing the CPU usage, memory, energy and rendering times. Not very scientific but there were some clear results. The four tests consisted of...
Load a .dae, e.g. SCNScene(named: "grid.dae")
Converting a .dae to .scn file in XCode and loadinf that
Building a grid in the Scnenekit editor manually using a reference node
Building a grid programatically using an SCNReference node (see code at bottom of question)
I expected 1 & 2 to be broadly the same and they were.
I expected test 3 to have much better performance than tests 1 & 2, and it did. The CPU load and energy usage was very low. It had half the memory foootprint and the rendering time was a fraction of the rendering times for test 1&2.
I was hoping test 4 would match test 3, but it didn't. It appeared to be the same or worse than tests 1&2.
// Code for test 4
let boxPath = Bundle.main.path(forResource: "box", ofType: "scn")
let boxUrl = URL(fileURLWithPath: boxPath!)
let offset: Int = 16
for xIndex:Int in 0...32 {
for yIndex:Int in 0...32 {
let boxReference = SCNReferenceNode(url: boxUrl)
scene.rootNode.addChildNode(boxReference!)
boxReference?.position.x = Float(xIndex - offset)
boxReference?.position.y = Float(yIndex - offset)
boxReference?.load()
}
}
Is the performance advantage that SceneKit's level editor provides available to developers and I'm just going about it wrong, or is Scenekit/XCode doing something bespoke under the hood?
UPDATE
In response to Confused's comment, I tried using the flattenedCone method on SCNNode. Here is a variation on the original code using that technique...
let boxPath = Bundle.main.path(forResource: "box", ofType: "scn")
let boxUrl = URL(fileURLWithPath: boxPath!)
let offset: Int = 16
let testNode = SCNNode()
for xIndex:Int in 0...32 {
for yIndex:Int in 0...32 {
let boxReference = SCNReferenceNode(url: boxUrl)
testNode.addChildNode(boxReference!)
boxReference?.position.x = Float(xIndex - offset)
boxReference?.position.y = Float(yIndex - offset)
boxReference?.load()
}
}
let optimizedNode = testNode.flattenedClone()
scene.rootNode.addChildNode(optimizedNode)

Setting up a 3D model on a Marker using Kudan and Swift

This might be the result of a beginner trying to do something too complex, but I'm trying to use Kudan to model a 3D object on a marker. However, I'm getting the EXC_BAD_ACCESS error. Here is my code as it stands:
func setupModel() {
let trackerMan = ARImageTrackerManager.getInstance()
trackerMan.initialise()
let trackable = trackerMan.findTrackableByName("image1")
let importer = ARModelImporter(bundled: "Horse.armodel") //ERROR IS HERE
let modelNode: ARModelNode = importer.getNode()
let mTexture = ARTexture(UIImage: UIImage(named: "map.jpg"))
let tMaterial = ARTextureMaterial(texture: mTexture)
for i in 0..<modelNode.meshNodes.count {
let meshNode: ARMeshNode = modelNode.meshNodes[i] as! ARMeshNode
meshNode.material = tMaterial
}
modelNode.rotateByDegrees(90, axisX: 1, y: 0, z: 0)
modelNode.scaleByUniform(10)
trackable.world.addChild(modelNode)
}
Here is what the console tells me "(lldb)" (as far as I understand it, nothing), and I get "tMaterial ARTextureMaterial! nil" in the screen next to the console.
Can somebody shed some light?
Thanks!
Check 「Build Phases」 → 「Copy Bundle Resources」.
Is Horse.armodel there?