Bounding box realignment from CoreML object detection - swift

I am currently trying to render a bounding boxes inside a UIView, however currently I'm facing the issue that there is a misalignment in the X axis when trying to render the box as can be seen in the screenshot below.
When the object is on the left of the view the misalignment will be on the right like seen in the image. However when the object is on the right the misalignment will be to the left. The misalignment increases the further it gets to the edge of the screen.
Currently are use ARKit to capture the current frame as a pixel buffer.
let pixelBuffer = sceneView.session.currentFrame?.capturedImage
// Capture current device orientation
let orientation = CGImagePropertyOrientation(rawValue: UIDevice.current.exifOrientation)
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: orientation)
Additionally additionally my CoroML vision request looks as follows
findObjectRequest = VNCoreMLRequest(model: visionModel, completionHandler: visionRequestDidComplete)
findObjectRequest?.imageCropAndScaleOption = .scaleFit
I then try to reschedule the normalised bounding box to image Space like this:
public func scaleImageForCameraOutput(predictionRect finderrItem: FinderrItem, viewRect: CGRect) -> FinderrItem {
let scale = CGAffineTransform.identity.scaledBy(x: viewRect.width, y: viewRect.height)
let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -1)
let bgRect = = bgRect
return finderrItem
I also tried to follow the Apple developer documentation and using the API code to re-scale the banding boxes as follows
let newBox = VNImageRectForNormalizedRect(
However this still has the same issue with another issue that the y-axis is now inverted.
Does anyone know why I'm having this problem I've been stuck on it for quite awhile now and can't seem to figure it out.


Apple Vision API, Cropping from Original Image Based on Landmark Position

I have a swift based iPhone application (this tutorial It takes the camera feed and processes each frame using the vision API to find the landmarks on a face, and then draws and overlay on the video of the landmarks. All I was trying to do was take the position of a landmark and crop a rectangle around that position from the original image (after that I was going to run it through an ML model to determine some things). However, I have an issue translating the vision API landmark position back to the original image location to do the cropping. Below is hopefully the relevant portions of the code that show how I attempted to do this and failed (I pulled code from a number of functions/classes just to focus it on the problem).
Capture video frame
let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
let ciimage = CIImage(cvImageBuffer: imageBuffer)
Find the Face Landmarks
let detectFaceRequest = VNDetectFaceLandmarksRequest(completionHandler: detectedFace)
sequenceHandler.perform([detectFaceRequest],on: imageBuffer,orientation: .leftMirrored)
Get the left Eye Pupil Location
let point = result.landmarks?.leftPupil?.pointsInImage(imageSize: ciimage.extent.size)
Draw the cropped image around the leftPupil
let cropped = ciimage?.cropped(to: CGRect(x: point.x-100, y: point.y-100, width: 200, height: 200))
let uicropped = UIImage(ciImage: cropped!)
uicropped.draw(at: CGPoint(x:100,y:100))
The issue is the cropped image is not positioned over the left pupil.

Blur face in face detection in vision kit

I'm using Apple tutorial about face detection in vision kit in a live camera feed, not an image.
It detects the face and adds some lines using CAShapeLayer to draw lines between different parts of the face.
fileprivate func setupVisionDrawingLayers() {
let captureDeviceResolution = self.captureDeviceResolution
let captureDeviceBounds = CGRect(x: 0,
y: 0,
width: captureDeviceResolution.width,
height: captureDeviceResolution.height)
let captureDeviceBoundsCenterPoint = CGPoint(x: captureDeviceBounds.midX,
y: captureDeviceBounds.midY)
let normalizedCenterPoint = CGPoint(x: 0.5, y: 0.5)
guard let rootLayer = self.rootLayer else {
self.presentErrorAlert(message: "view was not property initialized")
let overlayLayer = CALayer() = "DetectionOverlay"
overlayLayer.masksToBounds = true
overlayLayer.anchorPoint = normalizedCenterPoint
overlayLayer.bounds = captureDeviceBounds
overlayLayer.position = CGPoint(x: rootLayer.bounds.midX, y: rootLayer.bounds.midY)
let faceRectangleShapeLayer = CAShapeLayer() = "RectangleOutlineLayer"
faceRectangleShapeLayer.bounds = captureDeviceBounds
faceRectangleShapeLayer.anchorPoint = normalizedCenterPoint
faceRectangleShapeLayer.position = captureDeviceBoundsCenterPoint
faceRectangleShapeLayer.fillColor = nil
faceRectangleShapeLayer.strokeColor =
faceRectangleShapeLayer.lineWidth = 5
faceRectangleShapeLayer.shadowOpacity = 0.7
faceRectangleShapeLayer.shadowRadius = 5
let faceLandmarksShapeLayer = CAShapeLayer() = "FaceLandmarksLayer"
faceLandmarksShapeLayer.bounds = captureDeviceBounds
faceLandmarksShapeLayer.anchorPoint = normalizedCenterPoint
faceLandmarksShapeLayer.position = captureDeviceBoundsCenterPoint
faceLandmarksShapeLayer.fillColor = nil
faceLandmarksShapeLayer.strokeColor = UIColor.yellow.withAlphaComponent(0.7).cgColor
faceLandmarksShapeLayer.lineWidth = 3
faceLandmarksShapeLayer.shadowOpacity = 0.7
faceLandmarksShapeLayer.shadowRadius = 5
self.detectionOverlayLayer = overlayLayer
self.detectedFaceRectangleShapeLayer = faceRectangleShapeLayer
self.detectedFaceLandmarksShapeLayer = faceLandmarksShapeLayer
How can I fill inside the lines (different part of face) with a blurry view? I need to blur the face.
You could try placing a UIVisualEffectView on top of your video feed, and then adding a masking CAShapeLayer to that UIVisualEffectView. I don't know if that would work or not.
The docs on UIVisualEffectView say:
When using the UIVisualEffectView class, avoid alpha values that are less than 1. Creating views that are partially transparent causes the system to combine the view and all the associated subviews during an offscreen render pass. UIVisualEffectView objects need to be combined as part of the content they are layered on top of in order to look correct. Setting the alpha to less than 1 on the visual effect view or any of its superviews causes many effects to look incorrect or not show up at all.
I don't know if using a mask layer on a visual effect view would cause the same rendering problems or not. You'd have to try it. (And be sure to try it on a range of different hardware, since the rendering performance varies quite a bit between different versions of Apple's chipsets.)
You could also try using a shape layer filled with visual hash or a "pixellated" pattern instead of blurring. That would be faster and probably render more reliably.
Note that face detection tends to be a little jumpy. It might drop out for a few frames, or lag on quick pans or change of scene. If you're trying to hide people's faces in a live feed for privacy, it might not be reliable. It would only take a few un-blurred frames for somebody's identity to be revealed.

Turning a UIBezierPath into a mask?

Not sure if I am asking this question correctly, but I have two components; a CIImage and a UIBezierPath. Ideally, I want to create a CGRect that encapsulates my UIBezierPath; everything inside of the path would be white, everything outside of the path would be black. This way, I can then render this CGRect to some sort of an image, which I could then use as a mask for other purposes.
I am struggling to figure out how to do this with a focus on performance. My tests, as noted below, leverage using UIGraphicsImageRenderer which is far too slow for my needs (I will be doing this on sample buffers from a camera). Therefore, I would like to stick within CoreImage. This is my attempt;
// Path
let path = UIBezierPath()
// ... define the path's shape and close it
// My source image
let image = CIImage(cgImage: UIImage(named: "test.jpg")!.cgImage!)
// Renderer
let renderer = UIGraphicsImageRenderer(size: image.extent.size)
// Render path as mask
let img = renderer.image { ctx in
ctx.cgContext.fill(CGRect(x: 0, y: 0, width: image.extent.size.width, height: image.extent.size.height))
ctx.cgContext.drawPath(using: .fill)
// Put a filter on the image
let imageFiltered = image.applyingFilter("CIPhotoEffectNoir")
// Blend with mask
let maskFilter = CIFilter.blendWithMask()
maskFilter.inputImage = imageFiltered
maskFilter.backgroundImage = image
maskFilter.maskImage = CIImage(cgImage: img.cgImage!)
// Output
if let output = maskFilter.outputImage {
... use CIContext() to render back to CVPixelBuffer for preview on MTKView.
Overall, the goal is to have a defined portion of an image (which will not conform to a traditional shape like a square or circle) which will be filtered with a CIFilter, then composited back over the original. If there is a better approach (such as somehow taking the original image, filtering it, cropping it to the path (leaving everything outside of the path transparent) and composing, that would likely be better performant.
To note, the above sample code results in a crash as the UIGraphicsImageRenderer cannot render the mask fast enough.
Your approach looks good so far. I assume the slow part is the generation of the mask image with Core Graphics. Unfortunately, there is no direct way to do the same with Core Image directly (on the GPU). However, you can try the following:
(Assuming from your previous question that the path always has a certain shape,) you can generate a mask image containing the path once for a certain reference size of your choice. Make sure that the path doesn't "touch" the border.
Then, when you want to use it as a mask, move and scale the shape image to the correct place using transformations and let its edges extend infinitely (to cover the whole underlying image; that's why the shape shouldn't touch the edges). Something like this:
let pathImage = CIImage(cgImage: img.cgImage!)
// scale path to the size of the area you want to mask
var mask = pathImage.transformed(by: CGAffineTransform(scaleX: scaleX, y: scaleY))
// move path to the place you want to cover
mask = mask.transformed(by: CGAffineTransform(translationX: offsetX, y: offsetY))
// let mask fill the rest of the area
mask = mask.clampedToExtent()
// use mask as maskImage...
You should be able to recycle the pathImage for every frame and thereby avoiding Core Graphics and CPU-GPU-synchronization.

I'm having some trouble using x and y coordinates from touchesBegan as the center key in a CI filter

I'm trying to setup having the users tap a location in an image view and the X,Y of the tap becomes the center point (kCIInputCenterKey) of the current image filter in use.
These are my global variables:
var x: CGFloat = 0
var y: CGFloat = 0
var imgChecker = 0
This is my touchesBegan function that checks if the user is touching inside the image view or not, if not then sets the filter center key to the center of the image view:
override func touchesBegan(_ touches: Set<UITouch>, with event: UIEvent?) {
if let touch = touches.first {
let position = touch.location(in: self.imageView)
if (touch.view == imageView){
print("touchesBegan | This is an ImageView")
x = position.x * 4
y = position.y * 4
imgChecker = 1
print("touchesBegan | This is not an ImageView")
x = 0
y = 0
imgChecker = 0
print("x: \(x)")
print("y: \(y)")
As you can see I have the checker there to make the filter center appear in the middle of the image if inside the image view was not tapped. I'm also printing out the coordinates tapped to xCode's console and they appear without issue.
This is the part where i apply my filter:
currentFilter = CIFilter(name: "CIBumpDistortion")
currentFilter.setValue(200, forKey: kCIInputRadiusKey)
currentFilter.setValue(1, forKey: kCIInputScaleKey)
if imgChecker == 1 {
self.currentFilter.setValue(CIVector(x: self.x, y: self.y), forKey: kCIInputCenterKey)
self.currentFilter.setValue(CIVector(x: currentImage.size.width / 2, y: currentImage.size.height / 2), forKey: kCIInputCenterKey)
x = 0
y = 0
let beginImage = CIImage(image: currentImage)
currentFilter.setValue(beginImage, forKey: kCIInputImageKey)
let cgimg = context.createCGImage(currentFilter.outputImage!, from: currentFilter.outputImage!.extent)
currentImage = UIImage(cgImage: cgimg!)
self.imageView.image = currentImage
This is the CGRect I'm using, ignore the "frame" in there, its just a image view in front of the first one that allows me to save a "frame" over the current filtered image:
func drawImagesAndText() {
let renderer = UIGraphicsImageRenderer(size: CGSize(width: imageView.bounds.size.width, height: imageView.bounds.size.height))
img = renderer.image { ctx in
let bgImage = currentImage
bgImage?.draw(in: CGRect(x: 0, y: 0, width: imageView.bounds.size.width, height: imageView.bounds.size.height))
frames = UIImage(named: framesAr)
frames?.draw(in: CGRect(x: 0, y: 0, width: imageView.bounds.size.width, height: imageView.bounds.size.height))
When I do set the x,y by tapping inside the image view, the center of the filter in the image view keeps appearing in the lower left hand side of it regardless of where I tapped inside. If i keep tapping around the image view, the center does seem to move around a bit, but its no where near where I'm actually tapping.
any insight would be greatly appreciated, thank you.
Keep two things in mind.
First (and I think you probably know this), the CI origin (0,0) is lower left, not top left.
Second (and I think this is the issue) UIKit (meaning UIImage and potentially CGPoint coordinates) are not the same as CIVector coordinates. You need to take the UIKit touchesBegan coordinate and turn it into the CIImage.extent coordinate.
All coordinates that follow are X then Y, and Width then Height.
After posting my comment I thought I'd give an example of what I mean by scaling. Let's say you have a UIImageView sized at 250x250, using a content mode of AspectFit, displaying an image whose size is 1000x500.
Now, let's say the touchesBegan is CGPoint(200,100). (NOTE: If your UIImageView is part of a larger superview, it could be something more like 250,400 - I'm working on the point within the UIImageView.)
Scaling down the image size (remember, AspectFit) means the image is actually centered vertically (landscape appearing) within the UIImageView at CGRect(0, 62.5, 250, 125). So first off, good! The touch point not only began within the image view, it also began wishing the image. (You'll probably want to consider the not-so-edge case of touches beginning outside of the image.)
Dividing by 4 gives you the scaled down image view coordinates, and as you'd expect, multiplying up will give you the needed vector coordinates. So a touchesBegan CGPoint(200,100) turns into a CIVector(800,400).
I have some code written - not much in the way of comments, done in Swift 2 (I think) and very poorly written - that is part of a subclass (probably should have been an extension) of UIImageView that computes all this. Using the UIImageView's bounds and it's image's size is what you need. Keep in mind - images in AspectFit can also be scaled up!
One last note on CIImage - extent. Many times it's a UIImage's size. But many masks and generated output may have an infinite eatent.
I made a stupid mistake in my scaling example. Remember, the CIImage Origin is bottom left, not upper left. So in my example a CGPoint(200,100), scaled to CGPoint(800,400) would be CGVector(800,100).
Apologies for the multiple/running edits, but it seems important. (Besides, only the last was due my stupidity! Worthwhile, to note, but still.)
Now we're talking "near real time" updating using a Core Image filter. I'm planning to eventually have some blog posts on this, but the real source you want is Simon Gladman (he's moved on, look back to his posts in 2015-16), and his eBook Core Image for Swift (uses Swift 2 but most is automatically upgraded to Swift 3). Just giving credit where it is due.
If you want "near real time" usage of Core Image, you need to use the GPU. UIView, and all it's subclasses (meaning UIKit) uses the CPU. That's okay, using the GPU means using a Core Graphics, and specifically using a GLKView. It's the CG equivalent of a UIImage.
Here's my subclass of it:
open class GLKViewDFD: GLKView {
var renderContext: CIContext
var myClearColor:UIColor!
var rgb:(Int?,Int?,Int?)!
open var image: CIImage! {
didSet {
public var clearColor: UIColor! {
didSet {
myClearColor = clearColor
public init() {
let eaglContext = EAGLContext(api: .openGLES2)
renderContext = CIContext(eaglContext: eaglContext!)
context = eaglContext!
override public init(frame: CGRect, context: EAGLContext) {
renderContext = CIContext(eaglContext: context)
super.init(frame: frame, context: context)
enableSetNeedsDisplay = true
public required init?(coder aDecoder: NSCoder) {
let eaglContext = EAGLContext(api: .openGLES2)
renderContext = CIContext(eaglContext: eaglContext!)
super.init(coder: aDecoder)
context = eaglContext!
enableSetNeedsDisplay = true
override open func draw(_ rect: CGRect) {
if let image = image {
let imageSize = image.extent.size
var drawFrame = CGRect(x: 0, y: 0, width: CGFloat(drawableWidth), height: CGFloat(drawableHeight))
let imageAR = imageSize.width / imageSize.height
let viewAR = drawFrame.width / drawFrame.height
if imageAR > viewAR {
drawFrame.origin.y += (drawFrame.height - drawFrame.width / imageAR) / 2.0
drawFrame.size.height = drawFrame.width / imageAR
} else {
drawFrame.origin.x += (drawFrame.width - drawFrame.height * imageAR) / 2.0
drawFrame.size.width = drawFrame.height * imageAR
rgb = (0,0,0)
rgb = myClearColor.rgb()
glClearColor(Float(rgb.0!)/256.0, Float(rgb.1!)/256.0, Float(rgb.2!)/256.0, 0.0);
// set the blend mode to "source over" so that CI will use that
glBlendFunc(1, 0x0303);
renderContext.draw(image, in: drawFrame, from: image.extent)
A few notes.
I absolutely need to credit for much of this. This is also a great resource for Swift and UIKit coding.
I wanted AspectFit content mode with the potential to change the "backgroundColor" of the GLKView, which is why I subclassed and and called if clearColor.
Between the two resources I linked to, you should have what you need to have a good performing, near real time use of Core Image, using the GPU. One reason my afore-mentioned code to use scaling after getting the output of a filter was never updated? It didn't need it.
Lots here to process, I know. But I've found this side of things (Core Image effects) to be the most fun side (and pretty cool too) of iOS.

SKCropNode Strange Behaviour

When using SKCropNode, I wanted the image I add to the cropNode to adjust each individual pixel alpha value in accordance to the corresponding mask pixel alpha value.
After a lot of research, I came to the conclusion that the image pixel alpha values were not going to adjust to the mask, however after just continuing with my project, I notice that one specific cropNode image's pixels were in fact fading to the mask pixel alpha value??? Which was great! However after reproducing this, I don't know why it is doing it?
import SpriteKit
var textureArray: [SKTexture] = []
var display: SKSpriteNode!
class GameScene: SKScene {
override func didMoveToView(view: SKView) {
anchorPoint = CGPointMake(0.5, 0.5)
backgroundColor = UIColor.greenColor()
display = SKSpriteNode()
let image = SKSpriteNode(texture: textureArray[0])
let randomCropNode = SKCropNode()
let cropNode = SKCropNode()
cropNode.maskNode = display
let fill = SKSpriteNode(color: UIColor.whiteColor(), size: frame.size)
cropNode.zPosition = 10
func fetchTexures() {
var x: Int = 0
while x < 1 {
let texture: SKTexture = SKTextureAtlas(named: "texture").textureNamed("\(x)")
x += 1
The above code gives me my desired effect, however if you remove the below, the image pixel alpha values no longer adjust in accordance with the mask?? The below code is not actually using in my project, but it's the only way I can make the pixel alpha value's adjust.
let randomCropNode = SKCropNode()
Can anybody see what is causing this behaviour, or if there a better way of getting my desired effect?
If remove:
let randomCropNode = SKCropNode()
Crop node will only turn on and off pixels if the alpha varies between <.5 (off) and >=.5(on)
However to apply a fade, if your alpha mask is just black(with various alpha levels) and transparent, you apply the mask as a regular texture to your crop node, and you let alpha blending take care of the fade effect.
As for your issues with the code, are you sure your crop node is cropping, and not just rendering the texture? I do not know what the texture looks like to try and reproduce this.
The node supplied to the crop node must not be a child of another
node; however, it may have children of its own.
When the crop node’s contents are rendered, the crop node first draws
its mask into a private buffer. Then, it renders its children. When
rendering its children, each pixel is verified against the
corresponding pixel in the mask. If the pixel in the mask has an alpha
value of less than 0.05, the image pixel is masked out. Any pixel not
rendered by the mask node is automatically masked out.