Cropping/Compositing An Image With Vision/CoreImage - swift

I am working with the Vision framework in iOS 13 and am trying to achieve the following tasks;
Take an image (in this case, a CIImage) and locate all faces in the image using Vision.
Crop each face into its own CIImage (I'll call this a "face image").
Filter each face image using a CoreImage filter, such as a blur or comic book effect.
Composite the face image back over the original image, hereby creating effects that only apply to the face.
A better example of this would be the end goal of taking a live camera feed from an AVCaptureSession and blurring every face in the video frame, compositing the blurred faces back over the original image for saving.
I almost have this working, save for the fact that there seems to be a coordinates/translation issue. For example, when I test this code and move my face, the "blurred" section goes the wrong direction (if I turn my face right, the box goes left, if I look up, the box goes down). While I think this may have something to do with mirroring on the front-facing camera, I can't seem to figure out what I should try next;
func drawFaceBox(bufferImage: CIImage, observations: [VNFaceObservation]) -> CVPixelBuffer? {
// The filter
let blur = CIFilter(name: "CICrystallize")
// The unfiltered image, prepared for filtering
var filteredImage = bufferImage
// Find and crop each face
if !observations.isEmpty {
for face in observations {
let faceRect = VNImageRectForNormalizedRect(face.boundingBox, Int(bufferImage.extent.size.width), Int(bufferImage.extent.size.height))
let croppedFace = bufferImage.cropped(to: faceRect)
blur?.setValue(croppedFace, forKey: kCIInputImageKey)
blur?.setValue(10.0, forKey: kCIInputRadiusKey)
if let blurred = blur?.value(forKey: kCIOutputImageKey) as? CIImage {
compositorCIFilter?.setValue(blurred, forKey: kCIInputImageKey)
compositorCIFilter?.setValue(filteredImage, forKey: kCIInputBackgroundImageKey)
if let output = compositorCIFilter?.value(forKey: kCIOutputImageKey) as? CIImage {
filteredImage = output
}
}
}
}
// Convert image to CVPixelBuffer and return. This part works fine.
}
Any thoughts on how I can composite the blurred face image(s) back to their original position with accuracy? Or any other approach to only filter part of the original CIImage to avoid this issue altogether/save processing? Thanks!

I believe this issue stems from an orientation problem earlier on in the pipeline (specifically, during the output of the sample buffers from the camera, which is where the Vision task was instantiated). I have updated my didOutputSampleBuffer code like so;
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
...
// Setup the current device orientation
let curDeviceOrientation = UIDevice.current.orientation
// Handle the image property orientation
//let orientation = self.exifOrientation(from: curDeviceOrientation)
// Setup the image request handler
//let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: CGImagePropertyOrientation(rawValue: UInt32(1))!)
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:])
// Setup the completion handler
let completion: VNRequestCompletionHandler = {request, error in
let observations = request.results as! [VNFaceObservation]
// Draw faces
DispatchQueue.main.async {
// HANDLE FACES
self.drawFaceBoxes(for: observations)
}
}
// Setup the image request
let request = VNDetectFaceRectanglesRequest(completionHandler: completion)
// Handle the request
do {
try handler.perform([request])
} catch {
print(error)
}
}
As noted, I have commented out the let orientation = ... and the first let handler = ..., which was using the orientation. By removing the reference to the orientation, I seem to have removed any issue with orientation in the Vision calculations.

Related

CIGaussianBlur shrinks UIImageView

Using CIGaussianBlur causes UIImageView to apply the blur from the border in, making the image appear to shrink (right image). Using .blur on a SwiftUI view does the opposite; the blur is applied from the border outwards (left image). This is the effect I’m trying to achieve in UIKit. How can I go about this?
I've seen a few posts about using CIAffineClamp, but that causes the blur to stop at the image boarder which is not what I want.
private let context = CIContext()
private let filter = CIFilter(name: "CIGaussianBlur")!
private func createBluredImage(using image: UIImage, value: CGFloat) -> UIImage? {
let beginImage = CIImage(image: image)
filter.setValue(beginImage, forKey: kCIInputImageKey)
filter.setValue(value, forKey: kCIInputRadiusKey)
guard
let outputImage = filter.outputImage,
let cgImage = context.createCGImage(outputImage, from: outputImage.extent)
else {
return nil
}
return UIImage(cgImage: cgImage)
}
When I used CIGaussianBlur I wanted my output image to be contained inside the image frame, so I used CIAffineClamp on the image before applying the blur, as you describe.
You might need to render your source image into a larger frame, clamp to that larger frame using CIAffineClamp, apply your blur filter, then load the resulting blurred output image. Core Image is a bit of a pain to set up and figure out, so I don’t have a full solution ready for you, but that’s what I would suggest.

What is the best way to utilize memory inside of "session(_:didUpdate:)" method?

My use case is I want to calculate various gestures of a hand (the first hand) seen by the camera. I am able to find body anchors and hand anchors and poses. See my video here.
I am trying to utilize previous position SIMD3 information to calculate what kind of gesture was demonstrated. I did see the example posted by Apple which shows pinching to write virtually, I am not sure that a buffer is the right solution for something like this.
A specific example of what I am trying to do is detect a swipe, long-press, tap as if the user is wearing a pair of AR glasses (made by Apple one day). For clarification I want to raycast from my hand and perform a gesture on an Entity or Anchor.
Here is a snippet for those of you that want to know how to get body anchors:
public func session(_ session: ARSession, didUpdate frame: ARFrame) {
let capturedImage = frame.capturedImage
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: capturedImage,
orientation: .right,
options: [:])
let handPoseRequest = VNDetectHumanHandPoseRequest()
//let bodyPoseRequest = VNDetectHumanBodyPoseRequest()
do {
try imageRequestHandler.perform([handPoseRequest])
guard let observation = handPoseRequest.results?.first else {
return
}
// Get points for thumb and index finger.
let thumbPoints = try observation.recognizedPoints(.thumb)
let indexFingerPoints = try observation.recognizedPoints(.indexFinger)
let pinkyFingerPoints = try observation.recognizedPoints(.littleFinger)
let ringFingerPoints = try observation.recognizedPoints(.ringFinger)
let middleFingerPoints = try observation.recognizedPoints(.middleFinger)
self.detectHandPose(handObservations: observation)
} catch {
print("Failed to perform image request.")
}
}

captureOutput stops being called after switching CIFilters

I use this function to change CIFilter on my camera preview. It works as it should, but somehow after switching several filters, captureOutput stops being called and the preview is stuck on the last image captured. It does not return on my "guard let filter". The app does not crash - when I close the camera, and reopen it, it works again.
How can I prevent that behaviour?
func captureOutput(captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, fromConnection connection: AVCaptureConnection!)
{
guard let filter = Filters[FilterNames[currentFilter]] else
{
return
}
let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
let cameraImage = CIImage(CVPixelBuffer: pixelBuffer!)
filter!.setValue(cameraImage, forKey: kCIInputImageKey)
let filteredImage = UIImage(CIImage: filter!.valueForKey(kCIOutputImageKey) as! CIImage!)
dispatch_async(dispatch_get_main_queue())
{
self.imageView.image = filteredImage
}
}
I guess the system can't keep up with the rendering of the images. UIImageView is not meant to display new images at 30 frames per second while also adding the filtering on top of that.
A much more efficient way would be to render directly into an MTKView. I encourage you to check out the AVCamFilter example project to see how this can be done.

How do you add an overlay while recording a video in Swift?

I am trying to record, and then save, a video in Swift using AVFoundation. This works. I am also trying to add an overlay, such as a text label containing the date, to the video.
For example: the video saved is not only what the camera sees, but the timestamp as well.
Here is how I am saving the video:
func fileOutput(_ output: AVCaptureFileOutput, didFinishRecordingTo outputFileURL: URL, from connections: [AVCaptureConnection], error: Error?) {
saveVideo(toURL: movieURL!)
}
private func saveVideo(toURL url: URL) {
PHPhotoLibrary.shared().performChanges({
PHAssetChangeRequest.creationRequestForAssetFromVideo(atFileURL: url)
}) { (success, error) in
if(success) {
print("Video saved to Camera Roll.")
} else {
print("Video failed to save.")
}
}
}
I have a movieOuput that is an AVCaptureMovieFileOutput. My preview layer does not contain any sublayers. I tried adding the timestamp label's layer to the previewLayer, but this did not succeed.
I have tried Ray Wenderlich's example as well as this stack overflow question. Lastly, I also tried this tutorial, all of which to no avail.
How can I add an overlay to my video that is in the saved video in the camera roll?
Without more information it sounds like what you are asking for is a WATERMARK.
Not an overlay.
A watermark is a markup on the video that will be saved with the video.
An overlay is generally showed as subviews on the preview layer and will not be saved with the video.
Check this out here: https://stackoverflow.com/a/47742108/8272698
func addWatermark(inputURL: URL, outputURL: URL, handler:#escaping (_ exportSession: AVAssetExportSession?)-> Void) {
let mixComposition = AVMutableComposition()
let asset = AVAsset(url: inputURL)
let videoTrack = asset.tracks(withMediaType: AVMediaType.video)[0]
let timerange = CMTimeRangeMake(kCMTimeZero, asset.duration)
let compositionVideoTrack:AVMutableCompositionTrack = mixComposition.addMutableTrack(withMediaType: AVMediaType.video, preferredTrackID: CMPersistentTrackID(kCMPersistentTrackID_Invalid))!
do {
try compositionVideoTrack.insertTimeRange(timerange, of: videoTrack, at: kCMTimeZero)
compositionVideoTrack.preferredTransform = videoTrack.preferredTransform
} catch {
print(error)
}
let watermarkFilter = CIFilter(name: "CISourceOverCompositing")!
let watermarkImage = CIImage(image: UIImage(named: "waterMark")!)
let videoComposition = AVVideoComposition(asset: asset) { (filteringRequest) in
let source = filteringRequest.sourceImage.clampedToExtent()
watermarkFilter.setValue(source, forKey: "inputBackgroundImage")
let transform = CGAffineTransform(translationX: filteringRequest.sourceImage.extent.width - (watermarkImage?.extent.width)! - 2, y: 0)
watermarkFilter.setValue(watermarkImage?.transformed(by: transform), forKey: "inputImage")
filteringRequest.finish(with: watermarkFilter.outputImage!, context: nil)
}
guard let exportSession = AVAssetExportSession(asset: asset, presetName: AVAssetExportPreset640x480) else {
handler(nil)
return
}
exportSession.outputURL = outputURL
exportSession.outputFileType = AVFileType.mp4
exportSession.shouldOptimizeForNetworkUse = true
exportSession.videoComposition = videoComposition
exportSession.exportAsynchronously { () -> Void in
handler(exportSession)
}
}
And heres how to call the function.
let outputURL = NSURL.fileURL(withPath: "TempPath")
let inputURL = NSURL.fileURL(withPath: "VideoWithWatermarkPath")
addWatermark(inputURL: inputURL, outputURL: outputURL, handler: { (exportSession) in
guard let session = exportSession else {
// Error
return
}
switch session.status {
case .completed:
guard NSData(contentsOf: outputURL) != nil else {
// Error
return
}
// Now you can find the video with the watermark in the location outputURL
default:
// Error
}
})
Let me know if this code works for you.
It is in swift 3 so some changes will be needed.
I currently am using this code on an app of mine. Have not updated it to swift 5 yet
I do not have an actual development environment for Swift that can utilize AVFoundation. Thus, I can't provide you with any example code.
For adding meta data(date, location, timestamp, watermark, frame rate, etc...) as an overlay to the video while recording, you would have to process the video feed, frame by frame, live, while recording. Most likely you would have to store the frames in a buffer and process them before actually record them.
Now when it come to the meta data, there are two type, static and dynamic. For static type such as a watermark, it should be easy enough, as all the frames will get the same thing.
However, for dynamic meta data type such as timestamp or GPS location, there are a few things that needed to be taken into consideration. It takes computational power and time to process the video frames. Thus, depends on the type of dynamic data and how you got them, sometime the processed value may not be a correct value. For example, if you got a frame at 1:00:01, you process it and add a timestamp to it. Just pretend that it took 2 seconds to process the timestamp. The next frame you got is at 1:00:02, but you couldn't process it until 1:00:03 because processing the previous frame took 2 seconds. Thus, depend on how you got that new timestamp for the new frame, that timestamp value may not be the value that you wanted.
For processing dynamic meta data, you should also take into consideration of hardware lag. For example, the software is supposed to add live GPS location data to each frame and there weren't any lags in development or in testing. However, in real life, a user used the software in an area with a bad connection, and his phone lag while obtaining his GPS location. Some of his lags lasted as long as 5 seconds. What do you do in that situation? Do you set a time out for the GPS location and used the last good position? Do you report the error? Do you defer that frame to be process later when the GPS data become available(This may ruin live recording) and using an expensive algorithm to try to predict the user's location for that frame?
Besides those to take into consideration, I have some references here that I think may help you. I thought the one from medium.com looked pretty good.
https://medium.com/ios-os-x-development/ios-camera-frames-extraction-d2c0f80ed05a
Adding watermark to currently recording video and save with watermark
Render dynamic text onto CVPixelBufferRef while recording video
Adding on to #Kevin Ng, you can do an overlay on video frames with an UIViewController and an UIView.
UIViewController will have:
property to work with video stream
private var videoSession: AVCaptureSession?
property to work with overlay(the UIView class)
private var myOverlay: MyUIView{view as! MyUIView}
property to work with video output queue
private let videoOutputQueue = DispatchQueue(label:
"outputQueue", qos: .userInteractive)
method to create video session
method to process and display overlay
UIView will have task-specific helper methods needed to to act as overlay. For example, if you are doing hand detection, this overlay class can have helper methods to draw points on coordinates(ViewController class will detect coordinates of hand features, do necessary coordinate conversions, then pass the coordinates to the UIView class to display coordinates as an overlay)

Combining CoreML and ARKit

I am trying to combine CoreML and ARKit in my project using the given inceptionV3 model on Apple website.
I am starting from the standard template for ARKit (Xcode 9 beta 3)
Instead of intanciating a new camera session, I reuse the session that has been started by the ARSCNView.
At the end of my viewDelegate, I write:
sceneView.session.delegate = self
I then extend my viewController to conform to the ARSessionDelegate protocol (optional protocol)
// MARK: ARSessionDelegate
extension ViewController: ARSessionDelegate {
func session(_ session: ARSession, didUpdate frame: ARFrame) {
do {
let prediction = try self.model.prediction(image: frame.capturedImage)
DispatchQueue.main.async {
if let prob = prediction.classLabelProbs[prediction.classLabel] {
self.textLabel.text = "\(prediction.classLabel) \(String(describing: prob))"
}
}
}
catch let error as NSError {
print("Unexpected error ocurred: \(error.localizedDescription).")
}
}
}
At first I tried that code, but then noticed that inception requires a pixel Buffer of type Image. < RGB,<299,299>.
Although not recommenced, I thought I would just resize my frame then try to get a prediction out of it. I am resizing using this function (took it from https://github.com/yulingtianxia/Core-ML-Sample)
func resize(pixelBuffer: CVPixelBuffer) -> CVPixelBuffer? {
let imageSide = 299
var ciImage = CIImage(cvPixelBuffer: pixelBuffer, options: nil)
let transform = CGAffineTransform(scaleX: CGFloat(imageSide) / CGFloat(CVPixelBufferGetWidth(pixelBuffer)), y: CGFloat(imageSide) / CGFloat(CVPixelBufferGetHeight(pixelBuffer)))
ciImage = ciImage.transformed(by: transform).cropped(to: CGRect(x: 0, y: 0, width: imageSide, height: imageSide))
let ciContext = CIContext()
var resizeBuffer: CVPixelBuffer?
CVPixelBufferCreate(kCFAllocatorDefault, imageSide, imageSide, CVPixelBufferGetPixelFormatType(pixelBuffer), nil, &resizeBuffer)
ciContext.render(ciImage, to: resizeBuffer!)
return resizeBuffer
}
Unfortunately, this is not enough to make it work. This is the error that is catched:
Unexpected error ocurred: Input image feature image does not match model description.
2017-07-20 AR+MLPhotoDuplicatePrediction[928:298214] [core]
Error Domain=com.apple.CoreML Code=1
"Input image feature image does not match model description"
UserInfo={NSLocalizedDescription=Input image feature image does not match model description,
NSUnderlyingError=0x1c4a49fc0 {Error Domain=com.apple.CoreML Code=1
"Image is not expected type 32-BGRA or 32-ARGB, instead is Unsupported (875704422)"
UserInfo={NSLocalizedDescription=Image is not expected type 32-BGRA or 32-ARGB, instead is Unsupported (875704422)}}}
Not sure what I can do from here.
If there is any better suggestion to combine both, I'm all ears.
Edit: I also tried the resizePixelBuffer method from the YOLO-CoreML-MPSNNGraph suggested by #dfd , the error is exactly the same.
Edit2: So I changed the pixel format to be kCVPixelFormatType_32BGRA (not the same format as the pixelBuffer passed in the resizePixelBuffer).
let pixelFormat = kCVPixelFormatType_32BGRA // line 48
I do not have the error anymore. But as soon as I try to make a prediction, the AVCaptureSession stops. Seems I am running into the same issue Enric_SA is running on the apple developers forum.
Edit3: So I tried implementing rickster solution. Works well with inceptionV3. I wanted to try a a feature observation (VNClassificationObservation). At this time, it is not working using TinyYolo. The bounding are wrong. Trying to figure it out.
Don't process images yourself to feed them to Core ML. Use Vision. (No, not that one. This one.) Vision takes an ML model and any of several image types (including CVPixelBuffer) and automatically gets the image to the right size and aspect ratio and pixel format for the model to evaluate, then gives you the model's results.
Here's a rough skeleton of the code you'd need:
var request: VNRequest
func setup() {
let model = try VNCoreMLModel(for: MyCoreMLGeneratedModelClass().model)
request = VNCoreMLRequest(model: model, completionHandler: myResultsMethod)
}
func classifyARFrame() {
let handler = VNImageRequestHandler(cvPixelBuffer: session.currentFrame.capturedImage,
orientation: .up) // fix based on your UI orientation
handler.perform([request])
}
func myResultsMethod(request: VNRequest, error: Error?) {
guard let results = request.results as? [VNClassificationObservation]
else { fatalError("huh") }
for classification in results {
print(classification.identifier, // the scene label
classification.confidence)
}
}
See this answer to another question for some more pointers.