AVCaptureVideoPreviewLayer does not detect objects in two ranges of the screen

AVCaptureVideoPreviewLayer does not detect objects in two ranges of the screen - swift

I downloaded Apple's project about recognizing Objects in Live Capture.
When I tried the app I saw that if I put the object to recognize on the top or on the bottom of the camera view, the app doesn't recognize the object:
In this first image the banana is in the center of the camera view and the app is able to recognize it.
image object in center
In these two images the banana is near to the camera view's border and it is not able to recognize the object.
image object on top
image object on bottom
This is how session and previewLayer are set:
func setupAVCapture() {
var deviceInput: AVCaptureDeviceInput!
// Select a video device, make an input
let videoDevice = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInWideAngleCamera], mediaType: .video, position: .back).devices.first
do {
deviceInput = try AVCaptureDeviceInput(device: videoDevice!)
} catch {
print("Could not create video device input: \(error)")
return
}
session.beginConfiguration()
session.sessionPreset = .vga640x480 // Model image size is smaller.
// Add a video input
guard session.canAddInput(deviceInput) else {
print("Could not add video device input to the session")
session.commitConfiguration()
return
}
session.addInput(deviceInput)
if session.canAddOutput(videoDataOutput) {
session.addOutput(videoDataOutput)
// Add a video data output
videoDataOutput.alwaysDiscardsLateVideoFrames = true
videoDataOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: Int(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)]
videoDataOutput.setSampleBufferDelegate(self, queue: videoDataOutputQueue)
} else {
print("Could not add video data output to the session")
session.commitConfiguration()
return
}
let captureConnection = videoDataOutput.connection(with: .video)
// Always process the frames
captureConnection?.isEnabled = true
do {
try videoDevice!.lockForConfiguration()
let dimensions = CMVideoFormatDescriptionGetDimensions((videoDevice?.activeFormat.formatDescription)!)
bufferSize.width = CGFloat(dimensions.width)
bufferSize.height = CGFloat(dimensions.height)
videoDevice!.unlockForConfiguration()
} catch {
print(error)
}
session.commitConfiguration()
previewLayer = AVCaptureVideoPreviewLayer(session: session)
previewLayer.videoGravity = AVLayerVideoGravity.resizeAspectFill
rootLayer = previewView.layer
previewLayer.frame = rootLayer.bounds
rootLayer.addSublayer(previewLayer)
}
You can download the project here,
I am wondering if it is normal or not.
Is there any solutions to fix?
Does it take square photos to elaborate with coreml and the two ranges are not included?
Any hints? Thanks

That's probably because the imageCropAndScaleOption is set to centerCrop.
The Core ML model expects a square image but the video frames are not square. This can be fixed by setting the imageCropAndScaleOption option on the VNCoreMLRequest. However, the results may not be as good as with center crop (it depends on how the model was originally trained).
See also VNImageCropAndScaleOption in the Apple docs.

Related

ARVideoNode in RealityKit

I was experimenting with Open AI's ChatGPT and when asked to give me a code for playing a video in a RealityKit's AR scene when reference image is tracked it used ARVideoNode instead of my expected AVPlayer and VideoMaterial solution. It even gave me an answer why ARVideoNode is better than AVPlayer on a VideoMaterial when I asked, but I never heard of ARVideoNode in RealityKit.
Am I missing something or is it just a flaw in the AI?
import RealityKit
// Set up image tracking
let configuration = ARImageTrackingConfiguration()
configuration.detectionImages = ["reference-image-1", "reference-image-2"]
arView.session.run(configuration)
// Create a dictionary to map reference image names to video file names
let videoFileNames = ["reference-image-1": "video-1.mp4",
"reference-image-2": "video-2.mp4"]
// Track the reference images and display the corresponding
// videos on top of them
var videoNodes = [String: ARVideoNode]()
arView.scene.subscribe(to: ARImageAnchor.self) { (anchor: ARImageAnchor) in
// Get the video file name for the tracked image
guard let videoFileName = videoFileNames[anchor.name] else { return }
// Load the video file
let videoURL = URL(fileURLWithPath: "path/to/\(videoFileName)")
let videoAsset = VideoAsset(url: videoURL)
// Create an ARVideoNode and add it to the scene
let videoNode = ARVideoNode(asset: videoAsset)
arView.scene.anchors.append(videoNode)
videoNodes[anchor.name] = videoNode
// Position the video node on top of the image
videoNode.transform = anchor.transform
// Play the video
videoNode.play()
}
// Monitor the tracking status of the reference images
// and pause/resume the videos as needed
arView.scene.subscribe(to: ARImageAnchor.self) { (anchor: ARImageAnchor) in
// Get the video node for the tracked image
guard let videoNode = videoNodes[anchor.name] else { return }
if anchor.isTracked {
// Resume playing the video if the image is being tracked
videoNode.play()
} else {
// Pause the video if the image is not being tracked
videoNode.pause()
}
}

It's not a flaw, it's rather ChatGPT's wrong answer. As far as I know, Kudan's ARVideoNode is a subclass of an ARNode parent class that is used to render video content. It has nothing to do with RealityKit even in terms of programming language - KudanAR SDK natively uses Objective-C for iOS and Java for Android.
Here's SO temporary policy regarding ChatGPT:
Overall, because the average rate of getting correct answers from ChatGPT is too low, the posting of answers created by ChatGPT is substantially harmful to the site and to users who are asking and looking for correct answers.

AVFoundation Camera in view

I'm having a lot of problems trying to get a view to show my back Camera feed. I looked throughout apples docs and came up with this, but all it seems to do is make a black screen. I also added the perms in my plist and am running on a real device. I don't need it to take a photo or save anything. Just simply show the camera live in a view.
import UIKit
import AVFoundation
class ViewController: UIViewController {
#IBOutlet weak var cameraView: UIView!
var captureSession = AVCaptureSession()
var previewLayer = AVCaptureVideoPreviewLayer()
override func viewDidLoad() {
super.viewDidLoad()
loadCamera()
}
func loadCamera() {
let device = AVCaptureDevice.default(.builtInWideAngleCamera, for: AVMediaType.video, position: .back)
do {
let input = try AVCaptureDeviceInput(device: device!)
if captureSession.canAddInput(input) {
captureSession.addInput(input)
previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
previewLayer.videoGravity = AVLayerVideoGravity.resizeAspectFill
previewLayer.connection?.videoOrientation = AVCaptureVideoOrientation.portrait
cameraView.layer.addSublayer(previewLayer)
}
} catch {
print(error)
}
}
}

Welcome!
The problem is that there is actually no data flow (of video frames) happening with your current setup. You need to at least attach one input and one output to your capture session. The preview layer doesn't count as an output itself since it will only attach to an existing connection between input and output.
So to fix it, you can just add an AVCapturePhotoOutput to the session (probably before you add the layer) but never use it. The preview layer should start displaying the frames then.
You probably also want to set the session's sessionPreset to .photo before you add the inputs and outputs. This will cause the session to produce video frames that have an ideal size for displaying on your device's screen.

Cropping/Compositing An Image With Vision/CoreImage

I am working with the Vision framework in iOS 13 and am trying to achieve the following tasks;
Take an image (in this case, a CIImage) and locate all faces in the image using Vision.
Crop each face into its own CIImage (I'll call this a "face image").
Filter each face image using a CoreImage filter, such as a blur or comic book effect.
Composite the face image back over the original image, hereby creating effects that only apply to the face.
A better example of this would be the end goal of taking a live camera feed from an AVCaptureSession and blurring every face in the video frame, compositing the blurred faces back over the original image for saving.
I almost have this working, save for the fact that there seems to be a coordinates/translation issue. For example, when I test this code and move my face, the "blurred" section goes the wrong direction (if I turn my face right, the box goes left, if I look up, the box goes down). While I think this may have something to do with mirroring on the front-facing camera, I can't seem to figure out what I should try next;
func drawFaceBox(bufferImage: CIImage, observations: [VNFaceObservation]) -> CVPixelBuffer? {
// The filter
let blur = CIFilter(name: "CICrystallize")
// The unfiltered image, prepared for filtering
var filteredImage = bufferImage
// Find and crop each face
if !observations.isEmpty {
for face in observations {
let faceRect = VNImageRectForNormalizedRect(face.boundingBox, Int(bufferImage.extent.size.width), Int(bufferImage.extent.size.height))
let croppedFace = bufferImage.cropped(to: faceRect)
blur?.setValue(croppedFace, forKey: kCIInputImageKey)
blur?.setValue(10.0, forKey: kCIInputRadiusKey)
if let blurred = blur?.value(forKey: kCIOutputImageKey) as? CIImage {
compositorCIFilter?.setValue(blurred, forKey: kCIInputImageKey)
compositorCIFilter?.setValue(filteredImage, forKey: kCIInputBackgroundImageKey)
if let output = compositorCIFilter?.value(forKey: kCIOutputImageKey) as? CIImage {
filteredImage = output
}
}
}
}
// Convert image to CVPixelBuffer and return. This part works fine.
}
Any thoughts on how I can composite the blurred face image(s) back to their original position with accuracy? Or any other approach to only filter part of the original CIImage to avoid this issue altogether/save processing? Thanks!

I believe this issue stems from an orientation problem earlier on in the pipeline (specifically, during the output of the sample buffers from the camera, which is where the Vision task was instantiated). I have updated my didOutputSampleBuffer code like so;
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
...
// Setup the current device orientation
let curDeviceOrientation = UIDevice.current.orientation
// Handle the image property orientation
//let orientation = self.exifOrientation(from: curDeviceOrientation)
// Setup the image request handler
//let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: CGImagePropertyOrientation(rawValue: UInt32(1))!)
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:])
// Setup the completion handler
let completion: VNRequestCompletionHandler = {request, error in
let observations = request.results as! [VNFaceObservation]
// Draw faces
DispatchQueue.main.async {
// HANDLE FACES
self.drawFaceBoxes(for: observations)
}
}
// Setup the image request
let request = VNDetectFaceRectanglesRequest(completionHandler: completion)
// Handle the request
do {
try handler.perform([request])
} catch {
print(error)
}
}
As noted, I have commented out the let orientation = ... and the first let handler = ..., which was using the orientation. By removing the reference to the orientation, I seem to have removed any issue with orientation in the Vision calculations.

How do you add an overlay while recording a video in Swift?

I am trying to record, and then save, a video in Swift using AVFoundation. This works. I am also trying to add an overlay, such as a text label containing the date, to the video.
For example: the video saved is not only what the camera sees, but the timestamp as well.
Here is how I am saving the video:
func fileOutput(_ output: AVCaptureFileOutput, didFinishRecordingTo outputFileURL: URL, from connections: [AVCaptureConnection], error: Error?) {
saveVideo(toURL: movieURL!)
}
private func saveVideo(toURL url: URL) {
PHPhotoLibrary.shared().performChanges({
PHAssetChangeRequest.creationRequestForAssetFromVideo(atFileURL: url)
}) { (success, error) in
if(success) {
print("Video saved to Camera Roll.")
} else {
print("Video failed to save.")
}
}
}
I have a movieOuput that is an AVCaptureMovieFileOutput. My preview layer does not contain any sublayers. I tried adding the timestamp label's layer to the previewLayer, but this did not succeed.
I have tried Ray Wenderlich's example as well as this stack overflow question. Lastly, I also tried this tutorial, all of which to no avail.
How can I add an overlay to my video that is in the saved video in the camera roll?

Without more information it sounds like what you are asking for is a WATERMARK.
Not an overlay.
A watermark is a markup on the video that will be saved with the video.
An overlay is generally showed as subviews on the preview layer and will not be saved with the video.
Check this out here: https://stackoverflow.com/a/47742108/8272698
func addWatermark(inputURL: URL, outputURL: URL, handler:#escaping (_ exportSession: AVAssetExportSession?)-> Void) {
let mixComposition = AVMutableComposition()
let asset = AVAsset(url: inputURL)
let videoTrack = asset.tracks(withMediaType: AVMediaType.video)[0]
let timerange = CMTimeRangeMake(kCMTimeZero, asset.duration)
let compositionVideoTrack:AVMutableCompositionTrack = mixComposition.addMutableTrack(withMediaType: AVMediaType.video, preferredTrackID: CMPersistentTrackID(kCMPersistentTrackID_Invalid))!
do {
try compositionVideoTrack.insertTimeRange(timerange, of: videoTrack, at: kCMTimeZero)
compositionVideoTrack.preferredTransform = videoTrack.preferredTransform
} catch {
print(error)
}
let watermarkFilter = CIFilter(name: "CISourceOverCompositing")!
let watermarkImage = CIImage(image: UIImage(named: "waterMark")!)
let videoComposition = AVVideoComposition(asset: asset) { (filteringRequest) in
let source = filteringRequest.sourceImage.clampedToExtent()
watermarkFilter.setValue(source, forKey: "inputBackgroundImage")
let transform = CGAffineTransform(translationX: filteringRequest.sourceImage.extent.width - (watermarkImage?.extent.width)! - 2, y: 0)
watermarkFilter.setValue(watermarkImage?.transformed(by: transform), forKey: "inputImage")
filteringRequest.finish(with: watermarkFilter.outputImage!, context: nil)
}
guard let exportSession = AVAssetExportSession(asset: asset, presetName: AVAssetExportPreset640x480) else {
handler(nil)
return
}
exportSession.outputURL = outputURL
exportSession.outputFileType = AVFileType.mp4
exportSession.shouldOptimizeForNetworkUse = true
exportSession.videoComposition = videoComposition
exportSession.exportAsynchronously { () -> Void in
handler(exportSession)
}
}
And heres how to call the function.
let outputURL = NSURL.fileURL(withPath: "TempPath")
let inputURL = NSURL.fileURL(withPath: "VideoWithWatermarkPath")
addWatermark(inputURL: inputURL, outputURL: outputURL, handler: { (exportSession) in
guard let session = exportSession else {
// Error
return
}
switch session.status {
case .completed:
guard NSData(contentsOf: outputURL) != nil else {
// Error
return
}
// Now you can find the video with the watermark in the location outputURL
default:
// Error
}
})
Let me know if this code works for you.
It is in swift 3 so some changes will be needed.
I currently am using this code on an app of mine. Have not updated it to swift 5 yet

I do not have an actual development environment for Swift that can utilize AVFoundation. Thus, I can't provide you with any example code.
For adding meta data(date, location, timestamp, watermark, frame rate, etc...) as an overlay to the video while recording, you would have to process the video feed, frame by frame, live, while recording. Most likely you would have to store the frames in a buffer and process them before actually record them.
Now when it come to the meta data, there are two type, static and dynamic. For static type such as a watermark, it should be easy enough, as all the frames will get the same thing.
However, for dynamic meta data type such as timestamp or GPS location, there are a few things that needed to be taken into consideration. It takes computational power and time to process the video frames. Thus, depends on the type of dynamic data and how you got them, sometime the processed value may not be a correct value. For example, if you got a frame at 1:00:01, you process it and add a timestamp to it. Just pretend that it took 2 seconds to process the timestamp. The next frame you got is at 1:00:02, but you couldn't process it until 1:00:03 because processing the previous frame took 2 seconds. Thus, depend on how you got that new timestamp for the new frame, that timestamp value may not be the value that you wanted.
For processing dynamic meta data, you should also take into consideration of hardware lag. For example, the software is supposed to add live GPS location data to each frame and there weren't any lags in development or in testing. However, in real life, a user used the software in an area with a bad connection, and his phone lag while obtaining his GPS location. Some of his lags lasted as long as 5 seconds. What do you do in that situation? Do you set a time out for the GPS location and used the last good position? Do you report the error? Do you defer that frame to be process later when the GPS data become available(This may ruin live recording) and using an expensive algorithm to try to predict the user's location for that frame?
Besides those to take into consideration, I have some references here that I think may help you. I thought the one from medium.com looked pretty good.
https://medium.com/ios-os-x-development/ios-camera-frames-extraction-d2c0f80ed05a
Adding watermark to currently recording video and save with watermark
Render dynamic text onto CVPixelBufferRef while recording video

Adding on to #Kevin Ng, you can do an overlay on video frames with an UIViewController and an UIView.
UIViewController will have:
property to work with video stream
private var videoSession: AVCaptureSession?
property to work with overlay(the UIView class)
private var myOverlay: MyUIView{view as! MyUIView}
property to work with video output queue
private let videoOutputQueue = DispatchQueue(label:
"outputQueue", qos: .userInteractive)
method to create video session
method to process and display overlay
UIView will have task-specific helper methods needed to to act as overlay. For example, if you are doing hand detection, this overlay class can have helper methods to draw points on coordinates(ViewController class will detect coordinates of hand features, do necessary coordinate conversions, then pass the coordinates to the UIView class to display coordinates as an overlay)

AVCaptureDevicePosition.front is not working

this is my first question here so if it's not explained well please let me know.
So basically I'm trying to access the front camera with this code:
captureSession = AVCaptureSession()
captureSession?.sessionPreset = AVCaptureSessionPreset1920x1080
let cameraDevice = AVCaptureDevice.defaultDevice(withDeviceType: AVCaptureDeviceType.builtInWideAngleCamera , mediaType: AVMediaTypeVideo, position: AVCaptureDevicePosition.front)
print(cameraDevice!)
let error: NSError? = nil
do{
let input = try AVCaptureDeviceInput(device: cameraDevice)
print(captureSession.canAddInput(input))
if error == nil && captureSession.canAddInput(input){
captureSession?.addInput(input)
stillImageOutput = AVCaptureStillImageOutput()
if captureSession.canAddOutput(stillImageOutput){
captureSession.addOutput(stillImageOutput)
previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
previewLayer?.videoGravity = AVLayerVideoGravityResizeAspect
previewLayer?.connection.videoOrientation = AVCaptureVideoOrientation.portrait
cameraView.layer.addSublayer(previewLayer!)
stillImageOutput?.outputSettings = [AVVideoCodecKey:AVVideoCodecJPEG]
captureSession?.startRunning()
}
}
Using that code print(captureSession.canAddInput(input)) returns false but when i change the position to the back camera everything works like a charm. Am i missing something?

I'm not sure what device you're using, but on an iPhone 6 / 6plus, that preset resolution is too high for the front camera.
https://developer.apple.com/library/content/documentation/DeviceInformation/Reference/iOSDeviceCompatibility/Cameras/Cameras.html
As per the Apple documentation, on those devices the highest still image capture resolution is 1280 x 960. Try using a lower preset and see if it works on the front camera.
This could explain why it works fine on the back camera, because that DOES support the higher resolution.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse