Implementing AVVideoCompositing causes video rotation problems - swift

I using Apple's example https://developer.apple.com/library/ios/samplecode/AVCustomEdit/Introduction/Intro.html and have some issues with video transformation.
If source assets have preferredTransform other than identity, output video will have incorrectly rotated frames. This problem can be fixed if AVMutableVideoComposition doesn't have value in property customVideoCompositorClass and when AVMutableVideoCompositionLayerInstruction's transform is setted up with asset.preferredTransform. But in reason of using custom video compositor, which adopting an AVVideoCompositing protocol I can't use standard video compositing instructions.
How can I pre-transform input asset tracks before it's CVPixelBuffer's putted into Metal shaders? Or there are any other way to fix it?
Fragment of original code:
func buildCompositionObjectsForPlayback(_ forPlayback: Bool, overwriteExistingObjects: Bool) {
// Proceed only if the composition objects have not already been created.
if self.composition != nil && !overwriteExistingObjects { return }
if self.videoComposition != nil && !overwriteExistingObjects { return }
guard !clips.isEmpty else { return }
// Use the naturalSize of the first video track.
let videoTracks = clips[0].tracks(withMediaType: AVMediaType.video)
let videoSize = videoTracks[0].naturalSize
let composition = AVMutableComposition()
composition.naturalSize = videoSize
/*
With transitions:
Place clips into alternating video & audio tracks in composition, overlapped by transitionDuration.
Set up the video composition to cycle between "pass through A", "transition from A to B", "pass through B".
*/
let videoComposition = AVMutableVideoComposition()
if self.transitionType == TransitionType.diagonalWipe.rawValue {
videoComposition.customVideoCompositorClass = APLDiagonalWipeCompositor.self
} else {
videoComposition.customVideoCompositorClass = APLCrossDissolveCompositor.self
}
// Every videoComposition needs these properties to be set:
videoComposition.frameDuration = CMTimeMake(1, 30) // 30 fps.
videoComposition.renderSize = videoSize
buildTransitionComposition(composition, andVideoComposition: videoComposition)
self.composition = composition
self.videoComposition = videoComposition
}
UPDATE:
I did workaround for transforming like this:
private func makeTransformedPixelBuffer(fromBuffer buffer: CVPixelBuffer, withTransform transform: CGAffineTransform) -> CVPixelBuffer? {
guard let newBuffer = renderContext?.newPixelBuffer() else {
return nil
}
// Correct transformation example I took from https://stackoverflow.com/questions/29967700/coreimage-coordinate-system
var preferredTransform = transform
preferredTransform.b *= -1
preferredTransform.c *= -1
var transformedImage = CIImage(cvPixelBuffer: buffer).transformed(by: preferredTransform)
preferredTransform = CGAffineTransform(translationX: -transformedImage.extent.origin.x, y: -transformedImage.extent.origin.y)
transformedImage = transformedImage.transformed(by: preferredTransform)
let filterContext = CIContext(mtlDevice: MTLCreateSystemDefaultDevice()!)
filterContext.render(transformedImage, to: newBuffer)
return newBuffer
}
But wondering if there are more memory-effective way without creation of new pixel buffers

How can I pre-transform input asset tracks before it's CVPixelBuffer's
putted into Metal shaders?
The best way to achieve maximum performance is to transform your video frame directly in shader. You just need to add rotation matrix in your Vertex shader.

Related

usdz object is not moving while loaded with SCNReferenceNode

I was following Apple Documentation and example project to load 3d Object using .SCN file with Virtual Object (subclass of SCNReferenceNode) class but suddenly i needed to change the model from .scn to usdz . Now my usdz object is loading successfully but it is not on surface (midway in the air) and i can't interact with it like (tap , pan , rotate) ... Is there any other way to get interaction with usdz object and how can I place it on the surface like I was doing it before with .scn file
For getting model URL (downloaded from server)
static let loadDownloadedModel : VirtualObject = {
let downloadedScenePath = getDocumentsDirectory().appendingPathComponent("\(Api.Params.inputModelName).usdz")
return VirtualObject(url: downloadedScenePath)!
}()
Loading it from URL
func loadVirtualObject(_ object: VirtualObject, loadedHandler: #escaping (VirtualObject) -> Void) {
isLoading = true
loadedObjects.append(object)
// Load the content asynchronously.
DispatchQueue.global(qos: .userInitiated).async {
object.reset()
object.load()
self.isLoading = false
loadedHandler(object)
}
}
Placing in the scene
func placeObjectOnFocusSquare() {
virtualObjectLoader.loadVirtualObject(VirtualObject.loadDownloadedModel) { (loadedObject) in
DispatchQueue.main.async {
self.placeVirtualObject(loadedObject)
self.setupBottomButtons(isSelected: true)
}
}
}
func placeVirtualObject(_ virtualObject: VirtualObject) {
guard let cameraTransform = session.currentFrame?.camera.transform,
let focusSquarePosition = focusSquare.lastPosition else {
statusViewController.showMessage("CANNOT PLACE OBJECT\nTry moving left or right.")
return
}
Api.Params.selectedModel = virtualObject
virtualObject.name = String(Api.Params.inputPreviewId)
virtualObject.scale = SCNVector3(Api.Params.modelXAxis, Api.Params.modelYAxis, Api.Params.modelZAxis)
virtualObject.setPosition(focusSquarePosition, relativeTo: cameraTransform, smoothMovement: false)
updateQueue.async {
self.sceneView.scene.rootNode.addChildNode(virtualObject)
}
}
.usdz object in sceneview
After so many tries , finally i found out that dynamic scaling of the model causing problem , Reference to this
iOS ARKit: Large size object always appears to move with the change in the position of the device camera
I scaled the object to 0.01 for all the axis (x,y and z)
virtualObject.scale = SCNVector3Make(0.01, 0.01, 0.01)

AVCaptureVideoPreviewLayer is not visible on the screenshot

I have an application that adds some live animations and images to preview view in AV Foundation camera. I can do "hardware screenshot" (holding the Side button and Volume Up button) and it's ok. However, I need a button that makes a screenshot.
All the methods of taking screenshot like UIGraphicsGetImageFromCurrentImageContext (or view.drawHierarchy() ) result in black screen where video preview is. All other elements are on the screenshot and images are visible except AVCaptureVideoPreviewLayer.
Please help me. Can I do "hardware screenshot"? Is exist another solution to that problem?
I was in the same position, and researched two separate solutions to this problem.
Set up the ViewController as an AVCaptureVideoDataOutputSampleBufferDelegate and sample the video output to take the screenshot.
Set up the ViewController as an AVCapturePhotoCaptureDelegate and capture the photo.
The mechanism for setting up the former is described in this question for example: How to take UIImage of AVCaptureVideoPreviewLayer instead of AVCapturePhotoOutput capture
I implemented both to check if there was any difference in the quality of the image (there wasn't).
If all you need is the camera snapshot, then that's it. But it sounds like you need to draw an additional animation on top. For this, I created a container UIView of the same size as the snapshot, added a UIImageView to it with the snapshot and then drew the animation on top. After that you can use UIGraphicsGetImageFromCurrentImageContext on the container.
As for which of solutions (1) and (2) to use, if you don't need to support different camera orientations in the app, it probably doesn't matter. However, if you need to switch between front and back camera and support different camera orientations, then you need to know the snapshot orientation to apply the animation in the right place, and getting that right turned out to be a total bear with method (1).
The solution I used:
UIViewController extends AVCapturePhotoCaptureDelegate
Add photo output to the AVCaptureSession
private let session = AVCaptureSession()
private let photoOutput = AVCapturePhotoOutput()
....
// When configuring the session
if self.session.canAddOutput(self.photoOutput) {
self.session.addOutput(self.photoOutput)
self.photoOutput.isHighResolutionCaptureEnabled = true
}
Capture snapshot
let settings = AVCapturePhotoSettings()
let previewPixelType = settings.availablePreviewPhotoPixelFormatTypes.first!
let previewFormat = [
kCVPixelBufferPixelFormatTypeKey as String: previewPixelType,
kCVPixelBufferWidthKey as String: 160,
kCVPixelBufferHeightKey as String: 160
]
settings.previewPhotoFormat = previewFormat
photoOutput.capturePhoto(with: settings, delegate: self)
Rotate or flip the snapshot before doing the rest
func photoOutput(_ output: AVCapturePhotoOutput, didFinishProcessingPhoto photo: AVCapturePhoto, error: Error?) {
guard error == nil else {
// Do something
// return
}
if let dataImage = photo.fileDataRepresentation() {
print(UIImage(data: dataImage)?.size as Any)
let dataProvider = CGDataProvider(data: dataImage as CFData)
let cgImageRef: CGImage! = CGImage(jpegDataProviderSource: dataProvider!, decode: nil, shouldInterpolate: true, intent: .defaultIntent)
//https://developer.apple.com/documentation/uikit/uiimageorientation?language=objc
let orientation = UIApplication.shared.statusBarOrientation
var imageOrientation = UIImage.Orientation.right
switch orientation {
case .portrait:
imageOrientation = self.cameraPosition == .back ? UIImage.Orientation.right : UIImage.Orientation.leftMirrored
case .landscapeRight:
imageOrientation = self.cameraPosition == .back ? UIImage.Orientation.up : UIImage.Orientation.downMirrored
case .portraitUpsideDown:
imageOrientation = self.cameraPosition == .back ? UIImage.Orientation.left : UIImage.Orientation.rightMirrored
case .landscapeLeft:
imageOrientation = self.cameraPosition == .back ? UIImage.Orientation.down : UIImage.Orientation.upMirrored
case .unknown:
imageOrientation = self.cameraPosition == .back ? UIImage.Orientation.right : UIImage.Orientation.leftMirrored
#unknown default:
imageOrientation = self.cameraPosition == .back ? UIImage.Orientation.right : UIImage.Orientation.leftMirrored
}
let image = UIImage.init(cgImage: cgImageRef, scale: 1.0, orientation: imageOrientation)
// Do whatever you need to do with the image
} else {
// Handle error
}
}
If you need to know the size of the image to position the animations you can use the AVCaptureVideoDataOutputSampleBufferDelegate strategy to detect the size of the buffer once.

Why is an iPhone XS getting worse CPU performance when using the camera live than an iPhone 6S Plus?

I'm using live camera output to update a CIImage on a MTKView. My main issue is that I have a large, negative performance difference where an older iPhone gets better CPU performance than a newer one, despite all their settings I've come across are the same.
This is a lengthy post, but I decided to include these details since they could be important to the cause of this problem. Please let me know what else I can include.
Below, I have my captureOutput function with two debug bools that I can turn on and off while running. I used this to try to determine the cause of my issue.
applyLiveFilter - bool whether or not to manipulate the CIImage with a CIFilter.
updateMetalView - bool whether or not to update the MTKView's CIImage.
// live output from camera
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection){
/*
Create CIImage from camera.
Here I save a few percent of CPU by using a function
to convert a sampleBuffer to a Metal texture, but
whether I use this or the commented out code
(without captureOutputMTLOptions) does not have
significant impact.
*/
guard let texture:MTLTexture = convertToMTLTexture(sampleBuffer: sampleBuffer) else{
return
}
var cameraImage:CIImage = CIImage(mtlTexture: texture, options: captureOutputMTLOptions)!
var transform: CGAffineTransform = .identity
transform = transform.scaledBy(x: 1, y: -1)
transform = transform.translatedBy(x: 0, y: -cameraImage.extent.height)
cameraImage = cameraImage.transformed(by: transform)
/*
// old non-Metal way of getting the ciimage from the cvPixelBuffer
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else
{
return
}
var cameraImage:CIImage = CIImage(cvPixelBuffer: pixelBuffer)
*/
var orientation = UIImage.Orientation.right
if(isFrontCamera){
orientation = UIImage.Orientation.leftMirrored
}
// apply filter to camera image
if debug_applyLiveFilter {
cameraImage = self.applyFilterAndReturnImage(ciImage: cameraImage, orientation: orientation, currentCameraRes:currentCameraRes!)
}
DispatchQueue.main.async(){
if debug_updateMetalView {
self.MTLCaptureView!.image = cameraImage
}
}
}
Below is a chart of results between both phones toggling the different combinations of bools discussed above:
Even without the Metal view's CIIMage updating and no filters being applied, the iPhone XS's CPU is 2% greater than iPhone 6S Plus's, which isn't a significant overhead, but makes me suspect that somehow how the camera is capturing is different between the devices.
My AVCaptureSession's preset is set identically between both phones
(AVCaptureSession.Preset.hd1280x720)
The CIImage created from captureOutput is the same size (extent)
between both phones.
Are there any settings I need to set manually between these two phones AVCaptureDevice's settings, including activeFormat properties, to make them the same between devices?
The settings I have now are:
if let captureDevice = AVCaptureDevice.default(for:AVMediaType.video) {
do {
try captureDevice.lockForConfiguration()
captureDevice.isSubjectAreaChangeMonitoringEnabled = true
captureDevice.focusMode = AVCaptureDevice.FocusMode.continuousAutoFocus
captureDevice.exposureMode = AVCaptureDevice.ExposureMode.continuousAutoExposure
captureDevice.unlockForConfiguration()
} catch {
// Handle errors here
print("There was an error focusing the device's camera")
}
}
My MTKView is based off code written by Simon Gladman, with some edits for performance and to scale the render before it is scaled up to the width of the screen using Core Animation suggested by Apple.
class MetalImageView: MTKView
{
let colorSpace = CGColorSpaceCreateDeviceRGB()
var textureCache: CVMetalTextureCache?
var sourceTexture: MTLTexture!
lazy var commandQueue: MTLCommandQueue =
{
[unowned self] in
return self.device!.makeCommandQueue()
}()!
lazy var ciContext: CIContext =
{
[unowned self] in
return CIContext(mtlDevice: self.device!)
}()
override init(frame frameRect: CGRect, device: MTLDevice?)
{
super.init(frame: frameRect,
device: device ?? MTLCreateSystemDefaultDevice())
if super.device == nil
{
fatalError("Device doesn't support Metal")
}
CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, self.device!, nil, &textureCache)
framebufferOnly = false
enableSetNeedsDisplay = true
isPaused = true
preferredFramesPerSecond = 30
}
required init(coder: NSCoder)
{
fatalError("init(coder:) has not been implemented")
}
// The image to display
var image: CIImage?
{
didSet
{
setNeedsDisplay()
}
}
override func draw(_ rect: CGRect)
{
guard var
image = image,
let targetTexture:MTLTexture = currentDrawable?.texture else
{
return
}
let commandBuffer = commandQueue.makeCommandBuffer()
let customDrawableSize:CGSize = drawableSize
let bounds = CGRect(origin: CGPoint.zero, size: customDrawableSize)
let originX = image.extent.origin.x
let originY = image.extent.origin.y
let scaleX = customDrawableSize.width / image.extent.width
let scaleY = customDrawableSize.height / image.extent.height
let scale = min(scaleX*IVScaleFactor, scaleY*IVScaleFactor)
image = image
.transformed(by: CGAffineTransform(translationX: -originX, y: -originY))
.transformed(by: CGAffineTransform(scaleX: scale, y: scale))
ciContext.render(image,
to: targetTexture,
commandBuffer: commandBuffer,
bounds: bounds,
colorSpace: colorSpace)
commandBuffer?.present(currentDrawable!)
commandBuffer?.commit()
}
}
My AVCaptureSession (captureSession) and AVCaptureVideoDataOutput (videoOutput) are setup below:
func setupCameraAndMic(){
let backCamera = AVCaptureDevice.default(for:AVMediaType.video)
var error: NSError?
var videoInput: AVCaptureDeviceInput!
do {
videoInput = try AVCaptureDeviceInput(device: backCamera!)
} catch let error1 as NSError {
error = error1
videoInput = nil
print(error!.localizedDescription)
}
if error == nil &&
captureSession!.canAddInput(videoInput) {
guard CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, MetalDevice, nil, &textureCache) == kCVReturnSuccess else {
print("Error: could not create a texture cache")
return
}
captureSession!.addInput(videoInput)
setDeviceFrameRateForCurrentFilter(device:backCamera)
stillImageOutput = AVCapturePhotoOutput()
if captureSession!.canAddOutput(stillImageOutput!) {
captureSession!.addOutput(stillImageOutput!)
let q = DispatchQueue(label: "sample buffer delegate", qos: .default)
videoOutput.setSampleBufferDelegate(self, queue: q)
videoOutput.videoSettings = [
kCVPixelBufferPixelFormatTypeKey as AnyHashable as! String: NSNumber(value: kCVPixelFormatType_32BGRA),
kCVPixelBufferMetalCompatibilityKey as String: true
]
videoOutput.alwaysDiscardsLateVideoFrames = true
if captureSession!.canAddOutput(videoOutput){
captureSession!.addOutput(videoOutput)
}
captureSession!.startRunning()
}
}
setDefaultFocusAndExposure()
}
The video and mic are recorded on two separate streams. Details on the microphone and recording video have been left out since my focus is performance of live camera output.
UPDATE - I have a simplified test project on GitHub that makes it a lot easier to test the problem I'm having: https://github.com/PunchyBass/Live-Filter-test-project
From the top of my mind, you are not comparing pears with pears, even if you are running with the 2.49 GHz of A12 against 1.85 GHz of A9, the differences between the cameras are also huge, even if you use them with the same parameters there are several features from XS's camera that require more CPU resources (dual camera, stabilization, smart HDR, etc).
Sorry for the sources, I tried to find metrics of the CPU cost of those features, but I couldn't find it, unfortunately for your needs, that information is not relevant for marketing, when they are selling it as the best camera ever for an smartphone.
They are selling it as the best processor as well, we don't know what would happen using the XS camera with an A9 processor, it would probably crash, we will never know...
PS.... Your metrics are for the whole processor or for the used core? For the whole processor, you also need to consider other tasks that the devices can be executing, for the single core, is 21% of 200% against 39% of 600%

ARKit dragging object changes scale

İ am trying to move the SCNNode object which i placed on to a surface. İt moves but the scale changes and it becomes smaller, when i first start to move.
This is what i did;
#IBAction func dragBanana(_ sender: UIPanGestureRecognizer) {
guard let _ = self.sceneView.session.currentFrame else {return}
if(sender.state == .began) {
let location = sender.location(in: self.sceneView)
let hitTestResult = sceneView.hitTest(location, options: nil)
if !hitTestResult.isEmpty {
guard let hitResult = hitTestResult.first else {return}
movedObject = hitResult.node
}
}
if (sender.state == .changed) {
if(movedObject != nil) {
let location = sender.location(in: self.sceneView)
let hitTestResult = sceneView.hitTest(location, types: .existingPlaneUsingExtent)
guard let hitResult = hitTestResult.first else {return}
let matrix = SCNMatrix4(hitResult.worldTransform)
let vector = SCNVector3Make(matrix.m41, matrix.m42, matrix.m43)
movedObject?.position = vector
}
}
if (sender.state == .ended) {
movedObject = nil
}
}
My answer is probably very late, but I faced this issue myself and it took me a while to kind of figure out why this might happen. I'll share my experience and maybe you can relate to it.
My problem was that I was trying to change the position of the node after changing its scale at runtime (most of my 3D assets were very large when added, I scale them down with a pinch gesture). I noticed changing the scale was the cause of the position change not working as expected.
I found a very simple solution to this. You simply need to change this line:
movedObject?.position = vector
to this:
movedObject?.worldPosition = vector
According to SCNNode documentation, the position property determines the position of the node relative to its parent. While worldPosition is the position of the node relative to the scene's root node (i.e. the world origin of ARSCNView)
I hope this answers your question.
It's because you're moving the object on the 3 axis and Z changes that's why it feels like it scales but it's only getting closer to you.

Only First Track Playing of AVMutableComposition()

New Edit Below
I have already referenced
AVMutableComposition - Only Playing First Track (Swift)
but it is not providing the answer to what I am looking for.
I have a AVMutableComposition(). I am trying to apply MULTIPLE AVCompositionTrack, of a single type AVMediaTypeVideo in this single composition. This is because I am using 2 different AVMediaTypeVideo sources with different CGSize's and preferredTransforms of the AVAsset's they come from.
So, the only way to apply their specified preferredTransforms is to provide them in 2 different tracks. But, for whatever reason, only the first track will actually provide any video, almost as if the second track is never there.
So, I have tried
1) using AVMutableVideoCompositionLayerInstruction's and applying an AVVideoComposition along with an AVAssetExportSession, which works okay, I am still working on the transforms, but is do-able. But the processing time's of the video's are WELL OVER 1 minute, which is just inapplicable in my situation.
2) Using multiple tracks, without AVAssetExportSession and the 2nd track of the same type never appears. Now, I could put it all on 1 track, but all the videos will then be the same size and preferredTransform as the first video, which I absolutely do not want, as it stretches them on all sides.
So my question is, is it possible
1) Applying instructions to just a track WITHOUT using AVAssetExportSession? //Preferred way BY FAR.
2) Decrease time of export? (I have tried using PresetPassthrough but you cannot use that if you have a exporter.videoComposition which are where my instructions are. This is the only place I know I can put instructions, not sure if I can place them somewhere else.
Here is some of my code (without the exporter as I don't need to export anything anywhere, just do stuff after the AVMutableComposition combines the items.
func merge() {
if let firstAsset = controller.firstAsset, secondAsset = self.asset {
let mixComposition = AVMutableComposition()
let firstTrack = mixComposition.addMutableTrackWithMediaType(AVMediaTypeVideo,
preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
do {
//Don't need now according to not being able to edit first 14seconds.
if(CMTimeGetSeconds(startTime) == 0) {
self.startTime = CMTime(seconds: 1/600, preferredTimescale: Int32(600))
}
try firstTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600)),
ofTrack: firstAsset.tracksWithMediaType(AVMediaTypeVideo)[0],
atTime: kCMTimeZero)
} catch _ {
print("Failed to load first track")
}
//This secondTrack never appears, doesn't matter what is inside of here, like it is blank space in the video from startTime to endTime (rangeTime of secondTrack)
let secondTrack = mixComposition.addMutableTrackWithMediaType(AVMediaTypeVideo,
preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
// secondTrack.preferredTransform = self.asset.preferredTransform
do {
try secondTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, secondAsset.duration),
ofTrack: secondAsset.tracksWithMediaType(AVMediaTypeVideo)[0],
atTime: CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600))
} catch _ {
print("Failed to load second track")
}
//This part appears again, at endTime which is right after the 2nd track is suppose to end.
do {
try firstTrack.insertTimeRange(CMTimeRangeMake(CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600), firstAsset.duration-endTime),
ofTrack: firstAsset.tracksWithMediaType(AVMediaTypeVideo)[0] ,
atTime: CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600))
} catch _ {
print("failed")
}
if let loadedAudioAsset = controller.audioAsset {
let audioTrack = mixComposition.addMutableTrackWithMediaType(AVMediaTypeAudio, preferredTrackID: 0)
do {
try audioTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, firstAsset.duration),
ofTrack: loadedAudioAsset.tracksWithMediaType(AVMediaTypeAudio)[0] ,
atTime: kCMTimeZero)
} catch _ {
print("Failed to load Audio track")
}
}
}
}
Edit
Apple states that "Indicates instructions for video composition via an NSArray of instances of classes implementing the AVVideoCompositionInstruction protocol.
For the first instruction in the array, timeRange.start must be less than or equal to the earliest time for which playback or other processing will be attempted
(note that this will typically be kCMTimeZero). For subsequent instructions, timeRange.start must be equal to the prior instruction's end time. The end time of
the last instruction must be greater than or equal to the latest time for which playback or other processing will be attempted (note that this will often be
the duration of the asset with which the instance of AVVideoComposition is associated)."
This just states that the entire composition must be layered inside instructions if you decide to use ANY instructions (this is what I am understanding). Why is this? How would I just apply instructions to say track 2 on this example without applying changing track 1 or 3 at all:
Track 1 from 0 - 10sec, Track 2 from 10 - 20sec, Track 3 from 20 - 30sec.
Any explanation on that would probably answer my question (if it is doable).
Ok, so for my exact problem, I had to apply specific transforms CGAffineTransform in Swift to get the specific result we wanted. The current one I am posting works with any picture taken/obtained as well as video
//This method gets the orientation of the current transform. This method is used below to determine the orientation
func orientationFromTransform(_ transform: CGAffineTransform) -> (orientation: UIImageOrientation, isPortrait: Bool) {
var assetOrientation = UIImageOrientation.up
var isPortrait = false
if transform.a == 0 && transform.b == 1.0 && transform.c == -1.0 && transform.d == 0 {
assetOrientation = .right
isPortrait = true
} else if transform.a == 0 && transform.b == -1.0 && transform.c == 1.0 && transform.d == 0 {
assetOrientation = .left
isPortrait = true
} else if transform.a == 1.0 && transform.b == 0 && transform.c == 0 && transform.d == 1.0 {
assetOrientation = .up
} else if transform.a == -1.0 && transform.b == 0 && transform.c == 0 && transform.d == -1.0 {
assetOrientation = .down
}
//Returns the orientation as a variable
return (assetOrientation, isPortrait)
}
//Method that lays out the instructions for each track I am editing and does the transformation on each individual track to get it lined up properly
func videoCompositionInstructionForTrack(_ track: AVCompositionTrack, _ asset: AVAsset) -> AVMutableVideoCompositionLayerInstruction {
//This method Returns set of instructions from the initial track
//Create inital instruction
let instruction = AVMutableVideoCompositionLayerInstruction(assetTrack: track)
//This is whatever asset you are about to apply instructions to.
let assetTrack = asset.tracks(withMediaType: AVMediaTypeVideo)[0]
//Get the original transform of the asset
var transform = assetTrack.preferredTransform
//Get the orientation of the asset and determine if it is in portrait or landscape - I forget which, but either if you take a picture or get in the camera roll it is ALWAYS determined as landscape at first, I don't recall which one. This method accounts for it.
let assetInfo = orientationFromTransform(transform)
//You need a little background to understand this part.
/* MyAsset is my original video. I need to combine a lot of other segments, according to the user, into this original video. So I have to make all the other videos fit this size.
This is the width and height ratios from the original video divided by the new asset
*/
let width = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width/assetTrack.naturalSize.width
var height = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height
//If it is in portrait
if assetInfo.isPortrait {
//We actually change the height variable to divide by the width of the old asset instead of the height. This is because of the flip since we determined it is portrait and not landscape.
height = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.width
//We apply the transform and scale the image appropriately.
transform = transform.scaledBy(x: height, y: height)
//We also have to move the image or video appropriately. Since we scaled it, it could be wayy off on the side, outside the bounds of the viewing.
let movement = ((1/height)*assetTrack.naturalSize.height)-assetTrack.naturalSize.height
//This lines it up dead center on the left side of the screen perfectly. Now we want to center it.
transform = transform.translatedBy(x: 0, y: movement)
//This calculates how much black there is. Cut it in half and there you go!
let totalBlackDistance = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width-transform.tx
transform = transform.translatedBy(x: 0, y: -(totalBlackDistance/2)*(1/height))
} else {
//Landscape! We don't need to change the variables, it is all defaulted that way (iOS prefers landscape items), so we scale it appropriately.
transform = transform.scaledBy(x: width, y: height)
//This is a little complicated haha. So because it is in landscape, the asset fits the height correctly, for me anyway; It was just extra long. Think of this as a ratio. I forgot exactly how I thought this through, but the end product looked like: Answer = ((Original height/current asset height)*(current asset width))/(Original width)
let scale:CGFloat = ((MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height)*(assetTrack.naturalSize.width))/MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width
transform = transform.scaledBy(x: scale, y: 1)
//The asset can be way off the screen again, so we have to move it back. This time we can have it dead center in the middle, because it wasn't backwards because it wasn't flipped because it was landscape. Again, another long complicated algorithm I derived.
let movement = ((MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width-((MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height)*(assetTrack.naturalSize.width)))/2)*(1/MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height)
transform = transform.translatedBy(x: movement, y: 0)
}
//This creates the instruction and returns it so we can apply it to each individual track.
instruction.setTransform(transform, at: kCMTimeZero)
return instruction
}
Now that we have those methods, we can now apply the correct and appropriate transformations to our assets appropriately and get everything fitting nice and clean.
func merge() {
if let firstAsset = MyAsset, let newAsset = newAsset {
//This creates our overall composition, our new video framework
let mixComposition = AVMutableComposition()
//One by one you create tracks (could use loop, but I just had 3 cases)
let firstTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeVideo,
preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
//You have to use a try, so need a do
do {
//Inserting a timerange into a track. I already calculated my time, I call it startTime. This is where you would put your time. The preferredTimeScale doesn't have to be 600000 haha, I was playing with those numbers. It just allows precision. At is not where it begins within this individual track, but where it starts as a whole. As you notice below my At times are different You also need to give it which track
try firstTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600000)),
of: firstAsset.tracks(withMediaType: AVMediaTypeVideo)[0],
at: kCMTimeZero)
} catch _ {
print("Failed to load first track")
}
//Create the 2nd track
let secondTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeVideo,
preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
do {
//Apply the 2nd timeRange you have. Also apply the correct track you want
try secondTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, self.endTime-self.startTime),
of: newAsset.tracks(withMediaType: AVMediaTypeVideo)[0],
at: CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600000))
secondTrack.preferredTransform = newAsset.preferredTransform
} catch _ {
print("Failed to load second track")
}
//We are not sure we are going to use the third track in my case, because they can edit to the end of the original video, causing us not to use a third track. But if we do, it is the same as the others!
var thirdTrack:AVMutableCompositionTrack!
if(self.endTime != controller.realDuration) {
thirdTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeVideo,
preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
//This part appears again, at endTime which is right after the 2nd track is suppose to end.
do {
try thirdTrack.insertTimeRange(CMTimeRangeMake(CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600000), self.controller.realDuration-endTime),
of: firstAsset.tracks(withMediaType: AVMediaTypeVideo)[0] ,
at: CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600000))
} catch _ {
print("failed")
}
}
//Same thing with audio!
if let loadedAudioAsset = controller.audioAsset {
let audioTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeAudio, preferredTrackID: 0)
do {
try audioTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, self.controller.realDuration),
of: loadedAudioAsset.tracks(withMediaType: AVMediaTypeAudio)[0] ,
at: kCMTimeZero)
} catch _ {
print("Failed to load Audio track")
}
}
//So, now that we have all of these tracks we need to apply those instructions! If we don't, then they could be different sizes. Say my newAsset is 720x1080 and MyAsset is 1440x900 (These are just examples haha), then it would look a tad funky and possibly not show our new asset at all.
let mainInstruction = AVMutableVideoCompositionInstruction()
//Make sure the overall time range matches that of the individual tracks, if not, it could cause errors.
mainInstruction.timeRange = CMTimeRangeMake(kCMTimeZero, self.controller.realDuration)
//For each track we made, we need an instruction. Could set loop or do individually as such.
let firstInstruction = videoCompositionInstructionForTrack(firstTrack, firstAsset)
//You know, not 100% why this is here. This is 1 thing I did not look into well enough or understand enough to describe to you.
firstInstruction.setOpacity(0.0, at: startTime)
//Next Instruction
let secondInstruction = videoCompositionInstructionForTrack(secondTrack, self.asset)
//Again, not sure we need 3rd one, but if we do.
var thirdInstruction:AVMutableVideoCompositionLayerInstruction!
if(self.endTime != self.controller.realDuration) {
secondInstruction.setOpacity(0.0, at: endTime)
thirdInstruction = videoCompositionInstructionForTrack(thirdTrack, firstAsset)
}
//Okay, now that we have all these instructions, we tie them into the main instruction we created above.
mainInstruction.layerInstructions = [firstInstruction, secondInstruction]
if(self.endTime != self.controller.realDuration) {
mainInstruction.layerInstructions += [thirdInstruction]
}
//We create a video framework now, slightly different than the one above.
let mainComposition = AVMutableVideoComposition()
//We apply these instructions to the framework
mainComposition.instructions = [mainInstruction]
//How long are our frames, you can change this as necessary
mainComposition.frameDuration = CMTimeMake(1, 30)
//This is your render size of the video. 720p, 1080p etc. You set it!
mainComposition.renderSize = firstAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize
//We create an export session (you can't use PresetPassthrough because we are manipulating the transforms of the videos and the quality, so I just set it to highest)
guard let exporter = AVAssetExportSession(asset: mixComposition, presetName: AVAssetExportPresetHighestQuality) else { return }
//Provide type of file, provide the url location you want exported to (I don't have mine posted in this example).
exporter.outputFileType = AVFileTypeMPEG4
exporter.outputURL = url
//Then we tell the exporter to export the video according to our video framework, and it does the work!
exporter.videoComposition = mainComposition
//Asynchronous methods FTW!
exporter.exportAsynchronously(completionHandler: {
//Do whatever when it finishes!
})
}
}
There is a lot going on here, but it has to be done, for my example anyways! Sorry it took so long to post and let me know if you have questions.
Yes you can totally just apply an individual transform to a each layer of an AVMutableComposition.
Heres an overview of the process - Ive done this personally in Objective-C though so I cant give you the exact swift code, but I know these same functions work just the same in Swift.
Create an AVMutableComposition.
Create an AVMutableVideoComposition.
Set the render size and frame duration of the Video Composition.
Now for each AVAsset :
Create an AVAssetTrack and an AVAudioTrack.
Create an AVMutableCompositionTrack for each of those (one for video, one for audio) by adding each to the mutableComposition.
here it gets more complicated .. (sorry AVFoundation is not easy!)
Create an AVMutableCompositionLayerInstruction from the AVAssetTrack that refers to each video. For each AVMutableCompositionLayerInstruction, you can set the transform on it. You can also do things like set a crop rectangle.
Add each AVMutableCompositionLayerInstruction to an array of layerinstructions. When all the AVMutableCompositionLayerInstructions are created, the array gets set on the AVMutableVideoComposition.
And finally ..
And finally, you will have an AVPlayerItem that you will use to play this back (on an AVPlayer). You create the AVPlayerItem using the AVMutableComposition, and then you set the AVMutableVideoComposition on the AVPlayerItem itself (setVideoComposition..)
Easy eh?
It took me some weeks to get this stuff working well. Its totally unforgiving and as you have mentioned, if you do something wrong, it doesnt tell you what you did wrong - it just doesnt appear.
But when you crack it, it totally works quickly and well.
Finally, all the stuff I have outlined is available in the AVFoundation docs. Its a lengthy tome, but you need to know it to achieve what you are trying to do.
Best of luck!