Clips in AVMutableComposition have gap between them - swift

I'm adding three clips to an AVMutableComposition like this...
let asset = AVURLAsset(url: url, options: [ AVURLAssetPreferPreciseDurationAndTimingKey : true ])
let track = composition.addMutableTrack(withMediaType: .video,
preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
do {
try track?.insertTimeRange(CMTimeRangeMake(start: CMTime.zero, duration: asset.duration),
of: asset.tracks(withMediaType: .video)[0],
at: composition.duration)
} catch {
print("Failed to load track")
}
So, that's the full duration of each asset, added at the current end of the composition. I've tried different orders and there's always one flash of the background in between two clips.
I've tried setting at: to be the sum of the previously added clip's duration but it doesn't change the final result.
I'm also setting the tracks to zero opacity when they've finished so that they don't cover up subsequent tracks.
instruction.setOpacity(0.0, at: composition.duration)
Possibly, the opacity is switching on too early. But this instruction also uses the asset's duration - it's the same data.
I've looked in the debugger and when there's a gap between two clips, the second clip start value is exactly the same as the previous clips' duration (where the first clip starts at 0), and the opacity is also switched on at the same time value. So it looks like I'm at least feeding the composition correct data.
How do I remove the gap?

Related

Adding many ramps is slow with AVMutableAudioMixInputParameters setVolumeRamp

I'm creating an AVComposition which cuts up a video into many small chunks, to remove the silent parts of the video. To keep the audio pop-free, I'm adding a short fade at the edges of these clips, using the setVolumeRamp method on AVMutableAudioMixInputParameters.
The problem I'm noticing is that adding audio ramps is slow when there are a lot of them.
Here's the relevant code:
let mixParams = AVMutableAudioMixInputParameters(track: compositionAudioTrack)
var timeOffset = CMTime(value: 0, timescale: asset.frameDuration.timescale)
for clip in clips {
// Add a fade-in at the start and a fade-out at the end, if the clip is long enough
let doubleFadeDuration = CMTimeMultiply(fadeDuration, multiplier: 2)
if clip.duration > doubleFadeDuration {
// Figure out the sections to fade
let fadeInRange = CMTimeRange(start: timeOffset, duration: fadeDuration)
let fadeOutRange = CMTimeRange(start: timeOffset + clip.duration - fadeDuration, duration: fadeDuration)
// Set the volume ramps
mixParams.setVolumeRamp(fromStartVolume: 0.0, toEndVolume: 1.0, timeRange: fadeInRange)
mixParams.setVolumeRamp(fromStartVolume: 1.0, toEndVolume: 0.0, timeRange: fadeOutRange)
}
timeOffset = timeOffset + clip.duration
}
// Create the mix. This is later set as the `audioMix` property
// on an AVPlayerItem or AVAssetExportSession
let mix = AVMutableAudioMix()
mix.inputParameters = [mixParams]
I measured this using Instruments on a 1hr 21min video that has 1421 clips, which would end up being 2842 audio ramps.
Granted this is an edge case for my app, and I think most videos will be shorter, but I don't love the fact that it just appears to hang for 2 solid seconds as it builds this composition.
From profiling, it looks like all the time is spent in AVRampsIncludesRampThatOverlapsTimeRange, which I'm guessing is using a O(n²) way of checking each new ramp against every previous one.
How can I speed this up? Is there any way to set all the ramps in a batch, or skip this check?

Recording of metal view is slow due to texture.getbytes function - Swift

I am using this post for recording a custom metal view, but I am experiencing some issues. When I start recording I go from 60fps to ~20fps on a iPhone 12 Pro Max. After Profiling, the function that is slowing everything is texture.getBytes, as it is grabbing buffer from the GPU into the CPU.
Another issue, not sure if consequence of this, is that the video and audio are out of sync. I am not sure if I should go into the semaphores route for solving this or there is any other potential workaround.
In my case, the texture size is as big as the screen size, as I create it from the camera stream and then process it through a couple of CIFilters. I am not sure if the issue is that it is too big so getBytes cannot support this size of textures on a real-time basis.
If I need to define priorities, my #1 priority would be to solve the out-of-sync between the audio and video. Any thoughts would be super helpful.
Here is the code:
import AVFoundation
class MetalVideoRecorder {
var isRecording = false
var recordingStartTime = TimeInterval(0)
private var assetWriter: AVAssetWriter
private var assetWriterVideoInput: AVAssetWriterInput
private var assetWriterPixelBufferInput: AVAssetWriterInputPixelBufferAdaptor
init?(outputURL url: URL, size: CGSize) {
do {
assetWriter = try AVAssetWriter(outputURL: url, fileType: AVFileType.m4v)
} catch {
return nil
}
let outputSettings: [String: Any] = [ AVVideoCodecKey : AVVideoCodecType.h264,
AVVideoWidthKey : size.width,
AVVideoHeightKey : size.height ]
assetWriterVideoInput = AVAssetWriterInput(mediaType: AVMediaType.video, outputSettings: outputSettings)
assetWriterVideoInput.expectsMediaDataInRealTime = true
let sourcePixelBufferAttributes: [String: Any] = [
kCVPixelBufferPixelFormatTypeKey as String : kCVPixelFormatType_32BGRA,
kCVPixelBufferWidthKey as String : size.width,
kCVPixelBufferHeightKey as String : size.height ]
assetWriterPixelBufferInput = AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: assetWriterVideoInput,
sourcePixelBufferAttributes: sourcePixelBufferAttributes)
assetWriter.add(assetWriterVideoInput)
}
func startRecording() {
assetWriter.startWriting()
assetWriter.startSession(atSourceTime: CMTime.zero)
recordingStartTime = CACurrentMediaTime()
isRecording = true
}
func endRecording(_ completionHandler: #escaping () -> ()) {
isRecording = false
assetWriterVideoInput.markAsFinished()
assetWriter.finishWriting(completionHandler: completionHandler)
}
func writeFrame(forTexture texture: MTLTexture) {
if !isRecording {
return
}
while !assetWriterVideoInput.isReadyForMoreMediaData {}
guard let pixelBufferPool = assetWriterPixelBufferInput.pixelBufferPool else {
print("Pixel buffer asset writer input did not have a pixel buffer pool available; cannot retrieve frame")
return
}
var maybePixelBuffer: CVPixelBuffer? = nil
let status = CVPixelBufferPoolCreatePixelBuffer(nil, pixelBufferPool, &maybePixelBuffer)
if status != kCVReturnSuccess {
print("Could not get pixel buffer from asset writer input; dropping frame...")
return
}
guard let pixelBuffer = maybePixelBuffer else { return }
CVPixelBufferLockBaseAddress(pixelBuffer, [])
let pixelBufferBytes = CVPixelBufferGetBaseAddress(pixelBuffer)!
// Use the bytes per row value from the pixel buffer since its stride may be rounded up to be 16-byte aligned
let bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
let region = MTLRegionMake2D(0, 0, texture.width, texture.height)
texture.getBytes(pixelBufferBytes, bytesPerRow: bytesPerRow, from: region, mipmapLevel: 0)
let frameTime = CACurrentMediaTime() - recordingStartTime
let presentationTime = CMTimeMakeWithSeconds(frameTime, preferredTimescale: 240)
assetWriterPixelBufferInput.append(pixelBuffer, withPresentationTime: presentationTime)
CVPixelBufferUnlockBaseAddress(pixelBuffer, [])
}
}
Unlike OpenGL, Metal doesn't have the concept of a default framebuffer. Instead it uses a technique called Swap Chain. A swap chain is a collection of buffers that are used for displaying frames to the user. Each time an application presents a new frame for display, the first buffer in the swap chain takes the place of the displayed buffer.
When a command queue schedules a command buffer for execution, the
drawable tracks all render or write requests on itself in that command
buffer. The operating system doesn't present the drawable onscreen
until the commands have finished executing. By asking the command
buffer to present the drawable, you guarantee that presentation
happens after the command queue has scheduled this command buffer.
Don’t wait for the command buffer to finish executing before
registering the drawable’s presentation.
The layer reuses a drawable only if it isn’t onscreen and there are no strong references to it. They exist within a limited and reusable resource pool, and a drawable may or may not be available when you request one. If none are available, Core Animation blocks your calling thread until a new drawable becomes available — usually at the next display refresh interval.
In your case frame recorder keeps a reference to your drawable for too long which is what causes the frame drops. In order to avoid it you should implement a Triple Buffering Model.
Adding a third dynamic data buffer is the ideal solution when considering processor idle time, memory overhead, and frame latency.
I have encountered the same problem, I'd like to know if you have solved this problem.
Here is what I know now.
Everything is doing on main thread. You can init another serial queue to do the writing & finishWriting asynchronously.
My iPhone Xs Max can record screen size video at 60 FPS.
You can check this repo,it is swift version of Apple's sample which is using AVAssetWriter, and it will tell you how to sync your video and audio.
RosyWriter
getBytes might have performance issue on A14 devices. Same code running on iPhone 12 Pro Max, the output video is laggy and unusable.
You can check this.
Developer Forums
I did not fully understand how to implement #HamidYusifli proposed solution, so I focused on:
Optimize the rest of the Metal code (I am doing some real time image processing)
Fix the out of sync video and audio via AVCaptureSynchronizedData
With this new implementation my code is still consuming quite a lot of CPU (106% on iPhone 12 plus) and at ~20fps but with a feeling of working pretty smooth to the user (there is no out-of-sync)

How do you add an overlay while recording a video in Swift?

I am trying to record, and then save, a video in Swift using AVFoundation. This works. I am also trying to add an overlay, such as a text label containing the date, to the video.
For example: the video saved is not only what the camera sees, but the timestamp as well.
Here is how I am saving the video:
func fileOutput(_ output: AVCaptureFileOutput, didFinishRecordingTo outputFileURL: URL, from connections: [AVCaptureConnection], error: Error?) {
saveVideo(toURL: movieURL!)
}
private func saveVideo(toURL url: URL) {
PHPhotoLibrary.shared().performChanges({
PHAssetChangeRequest.creationRequestForAssetFromVideo(atFileURL: url)
}) { (success, error) in
if(success) {
print("Video saved to Camera Roll.")
} else {
print("Video failed to save.")
}
}
}
I have a movieOuput that is an AVCaptureMovieFileOutput. My preview layer does not contain any sublayers. I tried adding the timestamp label's layer to the previewLayer, but this did not succeed.
I have tried Ray Wenderlich's example as well as this stack overflow question. Lastly, I also tried this tutorial, all of which to no avail.
How can I add an overlay to my video that is in the saved video in the camera roll?
Without more information it sounds like what you are asking for is a WATERMARK.
Not an overlay.
A watermark is a markup on the video that will be saved with the video.
An overlay is generally showed as subviews on the preview layer and will not be saved with the video.
Check this out here: https://stackoverflow.com/a/47742108/8272698
func addWatermark(inputURL: URL, outputURL: URL, handler:#escaping (_ exportSession: AVAssetExportSession?)-> Void) {
let mixComposition = AVMutableComposition()
let asset = AVAsset(url: inputURL)
let videoTrack = asset.tracks(withMediaType: AVMediaType.video)[0]
let timerange = CMTimeRangeMake(kCMTimeZero, asset.duration)
let compositionVideoTrack:AVMutableCompositionTrack = mixComposition.addMutableTrack(withMediaType: AVMediaType.video, preferredTrackID: CMPersistentTrackID(kCMPersistentTrackID_Invalid))!
do {
try compositionVideoTrack.insertTimeRange(timerange, of: videoTrack, at: kCMTimeZero)
compositionVideoTrack.preferredTransform = videoTrack.preferredTransform
} catch {
print(error)
}
let watermarkFilter = CIFilter(name: "CISourceOverCompositing")!
let watermarkImage = CIImage(image: UIImage(named: "waterMark")!)
let videoComposition = AVVideoComposition(asset: asset) { (filteringRequest) in
let source = filteringRequest.sourceImage.clampedToExtent()
watermarkFilter.setValue(source, forKey: "inputBackgroundImage")
let transform = CGAffineTransform(translationX: filteringRequest.sourceImage.extent.width - (watermarkImage?.extent.width)! - 2, y: 0)
watermarkFilter.setValue(watermarkImage?.transformed(by: transform), forKey: "inputImage")
filteringRequest.finish(with: watermarkFilter.outputImage!, context: nil)
}
guard let exportSession = AVAssetExportSession(asset: asset, presetName: AVAssetExportPreset640x480) else {
handler(nil)
return
}
exportSession.outputURL = outputURL
exportSession.outputFileType = AVFileType.mp4
exportSession.shouldOptimizeForNetworkUse = true
exportSession.videoComposition = videoComposition
exportSession.exportAsynchronously { () -> Void in
handler(exportSession)
}
}
And heres how to call the function.
let outputURL = NSURL.fileURL(withPath: "TempPath")
let inputURL = NSURL.fileURL(withPath: "VideoWithWatermarkPath")
addWatermark(inputURL: inputURL, outputURL: outputURL, handler: { (exportSession) in
guard let session = exportSession else {
// Error
return
}
switch session.status {
case .completed:
guard NSData(contentsOf: outputURL) != nil else {
// Error
return
}
// Now you can find the video with the watermark in the location outputURL
default:
// Error
}
})
Let me know if this code works for you.
It is in swift 3 so some changes will be needed.
I currently am using this code on an app of mine. Have not updated it to swift 5 yet
I do not have an actual development environment for Swift that can utilize AVFoundation. Thus, I can't provide you with any example code.
For adding meta data(date, location, timestamp, watermark, frame rate, etc...) as an overlay to the video while recording, you would have to process the video feed, frame by frame, live, while recording. Most likely you would have to store the frames in a buffer and process them before actually record them.
Now when it come to the meta data, there are two type, static and dynamic. For static type such as a watermark, it should be easy enough, as all the frames will get the same thing.
However, for dynamic meta data type such as timestamp or GPS location, there are a few things that needed to be taken into consideration. It takes computational power and time to process the video frames. Thus, depends on the type of dynamic data and how you got them, sometime the processed value may not be a correct value. For example, if you got a frame at 1:00:01, you process it and add a timestamp to it. Just pretend that it took 2 seconds to process the timestamp. The next frame you got is at 1:00:02, but you couldn't process it until 1:00:03 because processing the previous frame took 2 seconds. Thus, depend on how you got that new timestamp for the new frame, that timestamp value may not be the value that you wanted.
For processing dynamic meta data, you should also take into consideration of hardware lag. For example, the software is supposed to add live GPS location data to each frame and there weren't any lags in development or in testing. However, in real life, a user used the software in an area with a bad connection, and his phone lag while obtaining his GPS location. Some of his lags lasted as long as 5 seconds. What do you do in that situation? Do you set a time out for the GPS location and used the last good position? Do you report the error? Do you defer that frame to be process later when the GPS data become available(This may ruin live recording) and using an expensive algorithm to try to predict the user's location for that frame?
Besides those to take into consideration, I have some references here that I think may help you. I thought the one from medium.com looked pretty good.
https://medium.com/ios-os-x-development/ios-camera-frames-extraction-d2c0f80ed05a
Adding watermark to currently recording video and save with watermark
Render dynamic text onto CVPixelBufferRef while recording video
Adding on to #Kevin Ng, you can do an overlay on video frames with an UIViewController and an UIView.
UIViewController will have:
property to work with video stream
private var videoSession: AVCaptureSession?
property to work with overlay(the UIView class)
private var myOverlay: MyUIView{view as! MyUIView}
property to work with video output queue
private let videoOutputQueue = DispatchQueue(label:
"outputQueue", qos: .userInteractive)
method to create video session
method to process and display overlay
UIView will have task-specific helper methods needed to to act as overlay. For example, if you are doing hand detection, this overlay class can have helper methods to draw points on coordinates(ViewController class will detect coordinates of hand features, do necessary coordinate conversions, then pass the coordinates to the UIView class to display coordinates as an overlay)

Failed to get video thumbnail from AVPlayer using Fairplay HLS

I'm trying to build a custom progress bar for a video player app in tvOS, and would like to show thumbnails of the video while the user scans the video.
I'm using AVPlayer and Fairplay HLS to play remote video files. I've tried to do this using 2 methods. One with AVAssetImageGenerator's copyCGImage, and the other with AVPlayerItemVideoOutput's copyPixelBuffer method. Both return nil.
When I tried with a local video file, the first method worked.
Method 1:
let imageGenerator = AVAssetImageGenerator(asset: playerItem.asset)
let progressSeconds = playerItem.duration.seconds * Double(progress)
let time = CMTime(seconds: progressSeconds, preferredTimescale: 5)
if let imageRef = try? imageGenerator.copyCGImage(at: time, actualTime: nil) {
image = UIImage(cgImage:imageRef)
}
Method 2:
let videoThumbnailsOutput = AVPlayerItemVideoOutput(pixelBufferAttributes: [String(kCVPixelBufferPixelFormatTypeKey): NSNumber(value: kCVPixelFormatType_32BGRA)])
player?.currentItem?.add(videoThumbnailsOutput)
if let pixelBuffer = videoThumbnailsOutput.copyPixelBuffer(forItemTime: time, itemTimeForDisplay: nil) {
let ciImage = CIImage(cvPixelBuffer: pixelBuffer)
Any ideas what I'm doing wrong or is there any other way?
Thanks!
This is usually done by making use of the trick play stream associated to your actual stream.
https://en.wikipedia.org/wiki/Trick_mode
You can find it declared with the key EXT-X-I-FRAME-STREAM-INF in the manifest of your HLS stream. A regex might be needed in order to parse its value.
"#EXT-X-I-FRAME-STREAM-INF[^#]*URI=[^#]*"
Once you have the URL of the trick play stream, you can use a paused instance of AVPlayer as a thumbnail. And when the user swipe left and right, you should seek the player in the thumbnail to show the right frame.

Only First Track Playing of AVMutableComposition()

New Edit Below
I have already referenced
AVMutableComposition - Only Playing First Track (Swift)
but it is not providing the answer to what I am looking for.
I have a AVMutableComposition(). I am trying to apply MULTIPLE AVCompositionTrack, of a single type AVMediaTypeVideo in this single composition. This is because I am using 2 different AVMediaTypeVideo sources with different CGSize's and preferredTransforms of the AVAsset's they come from.
So, the only way to apply their specified preferredTransforms is to provide them in 2 different tracks. But, for whatever reason, only the first track will actually provide any video, almost as if the second track is never there.
So, I have tried
1) using AVMutableVideoCompositionLayerInstruction's and applying an AVVideoComposition along with an AVAssetExportSession, which works okay, I am still working on the transforms, but is do-able. But the processing time's of the video's are WELL OVER 1 minute, which is just inapplicable in my situation.
2) Using multiple tracks, without AVAssetExportSession and the 2nd track of the same type never appears. Now, I could put it all on 1 track, but all the videos will then be the same size and preferredTransform as the first video, which I absolutely do not want, as it stretches them on all sides.
So my question is, is it possible
1) Applying instructions to just a track WITHOUT using AVAssetExportSession? //Preferred way BY FAR.
2) Decrease time of export? (I have tried using PresetPassthrough but you cannot use that if you have a exporter.videoComposition which are where my instructions are. This is the only place I know I can put instructions, not sure if I can place them somewhere else.
Here is some of my code (without the exporter as I don't need to export anything anywhere, just do stuff after the AVMutableComposition combines the items.
func merge() {
if let firstAsset = controller.firstAsset, secondAsset = self.asset {
let mixComposition = AVMutableComposition()
let firstTrack = mixComposition.addMutableTrackWithMediaType(AVMediaTypeVideo,
preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
do {
//Don't need now according to not being able to edit first 14seconds.
if(CMTimeGetSeconds(startTime) == 0) {
self.startTime = CMTime(seconds: 1/600, preferredTimescale: Int32(600))
}
try firstTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600)),
ofTrack: firstAsset.tracksWithMediaType(AVMediaTypeVideo)[0],
atTime: kCMTimeZero)
} catch _ {
print("Failed to load first track")
}
//This secondTrack never appears, doesn't matter what is inside of here, like it is blank space in the video from startTime to endTime (rangeTime of secondTrack)
let secondTrack = mixComposition.addMutableTrackWithMediaType(AVMediaTypeVideo,
preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
// secondTrack.preferredTransform = self.asset.preferredTransform
do {
try secondTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, secondAsset.duration),
ofTrack: secondAsset.tracksWithMediaType(AVMediaTypeVideo)[0],
atTime: CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600))
} catch _ {
print("Failed to load second track")
}
//This part appears again, at endTime which is right after the 2nd track is suppose to end.
do {
try firstTrack.insertTimeRange(CMTimeRangeMake(CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600), firstAsset.duration-endTime),
ofTrack: firstAsset.tracksWithMediaType(AVMediaTypeVideo)[0] ,
atTime: CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600))
} catch _ {
print("failed")
}
if let loadedAudioAsset = controller.audioAsset {
let audioTrack = mixComposition.addMutableTrackWithMediaType(AVMediaTypeAudio, preferredTrackID: 0)
do {
try audioTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, firstAsset.duration),
ofTrack: loadedAudioAsset.tracksWithMediaType(AVMediaTypeAudio)[0] ,
atTime: kCMTimeZero)
} catch _ {
print("Failed to load Audio track")
}
}
}
}
Edit
Apple states that "Indicates instructions for video composition via an NSArray of instances of classes implementing the AVVideoCompositionInstruction protocol.
For the first instruction in the array, timeRange.start must be less than or equal to the earliest time for which playback or other processing will be attempted
(note that this will typically be kCMTimeZero). For subsequent instructions, timeRange.start must be equal to the prior instruction's end time. The end time of
the last instruction must be greater than or equal to the latest time for which playback or other processing will be attempted (note that this will often be
the duration of the asset with which the instance of AVVideoComposition is associated)."
This just states that the entire composition must be layered inside instructions if you decide to use ANY instructions (this is what I am understanding). Why is this? How would I just apply instructions to say track 2 on this example without applying changing track 1 or 3 at all:
Track 1 from 0 - 10sec, Track 2 from 10 - 20sec, Track 3 from 20 - 30sec.
Any explanation on that would probably answer my question (if it is doable).
Ok, so for my exact problem, I had to apply specific transforms CGAffineTransform in Swift to get the specific result we wanted. The current one I am posting works with any picture taken/obtained as well as video
//This method gets the orientation of the current transform. This method is used below to determine the orientation
func orientationFromTransform(_ transform: CGAffineTransform) -> (orientation: UIImageOrientation, isPortrait: Bool) {
var assetOrientation = UIImageOrientation.up
var isPortrait = false
if transform.a == 0 && transform.b == 1.0 && transform.c == -1.0 && transform.d == 0 {
assetOrientation = .right
isPortrait = true
} else if transform.a == 0 && transform.b == -1.0 && transform.c == 1.0 && transform.d == 0 {
assetOrientation = .left
isPortrait = true
} else if transform.a == 1.0 && transform.b == 0 && transform.c == 0 && transform.d == 1.0 {
assetOrientation = .up
} else if transform.a == -1.0 && transform.b == 0 && transform.c == 0 && transform.d == -1.0 {
assetOrientation = .down
}
//Returns the orientation as a variable
return (assetOrientation, isPortrait)
}
//Method that lays out the instructions for each track I am editing and does the transformation on each individual track to get it lined up properly
func videoCompositionInstructionForTrack(_ track: AVCompositionTrack, _ asset: AVAsset) -> AVMutableVideoCompositionLayerInstruction {
//This method Returns set of instructions from the initial track
//Create inital instruction
let instruction = AVMutableVideoCompositionLayerInstruction(assetTrack: track)
//This is whatever asset you are about to apply instructions to.
let assetTrack = asset.tracks(withMediaType: AVMediaTypeVideo)[0]
//Get the original transform of the asset
var transform = assetTrack.preferredTransform
//Get the orientation of the asset and determine if it is in portrait or landscape - I forget which, but either if you take a picture or get in the camera roll it is ALWAYS determined as landscape at first, I don't recall which one. This method accounts for it.
let assetInfo = orientationFromTransform(transform)
//You need a little background to understand this part.
/* MyAsset is my original video. I need to combine a lot of other segments, according to the user, into this original video. So I have to make all the other videos fit this size.
This is the width and height ratios from the original video divided by the new asset
*/
let width = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width/assetTrack.naturalSize.width
var height = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height
//If it is in portrait
if assetInfo.isPortrait {
//We actually change the height variable to divide by the width of the old asset instead of the height. This is because of the flip since we determined it is portrait and not landscape.
height = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.width
//We apply the transform and scale the image appropriately.
transform = transform.scaledBy(x: height, y: height)
//We also have to move the image or video appropriately. Since we scaled it, it could be wayy off on the side, outside the bounds of the viewing.
let movement = ((1/height)*assetTrack.naturalSize.height)-assetTrack.naturalSize.height
//This lines it up dead center on the left side of the screen perfectly. Now we want to center it.
transform = transform.translatedBy(x: 0, y: movement)
//This calculates how much black there is. Cut it in half and there you go!
let totalBlackDistance = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width-transform.tx
transform = transform.translatedBy(x: 0, y: -(totalBlackDistance/2)*(1/height))
} else {
//Landscape! We don't need to change the variables, it is all defaulted that way (iOS prefers landscape items), so we scale it appropriately.
transform = transform.scaledBy(x: width, y: height)
//This is a little complicated haha. So because it is in landscape, the asset fits the height correctly, for me anyway; It was just extra long. Think of this as a ratio. I forgot exactly how I thought this through, but the end product looked like: Answer = ((Original height/current asset height)*(current asset width))/(Original width)
let scale:CGFloat = ((MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height)*(assetTrack.naturalSize.width))/MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width
transform = transform.scaledBy(x: scale, y: 1)
//The asset can be way off the screen again, so we have to move it back. This time we can have it dead center in the middle, because it wasn't backwards because it wasn't flipped because it was landscape. Again, another long complicated algorithm I derived.
let movement = ((MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width-((MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height)*(assetTrack.naturalSize.width)))/2)*(1/MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height)
transform = transform.translatedBy(x: movement, y: 0)
}
//This creates the instruction and returns it so we can apply it to each individual track.
instruction.setTransform(transform, at: kCMTimeZero)
return instruction
}
Now that we have those methods, we can now apply the correct and appropriate transformations to our assets appropriately and get everything fitting nice and clean.
func merge() {
if let firstAsset = MyAsset, let newAsset = newAsset {
//This creates our overall composition, our new video framework
let mixComposition = AVMutableComposition()
//One by one you create tracks (could use loop, but I just had 3 cases)
let firstTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeVideo,
preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
//You have to use a try, so need a do
do {
//Inserting a timerange into a track. I already calculated my time, I call it startTime. This is where you would put your time. The preferredTimeScale doesn't have to be 600000 haha, I was playing with those numbers. It just allows precision. At is not where it begins within this individual track, but where it starts as a whole. As you notice below my At times are different You also need to give it which track
try firstTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600000)),
of: firstAsset.tracks(withMediaType: AVMediaTypeVideo)[0],
at: kCMTimeZero)
} catch _ {
print("Failed to load first track")
}
//Create the 2nd track
let secondTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeVideo,
preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
do {
//Apply the 2nd timeRange you have. Also apply the correct track you want
try secondTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, self.endTime-self.startTime),
of: newAsset.tracks(withMediaType: AVMediaTypeVideo)[0],
at: CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600000))
secondTrack.preferredTransform = newAsset.preferredTransform
} catch _ {
print("Failed to load second track")
}
//We are not sure we are going to use the third track in my case, because they can edit to the end of the original video, causing us not to use a third track. But if we do, it is the same as the others!
var thirdTrack:AVMutableCompositionTrack!
if(self.endTime != controller.realDuration) {
thirdTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeVideo,
preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
//This part appears again, at endTime which is right after the 2nd track is suppose to end.
do {
try thirdTrack.insertTimeRange(CMTimeRangeMake(CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600000), self.controller.realDuration-endTime),
of: firstAsset.tracks(withMediaType: AVMediaTypeVideo)[0] ,
at: CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600000))
} catch _ {
print("failed")
}
}
//Same thing with audio!
if let loadedAudioAsset = controller.audioAsset {
let audioTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeAudio, preferredTrackID: 0)
do {
try audioTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, self.controller.realDuration),
of: loadedAudioAsset.tracks(withMediaType: AVMediaTypeAudio)[0] ,
at: kCMTimeZero)
} catch _ {
print("Failed to load Audio track")
}
}
//So, now that we have all of these tracks we need to apply those instructions! If we don't, then they could be different sizes. Say my newAsset is 720x1080 and MyAsset is 1440x900 (These are just examples haha), then it would look a tad funky and possibly not show our new asset at all.
let mainInstruction = AVMutableVideoCompositionInstruction()
//Make sure the overall time range matches that of the individual tracks, if not, it could cause errors.
mainInstruction.timeRange = CMTimeRangeMake(kCMTimeZero, self.controller.realDuration)
//For each track we made, we need an instruction. Could set loop or do individually as such.
let firstInstruction = videoCompositionInstructionForTrack(firstTrack, firstAsset)
//You know, not 100% why this is here. This is 1 thing I did not look into well enough or understand enough to describe to you.
firstInstruction.setOpacity(0.0, at: startTime)
//Next Instruction
let secondInstruction = videoCompositionInstructionForTrack(secondTrack, self.asset)
//Again, not sure we need 3rd one, but if we do.
var thirdInstruction:AVMutableVideoCompositionLayerInstruction!
if(self.endTime != self.controller.realDuration) {
secondInstruction.setOpacity(0.0, at: endTime)
thirdInstruction = videoCompositionInstructionForTrack(thirdTrack, firstAsset)
}
//Okay, now that we have all these instructions, we tie them into the main instruction we created above.
mainInstruction.layerInstructions = [firstInstruction, secondInstruction]
if(self.endTime != self.controller.realDuration) {
mainInstruction.layerInstructions += [thirdInstruction]
}
//We create a video framework now, slightly different than the one above.
let mainComposition = AVMutableVideoComposition()
//We apply these instructions to the framework
mainComposition.instructions = [mainInstruction]
//How long are our frames, you can change this as necessary
mainComposition.frameDuration = CMTimeMake(1, 30)
//This is your render size of the video. 720p, 1080p etc. You set it!
mainComposition.renderSize = firstAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize
//We create an export session (you can't use PresetPassthrough because we are manipulating the transforms of the videos and the quality, so I just set it to highest)
guard let exporter = AVAssetExportSession(asset: mixComposition, presetName: AVAssetExportPresetHighestQuality) else { return }
//Provide type of file, provide the url location you want exported to (I don't have mine posted in this example).
exporter.outputFileType = AVFileTypeMPEG4
exporter.outputURL = url
//Then we tell the exporter to export the video according to our video framework, and it does the work!
exporter.videoComposition = mainComposition
//Asynchronous methods FTW!
exporter.exportAsynchronously(completionHandler: {
//Do whatever when it finishes!
})
}
}
There is a lot going on here, but it has to be done, for my example anyways! Sorry it took so long to post and let me know if you have questions.
Yes you can totally just apply an individual transform to a each layer of an AVMutableComposition.
Heres an overview of the process - Ive done this personally in Objective-C though so I cant give you the exact swift code, but I know these same functions work just the same in Swift.
Create an AVMutableComposition.
Create an AVMutableVideoComposition.
Set the render size and frame duration of the Video Composition.
Now for each AVAsset :
Create an AVAssetTrack and an AVAudioTrack.
Create an AVMutableCompositionTrack for each of those (one for video, one for audio) by adding each to the mutableComposition.
here it gets more complicated .. (sorry AVFoundation is not easy!)
Create an AVMutableCompositionLayerInstruction from the AVAssetTrack that refers to each video. For each AVMutableCompositionLayerInstruction, you can set the transform on it. You can also do things like set a crop rectangle.
Add each AVMutableCompositionLayerInstruction to an array of layerinstructions. When all the AVMutableCompositionLayerInstructions are created, the array gets set on the AVMutableVideoComposition.
And finally ..
And finally, you will have an AVPlayerItem that you will use to play this back (on an AVPlayer). You create the AVPlayerItem using the AVMutableComposition, and then you set the AVMutableVideoComposition on the AVPlayerItem itself (setVideoComposition..)
Easy eh?
It took me some weeks to get this stuff working well. Its totally unforgiving and as you have mentioned, if you do something wrong, it doesnt tell you what you did wrong - it just doesnt appear.
But when you crack it, it totally works quickly and well.
Finally, all the stuff I have outlined is available in the AVFoundation docs. Its a lengthy tome, but you need to know it to achieve what you are trying to do.
Best of luck!