Export a video with dynamic text per frame in Swift AVFoundation - swift

I fetch the timestamps from every frame and store them in an array using the showTimestamps function. I now want to "draw" each timestamp on each frame of the video, and export it.
func showTimestamps(videoFile : URL) -> [String] {
let asset = AVAsset(url:videoFile)
let track = asset.tracks(withMediaType: AVMediaType.video)[0]
let output = AVAssetReaderTrackOutput(track: track, outputSettings: nil)
guard let reader = try? AVAssetReader(asset: asset) else {exit(1)}
output.alwaysCopiesSampleData = false
reader.add(output)
reader.startReading()
var times : [String] = []
while(reader.status == .reading){
if let sampleBuffer = output.copyNextSampleBuffer() , CMSampleBufferIsValid(sampleBuffer) && CMSampleBufferGetTotalSampleSize(sampleBuffer) != 0 {
let frameTime = CMSampleBufferGetOutputPresentationTimeStamp(sampleBuffer)
if (frameTime.isValid){
times.append(String(format:"%.3f", frameTime.seconds))
}
}
}
return times.sorted()
}
However, I cannot figure out how to export a new video with each frame containing it's respectful timestamp? i.e How can I implement this code:
func generateNewVideoWithTimestamps(videoFile: URL, timestampsForFrames: [String]) {
// TODO
}
I want to keep the framerate, video quality, etc., the same. The only thing that should differ is to add some text on the bottom.
To get this far, I used these guides and failed: Frames, Static Text, Watermark

Related

How to route audio to default speakers in Swift for macOS?

I have a function playing audio for a macOS swiftUI app but I want it to play the sound through the default built in speakers every single time. Does anyone know of any reliable method for this?
I've researched a lot but haven't found a solid method for macos. This is what I've tried:
AVRoutePickerView
This was only availble for ios and Mac catalyst but not macOS
Getting Device ID in AVAudioEngine
I found this code snippet but it assumes that the built in speaker device ID stays the same which it doesnt so that doesn't help.
engine = AVAudioEngine()
let output = engine.outputNode
// get the low level input audio unit from the engine:
let outputUnit = output.audioUnit!
// use core audio low level call to set the input device:
var outputDeviceID: AudioDeviceID = 51 // replace with actual, dynamic value
AudioUnitSetProperty(outputUnit,
kAudioOutputUnitProperty_CurrentDevice,
kAudioUnitScope_Global,
0,
&outputDeviceID,
UInt32(MemoryLayout<AudioDeviceID>.size))
Disabling bluetooth so the audio only goes through main speakers and not bluetooth speaker. This didn't seem the best approach so I havent' tested it.
The following is the code I have for playing sound:
func playTheSound() {
let url = Bundle.main.url(forResource: "Blow", withExtension: "mp3")
player = try! AVAudioPlayer(contentsOf: url!)
player?.play()
print("Sound was played")
//
So, any recommendations on how to route the audio to main speakers for macOS?
By "default built-in" I assume you actually just mean "built-in." The default speakers are the ones the audio will route to already.
The simplest solution to this that will probably always work is to route to the UID "BuiltInSpeakerDevice". For example, this does what you want:
let player = AVPlayer()
func playTheSound() {
let url = URL(filePath: "/System/Library/Sounds/Blow.aiff")
let item = AVPlayerItem(url: url)
player.replaceCurrentItem(with: item)
player.audioOutputDeviceUniqueID = "BuiltInSpeakerDevice"
player.play()
}
Note the use of AVPlayer and audioOutputDeviceUniqueID here. I'm betting this will work in approximately 100% of cases. It should even "work" if there were no built-in speakers, in that this silently fails (without crashing) if the UID doesn't exist.
But...sigh...I can't find anywhere that this is documented or any system constant for this string. And I really hate magic, undocumented strings. So, let's do it right. Besides, if we do it right, it'll work with AVAudioEngine, too. So let's get there.
First, you should always take a look at the invaluable CoreAudio output device useful methods in Swift 4. I don't know if anyone has turned this into a real framework, but this is a treasure trove of examples. The following code is a modernized version of that.
struct AudioDevice {
let id: AudioDeviceID
static func getAll() -> [AudioDevice] {
var propertyAddress = AudioObjectPropertyAddress(
mSelector: kAudioHardwarePropertyDevices,
mScope: kAudioObjectPropertyScopeGlobal,
mElement: kAudioObjectPropertyElementMain)
// Get size of buffer for list
var devicesBufferSize: UInt32 = 0
AudioObjectGetPropertyDataSize(AudioObjectID(kAudioObjectSystemObject), &propertyAddress,
0, nil,
&devicesBufferSize)
let devicesCount = Int(devicesBufferSize) / MemoryLayout<AudioDeviceID>.stride
// Get list
let devices = Array<AudioDeviceID>(unsafeUninitializedCapacity: devicesCount) { buffer, initializedCount in
AudioObjectGetPropertyData(AudioObjectID(kAudioObjectSystemObject), &propertyAddress,
0, nil,
&devicesBufferSize, buffer.baseAddress!)
initializedCount = devicesCount
}
return devices.map(Self.init)
}
var hasOutputStreams: Bool {
var propertySize: UInt32 = 256
var propertyAddress = AudioObjectPropertyAddress(
mSelector: kAudioDevicePropertyStreams,
mScope: kAudioDevicePropertyScopeOutput,
mElement: kAudioObjectPropertyElementMain)
AudioObjectGetPropertyDataSize(id, &propertyAddress, 0, nil, &propertySize)
return propertySize > 0
}
var isBuiltIn: Bool {
transportType == kAudioDeviceTransportTypeBuiltIn
}
var transportType: AudioDevicePropertyID {
var deviceTransportType = AudioDevicePropertyID()
var propertySize = UInt32(MemoryLayout<AudioDevicePropertyID>.size)
var propertyAddress = AudioObjectPropertyAddress(
mSelector: kAudioDevicePropertyTransportType,
mScope: kAudioObjectPropertyScopeGlobal,
mElement: kAudioObjectPropertyElementMain)
AudioObjectGetPropertyData(id, &propertyAddress,
0, nil, &propertySize,
&deviceTransportType)
return deviceTransportType
}
var uid: String {
var propertySize = UInt32(MemoryLayout<CFString>.size)
var propertyAddress = AudioObjectPropertyAddress(
mSelector: kAudioDevicePropertyDeviceUID,
mScope: kAudioObjectPropertyScopeGlobal,
mElement: kAudioObjectPropertyElementMain)
var result: CFString = "" as CFString
AudioObjectGetPropertyData(id, &propertyAddress, 0, nil, &propertySize, &result)
return result as String
}
}
And with that in place, you can fetch the first built-in output device:
player.audioOutputDeviceUniqueID = AudioDevice.getAll()
.first(where: {$0.hasOutputStreams && $0.isBuiltIn })?
.uid
Or you can use your AVAudioEngine approach if you want more control (note difference between uid and id here):
let player = AVAudioPlayerNode()
let engine = AVAudioEngine()
func playTheSound() {
let output = engine.outputNode
let outputUnit = output.audioUnit!
var outputDeviceID = AudioDevice.getAll()
.first(where: {$0.hasOutputStreams && $0.isBuiltIn })!
.id
AudioUnitSetProperty(outputUnit,
kAudioOutputUnitProperty_CurrentDevice,
kAudioUnitScope_Global,
0,
&outputDeviceID,
UInt32(MemoryLayout<AudioDeviceID>.size))
engine.attach(player)
engine.connect(player, to: engine.outputNode, format: nil)
try! engine.start()
let url = URL(filePath: "/System/Library/Sounds/Blow.aiff")
let file = try! AVAudioFile(forReading: url)
player.scheduleFile(file, at: nil)
player.play()
}

How can I create a spectrogram from an audio file?

I have tried to create a spectrogram using this apple tutorial but it uses live audio input from the microphone. I want to create one from an existing file. I have tried to convert apples example from live input to existing files with no luck, so I am wondering if there are any better resources out there.
Here is how I am getting the samples:
let samples: (naturalTimeScale: Int32, data: [Float]) = {
guard let samples = AudioUtilities.getAudioSamples(
forResource: resource,
withExtension: wExtension) else {
fatalError("Unable to parse the audio resource.")
}
return samples
}()
// Returns an array of single-precision values for the specified audio resource.
static func getAudioSamples(forResource: String,
withExtension: String) -> (naturalTimeScale: CMTimeScale,
data: [Float])? {
guard let path = Bundle.main.url(forResource: forResource,
withExtension: withExtension) else {
return nil
}
let asset = AVAsset(url: path.absoluteURL)
guard
let reader = try? AVAssetReader(asset: asset),
let track = asset.tracks.first else {
return nil
}
let outputSettings: [String: Int] = [
AVFormatIDKey: Int(kAudioFormatLinearPCM),
AVNumberOfChannelsKey: 1,
AVLinearPCMIsBigEndianKey: 0,
AVLinearPCMIsFloatKey: 1,
AVLinearPCMBitDepthKey: 32,
AVLinearPCMIsNonInterleaved: 1
]
let output = AVAssetReaderTrackOutput(track: track,
outputSettings: outputSettings)
reader.add(output)
reader.startReading()
var samplesData = [Float]()
while reader.status == .reading {
if
let sampleBuffer = output.copyNextSampleBuffer(),
let dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer) {
let bufferLength = CMBlockBufferGetDataLength(dataBuffer)
var data = [Float](repeating: 0,
count: bufferLength / 4)
CMBlockBufferCopyDataBytes(dataBuffer,
atOffset: 0,
dataLength: bufferLength,
destination: &data)
samplesData.append(contentsOf: data)
}
}
return (naturalTimeScale: track.naturalTimeScale, data: samplesData)
}
And here is how I am performing the "fft" or dct in this case:
static var sampleCount = 1024
let forwardDCT = vDSP.DCT(count: sampleCount,
transformType: .II)
guard let freqs = forwardDCT?.transform(samples.data) else { return }
This is the part where I begin to get lost/stuck in the apple tutorial. How can I create the spectrogram from here?

My videos have a naturalSize of only (4.0, 3.0) pixels, which is also extracted frame size

Context
I'm dealing with video files that are 1280x920, that's their actual pixel size when displayed in QuickTime, or even played in my AVPlayer.
I have a bunch of videos in a folder and I need to stick them together on a AVMutableComposition and play it.
I also need, for each video, to extract the last frame.
What I did so far was using AVAssetImageGenerator on each on my individual AVAsset and it worked, whether I was using generateCGImagesAsynchronously or copyCGImage.
But I thought it would be more efficient to run generateCGImagesAsynchronously on my composition asset, so I have only one call instead of looping with each original tracks.
Instead of :
v-Get Frame
AVAsset1 |---------|
AVAsset2 |---------|
AVAsset3 |---------|
I want to do :
v----------v----------v- Get Frames
AVMutableComposition: |---------||---------||---------|
Problem
Here is the actual issue:
import AVKit
var video1URL = URL(fileReferenceLiteralResourceName: "video_bad.mp4") // One of my video file
let asset1 = AVAsset(url: video1URL)
let track1 = asset1.tracks(withMediaType: .video).first!
_ = track1.naturalSize // {w 4 h 3}
var video2URL = URL(fileReferenceLiteralResourceName: "video_ok.mp4") // Some mp4 I got from internet
let asset2 = AVAsset(url: video2URL)
let track2 = asset2.tracks(withMediaType: .video).first!
_ = track2.naturalSize // {w 1920 h 1080}
Here is the actual screenshot of the playground (that you can download here):
And here is something else :
Look at the "Current Scale" information in QuickTime inspector. The video displays just fine, but it's showed as being really magnified (note that no pixel is blurry or anything, it has to do with some metadata)
The video file I'm working with in QuickTime:
The video file from internet:
Question
What metadata that information is and how to deal with it?
Why it is different on the original track than when put on a different composition?
How I can extract a frame on such videos?
So if you stumble across this post, it's probably you are trying to figure out Tesla's way of writing videos.
There is no easy solution to that issue, that is caused by Tesla software incorrectly setting metadata in .mov video files. I opened an incident with Apple and they were able to confirm this.
So I wrote some code to actually go and fix the video file by rewriting the bytes where it indicates the video track size.
Here we go, it's ugly but for the sake of completeness I wanted to post a solution here, if not the best.
import Foundation
struct VideoFixer {
var url: URL
private var fh: FileHandle?
static func fix(_ url: URL) {
var fixer = VideoFixer(url)
fixer.fix()
}
init(_ url: URL) {
self.url = url
}
mutating func fix() {
guard let fh = try? FileHandle(forUpdating: url) else {
return
}
var atom = Atom(fh)
atom.seekTo(AtomType.moov)
atom.enter()
if atom.atom_type != AtomType.trak {
atom.seekTo(AtomType.trak)
}
atom.enter()
if atom.atom_type != AtomType.tkhd {
atom.seekTo(AtomType.tkhd)
}
atom.seekTo(AtomType.tkhd)
let data = atom.data()
let width = data?.withUnsafeBytes { $0.load(fromByteOffset: 76, as: UInt16.self).bigEndian }
let height = data?.withUnsafeBytes { $0.load(fromByteOffset: 80, as: UInt16.self).bigEndian }
if width==4 && height==3 {
guard let offset = try? fh.offset() else {
return
}
try? fh.seek(toOffset: offset+76)
//1280x960
var newWidth = UInt16(1280).byteSwapped
var newHeight = UInt16(960).byteSwapped
let dataWidth = Data(bytes: &newWidth, count: 2)
let dataHeight = Data(bytes: &newHeight, count: 2)
fh.write(dataWidth)
try? fh.seek(toOffset: offset+80)
fh.write(dataHeight)
}
try? fh.close()
}
}
typealias AtomType = UInt32
extension UInt32 {
static var ftyp = UInt32(1718909296)
static var mdat = UInt32(1835295092)
static var free = UInt32(1718773093)
static var moov = UInt32(1836019574)
static var trak = UInt32(1953653099)
static var tkhd = UInt32(1953196132)
}
struct Atom {
var fh: FileHandle
var atom_size: UInt32 = 0
var atom_type: UInt32 = 0
init(_ fh: FileHandle) {
self.fh = fh
self.read()
}
mutating func seekTo(_ type:AtomType) {
while self.atom_type != type {
self.next()
}
}
mutating func next() {
guard var offset = try? fh.offset() else {
return
}
offset = offset-8+UInt64(atom_size)
if (try? self.fh.seek(toOffset: UInt64(offset))) == nil {
return
}
self.read()
}
mutating func read() {
self.atom_size = fh.nextUInt32().bigEndian
self.atom_type = fh.nextUInt32().bigEndian
}
mutating func enter() {
self.atom_size = fh.nextUInt32().bigEndian
self.atom_type = fh.nextUInt32().bigEndian
}
func data() -> Data? {
guard let offset = try? fh.offset() else {
return nil
}
let data = fh.readData(ofLength: Int(self.atom_size))
try? fh.seek(toOffset: offset)
return data
}
}
extension FileHandle {
func nextUInt32() -> UInt32 {
let data = self.readData(ofLength: 4)
let i32array = data.withUnsafeBytes { $0.load(as: UInt32.self) }
//print(i32array)
return i32array
}
}

How to use Spotify iOS SDK attemptToDeliverAudioFrames:ofCount:streamDescription function?

I am trying to perform some magic on Spotify's audio stream based on this. I have subclassed SPTCoreAudioController.
It seems Spotify pointer, which is passed into the overridden function, points to a 16-bit integer. I have tried to create AVAudioPCMBuffer based on audioFrames and audioDescription and pass it playerNode. The player node which is the node in Audio Engine works properly if I use an audio file.
override func attempt(toDeliverAudioFrames audioFrames: UnsafeRawPointer!, ofCount frameCount: Int, streamDescription audioDescription: AudioStreamBasicDescription) -> Int {
let ptr = audioFrames.bindMemory(to: Int16.self, capacity: frameCount)
let framePtr = UnsafeBufferPointer(start: ptr, count: frameCount)
let frames = Array(framePtr)
var newAudioDescription = audioDescription
let audioFormat = AVAudioFormat(streamDescription: &newAudioDescription)!
let audioPCMBuffer = AVAudioPCMBuffer(pcmFormat: audioFormat, frameCapacity: AVAudioFrameCount(frameCount))!
audioPCMBuffer.frameLength = audioPCMBuffer.frameCapacity
let channelCount = Int(audioDescription.mChannelsPerFrame)
if let int16ChannelData = audioPCMBuffer.int16ChannelData {
for channel in 0..<channelCount {
for sampleIndex in 0..<frameLength {
int16ChannelData[channel][Int(sampleIndex)] = frames[Int(sampleIndex)]
}
}
}
didReceive(pcmBuffer: audioPCMBuffer)
return super.attempt(toDeliverAudioFrames: audioFrames, ofCount: frameCount, streamDescription: audioDescription)
}
func didReceive(pcmBuffer: AVAudioPCMBuffer) {
playerNode.scheduleBuffer(pcmBuffer) {
}
}
I get AURemoteIO::IOThread (19): EXC_BAD_ACCESS (code=1, address=0x92e370f25cc0)
the error which I think the data is moved before I can copy it to the pcm buffer.
I was wondering if someone knows what is the proper way of using attemptToDeliverAudioFrames:ofCount:streamDescription: function?

Concatenating AVAssets seamlessly

I've got some simple AVFoundation code to concatenate a bunch of four-second-long mp4 files together that looks like this:
func
compose(parts inParts: [Part], progress inProgress: (CMTime) -> ())
-> AVAsset?
{
guard
let composition = self.composition,
let videoTrack = composition.addMutableTrack(withMediaType: .video, preferredTrackID: kCMPersistentTrackID_Invalid),
let audioTrack = composition.addMutableTrack(withMediaType: .audio, preferredTrackID: kCMPersistentTrackID_Invalid)
else
{
debugLog("Unable to create tracks for composition")
return nil
}
do
{
var time = CMTime.zero
for p in inParts
{
let asset = AVURLAsset(url: p.path.url)
if let track = asset.tracks(withMediaType: .video).first
{
try videoTrack.insertTimeRange(CMTimeRange(start: .zero, duration: asset.duration), of: track, at: time)
}
if let track = asset.tracks(withMediaType: .audio).first
{
try audioTrack.insertTimeRange(CMTimeRange(start: .zero, duration: asset.duration), of: track, at: time)
}
time = CMTimeAdd(time, asset.duration)
inProgress(time)
}
}
catch (let e)
{
debugLog("Error adding clips: \(e)")
return nil
}
return composition
}
Unfortunately, every four seconds you can hear the audio cut out for a moment, indicating to me that this isn't an entirely seamless concatenation. Is there anything I can do to improve this?
Solution
Thanks to NoHalfBits’s excellent answer below, I’ve updated the above loop with the following, and it works very well:
for p in inParts
{
let asset = AVURLAsset(url: p.path.url)
// It’s possible (and turns out, it’s often the case with UniFi NVR recordings)
// for the audio and video tracks to be of slightly different start time
// and duration. Find the intersection of the two tracks’ time ranges and
// use that range when inserting both tracks into the composition…
// Calculate the common time range between the video and audio tracks…
let sourceVideo = asset.tracks(withMediaType: .video).first
let sourceAudio = asset.tracks(withMediaType: .audio).first
var commonTimeRange = CMTimeRange.zero
if sourceVideo != nil && sourceAudio != nil
{
commonTimeRange = CMTimeRangeGetIntersection(sourceVideo!.timeRange, otherRange: sourceAudio!.timeRange)
}
else if sourceVideo != nil
{
commonTimeRange = sourceVideo!.timeRange
}
else if sourceAudio != nil
{
commonTimeRange = sourceAudio!.timeRange
}
else
{
// There’s neither video nor audio tracks, bail…
continue
}
debugLog("Asset duration: \(asset.duration.seconds), common time range duration: \(commonTimeRange.duration.seconds)")
// Insert the video and audio tracks…
if sourceVideo != nil
{
try videoTrack.insertTimeRange(commonTimeRange, of: sourceVideo!, at: time)
}
if sourceAudio != nil
{
try audioTrack.insertTimeRange(commonTimeRange, of: sourceAudio!, at: time)
}
time = time + commonTimeRange.duration
inProgress(time)
}
In a mp4 container, every track can have its own start time and duration. Especially in recorded material it is not uncommon to have audio and video tracks with slightly different time ranges (insert some CMTimeRangeShow(track.timeRange) near the insertTimeRange to have a look at this).
To overcome this, instead of blindly inserting from CMTime.zero and the duration of the whole asset (the max endtime of all tracks):
get the timeRange of the sources audio and video track
calculate the common time range from these (CMTimeRangeGetIntersection does this for you)
use the common time range when inserting the segments from the source tracks to the destination tracks
increment your time by the duration of the common time range