How to get frames from a local video file in Swift? - swift

I need to get the frames from a local video file so i can process them before the video is played. I already tried using AVAssetReader and VideoOutput.
[EDIT] Here is the code i used from Accesing Individual Frames using AV Player
let asset = AVAsset(URL: inputUrl)
let reader = try! AVAssetReader(asset: asset)
let videoTrack = asset.tracksWithMediaType(AVMediaTypeVideo)[0]
// read video frames as BGRA
let trackReaderOutput = AVAssetReaderTrackOutput(track: videoTrack, outputSettings:[String(kCVPixelBufferPixelFormatTypeKey): NSNumber(unsignedInt: kCVPixelFormatType_32BGRA)])
reader.addOutput(trackReaderOutput)
reader.startReading()
while let sampleBuffer = trackReaderOutput.copyNextSampleBuffer() {
print("sample at time \(CMSampleBufferGetPresentationTimeStamp(sampleBuffer))")
if let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) {
// process each CVPixelBufferRef here
// see CVPixelBufferGetWidth, CVPixelBufferLockBaseAddress, CVPixelBufferGetBaseAddress, etc
}
}

I believe AVAssetReader should work. What did you try? Have you seen this sample code from Apple? https://developer.apple.com/library/content/samplecode/ReaderWriter/Introduction/Intro.html

I found out what the problem was! It was with my implementation. The code i posted is correct. Thank you all

You can have a look at VideoToolbox : https://developer.apple.com/documentation/videotoolbox
But beware: this is close to the hardware decompressor and sparsely documented terrain.

Depending on what processing you want to do, OpenCV may be a an option - in particular if you are detecting or tracking objets in your frames. If your needs are simpler, then the effort to use OpenCV with swift may be a little too much - see below.
You can open a video, read it frame by frame, do your work on the frames and then display then - bearing in mind the need to be efficient to avoid delaying the display.
The basic code structure is quite simple - this is a python example but the same principles apply across supported languages
import numpy as np
import cv2
cap = cv2.VideoCapture('vtest.avi')
while(cap.isOpened()):
ret, frame = cap.read()
//Do whatever work you want on the frame here - in this example
//from the tutorial the image is being converted from one colour
//space to another
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
//This displays the resulting frame
cv2.imshow('frame',gray)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
More info here: http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_gui/py_video_display/py_video_display.html
The one caveat is that using OpenCV with swift requires some additional effort - this is a good example, but it evolves constantly so it is worth searching for if you decide to go this way: https://medium.com/#yiweini/opencv-with-swift-step-by-step-c3cc1d1ee5f1

Related

How can I parse/read, not play, an audio file in Swift based on its formatting?

My app already finds silence in audio as it is being recorded on an iOS device. I compare background and speaking volumes from samples then compare it and so far it has been pretty accurate. This way I save the audio into smaller multiple files, yet without cutting words.
Now I would like to do the same with prerecorded audio files (m4a).
Open an m4a audio file on my iOS device
Read 0.5 seconds of an audio file based on it's format, channels, etc,
Use my detect silence function to determine whether to read more or less data
While I have no problem opening the files and getting their format (below), I am unsure as to how to calculate how much of the data to read in to get a sample. If I determine just silence, read more data in. If I determine there's a spoken word, read less data so as not to clip the word.
I have tried googling to get info as to how to interpret the format of the file (sampling rate, channels, etc. )to figure out how much to read, but have not been able to find any.
//left out error checking in this sample
let input = try? AVAudioFile(forReading: url) else {
return nil
}
let buffer = AVAudioPCMBuffer(pcmFormat: input.processingFormat, frameCapacity: AVAudioFrameCount(input.length)) else {
return nil
}
do {
// reads entire audio file
try input.read(into: buffer)
} catch {
return nil
}

increase volume of audio file recorded with swift

I am developing an application with swift. I would like to be able to increase the volume of a recorded file. Is there a way to do it directly inside the application?
I found Audiokit Here and this question but it didn't help me much.
Thanks!
With AudioKit
Option A:
Do you just want to import a file, then play it louder than you imported it? You can use an AKBooster for that.
import AudioKit
do {
let file = try AKAudioFile(readFileName: "yourfile.wav")
let player = try AKAudioPlayer(file: file)
// Define your gain below. >1 means amplifying it to be louder
let booster = AKBooster(player, gain: 1.3)
AudioKit.output = booster
try AudioKit.start()
// And then to play your file:
player.play()
} catch {
// Log your error
}
Just set the gain value of booster to make it louder.
Option B: You could also try normalizing the audio file, which essentially applies a multiple constant across the recording (with respect to the highest signal level in the recording) so it reaches a new target maximum that you define. Here, I set it to -4dB.
let url = Bundle.main.url(forResource: "sound", withExtension: "wav")
if let file = try? AKAudioFile(forReading: url) {
// Set the new max level (in dB) for the gain here.
if let normalizedFile = try? file.normalized(newMaxLevel: -4) {
print(normalizedFile.maxLevel)
// Play your normalizedFile...
}
}
This method increases the amplitude of everything to a level of dB - so it won't effect the dynamics (SNR) of your file, and it only increases by the amount it needs to reach that new maximum (so you can safely apply it to ALL of your files to have them be uniform).
With AVAudioPlayer
Option A: If you want to adjust/control volume, AVAudioPlayer has a volume member but the docs say:
The playback volume for the audio player, ranging from 0.0 through 1.0 on a linear scale.
Where 1.0 is the volume of the original file and the default. So you can only make it quieter with that. Here's the code for it, in case you're interested:
let soundFileURL = Bundle.main.url(forResource: "sound", withExtension: "mp3")!
let audioPlayer = try? AVAudioPlayer(contentsOf: soundFileURL, fileTypeHint: AVFileType.mp3.rawValue)
audioPlayer?.play()
// Only play once
audioPlayer?.numberOfLoops = 0
// Set the volume of playback here.
audioPlayer?.volume = 1.0
Option B: if your sound file is too quiet, it might be coming out the receiver of the phone. In which case, you could try overriding the output port to use the speaker instead:
do {
try AVAudioSession.sharedInstance().overrideOutputAudioPort(AVAudioSession.PortOverride.speaker)
} catch let error {
print("Override failed: \(error)")
}
You can also set that permanently with this code (but I can't guarantee your app will get into the AppStore):
try? audioSession.setCategory(AVAudioSessionCategoryPlayAndRecord, with: AVAudioSessionCategoryOptions.defaultToSpeaker)
Option C: If Option B doesn't do it for you, you might be out of luck on 'how to make AVAudioPlayer play louder.' You're best off editing the source file with some external software yourself - I can recommend Audacity as a good option to do this.
Option D: One last option I've only heard of. You could also look into MPVolumeView, which has UI to control the system output and volume. I'm not too familiar with it though - may be approaching legacy at this point.
I want to mention a few things here because I was working on a similar problem.
On the contrary to what's written on Apple Docs on the AVAudioPlayer.volume property (https://developer.apple.com/documentation/avfoundation/avaudioplayer/1389330-volume) the volume can go higher than 1.0... And actually this works. I bumped up the volume to 100.0 on my application and recorded audio is way louder and easier to hear.
Another thing that helped me was setting the mode of AVAudioSession like so:
do {
let session = AVAudioSession.sharedInstance()
try session.setCategory(.playAndRecord, options: [.defaultToSpeaker, .allowBluetooh])
try session.setMode(.videoRecording)
try session.setActive(true)
} catch {
debugPrint("Problem with AVAudioSession")
}
session.setMode(.videoRecording) is the key line here. This helps you to send the audio through the louder speakers of the phone and not just the phone call speaker that's next to the face camera in the front. I was having a problem with this and posted a question that helped me here:
AVAudioPlayer NOT playing through speaker after recording with AVAudioRecorder
There are several standard AudioKit DSP components that can increase the volume.
For example, you can use a simple method like AKBooster: http://audiokit.io/docs/Classes/AKBooster.html
OR
Use the following code,
AKSettings.defaultToSpeaker = true
See more details in this post:
https://github.com/audiokit/AudioKit/issues/599
https://github.com/AudioKit/AudioKit/issues/586

SWIFT - Is it possible to save audio from AVAudioEngine, or from AudioPlayerNode? If yes, how?

I've been looking around Swift documentation to save an audio output from AVAudioEngine but I couldn't find any useful tip.
Any suggestion?
Solution
I found a way around thanks to matt's answer.
Here a sample code of how to save an audio after passing it through an AVAudioEngine (i think that technically it's before)
newAudio = AVAudioFile(forWriting: newAudio.url, settings: nil, error: NSErrorPointer())
//Your new file on which you want to save some changed audio, and prepared to be bufferd in some new data...
var audioPlayerNode = AVAudioPlayerNode() //or your Time pitch unit if pitch changed
//Now install a Tap on the output bus to "record" the transformed file on a our newAudio file.
audioPlayerNode.installTapOnBus(0, bufferSize: (AVAudioFrameCount(audioPlayer.duration)), format: opffb){
(buffer: AVAudioPCMBuffer!, time: AVAudioTime!) in
if (self.newAudio.length) < (self.audioFile.length){//Let us know when to stop saving the file, otherwise saving infinitely
self.newAudio.writeFromBuffer(buffer, error: NSErrorPointer())//let's write the buffer result into our file
}else{
audioPlayerNode.removeTapOnBus(0)//if we dont remove it, will keep on tapping infinitely
println("Did you like it? Please, vote up for my question")
}
}
Hope this helps !
One issue to solve:
Sometimes, your outputNode is shorter than the input: if you accelerate the time rate by 2, your audio will be 2 times shorter. This is the issue im facing for now since my condition for saving the file is (line 10)
if(newAudio.length) < (self.audioFile.length)//audiofile being the original(long) audio and newAudio being the new changed (shorter) audio.
Any help here?
Yes, it's quite easy. You simply put a tap on a node and save the buffer into a file.
Unfortunately this means you have to play through the node. I was hoping that AVAudioEngine would let me process one sound file into another directly, but apparently that's impossible - you have to play and process in real time.
Offline rendering Worked for me using GenericOutput AudioUnit. Please check this link, I have done mixing two,three audios offline and combine it to a single file. Not the same scenario but it may help you for getting some idea. core audio offline rendering GenericOutput

Using Apple's new AudioEngine to change Pitch of AudioPlayer sound

I am currently trying to get Apple's new audio engine working with my current audio setup. Specifically, I am trying to change the pitch with Audio Engine, which apparently is possible according to this post.
I have also looked into other pitch changing solutions including Dirac and ObjectAL, but unfortunately both seem to be pretty messed up in terms of working with Swift, which I am using.
My question is how do I change the pitch of an audio file using Apple's new audio engine. I am able to play sounds using AVAudioPlayer, but I am not getting how the file is referenced in audioEngine. In the code on the linked page there is a 'format' that refers to audio file, but I am not getting how to create a format, or what it does.
I am playing sounds with this simple code:
let path = NSBundle.mainBundle().pathForResource(String(randomNumber), ofType:"m4r")
let fileURL = NSURL(fileURLWithPath: path!)
player = AVAudioPlayer(contentsOfURL: fileURL, error: nil)
player.prepareToPlay()
player.play()
You use an AVAudioPlayerNode, not an AVAudioPlayer.
engine = AVAudioEngine()
playerNode = AVAudioPlayerNode()
engine.attachNode(playerNode)
Then you can attach an AVAudioUnitTimePitch.
var mixer = engine.mainMixerNode;
auTimePitch = AVAudioUnitTimePitch()
auTimePitch.pitch = 1200 // In cents. The default value is 1.0. The range of values is -2400 to 2400
auTimePitch.rate = 2 //The default value is 1.0. The range of supported values is 1/32 to 32.0.
engine.attachNode(auTimePitch)
engine.connect(playerNode, to: auTimePitch, format: mixer.outputFormatForBus(0))
engine.connect(auTimePitch, to: mixer, format: mixer.outputFormatForBus(0))

AVAudioPlayer - Metering - Want to build a waveform (graph)

I need to build a visual graph that represents voice levels (dB) in a recorded file. I tried to do it this way:
NSError *error = nil;
AVAudioPlayer *meterPlayer = [[AVAudioPlayer alloc]initWithContentsOfURL:[NSURL fileURLWithPath:self.recording.fileName] error:&error];
if (error) {
_lcl_logger(lcl_cEditRecording, lcl_vError, #"Cannot initialize AVAudioPlayer with file %# due to: %# (%#)", self.recording.fileName, error, error.userInfo);
} else {
[meterPlayer prepareToPlay];
meterPlayer.meteringEnabled = YES;
for (NSTimeInterval i = 0; i <= meterPlayer.duration; ++i) {
meterPlayer.currentTime = i;
[meterPlayer updateMeters];
float averagePower = [meterPlayer averagePowerForChannel:0];
_lcl_logger(lcl_cEditRecording, lcl_vTrace, #"Second: %f, Level: %f dB", i, averagePower);
}
}
[meterPlayer release];
It would be cool if it worked out however it didn't. I always get -160 dB. Any other ideas on how to implement that?
UPD: Here is what I got finally:
alt text http://img22.imageshack.us/img22/5778/waveform.png
I just want to help the others who have come into this same question and used a lot of time to search. To save your time, I put out my answer. I dislike somebody here who treat this as kind of secret...
After search around the articles about extaudioservice, audio queue and avfoundation.
I realised that i should use AVFoundation, reason is simple, it is the latest bundle and it is Objective C but not so cpp style.
So the steps to do it is not complicated:
Create AVAsset from the audio file
Create avassetreader from the avasset
Create avassettrack from avasset
Create avassetreadertrackoutput from avassettrack
Add the avassetreadertrackoutput to the previous avassetreader to start reading out the audio data
From the avassettrackoutput you can copyNextSampleBuffer one by one (it is a loop to read all data out).
Each copyNextSampleBuffer gives you a CMSampleBufferRef which can be used to get AudioBufferList by CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer. AudioBufferList is array of AudioBuffer. AudioBuffer is the a bunch of audio data which is stored in its mData part.
You can implement the above in extAudioService as well. But i think the above avfoundation approach is easier.
So next question, what to do with the mData? Note that when you get the avassetreadertrackoutput, you can specify its output format, so we specify the output is lpcm.
Then the mData you finally get is actually a float format amplitude value.
Easy right? Though i used a lot of time to organise this from piece here and there.
Two useful resource for share:
Read this article to know basic terms and conceptions: https://www.mikeash.com/pyblog/friday-qa-2012-10-12-obtaining-and-interpreting-audio-data.html
Sample code: https://github.com/iluvcapra/JHWaveform
You can copy most of the above mentioned code from this sample directly and used for your own purpose.
I haven't used it myself, but Apple's avTouch iPhone sample has bar graphs powered by AVAudioPlayer, and you can easily check to see how they do it.
I don't think you can use AVAudioPlayer based on your constraints. Even if you could get it to "start" without actually playing the sound file, it would only help you build a graph as fast as the audio file would stream. What you're talking about is doing static analysis of the sound, which will require a much different approach. You'll need to read in the file yourself and parse it manually. I don't think there's a quick solution using anything in the SDK.
Ok guys, seems I'm going to answer my own question again: http://www.supermegaultragroovy.com/blog/2009/10/06/drawing-waveforms/ No a lot of concretics, but at least you will know what Apple docs to read.