I am trying to implement a speech to text application,i am able to record the audio from Microphone by using SFSpeechRecognizer .The use case is as soon as the user stop speaking ,a method should invoke and stop the recording automatically .Would you be able to help me the use case.
Please find the below code
func startRecording() {
// Clear all previous session data and cancel task
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
// Create instance of audio session to record voice
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSession.Category.record, mode: AVAudioSession.Mode.measurement, options: AVAudioSession.CategoryOptions.defaultToSpeaker)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
let inputNode = audioEngine.inputNode
guard let recognitionRequest = recognitionRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
}
recognitionRequest.shouldReportPartialResults = true
self.recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if result != nil {
self.lblText.text = result?.bestTranscription.formattedString
print(result?.bestTranscription.formattedString)
print(result?.isFinal)
isFinal = (result?.isFinal)!
}
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
self.btnStart.isEnabled = true
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
self.audioEngine.prepare()
do {
try self.audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}
self.lblText.text = "Say something, I'm listening!"
}
}
As far as I understand you want to stop recognition when user stops speaking. I suggest to use Timer in order to track time spent in silence.
Add var detectionTimer: Timer? outside your startRecording(). And inside resultHandler of recognitionTask insert
self.detectionTimer?.invalidate()
self.detectionTimer = Timer.scheduledTimer(withTimeInterval: 2, repeats: false, block: { (timer) in
self.stopRecording()
})
This way after every recognised word you will start timer which will stop recognition if nothing was captured for 2 seconds. stopRecording should look something like this
audioEngine.stop()
recognitionRequest?.endAudio()
recognitionRequest = nil
audioEngine.inputNode.removeTap(onBus: 0)
// Cancel the previous task if it's running
if let recognitionTask = recognitionTask {
recognitionTask.cancel()
self.recognitionTask = nil
}
You can use a timer to achieve this. Start the time as soon as you start playing the audio engine to recognize the speech.
If speech will be recognized continuously the timer will get re started continuosly.
If there will be silence after fixed seconds selector method will get called and stop the recognition.
Below is the code -
func timerReStart() {
if timer != nil {
timer?.invalidate()
timer = nil
}
// Change the interval as per the requirement
timer = Timer.scheduledTimer(timeInterval: 20, target: self, selector: #selector(self.handleTimerValue), userInfo: nil, repeats: false)
}
#objc func handleTimerValue() {
cancelRecording()
}
func timerStop() {
guard timer != nil else { return }
timer?.invalidate()
timer = nil
}
func startRecording() {
// Clear all previous session data and cancel task
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
// Create instance of audio session to record voice
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSession.Category.record, mode: AVAudioSession.Mode.measurement, options: AVAudioSession.CategoryOptions.defaultToSpeaker)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
let inputNode = audioEngine.inputNode
guard let recognitionRequest = recognitionRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
}
recognitionRequest.shouldReportPartialResults = true
self.recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if result != nil {
self.lblText.text = result?.bestTranscription.formattedString
print(result?.bestTranscription.formattedString)
print(result?.isFinal)
isFinal = (result?.isFinal)!
self.timerReStart()
}
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
self.btnStart.isEnabled = true
self.timerStop()
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
self.audioEngine.prepare()
do {
try self.audioEngine.start()
//Start timer to check if there is silence
self.timerReStart()
} catch {
print("audioEngine couldn't start because of an error.")
}
self.lblText.text = "Say something, I'm listening!"
}
}
func cancelRecording() {
if audioEngine.isRunning {
let node = audioEngine.inputNode
node.removeTap(onBus: 0)
audioEngine.stop()
recognitionTask?.cancel()
recognitionTask = nil
}
self.timerStop()
}
Related
I am trying to use the AVAudioEngine to play a button sound. But unfortunately the sound file is played only once.
The Idea is, that the user taps on the button, a sound plays and the recording starts. After the user taps on the button again, a second sound should be playing indicating, that the recording session has been ended.
So far the first sound appears, and the recording starts.
Unfortunately the second sound (the ending sound) wont be played.
And I have found out, that when I am using the same AudioEngine as the recording function, the sound wont be played at all.
As I am completely new to the AVFoundation Framework, I am not sure what the issue here is.
Thank in advance.
var StartSoundEngineScene1 = AVAudioEngine()
var StartSoundNodeScene1 = AVAudioPlayerNode()
func SetupAudio(AudioEngine: AVAudioEngine, SoundNode: AVAudioPlayerNode, FileURL: URL) {
guard let AudioFile = try? AVAudioFile(forReading: FileURL) else{ return }
let AudioSession = AVAudioSession.sharedInstance()
AudioEngine.attach(SoundNode)
AudioEngine.connect(SoundNode, to: AudioEngine.mainMixerNode, format: AudioFile.processingFormat)
AudioEngine.prepare()
}
override func viewDidLoad() {
super.viewDidLoad()
SetupAudio(AudioEngine: StartSoundEngineScene1, SoundNode: StartSoundNodeScene1, FileURL: StartRecSound)
}
func ButtonSound (AudioEngine: AVAudioEngine, SoundNode: AVAudioPlayerNode, FileURL: URL){
try? AudioEngine.start()
guard let audioFile = try? AVAudioFile(forReading: FileURL) else{ return }
SoundNode.scheduleFile(audioFile, at: nil, completionHandler: nil)
SoundNode.volume = 0.16
SoundNode.play()
}
func StartRecording(){
ButtonSound(AudioEngine: StartSoundEngineScene1, SoundNode: StartSoundNodeScene1, FileURL: StartRecSound)
Timer.scheduledTimer(withTimeInterval: 0.7, repeats: false) { timer in
if audioEngine.isRunning {
audioEngine.stop()
recognitionRequest?.endAudio()
} else {
print("Rercording Started")
if let recognitionTask = self.recognitionTask {
recognitionTask.cancel()
self.recognitionTask = nil
}
self.recordedMessage = ""
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSession.Category.record)
try audioSession.setMode(AVAudioSession.Mode.measurement)
}catch {
print(error)
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let recognitionRequest = self.recognitionRequest else {
fatalError("Unable to create a speech audio buffer")
}
recognitionRequest.shouldReportPartialResults = true
recognitionRequest.requiresOnDeviceRecognition = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if let result = result {
let sentence = result.bestTranscription.formattedString
self.recordedMessage = sentence
print (self.recordedMessage)
isFinal = result.isFinal
}
if error != nil || isFinal {
self.audioEngine.stop()
self.audioEngine.inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
self.RecordBtn.isEnabled = true
}
})
let recordingFormat = audioEngine.inputNode.outputFormat(forBus: 0)
audioEngine.inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do{
try audioEngine.start()
}catch {
print(error)
}
}
}
}
func StopRecording(){
if audioEngine.isRunning{
audioEngine.stop()
ButtonSound(AudioEngine: StartSoundEngineScene1, SoundNode: StartSoundNodeScene1, FileURL: StopRecSound)
recognitionRequest?.endAudio()
audioEngine.inputNode.removeTap(onBus: 0)
}
}
You set the AVAudioSessionCategory as record.
try audioSession.setCategory(AVAudioSession.Category.record)
If you want to play and Record concurrently, You should set this category playAndRecord
And... If you change the AVAudioSession during playing or recording, AVAudioEngine's configuration will be changed then It fires the AVAudioEngineConfigurationChange notification.
I am using SpriteKit to build an app. The other functionality of the Scene is all done through SpriteKit. I want to add speech recognition to the scene so that when the user touches the microphone button node, the words they say are matched against the correct word they needed to say to move on. Is this possible on SpriteKit? The code below is the function I am using but it is not causing anything to happen.
func recordAndRecognizeSpeech() {
let node = audioEngine.inputNode
let recordingFormat = node.outputFormat(forBus: 0)
node.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { buffer, _ in
self.request.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
self.sendAlert(message: "There has been an audio engine error.")
return print(error)
}
guard let myRecognizer = SFSpeechRecognizer() else {
self.sendAlert(message: "Speech recognition is not supported for your current locale.")
return
}
if !myRecognizer.isAvailable {
self.sendAlert(message: "Speech recognition is not currently available. Check back at a later time.")
// Recognizer is not available right now
return
}
recognitionTask = speechRecognizer?.recognitionTask(with: request, resultHandler: { result, error in
if let result = result {
let bestString = result.bestTranscription.formattedString
self.labelNode.text = bestString
var lastString: String = ""
for segment in result.bestTranscription.segments {
let indexTo = bestString.index(bestString.startIndex, offsetBy: segment.substringRange.location)
lastString = bestString.substring(from: indexTo)
}
self.checkPhrase(resultString: lastString)
} else if let error = error {
self.sendAlert(message: "There has been a speech recognition error.")
print(error)
}
})
}
Below is the code I am using for when the mic node gets touched and nothing happens when I run it:
override func touchesBegan(_ touches: Set<UITouch>, with event: UIEvent?) {
guard let touch = touches.first else { return }
let location = touch.location(in: self)
if micButton.contains(location) {
if isRecording == true {
audioEngine.stop()
recognitionTask?.cancel()
isRecording = false
} else {
self.recordAndRecognizeSpeech()
isRecording = true
}
}
You can use the instance method requestRecordPermission(_:) to request the user’s permission for audio recording.
Try something like:
if (session.respondsToSelector("requestRecordPermission:")){
AVAudioSession.sharedInstance().requestRecordPermission({(granted: Bool)-> Void in
if granted {
println("granted")
session.setCategory(AVAudioSessionCategoryPlayAndRecord, error: nil)
session.setActive(true, error: nil)
self.recorder ()
}else{
println("not granted")
}
})
} session.setCategory(AVAudioSessionCategoryPlayAndRecord, error: nil)
session.setActive(true, error: nil)
self.recorder ()
} else{
println("not granted")
}
})
}
When I have airpods connected to my iphone and I try to override the audio to speaker, the audio defaults back to the airpods. I do not get this problem with any other bluetooth device or other audio options. How can I make the speaker output stick when airpods are connected?
Here is how I set up the audio session:
var err: Error? = nil
let session = AVAudioSession.sharedInstance()
do {
try session.setCategory(AVAudioSession.Category.playAndRecord, mode: .voiceChat, options: [.allowBluetooth, .allowBluetoothA2DP, .mixWithOthers])
} catch {
NSLog("Unable to change audio category because : \(String(describing: err?.localizedDescription))")
err = nil
}
try? session.setMode(AVAudioSession.Mode.voiceChat)
if err != nil {
NSLog("Unable to change audio mode because : \(String(describing: err?.localizedDescription))")
err = nil
}
let sampleRate: Double = 44100.0
try? session.setPreferredSampleRate(sampleRate)
if err != nil {
NSLog("Unable to change preferred sample rate because : \(String(describing: err?.localizedDescription))")
err = nil
}
try? session.setPreferredIOBufferDuration(0.005)
if err != nil {
NSLog("Unable to change preferred sample rate because : \(String(describing: err?.localizedDescription))")
err = nil
}
Speaker row on the action sheet:
let speakerOutput = UIAlertAction(title: "Speaker", style: .default, handler: {
(alert: UIAlertAction!) -> Void in
self.overrideSpeaker(override: true)
})
for description in currentRoute.outputs {
if convertFromAVAudioSessionPort(description.portType) == convertFromAVAudioSessionPort(AVAudioSession.Port.builtInSpeaker){
speakerOutput.setValue(true, forKey: "checked")
break
}
}
speakerOutput.setValue(UIImage(named: "ActionSpeaker.png")?.withRenderingMode(.alwaysOriginal), forKey: "image")
optionMenu.addAction(speakerOutput)
I am changing to speaker here and the bool does come in as true:
func overrideSpeaker(override : Bool) {
do {
let port: AVAudioSession.PortOverride = override ? .speaker : .none
try session.overrideOutputAudioPort(port)
} catch {
NSLog("audioSession error toggling speaker: \(error.localizedDescription)")
}
}
Here is my route change delegate, I get override first and then newDeviceAvailable:
#objc func handleRouteChange(_ notification: Notification) {
guard let userInfo = notification.userInfo,
let reasonValue = userInfo[AVAudioSessionRouteChangeReasonKey] as? UInt,
let reason = AVAudioSession.RouteChangeReason(rawValue:reasonValue) else {
return
}
switch reason {
case .newDeviceAvailable,
.categoryChange:
var audioShown = false
for output in session.currentRoute.outputs where output.portType != AVAudioSession.Port.builtInReceiver && output.portType != AVAudioSession.Port.builtInSpeaker {
self.showAudio()
audioShown = true
break
}
if !audioShown {
self.showSpeaker()
}
break
case .routeConfigurationChange:
break
case .override:
break
default: ()
}
}
Although it's not the best answer, you could try calling setCategory() before your call to overrideOutputAudioPort() and when you do so, omit the .allowBluetooth option. Then, if they uncheck speaker, you'll have to put it back.
Using the metadata in AVAudioSession.currentRoute.availableInputs, you could limit the use of this logic to only when the user has airpods attached.
I'm using SFSpeechRecognizer in my app which is working fine to ease the end user entering a comment in a UITextView thanks to a dedicated button (Start Speech Recognition).
But if the user is typing some text manually first and then starts its Speech Recognition, the previous text entered manually is erased. This is also the case if the user is performing two times a Speech Recognition (user is "speech" recording a first part of its text, then stop recording, and finally restart recording) on the same UITextView, the previous text is erased.
Hence, I would like to know how I can append text recognized by SFSpeechRecognizer to the existing one.
Here is my code:
func recordAndRecognizeSpeech(){
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let inputNode = audioEngine.inputNode else {
fatalError("Audio engine has no input node")
}
let recognitionRequest = self.recognitionRequest
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
self.decaration.text = (result?.bestTranscription.formattedString)!
isFinal = (result?.isFinal)!
let bottom = NSMakeRange(self.decaration.text.characters.count - 1, 1)
self.decaration.scrollRangeToVisible(bottom)
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionTask = nil
self.recognitionRequest.endAudio()
self.oBtSpeech.isEnabled = true
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}
}
I tried to update
self.decaration.text = (result?.bestTranscription.formattedString)!
by
self.decaration.text += (result?.bestTranscription.formattedString)!
but it makes a doubloon for each sentence recognized.
Any idea how I can do that ?
Try saving the text before starting the recognition system.
func recordAndRecognizeSpeech(){
// one change here
let defaultText = self.decaration.text
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let inputNode = audioEngine.inputNode else {
fatalError("Audio engine has no input node")
}
let recognitionRequest = self.recognitionRequest
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
// one change here
self.decaration.text = defaultText + " " + (result?.bestTranscription.formattedString)!
isFinal = (result?.isFinal)!
let bottom = NSMakeRange(self.decaration.text.characters.count - 1, 1)
self.decaration.scrollRangeToVisible(bottom)
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionTask = nil
self.recognitionRequest.endAudio()
self.oBtSpeech.isEnabled = true
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}
}
result?.bestTranscription.formattedString returns the entire phrase that was recognised, thats why you should reset self.decaration.text each time you get a response from SFSpeechRecognnizer.
I have the following code :
let speechRecognizer = SFSpeechRecognizer()!
let audioEngine = AVAudioEngine()
var recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
var recognitionTask = SFSpeechRecognitionTask()
var audioPlayer : AVAudioPlayer!
override func viewDidLoad() {
super.viewDidLoad()
playSound(sound: "oops")
speechRecognizer.delegate = self
requestSpeechAuth()
}
func requestSpeechAuth(){
SFSpeechRecognizer.requestAuthorization { (authStatus) in
OperationQueue.main.addOperation({
switch authStatus {
case.authorized:
print("authorized")
case.denied:
print("denied")
case.restricted:
print("restricted")
case.notDetermined:
print("not determined")
}
})
}
}
// Function called when I press on my record button
func SpeechButtonDown() {
print("Start recording")
if audioEngine.isRunning {
endRecording() {
} else {
do {
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
if let inputNode = audioEngine.inputNode {
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
print("1")
if let result = result {
self.instructionLabel.text = result.bestTranscription.formattedString
print("2")
if result.isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
if self.instructionLabel.text != "" {
self.compareWordwithVoice()
}
}
}
})
let recognitionFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recognitionFormat, block: { (buffer, when) in
self.recognitionRequest.append(buffer)
})
audioEngine.prepare()
try audioEngine.start()
}
} catch {
}
}
}
// Function called when I release the record button
func EndRecording() {
endRecording()
print("Stop recording")
}
func endRecording() {
audioEngine.stop()
recognitionRequest.endAudio()
audioEngine.inputNode?.removeTap(onBus: 0)
}
func playSound(sound: String) {
if let url = Bundle.main.url(forResource: sound, withExtension: "wav") {
do {
audioPlayer = try AVAudioPlayer(contentsOf: url)
guard let player = audioPlayer else { return }
player.prepareToPlay()
player.play()
print("tutu")
} catch let error {
print(error.localizedDescription)
}
}
}
func compareWordwithVoice() {
let StringToLearn = setWordToLearn()
print("StringToLearn : \(StringToLearn)")
if let StringRecordedFull = instructionLabel.text{
let StringRecorded = (StringRecordedFull as NSString).replacingOccurrences(of: " ", with: "").lowercased()
print("StringRecorded : \(StringRecorded)")
if StringRecorded == "appuyezsurleboutonendessousetprenoncezl’expression" {
print("not yet")
} else {
if StringToLearn == StringRecorded {
playSound(sound: "success")
print("success")
// update UI
} else {
playSound(sound: "oops")
print("oops")
// update UI
}
}
}
}
func setWordToLearn() -> String {
if let wordToLearnFull = expr?.expression {
print(wordToLearnFull)
var wordToLearn = (wordToLearnFull as NSString).replacingOccurrences(of: " ", with: "").lowercased()
wordToLearn = (wordToLearn as NSString).replacingOccurrences(of: ".", with: "")
wordToLearn = (wordToLearn as NSString).replacingOccurrences(of: "!", with: "")
wordToLearn = (wordToLearn as NSString).replacingOccurrences(of: "?", with: "")
wordToLearn = (wordToLearn as NSString).replacingOccurrences(of: ",", with: "")
wordToLearn = (wordToLearn as NSString).replacingOccurrences(of: "/", with: "")
print(wordToLearn)
return wordToLearn
}
print("no wordToLearn")
return ""
}
The problem is that the playSound works perfectly when it is in the viewDidLoad but doesn't work when it is called by the compareThing() function but it display "tutu" on both cases so it performs the playSound function every time.
Can the problem be if AVAudioPlayer and AVAudioEngine cannot work at the same time ?
Thx
Ive experienced the same thing with my code and from searching online it seems like there is an unspoken bug "when using AvAudioPlayer and Engine separately"
I got the information from the following link. I did not find anything else online that states why this bug happens though.
https://swiftios8dev.wordpress.com/2015/03/05/sound-effects-using-avaudioengine/
The suggestion was to use AVAudioEngine for everything.
I think "compareThings" always plays "oops" sound and this sound is not good (too quiet or broken).
Please try to play "oops" sound from "viewDidLoad" func to make sure sound is okay.
If it is okay (I don't think so) - set breakpoint in "playSound" func to see what is going on (sound name, does it exists etc).