Record HTML5 SpeechSynthesisUtterance generated speech to file - web-audio-api

I am able to generate speech from text using Chrome's Speech Synthesis API (in Version 33.0.1750.112 beta-m) in the following manner
var transcript = document.getElementById("speechTxt").value;
var msg = new SpeechSynthesisUtterance(transcript);
speechSynthesis.speak(msg);
Now I want to save this speech in a file (maybe using WebAudio API). Is this possible through some function call?
I have looked at the methods in Speech Synthesis API and there is nothing to save this speech data. Using WebAudio API I am able to capture this speech sound in the microphone but that introduces a lot of unnecessary noise. Is it not possible to save this speech data inside the Chrome browser itself as it is the one which is generating it in the first place?

Unforunately no. Apparently there was no major use case, see this answer
But you can use a js TTS library like mespeak. It outputs buffers that can be played back via web audio buffer nodes. (Although the engine it does not sound as natural chrome's).

Related

How to stream audio in Gstreamer (Python)

I am making a python application that takes text converts it into audio using ibm cloud Watson TTS, then return an audio using
content = watson_tts.synthesize(text, voice), accept=format).get_result().content
then I want to take this content and stream using Gstreamer, without saving it to a file.
I know how to play files from uri using this:
player = Gst.ElementFactory.make("playbin", "player")
player.set_property("uri", uri)
player.set_state(Gst.State.PLAYING)
but that's not what I want,
what I want is being able to stream the audio directly without downloading
After executing
content = watson_tts.synthesize(text, voice), accept=format).get_result()
synthesized audio is already "downloaded" from IBM's service so instead of "what I want is being able to stream the audio directly without downloading" I suppose it's better to say "... without saving to a file".
Anyways... to "programmatically" feed gstreamer's pipeline with (audio) bytes from Python's content object, you can utilize appsrc element.
For example, the pipeline can be implemented something like this
and it will produce MPEG Transport Stream with aac encoded audio streamed via UDP.

Timed metadata in MPEG DASH?

HLS has timed metadata, which can be used from within iOS / OS X (and some Flash-based players) to launch JavaScript events at a certain point in a live video stream by running a JavaScript event handler when the metadata arrives: HTTP Live Streaming: how to listen for timed metadata embedded as ID3 tags using Javascript in iOS8?
RTMP (Flash) has cue points, which can be used for the same effect.
Is there any way to do something like this with a live (not VOD) MPEG DASH stream?
With MPEG Dash you can make use of Inline and Inband events. Those events have a presentation time and a unique combination of schemeIdURI and value. In your DASH player you can usually register for that events and will get a callback if they occur.
Inline events are signalled directly in the manifest file, while inband events are multiplexed into specific segments. You can find a working demo and inband events here. In that example an event is used to trigger a reload of the manifest file. Nevertheless you also use that mechanism for your own custom events.

How does Bravia Engine get called on AOSP?

I found the AOSP source code from Google and also retrieved vendor's info from https://github.com/sonyxperiadev/device-sony-sgp321
Sony added its Bravia Engine library to AOSP to improve image and video quality. It can either be called in libstagefright's awesomelocalrenderer or called at the decoding phase, when OMX addPlugin is called.
I searched both places, the code there are the same compare with other native AOSP source code. I would like to know how does Sony use its BE library?
Bravia engine is mainly employed for video/image post-processing prior to rendering on the framework. There is an interesting link at http://developer.sonymobile.com/2012/06/21/mobile-bravia-engine-explained-video/.
In AOSP, I presume the user settings from the menu are read and subsequent filtering is enabled/applied in SurfaceFlinger or HwComposer parts of the framework. Another link of interest could be: http://blog.gsmarena.com/heres-what-sony-ericsson-mobile-bravia-engine-really-does-review/
EDIT: Interaction between Video Decoder - AwesomePlayer - HwComposer
The following is a summary of interactions between the different actors in the playback and composition pipeline.
AwesomePlayer acts as a sink to the OMX Video Decoder. Hence, it will continuously poll for a new frame that could be available for rendering and processing.
When OMX Video Decoder completes the decoder, the FillBufferDone callback of the codec will unblock a read invoked by the AwesomePlayer.
Once the frame is available, it is subjected to the A/V synchronization logic by the AwesomePlayer module and pushed into SurfaceTexture via the render call. All the aforementioned steps are performed as part of AwesomePlayer::onVideoEvent method.
The render will queue the buffer. This SurfaceTexture is one of the layers available for the composition to the SurfaceFlinger.
When a new layer is available, through a series of steps, SurfaceFlinger will invoke the HwComposer to perform the composition of all the related layers.
AOSP only provides a template or an API for the HwComposer, the actual implementation of which is left to the vendor.
My guess is that all vendor specific binaries are just implementing the standard interface defined by Android/OMX.
And these engine is complied into shared objects which can be found at /system/vendor directory.
The Android system just have to look at the directory and load the necessary shared objects.

Audio Input and Output in same iOS App

Okay so I am creating simple objects to later be used for audio input and output. Both objects work independently just fine, but when I try to use them in the same application, they clash and the audio input object gets blocked out by the output object.
The output object is using AudioUnitSessions to pass samples into a buffer and play audio, while the input object is using AudioQueue to feed in samples from the microphone, which we can later process.
I think the solution is as simple as deactivating the AudioUnitSession, but this does not seem to be working. I am doing this the following way
AudioSessionSetActive(true) or AudioSessionSetActive(false)
above depends on whether I am trying to activate it or not.
Apparently this does not work because whenever I try to recreate the input object, it fails to initialize the recording with OSStatus error number -50.
Does anyone know of a way around this, or a simple way of audio input and output in the same application.

iPhone Remote IO Issues

I've been playing around with the SDK recently, and I had an idea to just build a personal autotuner (because I am just as awesome as T-Pain).
Intro aside, I wanted to attach a high-quality microphone into the headphone jack, and I wanted my audio to be processed in a callback, and then copied to the output buffer. This has several implications:
When my audio-in is being routed through the built-in microphone, I need to be able to process this input, and send it once my input has stopped (this works).
When my audio-in is being routed through the microphone-in input from the headset jack, I want the output to be sent immediately.
Routing, however, doesn't seem to work properly when using AudioSession modes and overrides, which technically should allow you to reroute output to the iPhone speakers, no matter where the input is coming from. This is documented to work, but in practice, doesn't really work.
Remote IO, however, is not documented at all. Anyone with experience using Remote IO audio units, can you give me a reasonable high-level overview on how to do this properly? I have been using the aurioTouch example code, but I am running into errors where I get error codes like -50 and -10863, none of which are documented.
Thanks in advance.
The aurioTouch example implements remoteIO play through.
You could modify the samples before passing them on.
It simply calls AudioUnitRender in the output render callback.
NB this trick does not seem to work if you port the code
to OSX style CoreAudio. There, 99% of the time, you need
to create two AUHALs (RemoteIO-a-likes) and pass
the samples between them.