I am making a python application that takes text converts it into audio using ibm cloud Watson TTS, then return an audio using
content = watson_tts.synthesize(text, voice), accept=format).get_result().content
then I want to take this content and stream using Gstreamer, without saving it to a file.
I know how to play files from uri using this:
player = Gst.ElementFactory.make("playbin", "player")
player.set_property("uri", uri)
player.set_state(Gst.State.PLAYING)
but that's not what I want,
what I want is being able to stream the audio directly without downloading
After executing
content = watson_tts.synthesize(text, voice), accept=format).get_result()
synthesized audio is already "downloaded" from IBM's service so instead of "what I want is being able to stream the audio directly without downloading" I suppose it's better to say "... without saving to a file".
Anyways... to "programmatically" feed gstreamer's pipeline with (audio) bytes from Python's content object, you can utilize appsrc element.
For example, the pipeline can be implemented something like this
and it will produce MPEG Transport Stream with aac encoded audio streamed via UDP.
Related
Does anybody know of a way I can extract timing data from any kind of live video stream?
I have tried with JWPlayer, but it is not capable of doing that.
My encoder is Streamcoders Mediasuite and I am happy to stream using whatever streamtype is necessary, in order for me to get cuepoint info (or any kind of timing info) from the stream.
Caveat - Flash and Silverlight are not an option, as the viewer-base is restricted by policy.
Thanks in advance.
Neil.
HLS has timed metadata, which can be used from within iOS / OS X (and some Flash-based players) to launch JavaScript events at a certain point in a live video stream by running a JavaScript event handler when the metadata arrives: HTTP Live Streaming: how to listen for timed metadata embedded as ID3 tags using Javascript in iOS8?
RTMP (Flash) has cue points, which can be used for the same effect.
Is there any way to do something like this with a live (not VOD) MPEG DASH stream?
With MPEG Dash you can make use of Inline and Inband events. Those events have a presentation time and a unique combination of schemeIdURI and value. In your DASH player you can usually register for that events and will get a callback if they occur.
Inline events are signalled directly in the manifest file, while inband events are multiplexed into specific segments. You can find a working demo and inband events here. In that example an event is used to trigger a reload of the manifest file. Nevertheless you also use that mechanism for your own custom events.
I am able to generate speech from text using Chrome's Speech Synthesis API (in Version 33.0.1750.112 beta-m) in the following manner
var transcript = document.getElementById("speechTxt").value;
var msg = new SpeechSynthesisUtterance(transcript);
speechSynthesis.speak(msg);
Now I want to save this speech in a file (maybe using WebAudio API). Is this possible through some function call?
I have looked at the methods in Speech Synthesis API and there is nothing to save this speech data. Using WebAudio API I am able to capture this speech sound in the microphone but that introduces a lot of unnecessary noise. Is it not possible to save this speech data inside the Chrome browser itself as it is the one which is generating it in the first place?
Unforunately no. Apparently there was no major use case, see this answer
But you can use a js TTS library like mespeak. It outputs buffers that can be played back via web audio buffer nodes. (Although the engine it does not sound as natural chrome's).
Enviroment
iphone
arm7/sdk6.0
xcode 4.5
Use-case
Based on the AVCam sample
Capture A/V into a file using AVCaptureMovieFileOutput
Add an additional AVCaptureAudioDataOutput to intercept the audio being written to the file while recording
How-to
Add Video input to the Capture session
Add Audio input to the Capture session
Add File Output to the Capture session
Add Audio Output to the Capture session
Configure
Start recording
The problem
It seems the audio output is mutual exclusive, thus, either I get data being written to the disk, OR, I get AVCaptureAudioDataOutput capture delegate being called, when AVCaptureMovieFileOutput is added ( order doesn't matter ), AVCaptureAudioDataOutput delegate is not called.
How can this be solved? how can I get 'AVCaptureAudioDataOutput' triggering it's delegate/selector while, at the same time 'AVCaptureMovieFileOutput' is used to write data to the disk?
Can this be done in any way other way than using a lower level API such as eg. AVAssetWriter et al ?
Any help will be appreciated!
AVAssetWriter is to be used in conjunction with AVAssetWriterInputPixelBufferAdaptor, a good example of how this can be achieved can be found here.
Then, upon 'AVCaptureAudioDataOutputSampleBufferDelegate' invocation, the raw audio buffer can be propagated out for further processing ( in parallel to having the data written to the disk ).
Live555 lib has a nice example testOnDemandRTSPServer.cpp This example just stream "one" given file. I want to stream more than one file. Does Live555 has playlist concept or how to stream more than one file in Live555?
Best Wishes
PS: I try to add more than one subsession, in that case Live555 just stream the last session file...
There is one more application that comes with the live555 code. Live555Media server is present inside the source code's mediaServer directory. This does the job. It uses the dynamicRTSP server class. You give it the folder with all your media files and access them as rtsp://ip/filename.
My 0.02 cents:
I'm not sure if that makes sense: how would you ensure that they are all encoded in the same format which is a requirement if you want to stream them in the same session. RTSP describe gets a media session description of the file and this is used to setup the streaming sessions so it is crucial that all files encoded similarly.
RTSP does not make any provision for playlists. Usually playlists are not transferred via RTSP, but say via HTTP. IMO if the playlist resides on the client it would make more sense to await the RTCP bye packet (at the eof) and then to do a SETUP and PLAY for the next file/RTSP URI in the playlist.
If you just want to stream a sequence of files (playlist is on the server) where the RTSP client just initiates one session, of course nothing prevents you from creating a custom file source in the live555 library that does what you want...
Recently I had to do similar task and with similar functionality:
Here what you can do for video H264 stream files to play in the row like playlist (of course if they are same resolution, encoding profile,etc)
You would have to modify ByteStreamFileSource::doGetNextFrame method.
There is code like feof(fFid)
if (feof(fFid))
{
CloseInputFile(fFid);
fFid = OpenInputFile(envir(), "test.264");
//fileName
}
else ....
Of course if you still need LGPL compliance you there will be more work to do... You will have to copy/rename this class outside library and do the same with H264VideoFileServerMediaSubsession and modify method createNewStreamSource that it would use you rewritten class of ByteStreamFileSource.