Google Speech API v2 with Sockets - sockets

Does Google speech API v2 support audio streaming via web sockets?
I found a way to send POST request with audio. However, it would be great if I can write audio and send it via socket in real time.
Note: I use Firefox browser. I know that Google Chrome supports voice recognition from the box, however I'm interested in Firefox and other browsers.

V2 of the API currently does not support web sockets. The streaming API uses gRPC which would need a translation layer to work with websockets.
https://cloud.google.com/speech/reference/rpc/google.cloud.speech.v1beta1#google.cloud.speech.v1beta1.Speech.StreamingRecognize
IBM Bluemix does support websockets, check out this sample project to see it in use: https://github.com/triceam/IBMWatson-QA-Speech/blob/master/config/socket.js

Related

Is it possible to push media to Azure from web browser?

1) I'm researching the technology I can use for a browser applicaton that streams video. It should capture video from webcam and push it to service where it's stored and can be watched later. One of the (possible?) options is Azure Media Services. But after a quick look at the documentation it seems that it's not possible to use pure modern browser without plugins. Am I correct? If no, can you please give some links to github projects or an example of code to look at?
2) Another possible technology option is Amazon Kinesis Video Streams (looks lite the best solution I came up with so far), but maybe you can recommend some other cloud services?
Thanks!
Currently the short answer is no.
WebRTC is the right solution for broadcasting from a browser. That's the only protocol for live streaming that will be "somewhat" widely supported in modern browsers like latest Chrome.
AMS does not yet support receiving WebRTC. We only support RTMP and Smooth ingest right now (Chunked MP4)
As far as I'm aware, Kinesis also expects you to send chunked MKV (like chunked MP4 but a less popular container format), which would need a browser plugin or javascript library to support. I don't see any Producer library from them in Javascript.
WebRTC is your answer - but to catch that in the cloud, you may need to look at other solutions that run in an Azure Container. There are a bunch of 3rd party solutions out there for WebRTC.

Does cobalt RC 11 support Web Audio API

From my understanding, the Youtube technical requirement for 2017 and 2018 requires support of W3C Web Audio API.
Cobalt is currently not able to run the qual-e web audio web page: http://qual-e.appspot.com/webaudio.html. The video player is not rendered. Also clicking the navigation buttons does not result in sound played and with the following message on the console.
[0917/153645:ERROR:console.cc(62)] [console.error()] Error in loading sound:
Does cobalt currently support Web Audio API?
Does cobalt currently support rendering the webpage http://qual-e.appspot.com/webaudio.html?
Will such a support rely on the starboard audio_sink api or some other api?
Cobalt supports WebAudio API. This works well with linux-x64x11 reference implementation while navigating the YouTube app(https://youtube.com/tv).
The reason why WebAudio TC does not work is due to the lack of support which is not YouTube's requirement. We will rewrite the test case once we modify ShakaPlayer to work on Cobalt.

What is the difference between WebRTC, Jingle and XMPP?

What's the difference between WebRTC and Jingle. I am going to build Android based voice calling app using XMPP ejabberd server. So, which one of these will be best choice for voice calling on Android?
XMPP is a messaging protocol. Jingle the subprotocol that XMPP uses for establishing voice-over-ip calls or transfer files. WebRTC is a Javascript API (there is also a library implementing that API).
You can use Jingle as a signaling protocol to establish a peer-to-perconnection between two XMPP clients using the WebRTC API. This shows an example in Javascript that works in Chrome and Firefox (and Microsoft Edge if you only want audio).
WebRTC code in code.google.com only contains the video and audio codec, the RTP stack. The libjingle project contains the API of webRTC, it looks nurse but it's true. Besides, the libjingle has the stacks of XMPP and STUN, ICE implementation. If you want to make a total solution for VOIP, you have to build both.

Different between Google Speech API and Web Speech API

I am working on web speech recognition.
And I found that Google provide a API which call "Google speech API V2" to developer. But I notice there is a limit on every day to use it.
After that I found there is a native WEB Speech API also can implement the speech recognition. And it just working on google chrome and opera:
http://caniuse.com/#feat=speech-recognition
So
1. What is the different Google Speech API and Web Speech API? Are they have any relations?
The speech recognition result json is return from google. Is that the google speech api will be more accurate than web speech api?
Thank you.
The Web Speech API is a W3C supported specification that allows browser vendors to supply a speech recognition engine of their choosing (be it local or cloud-based) that backs an API you can use directly from the browser without having to worry about API limits and the like. You could imagine that Apple might power this with Siri and Microsoft might power this with Cortana. Again, browser vendors could opt to use the built in dictation software in the operating system, but that doesn't seem to currently be the trend. If your trying to perform simple speech synthesis in a browser (e.g. voice commands), this is likely the best path to take, especially as adoption grows.
The Google Speech API is a cloud-based solution that allows you to use Google's speech software outside of a browser. It also provides broader language support and can transcribe longer audio files. If you have a 20min audio recording you want to transcribe, this would be the path to take. As of the time of this writing, Google charges $0.006 for every 15s recorded after the first hour for this service.
The Web API is REST based API with API key authentication, especially for web pages which needs a a simple feature set.
While Google Speech API basically is a gRPC API with various authentication method. There are lot feature is available when you use gRPC, like authentication, faster calling, and streaming!!!

Windows Media Services Web Streaming

Which web player I can use to get live stream from Windows Media Services? Is there are any crossplatform solutions (Windows, iPad/iPhone)? Should I make live convertation to flv or any other trick?
You could try using h264/aac video format for targeting iOS systems as explained in this article: Apple HTTP Live Streaming with IIS Media Services (this is kind of your only choice if you want to support iphone/ipad etc). This format will also be valid for Windows Phone 7 devices.
For the rest of Windows-based systems you could use Silverlight as streaming client, although you will need to use a different format based on Windows Media Video.