IBM Watson Speech to Text and webm - ibm-cloud

Currently IBM Watson Speech to Text service supports only “ogg” compressed format. However, a new standard for WebRTC platform is “webm”. As a result, we have to either use Firefox or send huge “wav” files without compression to Bluemix from a client browser. Is it possible to make support of “webm”?

The service added support for webm on April 10th, 2017. See the release notes. Additionally, here is a list of the audio formats supported by the service.

Related

Is it possible to upload a video file to IBM Cloud Functions / OpenWhisk function and encode it?

We are developing a video streaming platform. In that we want to encode video into H.264 format after uploading it.
We decided to use IBM Cloud Functions / OpenWhisk to encode the video, but having some doubts. Is it possible to upload a video file to IBM Cloud Functions / OpenWhisk and encode it? Is it supported, how can it be done?
Yes, that should be possible.
I recommend checking out this "Dark Vision" app using IBM Cloud Functions. You can upload videos which then are split into frames, the frames processed with Visual Recognition. The source code for Dark Vision is available on GitHub.
In addition you should go over the documented IBM Cloud Functions system limits to see if they match your requirements.

Different between Google Speech API and Web Speech API

I am working on web speech recognition.
And I found that Google provide a API which call "Google speech API V2" to developer. But I notice there is a limit on every day to use it.
After that I found there is a native WEB Speech API also can implement the speech recognition. And it just working on google chrome and opera:
http://caniuse.com/#feat=speech-recognition
So
1. What is the different Google Speech API and Web Speech API? Are they have any relations?
The speech recognition result json is return from google. Is that the google speech api will be more accurate than web speech api?
Thank you.
The Web Speech API is a W3C supported specification that allows browser vendors to supply a speech recognition engine of their choosing (be it local or cloud-based) that backs an API you can use directly from the browser without having to worry about API limits and the like. You could imagine that Apple might power this with Siri and Microsoft might power this with Cortana. Again, browser vendors could opt to use the built in dictation software in the operating system, but that doesn't seem to currently be the trend. If your trying to perform simple speech synthesis in a browser (e.g. voice commands), this is likely the best path to take, especially as adoption grows.
The Google Speech API is a cloud-based solution that allows you to use Google's speech software outside of a browser. It also provides broader language support and can transcribe longer audio files. If you have a 20min audio recording you want to transcribe, this would be the path to take. As of the time of this writing, Google charges $0.006 for every 15s recorded after the first hour for this service.
The Web API is REST based API with API key authentication, especially for web pages which needs a a simple feature set.
While Google Speech API basically is a gRPC API with various authentication method. There are lot feature is available when you use gRPC, like authentication, faster calling, and streaming!!!

Google Speech API v2 with Sockets

Does Google speech API v2 support audio streaming via web sockets?
I found a way to send POST request with audio. However, it would be great if I can write audio and send it via socket in real time.
Note: I use Firefox browser. I know that Google Chrome supports voice recognition from the box, however I'm interested in Firefox and other browsers.
V2 of the API currently does not support web sockets. The streaming API uses gRPC which would need a translation layer to work with websockets.
https://cloud.google.com/speech/reference/rpc/google.cloud.speech.v1beta1#google.cloud.speech.v1beta1.Speech.StreamingRecognize
IBM Bluemix does support websockets, check out this sample project to see it in use: https://github.com/triceam/IBMWatson-QA-Speech/blob/master/config/socket.js

Encoding of audio (mp3, mp4, m4a, ogg) file for smooth streaming window media services

I want to encode the audio file (mp3, mp4, m4a, ogg) for the streaming and want to play (I want to play encoded file smoothly) using the HTML5 player but I think HTML5 player.
So now what I am doing, I am uplaoding a file and econding this file on windows Azure Media Services using the preset "AAC Good Quality Audio". It encode the file with .mp4 file format and then I create SAS locator to run this file, it works well but the problem is that user can download it too which I don't want to allow.
If I create the OnDemandOrigin locator of the same encoded asset, it gives me 404 erroe. It means we can not play it.
Below are the steps that I have used to upload the file on Azure Media Services:
Created the empty assest.
Upload the file into the asset.
Then create the new task job to encode the audio file.
I have successfully encoded the file but when I try to generate the origin url it generate the url but when I browse the file I get
the error 404.
My queries:
"AAC Good Quality Audio" preset is the right for my task?
How can I restrict the user to download the file, if I use sas locator.
Is it possible to play the encoded file using origin locator.
Can I encode audio files for smooth streaming ? If I can then which player I should use to run the encoded file for all browsers, IOS devices and android devices.
If you want further details please feel free to ask me.
Awaiting your response.
Thanks
If your user is able to listen to the audio you're publishing, they will also be able to download the file. This you can not prevent. At best, you can make it difficult, but not impossible. More to the point, Media Services at its current incarnation has no way for you to do authorization of any kind, so the only tool you've got is time-bombed SAS locators.
The typical solution for this problem is to use DRM. Media Services supports PlayReady encryption, but you need to either have a PlayReady server or purchase it as a service (there is currently a service in the Azure Marketplace that provides PlayReady for a monthly price).
See following article how to protect assets with Microsoft PlayReady technology
Origin Locators are something you would use to publish a Smooth Stream or HLS asset. It is not useful for regular media files, as it is internally something equivalent to an IIS Media Services endpoint. For regular media files, you can just as well host them in Blob Storage -- and refer to them via the SAS locator.
There is currently no single format that will play across all devices and operating systems. You can get Smooth Streaming to work on most Windows and Mac computers (possibly Linux, too), either with Silverlight or with the Smooth Streaming Plugin for the Flash-based OSMF. For iOS devices you will need to encode to HLS and use the HTML5 video tag. Microsoft Media Platform will support MPEG-DASH, a recently ratified ISO/IEC standard for dynamic adaptive streaming over HTTP.More details how to use DASH preview feature can be found here
If you want smooth streaming for audio only, it looks like you will have to create a video asset with an empty video stream -- although there is a Uservoice request to add support for audio only in the future.

Windows Media Services Web Streaming

Which web player I can use to get live stream from Windows Media Services? Is there are any crossplatform solutions (Windows, iPad/iPhone)? Should I make live convertation to flv or any other trick?
You could try using h264/aac video format for targeting iOS systems as explained in this article: Apple HTTP Live Streaming with IIS Media Services (this is kind of your only choice if you want to support iphone/ipad etc). This format will also be valid for Windows Phone 7 devices.
For the rest of Windows-based systems you could use Silverlight as streaming client, although you will need to use a different format based on Windows Media Video.