Translating a large file with small API rate limits on Google Speech API - google-api-nodejs-client

I have an audio file that is 24 hours and 100MB that I want to transcribe/translate using nodejs and the Google Speech API. The Google Speech API will not support processing more than 1 min of audio nor 10MB of data at a time.
How can I get this much data transcribed please?
My repo :
https://github.com/nguyentienviet123456/api-google-speechtotext-nodejs

Related

Does using saved Google Text-to-Speech audio files violate Google Cloud Usage Terms?

My app has a list of fixed paragraphs that needs to be translated into speech. I plan to use Google's Text-to-Speech API to convert them into speech then download their audio files so that I don't need to constantly communicate with the API to translate them, considering that the paragraphs, once again, do not change.
Does this violate the Google Cloud Terms of Service restrictions?
Good news. It seems that caching synthesized audio files to avoid re-synthesization and promote cost saving is allowed with Google Text-to-Speech, as promoted by one of their use cases.

Google Cloud Speech API and metadata industry naics code of audio

When using the Google Cloud Speech API does adding metadata ex. the industry naics code of audio ex. https://www.naics.com/search, influence the speech recognition? ie, would adding the naics code improve for recognition inline with the indicated vertical?
No, they do not have this feature.

Is it possible to upload a video file to IBM Cloud Functions / OpenWhisk function and encode it?

We are developing a video streaming platform. In that we want to encode video into H.264 format after uploading it.
We decided to use IBM Cloud Functions / OpenWhisk to encode the video, but having some doubts. Is it possible to upload a video file to IBM Cloud Functions / OpenWhisk and encode it? Is it supported, how can it be done?
Yes, that should be possible.
I recommend checking out this "Dark Vision" app using IBM Cloud Functions. You can upload videos which then are split into frames, the frames processed with Visual Recognition. The source code for Dark Vision is available on GitHub.
In addition you should go over the documented IBM Cloud Functions system limits to see if they match your requirements.

Different between Google Speech API and Web Speech API

I am working on web speech recognition.
And I found that Google provide a API which call "Google speech API V2" to developer. But I notice there is a limit on every day to use it.
After that I found there is a native WEB Speech API also can implement the speech recognition. And it just working on google chrome and opera:
http://caniuse.com/#feat=speech-recognition
So
1. What is the different Google Speech API and Web Speech API? Are they have any relations?
The speech recognition result json is return from google. Is that the google speech api will be more accurate than web speech api?
Thank you.
The Web Speech API is a W3C supported specification that allows browser vendors to supply a speech recognition engine of their choosing (be it local or cloud-based) that backs an API you can use directly from the browser without having to worry about API limits and the like. You could imagine that Apple might power this with Siri and Microsoft might power this with Cortana. Again, browser vendors could opt to use the built in dictation software in the operating system, but that doesn't seem to currently be the trend. If your trying to perform simple speech synthesis in a browser (e.g. voice commands), this is likely the best path to take, especially as adoption grows.
The Google Speech API is a cloud-based solution that allows you to use Google's speech software outside of a browser. It also provides broader language support and can transcribe longer audio files. If you have a 20min audio recording you want to transcribe, this would be the path to take. As of the time of this writing, Google charges $0.006 for every 15s recorded after the first hour for this service.
The Web API is REST based API with API key authentication, especially for web pages which needs a a simple feature set.
While Google Speech API basically is a gRPC API with various authentication method. There are lot feature is available when you use gRPC, like authentication, faster calling, and streaming!!!

Is it possible to covert live streaming audio to text in ios

I have a live streaming audio and i need to convert it to text.Is there any api or SDK available to create an IOS app for this requirement ?
In iOS 10, it is possible to convert speech into text using Speech framework. You can follow this link.
But there are some limitations which are as follows:
Apple limits recognition per device. The limit is not known, but you
can contact Apple for more information.
Apple limits recognition per app.
If you routinely hit limits, make sure to contact Apple, they can
probably resolve it.
Speech recognition uses a lot of power and data.
Speech recognition only lasts about a minute at a time.
You can also use OpenEars, and Google Cloud Speech API