When using the Google Cloud Speech API does adding metadata ex. the industry naics code of audio ex. https://www.naics.com/search, influence the speech recognition? ie, would adding the naics code improve for recognition inline with the indicated vertical?
No, they do not have this feature.
Related
Is there any way to get the actual recorded audio input from a Google Assistant or Amazon Alexa device to use in my own API backend?
This answer regarding the Android Speech Recognition API mentions that it's not really possible to get the audio recording.
While the platform provides a developer with the user transcription, it does not provide the underlying audio that generated the query.
does have any company that provides Apis for this service?
Speech, audio Analytics,
Automated speech recognition,
multiple speaker separation,
emotions,
speakers overlapping (detect speakers that speak at the same time).
my project needs to detect the speakers on audio and separate them and also detect if they have any collision (overlapping) between speakers ( speak together).
now I use DeepAffect, but they have bad support so I searching for another company that deals with that issue
Note: services that I wrote below I already checked and it's not useful for my goals.
-symbl.ai
-Cloud Speech-to-Text - Speech Recognition | Google Cloud
-azure cognitive-services
-AI-Powered Speech Analytics for Amazon Connect
Its not so clear which type of setup you expect/have.
Cloud service? On-Prem? What sizing?
You can check the following company Phonexia that provide such solution. https://www.phonexia.com/en/
Here list of APIs and capabilities their solution may provide: https://download.phonexia.com/docs/spe/
I am working on web speech recognition.
And I found that Google provide a API which call "Google speech API V2" to developer. But I notice there is a limit on every day to use it.
After that I found there is a native WEB Speech API also can implement the speech recognition. And it just working on google chrome and opera:
http://caniuse.com/#feat=speech-recognition
So
1. What is the different Google Speech API and Web Speech API? Are they have any relations?
The speech recognition result json is return from google. Is that the google speech api will be more accurate than web speech api?
Thank you.
The Web Speech API is a W3C supported specification that allows browser vendors to supply a speech recognition engine of their choosing (be it local or cloud-based) that backs an API you can use directly from the browser without having to worry about API limits and the like. You could imagine that Apple might power this with Siri and Microsoft might power this with Cortana. Again, browser vendors could opt to use the built in dictation software in the operating system, but that doesn't seem to currently be the trend. If your trying to perform simple speech synthesis in a browser (e.g. voice commands), this is likely the best path to take, especially as adoption grows.
The Google Speech API is a cloud-based solution that allows you to use Google's speech software outside of a browser. It also provides broader language support and can transcribe longer audio files. If you have a 20min audio recording you want to transcribe, this would be the path to take. As of the time of this writing, Google charges $0.006 for every 15s recorded after the first hour for this service.
The Web API is REST based API with API key authentication, especially for web pages which needs a a simple feature set.
While Google Speech API basically is a gRPC API with various authentication method. There are lot feature is available when you use gRPC, like authentication, faster calling, and streaming!!!
I have a live streaming audio and i need to convert it to text.Is there any api or SDK available to create an IOS app for this requirement ?
In iOS 10, it is possible to convert speech into text using Speech framework. You can follow this link.
But there are some limitations which are as follows:
Apple limits recognition per device. The limit is not known, but you
can contact Apple for more information.
Apple limits recognition per app.
If you routinely hit limits, make sure to contact Apple, they can
probably resolve it.
Speech recognition uses a lot of power and data.
Speech recognition only lasts about a minute at a time.
You can also use OpenEars, and Google Cloud Speech API
I want to ask a question about the iPhone application. Does Apple provide any API to the developers to record the phone call and convert it to text message? Thank you.
In short, there are no APIs for recording phone calls or converting text to speech. You will need to create a speech recognition engine. I suspect the iPhone hardware will not be powerful enough to handle that type of processing though.
FYI...
APIs for converting Voice/Audio data in to text
API for Voice recognition in among group
iPhone speech recognition API?