Speech Analytics Automated speech recognition, multiple speaker separation, emotions,speaker overlapping - rest

does have any company that provides Apis for this service?
Speech, audio Analytics,
Automated speech recognition,
multiple speaker separation,
emotions,
speakers overlapping (detect speakers that speak at the same time).
my project needs to detect the speakers on audio and separate them and also detect if they have any collision (overlapping) between speakers ( speak together).
now I use DeepAffect, but they have bad support so I searching for another company that deals with that issue
Note: services that I wrote below I already checked and it's not useful for my goals.
-symbl.ai
-Cloud Speech-to-Text - Speech Recognition | Google Cloud
-azure cognitive-services
-AI-Powered Speech Analytics for Amazon Connect

Its not so clear which type of setup you expect/have.
Cloud service? On-Prem? What sizing?
You can check the following company Phonexia that provide such solution. https://www.phonexia.com/en/
Here list of APIs and capabilities their solution may provide: https://download.phonexia.com/docs/spe/

Related

Can you retrieve the voice recording from Speech Recognition plaforms like Amazon Alexa or Google Assistant?

Is there any way to get the actual recorded audio input from a Google Assistant or Amazon Alexa device to use in my own API backend?
This answer regarding the Android Speech Recognition API mentions that it's not really possible to get the audio recording.
While the platform provides a developer with the user transcription, it does not provide the underlying audio that generated the query.

Does using saved Google Text-to-Speech audio files violate Google Cloud Usage Terms?

My app has a list of fixed paragraphs that needs to be translated into speech. I plan to use Google's Text-to-Speech API to convert them into speech then download their audio files so that I don't need to constantly communicate with the API to translate them, considering that the paragraphs, once again, do not change.
Does this violate the Google Cloud Terms of Service restrictions?
Good news. It seems that caching synthesized audio files to avoid re-synthesization and promote cost saving is allowed with Google Text-to-Speech, as promoted by one of their use cases.

Watson 'Speech to text' not recognizing microphone input properly

Iam using Unity SDK provided for IBM Watson services. I try to use 'ExampleStreaming.cs' sample provided for speech to text recognition. I test the app in unity editor.
This sample uses Microphone as audio input and gets results for voice input from the user. However, when I use microphone as input, the transcribed results are far from being correct. When I say "Create a black box", the results are inappropriate, with the word results being completely irrelevant to input.
When I use pre-recorded voice clips, the output is perfect.
Does the service perform incorrectly for Indian accent?.
What is the reason for poor microphone input recognition?
The docs say:
"In general, the service is sensitive to background noise. For instance, engine noise, working devices, street noise, and talking can significantly reduce accuracy. In addition, the microphones that are typically installed on mobile devices and tablets are often inadequate. The service performs best when professional microphones are used to capture audio with better quality."
I use Logitech headset mic as input source.
Satish,
Try to "clean up" the audio as best you can - by limiting background noise. Also be aware that you can use one of two different processing models - one for broadband and one for narrowband. Try them both, and see which is most appropriate for your input device.
In addition, you can find that the underlying speech model does not handle all of the domain specific terms that you might be looking for. In these cases you can customize and expand the speech model, as explained in the documentation on Using Custom Language Models (https://console.bluemix.net/docs/services/speech-to-text/custom.html#custom). While this is a bit more involved, it can often make a huge difference in accuracy and overall usability.

The API’s like Vision, Speech & Video intelligence use out of GCloud Infrastructure

The API’s like Vision, Speech & Video intelligence to use it in any web application, is it mandate to enrol for google cloud infrastructure?
"enrol for google cloud infrastructure" - Yes you need to create a project on google cloud console with your gmail ID.
Google is giving USD300 credit with your first project.
However you don't need to start a compute engine to use speech recognition services.
You will have to create buckets in google storage to upload long audio sequences in FLAC/PCM format.

Different between Google Speech API and Web Speech API

I am working on web speech recognition.
And I found that Google provide a API which call "Google speech API V2" to developer. But I notice there is a limit on every day to use it.
After that I found there is a native WEB Speech API also can implement the speech recognition. And it just working on google chrome and opera:
http://caniuse.com/#feat=speech-recognition
So
1. What is the different Google Speech API and Web Speech API? Are they have any relations?
The speech recognition result json is return from google. Is that the google speech api will be more accurate than web speech api?
Thank you.
The Web Speech API is a W3C supported specification that allows browser vendors to supply a speech recognition engine of their choosing (be it local or cloud-based) that backs an API you can use directly from the browser without having to worry about API limits and the like. You could imagine that Apple might power this with Siri and Microsoft might power this with Cortana. Again, browser vendors could opt to use the built in dictation software in the operating system, but that doesn't seem to currently be the trend. If your trying to perform simple speech synthesis in a browser (e.g. voice commands), this is likely the best path to take, especially as adoption grows.
The Google Speech API is a cloud-based solution that allows you to use Google's speech software outside of a browser. It also provides broader language support and can transcribe longer audio files. If you have a 20min audio recording you want to transcribe, this would be the path to take. As of the time of this writing, Google charges $0.006 for every 15s recorded after the first hour for this service.
The Web API is REST based API with API key authentication, especially for web pages which needs a a simple feature set.
While Google Speech API basically is a gRPC API with various authentication method. There are lot feature is available when you use gRPC, like authentication, faster calling, and streaming!!!