Bluemix, Watson, speech to text - tone analysis - ibm-cloud

would like to know if Bluemix, potentially Watson capability could do the following: if multiple persons having conversation via one or many microphones as streamed audio source, could identify also a person tone / spectrum - i.e. who of all is producing the sound ?Thanks: Markku

There's no Watson API to work on an audio stream at the moment.
As you pointed out, you could use Speech-to-text to get a transcript of the conversation and potentially Tone-analyzer to get a sentiment analysis but this won't be enough to determine who's the speaker.
If you want to know more about how those two services work, please check their pages on the Watson Developer Cloud

Here is an example application and the code for combining STT and Tone Analyzer:
Application
https://realtime-tone.mybluemix.net/
Get the code here to use for your own applications:
https://github.com/IBM-Bluemix/real-time-tone-analysis
Julia

Related

Realtime meta-data/ captioning for live streamed audio

How might I achieve adding a track of accurately aligned real-time "additional" data with live-streamed audio? Primarily interested in the browser here, but ideally the solution would be possible with any platform.
The idea is, if I have a live recording from my computer being sent into Icecast via something like DarkIce, I want a listener (who could join a stream at any time) to be able to place some kind of annotation over a few of the samples and allow them to send only the annotation back (for example, using a regular HTTP request). However, this needs a mechanism to align the annotation with the dumped streamed audio at the server side, and in a live stream, the user AFAIK can't actually get the timestamp in the "whole" stream, just from when they joined. But if there was some kind of simultaneously aligned metadata, then perhaps this would be possible.
The problem is, most systems seem to assume you "pre-caption" or multi-plex your data streams beforehand. However, this wouldn't make sense for something being recorded and live-streamed in real-time. Google's examples seem to be mostly around their ability to do "live captioning" which is more about processing audio in real-time then adding slightly delayed captions using speech recognition. This isn't what I'm after. I've looked into various ways data is put into OGG containers, as well as the current captioning like WebVTT, and I am struggling to find examples of this.
I found maybe a hint here: https://github.com/w3c/webvtt/issues/320 and I've been recommended to look for examples by Apple and Google using WebVTT for something along these lines, but cannot find these demos. There's older tech as well (Kate, CMML, Annodex, etc) but none of these are in use and are completely replaced by WebVTT. Perhaps I can achieve something like this web WebRTC, but I'm not sure this gives any guarantees on alignment and it's a slightly different technology stack that I am looking at in this scenario.

Thingspeak MATLAB analysis

I am using thingspeak platform with my pi3 for home automation. I have successfully sent and received data from my board to the channel. However, I am not able to properly understand the MATLAB analysis tutorial supported on the site.
https://in.mathworks.com/help/thingspeak/analyze-your-data.html
I am unable to understand what and why readchId should be given here and
what is the job of MATLAB analysis.
if the MATLAB analysis is to write my received data to channel and then use MATLAB visualize to display it using readchId. what purpose does the readchId in MATLAB analysis part solve?
ThingSpeak allows you to send data from your IoT device to a ThingSpeak channel, and then to use apply various ThingSpeak "apps" to those channels: these can perform various actions based on the channel data (like tweeting, or sending a message to some other web service), or they can perform analytics, or create visualisations, on the channel data. These analytics and visualisation apps are implemented in MATLAB code, and run on ThingSpeak.
The tutorial you're looking at reads in data from one channel (ThingSpeak 12397, which receives weather data), does some analysis on it to calculate the dew point from the temperature and humidity, and then writes it out to another channel and visualizes it.
readChId in the tutorial is the ID of the channel you are reading from (12397), and writeChId is the ID of the channel you are writing to (677 as an example, but replace with your own channel number).

Terminology: "live-dvr" in mpeg-dash streaming

I'm working with live MPEG-DASH streaming, and I would like to know if there exists a stardard terminology for a given functionality.
It's the "live-dvr" functionality. That is, a mix between a live stream and VOD features: a live stream with the seeking bar in the player allowing to watch past stream time. This involves a series of infrastructure tweaks.
The term "live-dvr" for this setup is kind of informal, and different parties call it in its own way: "live catch-up", "live-vod", "cached live", some vendors set the name for this based on their product lines, and so on. I would like to know if there's a standard term for this kind of setup. Specially because interpreting the standard in order to understand setup parameters for the manifests may be confusing or even misleading without proper terminology.
The MPEG-DASH standard only mentions a timeShiftBufferDepth, which specifies how long after the availability of a segment it is still available on the server.
From the spec:
#timeShiftBufferDepth specifies the duration of the time shifting buffer for this Representation that is guaranteed to be available for a Media Presentation with type 'dynamic'.
There is no mention at all of DVR in the spec. So time shift seems to be the term used by MPEG-DASH. However, for example HLS does not mentioned DVR or time shift at all.
DVR (Digital Video Recording, also known as nDVR - network DVR) is a functionality that allows recording the live stream and perform its playback from any moment of recorded period. Live stream can still run while the end-user may rewind it to any particular moment in past.
Typically media servers (like our Nimble Streamer) also provide time-shift and time range selection - see our links for details.

(Bluemix) Conversion of audio file formats

I've created an Android Application and I've connected different watson services, available on Bluemix, to it: Natural Language Classifier, Visual Recognition and Speech to Text.
1) The first and the second work well; I've a little problem with the third one about the format of the audio. The app should register a 30sec audio, save it on memory and send to the service to obtain the corresponding text.
I've used an instance of the class MediaRecorder to register the file. It works, but the available Output formats are AAC_ADTS, AMR_WB, AMR_NB, MPEG_4, THREE_GPP, RAW_MR and WEBM.
The service, differently, accepts in input these formats: FLAC, WAV, PCM.
What is the best way to convert the audio file from the first set of outputs to the second one? Is there a simple method to do that? For example, from THREE_GPP or MPEG_4 to WAV or PCM.
I've googled searching infos and ideas, but I've found only few and long-time methods, not well understood.
I'm looking for a fast method, because I would make the latency of conversion and elaboration by the service as short as possible.
Is there an available library that does this? Or a simple code snippet?
2) One last thing:
SpeechResults transcript = service.recognize(audio, HttpMediaType.AUDIO_WAV);
System.out.println(transcript);
"transcript" is a json response. Is there a method to directly extract only the text, or should I parse the json?
Any suggestion will be appreciated!
Thanks!
To convert the audio records in different formats/encodings you could:
- find an audio encoder lib to include into your app which supports the required libs but it could very heavy to run on a mobile device (if you find the right lib)
- develop an external web application used to send your record, make it encoded and returned as a file or a stream
- develop a simple web application working like a live proxy that gets the record file, makes a live conversion of the file and send to Watson
Both the 2nd option and the 3rd one expects to use an encoding tool like ffmpeg.
The 3rd one is lighter to develop but a little bit more complex but could allow you to save 2 http request from you android device

Live Stream midroll ad injection in Wowza Streaming Engine

I haven't found any way to automate inserting an ad spot into an existing live stream without stopping the streams and/or using a Flash client to interact with Wowza.
The idea is that these ads can be randomly chosen and inserted into the stream programatically & automated.
Can someone please point me in the right direction of how to properly change sources on the fly?
Thanks!
The following articles may be of interest for you
https://www.wowza.com/docs/how-to-switch-streams-using-stream-class-streams
https://www.wowza.com/docs/how-to-control-stream-class-streams-dynamically-modulestreamcontrol
https://www.wowza.com/docs/how-to-use-ipublishingprovider-api-to-publish-server-side-live-streams
I've previously created a custom module for Wowza that allows you to create an output stream from a live input stream, then control the output and switch between the live input stream and other live or on demand streams.