(Bluemix) Conversion of audio file formats - ibm-cloud

I've created an Android Application and I've connected different watson services, available on Bluemix, to it: Natural Language Classifier, Visual Recognition and Speech to Text.
1) The first and the second work well; I've a little problem with the third one about the format of the audio. The app should register a 30sec audio, save it on memory and send to the service to obtain the corresponding text.
I've used an instance of the class MediaRecorder to register the file. It works, but the available Output formats are AAC_ADTS, AMR_WB, AMR_NB, MPEG_4, THREE_GPP, RAW_MR and WEBM.
The service, differently, accepts in input these formats: FLAC, WAV, PCM.
What is the best way to convert the audio file from the first set of outputs to the second one? Is there a simple method to do that? For example, from THREE_GPP or MPEG_4 to WAV or PCM.
I've googled searching infos and ideas, but I've found only few and long-time methods, not well understood.
I'm looking for a fast method, because I would make the latency of conversion and elaboration by the service as short as possible.
Is there an available library that does this? Or a simple code snippet?
2) One last thing:
SpeechResults transcript = service.recognize(audio, HttpMediaType.AUDIO_WAV);
System.out.println(transcript);
"transcript" is a json response. Is there a method to directly extract only the text, or should I parse the json?
Any suggestion will be appreciated!
Thanks!

To convert the audio records in different formats/encodings you could:
- find an audio encoder lib to include into your app which supports the required libs but it could very heavy to run on a mobile device (if you find the right lib)
- develop an external web application used to send your record, make it encoded and returned as a file or a stream
- develop a simple web application working like a live proxy that gets the record file, makes a live conversion of the file and send to Watson
Both the 2nd option and the 3rd one expects to use an encoding tool like ffmpeg.
The 3rd one is lighter to develop but a little bit more complex but could allow you to save 2 http request from you android device

Related

Flex mobile project for IOS, server side proxy

I am trying to write an iphone app that loads a video from an inbuilt web server running off a camera (connect to iphone via wifi).
I am using flash builder / flex mobile project - not particularly familiar but finding it easier to understand than xcode !!
The files from the camera have the wrong file extension so will not play on the ios video app, can I set up a server side proxy in flex mobile and use this to alter the file extension and then pass this link to the ios video app ?
If so any help anybody could give me ( examples etc) would be really grateful received , I have been trying to get round this problem for a couple of weeks .
Cheers
Toby
I can explain, conceptually, what a server side proxy would do in this case. Let's say you are retrieving a URL, like this:
http://myserver.com/somethingSomething/DarkSide/
to retrieve a video stream from the server. You say it won't be played because there is no file extension; so you have to, in essence, use a different URL with the extension. Set up 'search engine friendly' URLs on the server. And do something like this:
http://myserver.com/myProxy.cfm/streamURL/somethingSomething%5CDarkSide/Name/myProxyVid.mp4
Here is some information on how to deal with Search Engine Friendly URLs in ColdFusion. Here is some information on how to deal with Search Engine Friendly URls in PHP. I'm sure Other technologies will come up in a Google Search.
In the URL above; this is what you have:
http://myserver.com/: This is your server
myProxy.cfm: This is your server side file; that is a proxy
streamURL/somethingSomething%5CDarkSide/Name/myProxyVid.mp4: This is the query string. It consists of two name value pairs. The first is the streamURL. This is the URL you want to retrieve with your proxy. The second is just random; but as long as it ends with the file extension .mp4 the URL should be seen as an 'mp4 file'
The code behind your myProxy.cfm should be something like this, in psuedo-code:
Parse URL Query String
Retrieve Stream.
Set mimeType on return value.
Return stream data
I used a similar approach on TheFlexShow.com to track the number of people who watch our screencast on-line vs downloading it first. I also used the same approach to keep track of impressions of advertiser's banner ads. For example, the browser can't tell that this is not a JPG image:
http://www.theflexshow.com/blog/mediaDisplay.cfm?mediaid=51
Based on this, and one of your previous questions; I am not convinced this is the best solution, though. I make a lot of assumptions here. I assume that the problem with playing the file does relate to the extension and not the file data. I assume that you are not actually streaming video with an open connection on both client and server to send data back and forth.

How do I extract streamed "now playing" data embedded in an Icecast audio (radio) stream on Samsung Smart-TV

I am creating a Samsung TV app for a radio station and they provide the "Now Playing" info within the Icecast stream. Is it possible to (and how do I) extract this information?
Shoutcast supports "Icy-MetaData" - an additional field in the request header. When set, its a request to the shoutcast server to embed metadata about the stream at periodic intervals(once every "icy-metaint" bytes) in the encoded audio stream itself. The value of "icy-metaint" is decided by the shoutcast server configuration and is sent to the client as part of the initial reply.
Check out this post on Shoutcast Internet Radio Protocol for details on icy:metadata and sample code in C.
A somewhat more technical discussion is also available at
http://forums.radiotoolbox.com/viewtopic.php?t=74
Yes, this is possible. The metadata is interleaved into the stream data at a specified interval. Basically, you read 8192 bytes (or whatever is specified by the Icy-MetaInt response header), and then you read the metadata block.
The first byte of that metadata block tells you the length of the data. A length of 0 means there is no updated metadata.
Once you read the meta block, then you go back to reading stream data.
I have all of this in more detail on my answer here: https://stackoverflow.com/a/4914538/362536 While I know you're not writing PHP, the principal is identical no matter what language.
From native player there is no option to get this meta.
You could probably use jQuery.stream plugin to fetch the meta directly - but you need to setup Access-Control-Allow-Origin on you icecast server - but I have no idea if it will work.
The best solution here will be to use this script:
http://code.google.com/p/icecast-now-playing-script/
So you install this script on your web server and from the SmartTV application you will AJAX it once for a while, while your stream is playing.
I just created a radio player for icecast and centova, it uses lastFM api to extract the song meta data. https://github.com/johndavedecano/Icecast-Centova-LastFM-API
If you are doing this for a radio station, then they can provide this data through the XSLT feature of Icecast. Some random old XSLT examples for offering stream metadata that I did at some point.
The other option is to run Icecast 2.4.1 or add the two files (xml2json.xsl status-json.xsl) to an old version.
Note that only Icecast 2.4.1 or newer supports adding CORS/ACAO headers that might be necessary to access data from a web app / web site.
If you are not directly cooperating with the radio station and can't ask them to do this, then disregard this answer. Someone else might find it useful though.

Text to Speech conversion in iPhone using Nuance SDK: Generate .wav file

I am converting text to speech using Nuance SDK and it works fine.
I want to mail the the output to the user as a file, "voice.wav" for example.
Being new to this field, I'm not sure, does this text to speech process create an output file?
I don't see an output file, does it exist?
Can I make it generate one?
Thanks in advance.
At this time, the SDKs/libraries don't expose access to the raw audio data. This is done in an effort to guarantee the optimal audio subsystem as well as simplifying the process of speech-enabling apps.
Depending on the plan you're enrolled in, you may be able to use the HTTP service, which means you will have to construct your own audio layer. That said, this is your best bet for getting access to the audio data if you need it.

Streaming short sound files

I have a script that generates wave files, based on user input.
I want to be able to stream those wave files online(not necessarily as wave files, they can be converted on the fly to mp3 or whatever). Preferably through a embedded flash streamer, but a html5 version would be good too.
The files are generally small, around 5 seconds long, and I'd like then to be stream multiple files in one session.
Does anyone know how I should go about implementing this?
With such short audio clips I would not bother with a 'real' streaming technology, but just serve them up via HTTP as static files as quickly as the network connection will allow. A quick look at my iTunes library indicates that a 5s 128kpbs 44kHz stereo file is between 120-250KB. Almost small. If you are talking about 32kbps mono, then maybe the sizes will be a mere 15-30KB.
Encoding on-the-fly may result in undesirable issues, like scaling (CPU load from all those encoding jobs, some of which will be duplicate), latency (setting up the encoding, the actual encoding), and you won't know the end file size which can cause problems. So, setting up a caching system may make more sense.
I use wpaudioplayer to stream MP3s from my website (Example). It was originally made as a wordpress plugin but can be used as a standalone javascript.
I believe that it can play wave files as well as MP3s. If you do end up converting them before serving them I would suggest that you would

How do I seamlessly concatenate MP3 streams?

I'm working on a streaming server that will be capable of broadcasting targetted ads. Basically listeners hear the same music, but every, say, 30 minutes comes a block of ads and every listener has his/her own block. Implementing such streaming server poses various problems and this question is about one of them.
The server will work in a manner similar to Icecast, i.e. it will read the stream over the network from some stream generator and relay it to every listener. When it's time to broadcast ads, the server stops fetching the stream from the generator, reads ads from files and inserts them into each listener's buffer, transmits them and resumes on relaying stream from the generator.
When the server switches from relaying stream to broadcasting ads, it has to concatenate two MP3 streams (we broadcast in MP3). My concern is that simply appending one piece of data after another may produce some audible artifacts. Can it be done seamlessly?
I've already figured out this:
- I can make the server be aware of MP3 frames to avoid sync errors.
- I'm thinking about appending MP3 frames from the ad file after MP3 frames from the stream.
- Since ad is loaded from properly encoded MP3 file, I circumvent the problem of byte reservoir, because the first frame from the file can't use it.
But my concern is the way MDCT works. Listeners have no idea of what my server will do, so their MP3 decoders may produce some artifacts because incorrect MDCT data will be placed one after another in the stream they download. Will zero-padding at the beginning of the file with the ad compensate for this?
Do you know any libraries/tools (open source if possible) that can seamlessly join two MP3 files without decompressing them?
Can you point any good resources describing MP3 format? I searched Internet a lot, found lots of information, but I still miss the overall picture.
Maybe you know that this would be easier if I used another codec like OGG/Vorbis, AAC?
PS. This question is not a duplicate of What is the best way to merge mp3 files?. mp3wrap and tools alike are not an option for me.
I believe MP3s can be merged by simply concatenating the files. In some quick testing (cat file1.mp3 file2.mp3 > merged.mp3; mplayer merged.mp3) it seems to work as expected. Streaming from a web server probably will work just as well.
How are you going to handle switching the current input file? You can simply treat the advertisements as short tracks to play.
You should be able to concatenate mp3 files of both CBR and VBR formats.
MP3 files do not have a main header (disregarding ID3 and Xing). The audio data is stored as chunks where every chunk includes its own header. The header contains the necessary information (bitrate, sample frequency, stereo, etc) for the decoding of the audio data in that chunk.
This is one of the reasons why it is difficult to determine the duration of a mp3 file.
Another way of looking at it is, if you concatenate a CBR MP3 file with a VBR file, the end result is the same as one long VBR file with the first section of Audio at a constant bitrate.
The issue is that some MP3 players may be strict and expect a Xing header for a VBR MP3 file. This however was never the specification for the MP3 format but it is now assumed to be true.
If you're on Windows, the Microsoft DirectShow API may be the way to go. You should find that is is capable of doing things with audio and video both statically and streaming, in a variety of formats (you only need the necessary codecs, and the interface is virtually the same for all).
Saying this, DirectShow is unfortunately designed in a horribly intricate way, and has a steep learning curve, but the power it offers in unparallel if you're going to be doing audio/video manipulation on Windows. There are however a great number of samples and tutorials on how to use it, so it may not be so painful in the end. Also, if you're using the .NET Framework, there is a managed wrapped by the name of DirectShow.NET. It's not going to be an easy task whatever you do, unless there's something out there than I'm not aware of. Good luck with it anyway!
I approached a very similar problem, and after asking the right questions at various sources came up with the following...
Any worthy decoder will skip "bad" data until it hits a valid frame header. This is what ID3v2 relies upon to inject additional information into mp3 data. At the server, I'd go with analysis of source MP3 files to only serve valid MP3 frames. If you serve a few silent frames (about 7 should do it), the decoder should have time to settle before ramping up for the next load of (unassociated) MP3 data, avoiding the artefacts you (correctly) assume when concatenating frames from different encoding sessions.
More problematic is the possible switching of MP3 attributes (1/2 channels, output sample rate etc) between one frame to the next. Some decoders get quite upset when confronted with such a stream, resulting in 1/2 speed playback and the like. So, you need to ensure that all your source material is encoded to the same output attributes otherwise you may come unstuck.
You may have seen this already, but if not:
http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=79&printer=t
I don't see why you would want to concatenate the files. Why don't you use some sort of play list system and just change which file your sending. I would think this would allow more flexibility in the long run, and you wouldn't end up with large MP3 files.