I am looking to extract the Song name from an ice cast streaming radio. I am getting the icy-genre,icy-name n stuff. but not the song name. Can we extract it from the stream?
From your question I presume you already added Icy-Metadata: 1 to you request.
You'll have to read the "icy-metaint" response header, that will tell you how bytes to read between each metadata update in the stream.
The following is pseudocode:
byteinterval = int(response.getHeader("icy-metaint"))
stream = response.getBodyStream()
stream.read(byteinterval)
metadata_len = byte(stream.read(1)) * 16
metadata = stream.read(metadata_len)
The metadata will look something like this:
StreamTitle='Some Song Name Stream Client Sent';StreamUrl='http://someurl.com/';
Unfortunately there's no absolute standard for either encoding of the complete metadata buffer, or the contents of the StreamTitle.
Song name may or may not contain album name, artist name and complete song name or other fields.
The metadata buffer itself may or may not be UTF-8. It's up to the streaming client to decide on what to inject. Most decent clients will use UTF-8 when forced to send non ASCII data, but not all (and they may send some native encoding like Windows-1521 or UTF-16).
If you want to keep getting metadata updates you can simply consume blocks of "byteinterval" length of bytes to get more metadata updates, or disconnect and poll the stream later.
Related
I am developing a small application to test the change stream functionality in MongoDB.
I have found that if one uses a client session, that information is included in the change stream output (change event)
For instance, here is the output when I insert a document:
{"txnNumber"=>1, "lsid"=>{"id"=><BSON::Binary:0x70310118878160 type=uuid data=0x05d30a0fa4db4f24...>, "uid"=><BSON::Binary:0x70310118878040 type=generic data=0x08e97261f57b1617...>}, "_id"=>{"_data"=>"8262D407C4000000022B022C0100296E5A100483BECD0AF46146E4A271EDAC0922356946645F6964006462D407C48C187B092534BD050004"}, "operationType"=>"insert", "clusterTime"=>#<BSON::Timestamp:0x00007fe4b351d4f8 #seconds=1658062788, #increment=2>, "fullDocument"=>{"_id"=>BSON::ObjectId('62d407c48c187b092534bd05'), "one"=>"one"}, "ns"=>{"db"=>"change_stream_testing", "coll"=>"testing"}, "documentKey"=>{"_id"=>BSON::ObjectId('62d407c48c187b092534bd05')}}
The "lsid"-field contains information about the session from which the write originated. After taking a closer look at this i found that it contains base64 encoded data (just doing a json.parse() on the id and uid fields)
ID IS
{"$binary":{"base64":"BdMKD6TbTySICrHNHE6GBA==","subType":"04"}}
UID IS
{"$binary":{"base64":"COlyYfV7FhdDV8hhDrSY7+10/NVCs/fLwkGrKMztex4=","subType":"00"}}
Now, the problem/question is that i can't decode that base64 string to something readable. Using an online decoder i get
У
¤ЫO$€
±НN†**
and **éraõ{CWÈa´˜ïítüÕB³÷ËÂA«(Ìí{
respectively - when using the "auto detect" feature or UTF-8 which MongoDB uses internally (according to a quick google search)
The reason I ask is that I have a use case where, in some cases, I would like to be able to identify where an event in the change stream originated from ie. and what client issued the write. The only way I've been able to sort of accomplishing that without using the mutateFields operator to actually change the documents themselves and adding some kind of marker I could inspect in the change stream code (which I ideally don't want to do) is to use explicit client sessions which at least lets me know that whatever was writing the document was using an explicit session. But I would like to be able to go further and actually decipher this session information to, if possible, get some kind of unique identifier.
I'm currently using the PlayWS http client which returns an Akka stream. From my understanding, I can consume the stream and turn it into a Byte[] to calculate the size. However, this also consumes the stream and I can't use it anymore. Anyway around this?
I think there are two different aspects related to the question.
You want to know the size of the server response in advance to prepare buffer. Unfortunately there is no guaranteed way to do this. HTTP 1.1 spec explicitly allows transfer mode when the server does not know the size of the response in advance via chunked transfer encoding. See also quote from 3.3.1. Transfer-Encoding:
A recipient MUST be able to parse the chunked transfer coding
(Section 4.1) because it plays a crucial role in framing messages
when the payload body size is not known in advance.
Section 3.3.3. Message Body Length specifies how length of a message body is defined and it besides the aforementioned chunked transfer encoding it also contains quite unhelpful
Otherwise, this is a response message without a declared message
body length, so the message body length is determined by the
number of octets received prior to the server closing the
connection.
This is added for backward compatibility and discouraged from usage but is still legally allowed.
Still in many real world scenarios you can use Content-Length header field that the server may return. However there is a catch here as well: if gzip Content-Encoding is used, then Content-Length will contain size of the compressed body.
To sum up: in general case you can't get the size of the message body in advance before you fully get the server response i.e. in terms of code perform a blocking call on the response. You may try to use Content-Length and it might or might not help in your specific case.
You already have a fully downloaded response (or you are OK with blocking on your StreamedResponse) and you want to process it by first getting the size and only then processing the actual data. In such case you may first use getBodyAsBytes method which returns IndexedSeq[Byte] and thus has size, and then convert it into a new Source using Source.single which is actually exactly what the default (i.e. non-streaming) implementation of getBodyAsSource does.
I want some level of real-time speech to text conversion. I am using the web-sockets interface with interim_results=true. However, I am receiving results for the first audio chunk only. The second,third... audio chunks that I am sending are not getting transcribed. I do know that my receiver is not blocked since I do receive the inactivity message.
json {"error": "Session timed out due to inactivity after 30 seconds."}
Please let me know if I am missing something if I need to provide more contextual information.
Just for reference this is my init json.
{
"action": "start",
"content-type":"audio/wav",
"interim_results": true,
"continuous": true,
"inactivity_timeout": 10
}
In the result that I get for the first audio chunk, the final json field is always received as false.
Also, I am using golang but that should not really matter.
EDIT:
Consider the following pseudo log
localhost-server receives first 4 seconds of binary data #lets say Binary 1
Binary 1 is sent to Watson
{interim_result_1 for first chunk}
{interim_result_2 for first chunk}
localhost-server receives last 4 seconds of binary data #lets say Binary 2
Binary 2 is sent to Watson
Send {"action": "stop"} to Watson
{interim_result_3 for first chunk}
final result for the first chunk
I am not receiving any transcription for the second chunk
Link to code
You are getting the time-out message because the service waits for you to either send more audio or send a message signalling the end of an audio submission. Are you sending that message? It's very easy:
By sending a JSON text message with the action key set to the value stop: {"action": "stop"}
By sending an empty binary message
https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/speech-to-text/websockets.shtml
Please let me know if this does not resolve your problem
This is a bit late, but I've open-sourced a Go SDK for Watson services here:
https://github.com/liviosoares/go-watson-sdk
There is some documentation about speech-to-text binding here:
https://godoc.org/github.com/liviosoares/go-watson-sdk/watson/speech_to_text
There is also an example of streaming data to the API in the _test.go file:
https://github.com/liviosoares/go-watson-sdk/blob/master/watson/speech_to_text/speech_to_text_test.go
Perhaps this can help you.
The solution to this question was to set the size header of the wav file to 0.
I am creating a tool that is required to parse incoming MIME streams and return the email body and email attachments as separate file streams.
I am using mime4j for this purpose.
Following are the problems that I am stuck on:
How can I test whether the email body file or email attachment file that I parsed out via mime4j from MIME stream is correct?
I have a large corpus of emails available in raw mime form that I want to run my tests on and need some automated way to determine which ones might be breaking the mime parsing by mime4j and tweak the code for that.
You could decode the attachments and then re-encode them. If the re-encoded stream matches (byte-for-byte) the original, then that's a good sign that mime4j is properly handling them.
I initially parsed out a sample corpus *.eml files using mime4j. I had to manually check them for parsing errors as I had no other good choice.
Now I am using the earlier parsed out emails as testbed over which I check my parsed out results iteratively.
The task is to send an XML object from Channel-A to Channel-B
<MyMessage>
<ID>42</ID>
<hl7v2>
MSH|^~\&|LAB|....
PID|1|....
</hl7v2>
</MyMessage>
The steps of the channel communication:
in the Channel-B's source transformer, extract the HL7v2 contents
OVERWRITE the current msg object in Channel-B with the extracted contents
continue in the other Channel-B source transformers and expecting to reference msg['PID']['PID.5'] as normal.
The good news is that I can extract the HL7v2 'payload' into a variable. The problem or difficulty is resetting the msg object, or any other object to be able to reference the HL7 properties as expected.
When I create a new variable with the SerializerFactory.getHL7Serializer, it wraps with the tags <HL7Message>.
channelMap.put('MessageID', msg['ID']); //successful
channelMap.put('v2payload',msg['HL7v2']); //also looks good
var v2Msg = SerializerFactory.getHL7Serializer(false,false,true).toXML(msg['HL7v2']);
channelMap.put('v2Msg', v2Msg );
link to full size image
Question: Do you have any suggestions on how to overwrite the msg object?
How can I start referencing the msg as such:
msg['PID']['PID.5']
Current Conditions
the receiving channel's input type is XML
the need is to take extract all the properties from that XML object; ID is a database PK to be used later in the destination.
I'm sorry my original answer was bogged down with the peculiarities of my own scenario. I have reworked and tested to ensure that this works in your scenario.
Sending Channel - wraps the raw hl7 into your xml structure, and forwards to a channel called ReceiveXML. I have coded this in the Source Transformer, but you should code it where it works for you.
var wrappedHL7 = <MyMessage><ID>123</ID>
<hl7v2>{messageObject.getRawData()}</hl7v2>
</MyMessage>;
router.routeMessage("ReceiveXML", wrappedHL7);
Receiving Channel - extracts the hl7 from the xml, converts it to xml, and assigns back to the msg object. I have coded this in the source Filter - hence "return true;"
msg = new XML(SerializerFactory.getHL7Serializer(false,false,true).toXML(msg['hl7v2'].toString()));
return true;
All you have to do is put your incoming xml message into the inbound template area in mirth and then use the message tree to drag and drop the info from the XML that you need to the javascript section of the connector.