Dont receive results other than those from first audio chunk - ibm-cloud

I want some level of real-time speech to text conversion. I am using the web-sockets interface with interim_results=true. However, I am receiving results for the first audio chunk only. The second,third... audio chunks that I am sending are not getting transcribed. I do know that my receiver is not blocked since I do receive the inactivity message.
json {"error": "Session timed out due to inactivity after 30 seconds."}
Please let me know if I am missing something if I need to provide more contextual information.
Just for reference this is my init json.
{
"action": "start",
"content-type":"audio/wav",
"interim_results": true,
"continuous": true,
"inactivity_timeout": 10
}
In the result that I get for the first audio chunk, the final json field is always received as false.
Also, I am using golang but that should not really matter.
EDIT:
Consider the following pseudo log
localhost-server receives first 4 seconds of binary data #lets say Binary 1
Binary 1 is sent to Watson
{interim_result_1 for first chunk}
{interim_result_2 for first chunk}
localhost-server receives last 4 seconds of binary data #lets say Binary 2
Binary 2 is sent to Watson
Send {"action": "stop"} to Watson
{interim_result_3 for first chunk}
final result for the first chunk
I am not receiving any transcription for the second chunk
Link to code

You are getting the time-out message because the service waits for you to either send more audio or send a message signalling the end of an audio submission. Are you sending that message? It's very easy:
By sending a JSON text message with the action key set to the value stop: {"action": "stop"}
By sending an empty binary message
https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/speech-to-text/websockets.shtml
Please let me know if this does not resolve your problem

This is a bit late, but I've open-sourced a Go SDK for Watson services here:
https://github.com/liviosoares/go-watson-sdk
There is some documentation about speech-to-text binding here:
https://godoc.org/github.com/liviosoares/go-watson-sdk/watson/speech_to_text
There is also an example of streaming data to the API in the _test.go file:
https://github.com/liviosoares/go-watson-sdk/blob/master/watson/speech_to_text/speech_to_text_test.go
Perhaps this can help you.

The solution to this question was to set the size header of the wav file to 0.

Related

How do I translate the following POST request into ESP8266 AT-command format?

I've got a working local website that takes in HTML form data.
The fields are:
Temperature
Humidity
The server successfully receives the data and spits out a graph updated with the new entries.
Using a browser tool, I was able to capture the actual POST request as follows:
http://127.0.0.1:5000/add_data
Temperature=25.4&Humidity=52.2
Content-Length:30
Now, I want to migrate from using the human interface browser with manual entries to an ESP01 device using AT commands.
According to the ESP AT-commands documentation, a POST request is performed using the following command:
AT+HTTPCPOST=
Find the link below for the full description of the command.
I cannot seem to get this POST request working. The ESP01 device immediately returns an "ERROR" message without any delay, as though it did not even try to send the request, that the syntax might be wrong.
Among many variations, the following is my best attempt:
AT+HTTPCPOST="http://MYIPADDR:5000/add_data",30,2,"Temperature: 25.4","Humidity: 52.2"
With MYIPADDR above replaced with my IP address.
How do I translate a post request into ESP01 AT command format, and are there any prerequisites needed to be in place to perform such a request?
I did connect the ESP01 device to the WiFi network.
Here's the link to the POST AT command description:
https://docs.espressif.com/projects/esp-at/en/release-v2.2.0.0_esp8266/AT_Command_Set/HTTP_AT_Commands.html#cmd-httpcpost
The documentation says:
AT+HTTPCPOST=url,length[,<http_req_header_cnt>][,<http_req_header>..<http_req_header>]
Response:
OK
The symbol > indicates that AT is ready for receiving serial data, and you can enter the data now. When the requirement of message length
determined by the parameter is met, the transmission starts.
...
Parameters
: HTTP URL. : HTTP data length to POST. The maximum
length is equal to the system allocable heap size.
<http_req_header_cnt>: the number of <http_req_header> parameters.
[<http_req_header>]: you can send more than one request header to the
server.
You're sending:
AT+HTTPCPOST="http://MYIPADDR:5000/add_data",30,2,"Temperature: 25.4","Humidity: 52.2"
The length is 30. The problem is that everything after the length is HTTP header fields; you need to send the variables in the body. So the command is:
AT+HTTPCPOST="http://MYIPADDR:5000/add_data",30
followed on the next line by after the ESP-01 send the > character:
Temperature=25.4&Humidity=52.2
Because you passed 30 as the body length, the ESP-01 will read exactly 30 characters after the end of the AT command and send that data as the post body. If the size of that data changes (for instance, maybe the temperature is 2.2, so one digit less), you'll need to send the new length rather than 30.

PlayWS calculate the size of a http call without consuming the stream

I'm currently using the PlayWS http client which returns an Akka stream. From my understanding, I can consume the stream and turn it into a Byte[] to calculate the size. However, this also consumes the stream and I can't use it anymore. Anyway around this?
I think there are two different aspects related to the question.
You want to know the size of the server response in advance to prepare buffer. Unfortunately there is no guaranteed way to do this. HTTP 1.1 spec explicitly allows transfer mode when the server does not know the size of the response in advance via chunked transfer encoding. See also quote from 3.3.1. Transfer-Encoding:
A recipient MUST be able to parse the chunked transfer coding
(Section 4.1) because it plays a crucial role in framing messages
when the payload body size is not known in advance.
Section 3.3.3. Message Body Length specifies how length of a message body is defined and it besides the aforementioned chunked transfer encoding it also contains quite unhelpful
Otherwise, this is a response message without a declared message
body length, so the message body length is determined by the
number of octets received prior to the server closing the
connection.
This is added for backward compatibility and discouraged from usage but is still legally allowed.
Still in many real world scenarios you can use Content-Length header field that the server may return. However there is a catch here as well: if gzip Content-Encoding is used, then Content-Length will contain size of the compressed body.
To sum up: in general case you can't get the size of the message body in advance before you fully get the server response i.e. in terms of code perform a blocking call on the response. You may try to use Content-Length and it might or might not help in your specific case.
You already have a fully downloaded response (or you are OK with blocking on your StreamedResponse) and you want to process it by first getting the size and only then processing the actual data. In such case you may first use getBodyAsBytes method which returns IndexedSeq[Byte] and thus has size, and then convert it into a new Source using Source.single which is actually exactly what the default (i.e. non-streaming) implementation of getBodyAsSource does.

How to read the whole message with Chilkat socket?

I need to get the whole message(response), but socket.ReceiveBytes(); returns just part of the message. I tried to loop it but it fails on timeout when no bytes to receive.
List<byte> lb = new List<byte>();
byte[] receivedMsg = socket.ReceiveBytes();
while (receivedMsg.Length > 0)
{
lb.AddRange(receivedMsg);
receivedMsg = socket.ReceiveBytes();
}
So, how I can check if there are byte to read? How I can read the whole message?
Since its a Chilkat implementation, you should probably contact the developer. But I found this that could help: http://www.cknotes.com/?p=302
Ultimately, you need to know how much to read from the socket to constitute a whole message. For example, if the overlying protocol is a portmapper, then you know that you are expecting messsages in the format that the RFC specifies (http://tools.ietf.org/html/rfc1833.)
If you are rolling your own protocol over a socket connection, then use the method in the Chilkat blog post about putting the size of the total message in the first 4 bytes.

How much data to receive from server in SSL handshake before calling InitializeSecurityContext?

In our Windows C++ application I am using InitializeSecurityContext() client side to open an schannel connection to a server which is running stunnel SSL proxy. My code now works, but only with a hack I would like to eliminate.
I started with this sample code:http://msdn.microsoft.com/en-us/library/aa380536%28v=VS.85%29.aspx
In the sample code, look at SendMsg and ReceiveMsg. The first 4 bytes of any message sent or received indicates the message length. This is fine for the sample, where the server portion of the sample conforms to the same convention.
stunnel does not seem to use this convention. When the client is receiving data during the handshake, how does it know when to stop receiving and make another call to InitializeSecurityContext()?
This is how I structured my code, based on what I could glean from the documentation:
1. call InitializeSecurityContext which returns an output buffer
2. Send output buffer to server
3. Receive response from server
4. call InitializeSecurityContext(server_response) which returns an output buffer
5. if SEC_E_INCOMPLETE_MESSAGE, go back to step 3,
if SEC_I_CONTINUE_NEEDED go back to step 2
I expected InitializeSecurityContext in step 4 to return SEC_E_INCOMPLETE_MESSAGE if not enough data was read from the server in step 3. Instead, I get SEC_I_CONTINUE_NEEDED but an empty output buffer. I have experimented with a few ways to handle this case (e.g. go back to step 3), but none seemed to work and more importantly, I do not see this behavior documented.
In step 3 if I add a loop that receives data until a timeout expires, everything works fine in my test environment. But there must be a more reliable way.
What is the right way to know how much data to receive in step 3?
SChannel is different than the Negotiate security package. You need to receive at least 5 bytes, which is the SSL/TLS record header size:
struct {
ContentType type;
ProtocolVersion version;
uint16 length;
opaque fragment[TLSPlaintext.length];
} TLSPlaintext;
ContentType is 1 byte, ProtocolVersion is 2 bytes, and you have 2 byte record length. Once you read those 5 bytes, SChannel will return SEC_E_INCOMPLETE_MESSAGE and will tell you exactly how many more bytes to expect:
SEC_E_INCOMPLETE_MESSAGE
Data for the whole message was not read from the wire.
When this value is returned, the pInput buffer contains a SecBuffer structure with a BufferType member of SECBUFFER_MISSING. The cbBuffer member of SecBuffer contains a value that indicates the number of additional bytes that the function must read from the client before this function succeeds.
Once you get this output, you know exactly how much to read from the network.
I found the problem.
I found this sample:
http://www.codeproject.com/KB/IP/sslsocket.aspx
I was missing the handling of SECBUFFER_EXTRA (line 987 SslSocket.cpp)
The SChannel SSP returns SEC_E_INCOMPLETE_MESSAGE from both InitializeSecurityContext and DecryptMessage when not enough data is read.
A SECBUFFER_MISSING message type is returned from DecryptMessage with a cbBuffer value of the amount of desired bytes.
But in practice, I did not use the "missing data" value. The documentation indicates the value is not guaranteed to be correct, and is only a hint for developers can use to reduce calls.
InitalizeSecurityContext MSDN doc:
While this number is not always accurate, using it can help improve performance by avoiding multiple calls to this function.
So I unconditionally read more data into the same buffer whenever SEC_E_INCOMPLETE_MESSAGE was returned. Reading multiple bytes at a time from a socket.
Some extra input buffer management was required to append more read data and keep the lengths right. DecryptMessage will modify the input buffers' cbBuffer properties when it fails, which surprised me.
Printing out the buffers and return result after calling InitializeSecurityContext shows the following:
read socket:bytes(5).
InitializeSecurityContext:result(80090318). // SEC_E_INCOMPLETE_MESSAGE
inBuffers[0]:type(2),bytes(5).
inBuffers[1]:type(0),bytes(0). // no indication of missing data
outBuffer[0]:type(2),bytes(0).
read socket:bytes(74).
InitializeSecurityContext:result(00090312). // SEC_I_CONTINUE_NEEDED
inBuffers[0]:type(2),bytes(79). // notice 74 + 5 from before
inBuffers[1]:type(0),bytes(0).
outBuffer[0]:type(2),bytes(0).
And for the DecryptMessage Function, input is always in dataBuf[0], with the rest zeroed.
read socket:bytes(5).
DecryptMessage:len 5, bytes(17030201). // SEC_E_INCOMPLETE_MESSAGE
DecryptMessage:dataBuf[0].BufferType 4, 8 // notice input buffer modified
DecryptMessage:dataBuf[1].BufferType 4, 8
DecryptMessage:dataBuf[2].BufferType 0, 0
DecryptMessage:dataBuf[3].BufferType 0, 0
read socket:bytes(8).
DecryptMessage:len 13, bytes(17030201). // SEC_E_INCOMPLETE_MESSAGE
DecryptMessage:dataBuf[0].BufferType 4, 256
DecryptMessage:dataBuf[1].BufferType 4, 256
DecryptMessage:dataBuf[2].BufferType 0, 0
DecryptMessage:dataBuf[3].BufferType 0, 0
read socket:bytes(256).
DecryptMessage:len 269, bytes(17030201). // SEC_E_OK
We can see my TLS Server peer is sending TLS headers (5 bytes) in one packet, and then the TLS message (8 for Application Data), then the Application Data payload in a third.
You must read some arbitrary amount the first time, and when you receive SEC_E_INCOMPLETE_MESSAGE, you must look in the pInput SecBufferDesc for a SECBUFFER_MISSING and read its cbBuffer to find out how many bytes you are missing.
This problem was doing my head in today, as I was attempting to modify my handshake myself, and having the same problem the other commenters were having, i.e. not finding a SECBUFFER_MISSING. I do not want to interpret the tls packet myself, and I do not want to unconditionally read some unspecified number of bytes. I found the solution to that, so I'm going to address their comments, too.
The confusion here is because the API is confusing. Ordinarily, to read the output of InitializeSecurityContext, you look at the content of the pOutput parameter (as defined in the signature). It's that SecBufferDesc that contains the SECBUFFER_TOKEN etc to pass to AcceptSecurityContext.
However, in the case where InitializeSecurityContext returns SEC_E_INCOMPLETE_MESSAGE, the SECBUFFER_MISSING is returned in the pInput SecBufferDesc, in place of the SECBUFFER_ALERT SecBuffer that was passed in.
The documentation does say this, but not in a way that clearly contrasts this case against the SEC_I_CONTINUE_NEEDED and SEC_E_OK cases.
This answer also applies to AcceptSecurityContext.
From MSDN, I'd presume SEC_E_INCOMPLETE_MESSAGE is returned when not enough data is received from server at the moment. Instead, SEC_I_CONTINUE_NEEDED returned with InBuffers[1] indicating amount of unread data (note that some data is processed and must be skipped) and OutBuffers containing nothing.
So the algorithm is:
If SEC_I_CONTINUE_NEEDED returned, check type of InBuffers[1]
If it is SECBUFFER_EXTRA, handle it (move InBuffers[1].cbBuffer bytes to the beginning of input buffer) and jump to next recv & InitializeSecurityContext iteration
If OutBuffers is not empty, send its contents to server

NSURLRequest with HTTPBody input stream: Stream sends event before being opened

I want to send a large amount of data to a server using NSURLConnection (and NSURLRequest). For this I create a bound pair of NSStreams (using CFStreamCreateBoundPair(...)). Then I pass the input stream to the NSURLRequest (-setHTTPBodyStream:) and schedule the output stream on the current run loop. When the run loop continues, I get the events to send data and the input stream sends this data to the server.
My problem is, that this only works when the data fits into the buffer between the paired streams. If the data is bigger, then somehow the input stream gets an event (I assume "bytes available") but the NSURLConnection has not yet opened the input stream. This results in an error message printed and the data is not being sent.
I tried to catch this in my -stream:handleEvent: method by simply returning if the input stream is not yet opened, but then my output stream gets a stream closed event (probably because I never sent data when I could).
So my question is: How to use a bound pair of streams with NSURLConnection correctly?
(If this matters: I'm developing on the iOS platform)
Any help is appreciated!
Cheers, Markus
Ok, I kind of fixed this by starting the upload delayed, so that it starts after the NSURLConnection had time to setup its input stream.
It's not what I call a clean solution though, since relying on -performSelector:withObject:afterDelay: seems a bit hacky.
So if anyone else has a solution to this, I'm still open for any suggestions.