I have developed an application for streaming speech recognition in c++ using another API and IBM Watson Speech to Text service API.
In both these programs, I am using the same file which contains this audio
several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday
This file is 641,680 bytes in size and I am sending 100,000 bytes (max) chunks at a time to the Speech to text servers.
Now, with the other API I am able to have everything recognized as a whole. With the IBM Watson API I couldn't. Here is what I have done:
Connect to IBM Watson web server (Speech to text API)
Send start frame {"action":"start","content-type":"audio/mulaw;rate=8000"}
Send binary 100,000 bytes
Send stop frame {"action":"stop"}
...Repeat binary and stop until the last byte.
The IBM Watson Speech API could only recognize the chunks individually
e.g.
several tornadoes touch down
a line of severe thunder
swept through Colorado
Sunday
This seems to be the output of individual chunks and the words coming in between the chunk division (for eg here, "thunderstorm" is partially present in the end of a chunk and partially in the starting of the next chunk) are thus incorrectly recognized or dropped.
What am I doing wrong?
EDIT (I am using c++ with boost library for websocket interface)
//Do the websocket handshake
void IbmWebsocketSession::on_ssl_handshake(beast::error_code ec) {
auto mToken = mSttServiceObject->GetToken(); // Get the authentication token
//Complete the websocket handshake and call back the "send_start" function
mWebSocket.async_handshake_ex(mHost, mUrlEndpoint, [mToken](request_type& reqHead) {reqHead.insert(http::field::authorization,mToken);},
bind(&IbmWebsocketSession::send_start, shared_from_this(), placeholders::_1));
}
//Sent the start frame
void IbmWebsocketSession::send_start(beast::error_code ec) {
//Send the START_FRAME and call back the "read_resp" function to receive the "state: listening" message
mWebSocket.async_write(net::buffer(START_FRAME),
bind(&IbmWebsocketSession::read_resp, shared_from_this(), placeholders::_1, placeholders::_2));
}
//Sent the binary data
void IbmWebsocketSession::send_binary(beast::error_code ec) {
streamsize bytes_read = mFilestream.rdbuf()->sgetn(&chunk[0], chunk.size()); //gets the binary data chunks from a file (which is being written at run time
// Send binary data
if (bytes_read > mcMinsize) { //Minimum size defined by IBM is 100 bytes.
// If chunk size is greater than 100 bytes, then send the data and then callback "send_stop" function
mWebSocket.binary(true);
/**********************************************************************
* Wait a second before writing the next chunk.
**********************************************************************/
this_thread::sleep_for(chrono::seconds(1));
mWebSocket.async_write(net::buffer(&chunk[0], bytes_read),
bind(&IbmWebsocketSession::send_stop, shared_from_this(), placeholders::_1));
} else { //If chunk size is less than 100 bytes, then DO NOT send the data only call "send_stop" function
shared_from_this()->send_stop(ec);
}
}
void IbmWebsocketSession::send_stop(beast::error_code ec) {
mWebSocket.binary(false);
/*****************************************************************
* Send the Stop message
*****************************************************************/
mWebSocket.async_write(net::buffer(mTextStop),
bind(&IbmWebsocketSession::read_resp, shared_from_this(), placeholders::_1, placeholders::_2));
}
void IbmWebsocketSession::read_resp(beast::error_code ec, size_t bytes_transferred) {
boost::ignore_unused(bytes_transferred);
if(mWebSocket.is_open())
{
// Read the websocket response and call back the "display_buffer" function
mWebSocket.async_read(mBuffer, bind(&IbmWebsocketSession::display_buffer, shared_from_this(),placeholders::_1));
}
else
cerr << "Error: " << e->what() << endl;
}
void IbmWebsocketSession::display_buffer(beast::error_code ec) {
/*****************************************************************
* Get the buffer into stringstream
*****************************************************************/
msWebsocketResponse << beast::buffers(mBuffer.data());
mResponseTranscriptIBM = ParseTranscript(); //Parse the response transcript
mBuffer.consume(mBuffer.size()); //Clear the websocket buffer
if ("Listening" == mResponseTranscriptIBM && true != mSttServiceObject->IsGstFileWriteDone()) { // IsGstFileWriteDone -> checks if the user has stopped speaking
shared_from_this()->send_binary(ec);
} else {
shared_from_this()->close_websocket(ec, 0);
}
}
IBM Watson Speech to Text has several APIs to transmit audio and receive transcribed text. Based on your description you seem to use the WebSocket Interface.
For the WebSocket Interface, you would open the connection (start), then send individual chunks of data, and - once everything has been transmitted - stop the recognition request.
You have not shared code, but it seems you are starting and stopping a request for each chunk. Only stop after the last chunk.
I would recommend to take a look at the API doc which contains samples in different languages. The Node.js sample shows how to register for events. There are also examples on GitHub like this WebSocket API with Python. And here is another one that shows the chunking.
#data_henrik is correct, the flow is wrong, it should be: ...START FRAME >> binary data >> binary data >> binary data >> ... >> STOP FRAME
you only need to send the {"action":"stop"} message when there are no more audio chunks to send
Related
I will explain setup first;
Setup: I have a microcontroller board running a Coap rest server (using Contiki OS) with an observable resource and a client (using Coapthon - python library for the Coap) observing that resource running on a Linux SOM. I am successfully able to observe a small amount of data (64 bytes)from the server (microcontroller) to the client (Linux SOM). I will add code at the end after describing everything.
Question: I need help in sending a big chunk of data (suppose 1024 bytes) from the Coap server to the client observer. How can I do that (thanks in advance for any kind of help, I will appreciate for any help I can get regarding this)?
I am posting Contiki observable resource code and the coapthon client code (I am posting the code which not sending big data).
Contiki Code:
char * temp_payload = "Behold empty data";
PERIODIC_RESOURCE(res_periodic_ext_temp_data,
"title=\"Temperature\";rt=\"Temperature\";obs",
res_get_handler_of_periodic_ext_temp_data,
NULL,
NULL,
res_delete_handler_ext_temp_data,
(15*CLOCK_SECOND),
res_periodic_handler_of_ext_temp_data);
static void
res_get_handler_of_periodic_ext_temp_data(void *request, void *response, uint8_t *buffer, uint16_t preferred_size, int32_t *offset)
{
/*
* For minimal complexity, request query and options should be ignored for GET on observable resources.
* Otherwise the requests must be stored with the observer list and passed by REST.notify_subscribers().
* This would be a TODO in the corresponding files in contiki/apps/erbium/!
*/
/* Check the offset for boundaries of the resource data. */
if(*offset >= 1024) {
REST.set_response_status(response, REST.status.BAD_OPTION);
/* A block error message should not exceed the minimum block size (16). */
const char *error_msg = "BlockOutOfScope";
REST.set_response_payload(response, error_msg, strlen(error_msg));
return;
}
REST.set_header_content_type(response, REST.type.TEXT_PLAIN);
REST.set_response_payload(response,(temp_payload + *offset), MIN( (int32_t)strlen(temp_payload) - *offset, preferred_size));
REST.set_response_status(response, REST.status.OK);
/* IMPORTANT for chunk-wise resources: Signal chunk awareness to REST engine. */
*offset += preferred_size;
/* Signal end of resource representation. */
if(*offset >= (int32_t)strlen( temp_payload) + 1) {
*offset = -1;
}
REST.set_header_max_age(response, MAX_AGE);
}
I am not adding code for periodic handler, get handler is getting notified from periodic handler periodically.
Coapthon code:
def ext_temp_data_callback_observe(response):
print response.pretty_print()
def observe_ext_temp_data(host, callback):
client = HelperClient(server=(host, port))
request = Request()
request.code = defines.Codes.GET.number
request.type = defines.Types["CON"]
request.destination = (host, port)
request.uri_path = "data/res_periodic_ext_temp_data"
request.content_type = defines.Content_types["text/plain"]
request.observe = 0
request.block2 = (0, 0, 64)
try:
response = client.send_request(request, callback)
print response.pretty_print()
except Empty as e:
print("listener_post_observer_rate_of_change({0}) timed out". format(host))
Again, I need help in implementing observer with coap block wise transfer (https://www.rfc-editor.org/rfc/rfc7959#page-26).
I can't tell much about the particular systems you use, but in general the combination of block-wise transfer and observes works in that the server only sends the first block of the updated resource. It is then up to the client to ask for the remaining blocks, and to verify that their ETag options match.
The contiki code looks like it should be sufficient, as it sets the offset to -1 which probably sets the "more data" bit in the block header.
On the coapython side, you may need to do reassembly manually, or ask that coapython do the reassembly automatically (its code does not indicate that it'd support the combination of blockwise and observe, at least not at a short glance).
To "bootstrap" your development you may consider to use Eclipse/Californium.
The simple client in demo-apps/cf-helloworld-client requires some change for the observe. If you need help, just open an issue in github.
With a two years of experience with that function, let me mention, that, if your data changes faster than your "bandwidth" is able to transfer (including the considered RTT for the blocks), you may send a lot of blocks in vain.
If the data changes just faster than your last block could be send, that invalidates the complete transfer so far. Some start then to develop their work-around, but from that your on very thin ice :-).
Can somebody help me?
I have problems sending data from a SIM800C to a website.
The first problem is, I uploaded the following code to Arduino (I use the Serial Monitor in Arduino IDE to send AT commands to the SIM800 and read the response).
#include <SoftwareSerial.h>
#define TX 10
#define RX 11
#define t 2000
SoftwareSerial mySerial(RX, TX);
int k=0, aS=0, amS=0;
void setup() {
Serial.begin(9600);
while(!Serial); // Wait for Serial ready
Serial.println("Intalizing...");
mySerial.begin(9600);
delay(5000);
mySerial.println("AT"); // Send the first AT command to auto set baud rate
delay(1000);
Serial.println("You can type AT command and send to SIM800 by using Serial Monitor");
}
void loop() {
k=0;
aS=Serial.available(); // aS: The number of bytes available to read from the buff of Serial
amS=mySerial.available(); // amS: The number of bytes available to read from the buff of mySerial
while(aS>0) {
mySerial.write(Serial.read());
k=1; aS--;
}
if (k==1) {
mySerial.println();
}
while (amS>0) {
Serial.write(mySerial.read());
k=2; amS--;
}
delay(1000);
}
Next, I send the AT commands below one by one and viewed responses. All the AT commands and responses can be seen on the Serial Monitor.
AT+SAPBR=3,1,"Contype","GPRS"
AT+SAPBR=3,1,"APN","m3-world"
AT+SAPBR=3,1,"USER","mms"
AT+SAPBR=3,1,"PWD","mms"
AT+CSTT="m3-world","mms","mms"
AT+SAPBR=1,1
AT+HTTPINIT
AT+HTTPPARA="CID",1
AT+HTTPPARA="URL","http://weatherstation.byethost3.com/"
AT+HTTPDATA=9,10000
value=YOU
AT+HTTPACTION=1
The last response below shows that the data (value=YOU) have been sent successfully.
OK
++HTTPACTION:1,200,839
I have created a website to read data with the GET method. My problem is nothing changes on the website. That means the website has not read the data sent from the SIM800 yet.
Your example is sending a POST request. If you would like to send a GET request with a name/value pair, it should be structured a little differently. Changing your example from above:
AT+SAPBR=3,1,"CONTYPE","GPRS"
AT+SAPBR=3,1,"APN","m3-world"
AT+SAPBR=3,1,"USER","mms"
AT+SAPBR=3,1,"PWD","mms"
AT+SAPBR=1,1
AT+HTTPINIT
AT+HTTPPARA="CID",1
AT+HTTPPARA="URL","http://weatherstation.byethost3.com/?value=YOU"
AT+HTTPACTION=0
Note the name/value pair is in the URL now and AT+HTTPACTION=1 is changed to AT+HTTPACTION=0 to indicate GET instead of POST
I want some level of real-time speech to text conversion. I am using the web-sockets interface with interim_results=true. However, I am receiving results for the first audio chunk only. The second,third... audio chunks that I am sending are not getting transcribed. I do know that my receiver is not blocked since I do receive the inactivity message.
json {"error": "Session timed out due to inactivity after 30 seconds."}
Please let me know if I am missing something if I need to provide more contextual information.
Just for reference this is my init json.
{
"action": "start",
"content-type":"audio/wav",
"interim_results": true,
"continuous": true,
"inactivity_timeout": 10
}
In the result that I get for the first audio chunk, the final json field is always received as false.
Also, I am using golang but that should not really matter.
EDIT:
Consider the following pseudo log
localhost-server receives first 4 seconds of binary data #lets say Binary 1
Binary 1 is sent to Watson
{interim_result_1 for first chunk}
{interim_result_2 for first chunk}
localhost-server receives last 4 seconds of binary data #lets say Binary 2
Binary 2 is sent to Watson
Send {"action": "stop"} to Watson
{interim_result_3 for first chunk}
final result for the first chunk
I am not receiving any transcription for the second chunk
Link to code
You are getting the time-out message because the service waits for you to either send more audio or send a message signalling the end of an audio submission. Are you sending that message? It's very easy:
By sending a JSON text message with the action key set to the value stop: {"action": "stop"}
By sending an empty binary message
https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/speech-to-text/websockets.shtml
Please let me know if this does not resolve your problem
This is a bit late, but I've open-sourced a Go SDK for Watson services here:
https://github.com/liviosoares/go-watson-sdk
There is some documentation about speech-to-text binding here:
https://godoc.org/github.com/liviosoares/go-watson-sdk/watson/speech_to_text
There is also an example of streaming data to the API in the _test.go file:
https://github.com/liviosoares/go-watson-sdk/blob/master/watson/speech_to_text/speech_to_text_test.go
Perhaps this can help you.
The solution to this question was to set the size header of the wav file to 0.
as I can see in XMLSocket the data can be readed fully until the end, on the other hand the Socket class read data by parts, so long string will be concatinated by parts, I wonder if is possible to use the Socket class and still read the full data until the end package
private function readResponse():void {
var str:String = readUTFBytes(bytesAvailable);
response += str;
trace2(response);
}
private function socketDataHandler(event:ProgressEvent):void {
trace2("socketDataHandler: " + event);
readResponse();
}
so as I've saw in the docs the only data handler is the ProgressEvent, but how to handle the data to get the full string, not by parts?, I don't want to use the XMLSocket, is there a way?
XMLSocket reads data in internal buffer, and when terminating null byte is received it parses all of the XML received since the previous zero byte or, if that is the first message received, since the connection was established.
You need to wrap Socket object, read messages to internal buffer and fire event when you need.
I have similar problem maybe the same like here.
From the server (Java TCP Server) im doing this:
public void sendMsg(String msg) {
out.println(msg); // msg is: "MSG Hello" without quetes
out.flush();
}
when i push it twice or more i receive only first message in client code which is unity3d code c# socket
void Update() {
if(connected) {
try {
if(theStream.DataAvailable) {
String data = sr.ReadLine();
// bla bla
Get rid of the if(theStream.DataAvailable). You cannot check if data is available that way since if you have already received it, it is not available. While the ReadLine function only returns one line to you, it may read much more than one line.
So here's what happens:
All data is sent.
Data is available, you call ReadLine. It reads all the data and returns one line to you.
No data is available now, since it has already been read from the connection.
There are other problems with that check too. If it's trying to avoid calling ReadLine if a line isn't available, it won't do that. Some data being available doesn't mean a whole line is. (Imagine if the other end maliciously sends just a single X byte.)