From a client we receive an UPDATE message which contains the below media with fmt properties as below.
m=video 0 RTP/SAVP 0
Their aim is to close video media but it is causing us problems.
Is their message correct?
The usual format of closing video stream which we can correctly handle is
m=video 0 RTP/AVP 96 97 98
This is correct. According to the RFC 3264:
Existing media streams are removed by creating a new SDP with the
port number for that stream set to zero. The stream description MAY
omit all attributes present previously, and MAY list just a single
media format.
This makes sense, because when a participant doesn't want to have a video stream, it doesn't matter which formats it would be willing to use. m=video 0 RTP/SAVP 0 simply means "no video stream" or "terminate the video stream".
And there is no obligation that the media format in this case should make any sense. Just like in the message you received: the format 0 stands for PCMU, which is not even a video format.
Related
when a calls with SIP:480 after SIP:180 is a user behavior, is it same with a calls with SIP:480 after SIP:183?
Yes.
The difference between the 180 and 183 responses is the 183 response typically includes an SDP payload which makes an offer to provide an audio progress indication (fancy ringtone). As far as the meaning of the 480 Unavailable response it's the same no matter which information responses preceded it.
Imagine an offer SDP that has one line of "m" with codecs 8 and 101 for DMTF and marked as sendrecv:
m = audio 35904 RTP/AVP 8 101
a = rtpmap:8 PCMA/8000
a = rtpmap:101 telephone-event/8000
a = fmtp:101 0-15
a = sendrecv
The offered SDP is answered by a SDP with one line of "m" containing codecs 8 and 120 for DTMF similarly marked as sendrecv:
m = audio 1235 RTP/AVP 8 120
a = rtpmap:8 PCMA/8000
a = rtpmap:120 telephone-event/8000
a = fmtp:101 0-15
a = sendrecv
From RFC 3264:
For streams marked as sendrecv in the answer, the "m=" line MUST
contain at least one codec the answerer is willing to both send and
receive, from amongst those listed in the offer. The stream MAY
indicate additional media formats, not listed in the corresponding
stream in the offer, that the answerer is willing to send or
receive (of course, it will not be able to send them at this time,
since it was not listed in the offer).
Above part of the RFC3264, proves that sending a different DTMF fmtp(120 to 101) in answer SDP complies with RFC3264 since the codec 8(G711a) matches with the offer SDP.
Is it okay to say the codec exchange is completed successfully and DTMF exchange will okay or is DTMF is not expected to work at this point?
In general:
RTP payload type numbers 0-95 identify a static media encoding. E.g. payload type 8 means PCMA audio with a clock rate of 8000 Hz (RFC3551). As such, this description doesn't have to (but should) be included in the media format description of the SDP offer/answer, using the "a=rtpmap:" and "a=fmtp:" attributes (RFC4566).
Payload type numbers 96-127 are dynamic. These can be used to negotiate encodings that aren't included in the static list. When using one of these numbers, an encoding specification has to be included in the media format description to specify the exact encoding parameters.
Both negotiating parties can choose their own dynamic payload type number to represent the same media encoding, this doesn't have to be the same number. This can be useful when a party already assigned a particular dynamic payload type number to another encoding. In your example one party uses 101 in the m-line and the other one uses 120, but these numbers represent the same media encoding (see "a=rtpmap:" lines). Each party tells the other 'when you send RTP using encoding X you must include payload type number Y in the RTP packet headers.
The payload type number is included in the PT field of RTP packets headers (RFC 3550)
In this case:
The "a=fmtp:" attribute in the answer specifies 101 as payload type number instead of 120. That means it doesn't apply to the telephone-events payload and no information is available as to which DTMF events are supported (RFC 4733). I think this is an implementation error and the fmtp attribute is meant to apply to the telephone-events payload.
It is an indication that you should expect DTMF issues. But it could also all work fine. Give it a try...
I want some level of real-time speech to text conversion. I am using the web-sockets interface with interim_results=true. However, I am receiving results for the first audio chunk only. The second,third... audio chunks that I am sending are not getting transcribed. I do know that my receiver is not blocked since I do receive the inactivity message.
json {"error": "Session timed out due to inactivity after 30 seconds."}
Please let me know if I am missing something if I need to provide more contextual information.
Just for reference this is my init json.
{
"action": "start",
"content-type":"audio/wav",
"interim_results": true,
"continuous": true,
"inactivity_timeout": 10
}
In the result that I get for the first audio chunk, the final json field is always received as false.
Also, I am using golang but that should not really matter.
EDIT:
Consider the following pseudo log
localhost-server receives first 4 seconds of binary data #lets say Binary 1
Binary 1 is sent to Watson
{interim_result_1 for first chunk}
{interim_result_2 for first chunk}
localhost-server receives last 4 seconds of binary data #lets say Binary 2
Binary 2 is sent to Watson
Send {"action": "stop"} to Watson
{interim_result_3 for first chunk}
final result for the first chunk
I am not receiving any transcription for the second chunk
Link to code
You are getting the time-out message because the service waits for you to either send more audio or send a message signalling the end of an audio submission. Are you sending that message? It's very easy:
By sending a JSON text message with the action key set to the value stop: {"action": "stop"}
By sending an empty binary message
https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/speech-to-text/websockets.shtml
Please let me know if this does not resolve your problem
This is a bit late, but I've open-sourced a Go SDK for Watson services here:
https://github.com/liviosoares/go-watson-sdk
There is some documentation about speech-to-text binding here:
https://godoc.org/github.com/liviosoares/go-watson-sdk/watson/speech_to_text
There is also an example of streaming data to the API in the _test.go file:
https://github.com/liviosoares/go-watson-sdk/blob/master/watson/speech_to_text/speech_to_text_test.go
Perhaps this can help you.
The solution to this question was to set the size header of the wav file to 0.
I'm working on an implementation of TCP for a class and I'm wondering what the Window Size field actually mean.
I understand that the window size is the number of bytes, but does that number of bytes apply to:
the payload of the TCP Segment, not including the header or to
the entire TCP Segment, including the header?
Thus far, I've looked on Wikipedia:
RFC 793 states that:
The window indicates an allowed number of octets that the sender may
transmit before receiving further permission.
RFC 2581 states that:
receiver's advertised window (rwnd) is a receiver-side limit on the
amount of outstanding data
Neither of these make it particularly clear. Anyone?
It applies to the payload only. The sender can always transmit ACKs, FINs, RSTs, etc., with no payload.
While testing SIP Video call, I am getting the below the media line information in an answer for an Offered media. Is this valid media line ??
where media format number is different from the rtpmap number:
m=video 49218 RTP/AVP 109
b=TIAS:322000
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42801f; max-mbps=216000; max-fs=3600; sar=13
a=sendonly
It's not a valid session description, but for a more subtle reason than Ralf's answer. A PT (payload type) of 109 falls in the dynamic range of the RTP/AVP profile defined in RFC 3551 which applies because of the RTP/AVP in the m line. "Dynamic" means what it says: RTP/AVP defines a whole bunch of standard codecs - PCM mu-law, G.729, and so on - and also allows for you to define your own PTs.
Here, the description says "we're going to use a custom PT of 109, and define another at 96, and forget to define what 109 means".
It's perfectly valid to define a bunch of rtpmap attributes and not use them; it's not valid to use a PT and then not define it!
I would say that it's an implementation bug since the rtpmap attribute is not referencing a payload format that has been specified in the media line, which effectively renders the attribute useless.
From Rfc4566:
a=rtpmap:<payload type> <encoding name>/<clock rate> [/<encoding
parameters>]
This attribute maps from an RTP payload type number (as used in
an "m=" line) to an encoding name denoting the payload format
to be used.