Limiting gstreamer pipeline throughput to simulate live source

Limiting gstreamer pipeline throughput to simulate live source - streaming

I'm developing an RTSP server that should emulate a live source, while streaming the data from a file.
What I currently have is mostly based on gst-rtsp-server example test-readme.c, only with the following pipeline:
gst_rtsp_media_factory_set_launch(factory, "( "
"filesrc location=stream.mkv ! matroskademux name=demuxer "
"demuxer. ! queue ! rtph264pay name=pay0 pt=96 "
"demuxer. ! queue ! rtpmp4gpay name=pay1 pt=97 "
")");
This works very well, except for one problem: when the RTSP client (which uses RTSP/TCP interleave transport) is not able to receive data, the whole pipeline locks up until the client is ready again, and then resumes at the original position without any jump.
Since I want to emulate live source which cannot buffer its video indefinitely, the desired behavior in this case is to continue playing the file, so when the client blocks for 5 seconds, it will lose 5 seconds of recording.
I've attempted to achieve this by limiting queue sizes and setting them as leaky (by setting them as queue max-size-bytes=1000000 max-size-time=1000000000 leaky=upstream, which should provide buffer to ~1 second of video, but no more). This did not work entirely as I hoped: the source and demuxer filled the queue and then completely emptied themselves in 0.1 sec.
I figured I need some way to throttle pipeline throughput before the queue, either by limiting the demuxer to real-time demuxing, or finding/making a gstreamer filter that will let through 1 second of data per 1 second of real time.
Do you have any hints on how to do this?

So it seems that while leaky queue and limiter can be done, they don't help much in this regard as GStreamer RTSP implementation has its own queue for outgoing TCP data. What appears to work is keeping the pipeline unchanged and patching gst-rtsp-server module to limit its queue length (to 1 MB in this case, recent version also limit message count to 100):
--- gst-rtsp-server-1.4.5/gst/rtsp-server/rtsp-client.c 2014-11-06 11:20:28.000000000 +0100
+++ gst-rtsp-server-1.4.5-r1/gst/rtsp-server/rtsp-client.c 2015-04-28 14:25:14.207888281 +0200
## -3435,11 +3435,11 ##
gst_rtsp_client_set_send_func (client, do_send_message, priv->watch,
(GDestroyNotify) gst_rtsp_watch_unref);
/* FIXME make this configurable. We don't want to do this yet because it will
* be superceeded by a cache object later */
- gst_rtsp_watch_set_send_backlog (priv->watch, 0, 100);
+ gst_rtsp_watch_set_send_backlog (priv->watch, 1000000, 100);
GST_INFO ("client %p: attaching to context %p", client, context);
res = gst_rtsp_watch_attach (priv->watch, context);
return res;

Related

How to minimize latency when reading audio with ALSA?

When trying to acquire some signals in the frequency domain, I've encountered the issue of having snd_pcm_readi() take a wildly variable amount of time. This causes problems in the logic section of my code, which is time dependent.
I have that most of the time, snd_pcm_readi() returns after approximately 0.00003 to 0.00006 seconds. However, every 4-5 call to snd_pcm_readi() requires approximately 0.028 seconds. This is a huge difference, and causes the logic part of my code to fail.
How can I get a consistent time for each call to snd_pcm_readi()?
I've tried to experiment with the period size, but it is unclear to me what exactly it does even after re-reading the documentation multiple times. I don't use an interrupt driven design, I simply call snd_pcm_readi() and it blocks until it returns -- with data.
I can only assume that the reason it blocks for a variable amount of time, is that snd_pcm_readi() pulls data from the hardware buffer, which happens to already have data readily available for transfer to the "application buffer" (which I'm maintaining). However, sometimes, there is additional work to do in kernel space or on the hardware side, hence the function call takes longer to return in these cases.
What purpose does the "period size" serve when I'm not using an interrupt driven design? Can my problem be fixed at all by manipulation of the period size, or should I do something else?
I want to achieve that each call to snd_pcm_readi() takes approximately the same amount of time. I'm not asking for a real time compliant API, which I don't imagine ALSA even attempts to be, however, seeing a difference in function call time on the order of being 500 times longer (which is what I'm seeing!) then this is a real problem.
What can be done about it, and what should I do about it?
I would present a minimal reproducible example, but this isn't easy in my case.

Typically when reading and writing audio, the period size specifies how much data ALSA has reserved in DMA silicon. Normally the period size specifies your latency. So for example while you are filling a buffer for writing through DMA to the I2S silicon, one DMA buffer is already being written out.
If you have your period size too small, then the CPU doesn't have time to write audio out in the scheduled execution slot provided. Typically people aim for a minimum of 500 us or 1 ms in latency. If you are doing heavy forms of computation, then you may want to choose 5 ms or 10 ms of latency. You may choose even more latency if you are on a non-powerful embedded system.
If you want to push the limit of the system, then you can request the priority of the audio processing thread be increased. By increasing the priority of your thread, you ask the scheduler to process your audio thread before all other threads with lower priority.
One method for increasing priority taken from the gtkIOStream ALSA C++ OO classes is like so (taken from the changeThreadPriority method) :
/** Set the current thread's priority
\param priority <0 implies maximum priority, otherwise must be between sched_get_priority_max and sched_get_priority_min
\return 0 on success, error code otherwise
*/
static int changeThreadPriority(int priority){
int ret;
pthread_t thisThread = pthread_self(); // get the current thread
struct sched_param origParams, params;
int origPolicy, policy = SCHED_FIFO, newPolicy=0;
if ((ret = pthread_getschedparam(thisThread, &origPolicy, &origParams))!=0)
return ALSA::ALSADebug().evaluateError(ret, "when trying to pthread_getschedparam\n");
printf("ALSA::Stream::changeThreadPriority : Current thread policy %d and priority %d\n", origPolicy, origParams.sched_priority);
if (priority<0) //maximum priority
params.sched_priority = sched_get_priority_max(policy);
else
params.sched_priority = priority;
if (params.sched_priority>sched_get_priority_max(policy))
return ALSA::ALSADebug().evaluateError(ALSA_SCHED_PRIORITY_ERROR, "requested priority is too high\n");
if (params.sched_priority<sched_get_priority_min(policy))
return ALSA::ALSADebug().evaluateError(ALSA_SCHED_PRIORITY_ERROR, "requested priority is too low\n");
if ((ret = pthread_setschedparam(thisThread, policy, &params))!=0)
return ALSA::ALSADebug().evaluateError(ret, "when trying to pthread_setschedparam - are you su or do you have permission to set this priority?\n");
if ((ret = pthread_getschedparam(thisThread, &newPolicy, &params))!=0)
return ALSA::ALSADebug().evaluateError(ret, "when trying to pthread_getschedparam\n");
if(policy != newPolicy)
return ALSA::ALSADebug().evaluateError(ALSA_SCHED_POLICY_ERROR, "requested scheduler policy is not correctly set\n");
printf("ALSA::Stream::changeThreadPriority : New thread priority changed to %d\n", params.sched_priority);
return 0;
}

What is the meaning of CANBUS function mode initilazing settings for STM32?

I want to understand meaning of the following function mode definition, there is explanation in the library. But I don't understand that because explanations are very short and not enough. I searched on the net I couldnt find any information about.
CAN_InitStructure.CAN_TTCM = DISABLE;
CAN_InitStructure.CAN_ABOM = DISABLE;
CAN_InitStructure.CAN_AWUM = DISABLE;
CAN_InitStructure.CAN_NART = ENABLE;
CAN_InitStructure.CAN_RFLM = DISABLE;
CAN_InitStructure.CAN_TXFP = ENABLE;

These are the names of the bits located in the CAN master control register (CAN_MCR). So, the proper source for their meaning is the reference manual. My following answer will be somewhat copy & paste from the reference manual, but I will try to explain these bits in detail.
TTCM (Time triggered communication mode): This bit activates the Time Triggered Communication (TTCAN) mode, which is an extension to the CAN standard. I don't know much about TTCAN, but as I understand, it assigns time windows to messages to satisfy some real-time requirements. So, normally this bit should remain 0.
ABOM (Automatic bus-off management): If the transmit error counter (TEC) becomes greater than 255, the CAN hardware switches to bus-off state. To recover, it must wait for the recovery sequence, 128 occurrences of 11 consecutive recessive bits. Only after that, the CAN hardware may return to the normal operating state. This bit controls the returning behavior. If it's 1, returning to normal state is automatic. Otherwise, software should make the request, provided that the recovery sequence has been observed.
AWUM (Automatic wakeup mode): The CAN module can be in one of 3 modes: Initialization mode, normal mode or sleep (low power) mode. Sleep mode is requested by the software. However, you have 2 options to exit sleep mode. If this bit is 0, then you have to exit sleep mode manually. You may enable CAN wakeup interrupt to inform you about bus activity, then exit the sleep mode in ISR. But if this bit is 1, the hardware returns to normal mode automatically when it detects bus activity.
NART (No automatic retransmission): Normally, CAN hardware retries to transmit a message if its previous attempts fail, because of arbitration lost etc. But if you make this bit 1, the transmitter does not retry. This is required when you use Time Triggered Communication (TTCAN). Otherwise, you should keep this bit 0.
RFLM (Receive FIFO locked mode): Your receive mailboxes have 3 levels depth, meaning that they can store maximum 3 messages before they are overrun. This bit controls what happens in case of mailbox overrun. Default behavior is to keep the oldest 2 messages and the newest one. For example, if you received 5 messages, the buffer keeps the messages 1, 2 & 5. However, if you make this bit 1, the mailbox keeps the messages 1, 2 & 3 and discards the new arrivals.
TXFP (Transmit FIFO priority): You have 3 transmit mailboxes. When you fill more than one, the hardware must decide which one to transmit first. Normally, one can assume that a message with a lower ID number is more important and should be transmitted first. But if you want to transfer them in a first-comes-first-served fashion for some reason, you need to make this bit 1. Of course, this is just a local priority. On the physical bus, the messages with lower ID always have priority.

Triggering Interrupt for any byte received

I'm trying to get a code to work that triggers an interrupt for a variable data size coming to a RX input of a STM32 board (not discovery) in DMA Circular mode. ex.:CONNECTED\r\nDATAREQUEST\r\n
So far so good, I'm being able to receive data and all, while also triggering the DMA interrupt.
I will then create a sub RX message processing buffer breaking down each \r\n to a different char array pointer.
msgProcessingBuffer[0] = "COM_OK"
msgProcessingBuffer[1] = "DATAREQUEST"
msgProcessingBuffer[n] = "BlahBlahBlah"
My problem comes actually from the trigger of the interrupt. I would like to trigger the interrupt from any amount of data and processing any data received.
If I use the interrupt request bellow:
HAL_UART_Receive_DMA(&huart1,uart1RxMsgBuffer, 30);
The input buffer will take 30 bytes to trigger the interrupt, but that's too much time to wait because I would like to process the RX data as soon as a \r\n is found in the string. So I cannot wait for the full buffer to fill to begin processing it.
If I use the interrupt request bellow:
HAL_UART_Receive_DMA(&huart1,uart1RxMsgBuffer, 1;
It will trigger as I want, but there is no point on using DMA in this case because it will trigger the interrupt for every byte and will create a buffer of just 1 byte (duh) just like in "polling mode".
So my question is, how do I trigger the DMA for the first byte received but still receive/process all data that might come after it in a single interrupt? I believe I might be missing some basic concept here.
Best regards,
Blukrr

In short: HAL/SPL libraries don't provide such feachures.
Generally some MCUs, for example STM32F091VCT6 have hardware supporting of Modbus and byte flow analysis (interrupt by recieve some control byte) - so if you will use such MCU in you project, you can configure receive by circular DMA with interrupts by receive '\r' or '\n' byte.
And I repeat: HAL or SPL don't support this features, you can use it only throught work with registers (see reference manuals).

I was taking a look at some other forums and I've found there a work around for this problem.
I'm using a DMA in circular mode and then I monitor the NDTR which updates its value every time a byte is received through the UART interface. Then I cyclically call a function (in while 1 loop or in a cyclic interrupt handler) that break down each message part always looking for /n /r chars. This function also saves the current NDTR value for comparison if it has changed since the last "while 1" cycle. If the NDTR has changed since last cycle I wait a couple milliseconds to receive the remaining message (UART it's too slow to transmit) and then save those received messages in a char buffer array for post processing.
If you create a circular DMA buffer of about 50 bytes (HAL_UART_Receive_DMA(&huart1,uart1RxMsgBuffer, 50)) I think it's enough to compensate any fluctuations in the program cycle.

In the mean time I opened a ticket to ST and they confirmed what you just said they also added:
SOLUTION PROPOSED BY SUPPORTER - 14/4/2016 16:45:22 :
Hi Gilberto,
The DMA interrupt requests available are listed on Table 50 of the Reference Manual, RM0090, http://www.st.com/web/en/resource/technical/document/reference_manual/DM00031020.pdf. Therefore, basically, the DMA interrupt can only trigger at the end of one of these events.
• Half-transfer reached
• Transfer complete
• Transfer error
• Fifo error (overrun, underrun or FIFO level error)
• Direct mode error
Getting a DMA interrupt to trigger upon reception of a specific character in your receive data stream is not possible. You may want to trigger the interrupt when you receive packets of say 30 bytes each and then process the datastring to check if your \r\n chars have arrived so you can process the data block.
Regards,
MCU Tech Support

libspotify C sending zeros at the end of track

I'm using libspotify SDK, C library for win32.
I think to have a right setup, every session callback is registered. I don't understand why i can't receive the call for end_of_track, while music_delivery continues to be called with zero padding 22050 long frames.
I attempt to start playing first loading the track with sp_session_load; till it returns SP_ERROR_IS_LOADING I post a message on my message queue (synchronization method I've used, PostMessage win32 API) in order to reload again with same API sp_session_load. As soon as it returns SP_ERROR_OK I use the sp_session_play and the music_delivery starts immediately, with correct frames.
I don't know why at the end of track the libspotify runtime then start sending zero padded frames, instead of calling end_of_track callback.
In other conditions it works perfectly: I've used the sp_track obtained from a album browse, so the track is fully loaded at the moment I load to the current session for playing: with this track, it works fine with end_of_track called correctly. In the case with padding error, I search the track using its Spotify URI and got the results; in this case the track metadata are not still ready (at the play attempt) so I used that kind of "polling" on sp_session_load with PostMessage.
Can anybody help me?

I ran into the same problem and I think the issue was that I was consuming the data too fast without giving other threads time to do any work since I was spending all of my time in the music_delivery callback. I found that if I add some throttling and notify the main thread that it can wake up to do some processing, the extra zeros at the end of track is reduced to one delivery of 22,050 frames (or 500ms at 44.1kHz).
Here is an example of what I added to my callback, heavily borrowed from the jukebox.c example provided with the SDK:
/* Buffer 1 second of data, then notify the main thread to do some processing */
if (g_throttle > format->sample_rate) {
pthread_mutex_lock(&g_notify_mutex);
g_notify_do = 1;
pthread_cond_signal(&g_notify_cond);
pthread_mutex_unlock(&g_notify_mutex);
// Reset the throttle counter
g_throttle = 0;
return 0;
}
As I said, there was still 22,050 frames of zeros delivered before the track stopped, but I believe libspotify may purposely do this to ensure that the duration calculated by the number of frames received (song_duration_ms = total_frames_delivered / sample_rate * 1000) is greater than or equal to the duration reported by sp_track_duration. In my case, the track I was trying to stream was 172,000ms in duration, without the extra padding the duration calculated is 171,796ms, but with the padding it was 172,296ms.
Hope this helps.

Weird Winsock recv() slowdown

I'm writing a little VOIP app like Skype, which works quite good right now, but I've run into a very strange problem.
In one thread, I'm calling within a while(true) loop the winsock recv() function twice per run to get data from a socket.
The first call gets 2 bytes which will be casted into a (short) while the second call gets the rest of the message which looks like:
Complete Message: [2 Byte Header | Message, length determined by the 2Byte Header]
These packets are round about 49/sec which will be round about 3000bytes/sec.
The content of these packets is audio-data that gets converted into wave.
With ioctlsocket() I determine wether there is some data on the socket or not at each "message" I receive (2byte+data). If there's something on the socket right after I received a message within the while(true) loop of the thread, the message will be received, but thrown away to work against upstacking latency.
This concept works very well, but here's the problem:
While my VOIP program is running and when I parallely download (e.g. via browser) a file, there always gets too much data stacked on the socket, because while downloading, the recv() loop seems actually to slow down. This happens in every download/upload situation besides the actual voip up/download.
I don't know where this behaviour comes from, but when I actually cancel every up/download besides the voip traffic of my application, my apps works again perfectly.
If the program runs perfectly, the ioctlsocket() function writes 0 into the bytesLeft var, defined within the class where the receive function comes from.
Does somebody know where this comes from? I'll attach my receive function down below:
std::string D_SOCKETS::receive_message(){
recv(ClientSocket,(char*)&val,sizeof(val),MSG_WAITALL);
receivedBytes = recv(ClientSocket,buffer,val,MSG_WAITALL);
if (receivedBytes != val){
printf("SHORT: %d PAKET: %d ERROR: %d",val,receivedBytes,WSAGetLastError());
exit(128);
}
ioctlsocket(ClientSocket,FIONREAD,&bytesLeft);
cout<<"Bytes left on the Socket:"<<bytesLeft<<endl;
if(bytesLeft>20)
{
// message gets received, but ignored/thrown away to throw away
return std::string();
}
else
return std::string(buffer,receivedBytes);}

There is no need to use ioctlsocket() to discard data. That would indicate a bug in your protocol design. Assuming you are using TCP (you did not say), there should not be any left over data if your 2byte header is always accurate. After reading the 2byte header and then reading the specified number of bytes, the next bytes you receive after that constitute your next message and should not be discarded simply because it exists.
The fact that ioctlsocket() reports more bytes available means that you are receiving messages faster than you are reading them from the socket. Make your reading code run faster, don't throw away good data due to your slowness.
Your reading model is not efficient. Instead of reading 2 bytes, then X bytes, then 2 bytes, and so on, you should instead use a larger buffer to read more raw data from the socket at one time (use ioctlsocket() to know how many bytes are available, and then read at least that many bytes at one time and append them to the end of your buffer), and then parse as many complete messages are in the buffer before then reading more raw data from the socket again. The more data you can read at a time, the faster you can receive data.
To help speed up the code even more, don't process the messages inside the loop directly, either. Do the processing in another thread instead. Have the reading loop put complete messages in a queue and go back to reading, and then have a processing thread pull from the queue whenever messages are available for processing.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse