Play sounds synchronously using snd_pcm_writei - synchronous

I need to play sounds upon certain events, and want to minimize
processor load, because some image processing is being done too, and
processor performance is limited.
For the present, I play only one sound at a time, and I do it as
follows:
At program startup, sounds are read from .wav files
and the raw pcm data are loaded into memory
a sound device is opened (snd_pcm_open() in mode SND_PCM_NONBLOCK)
a worker thread is started which continously calls snd_pcm_writei()
as long as it is fed with data (data->remaining > 0).
Somewhat resumed, the worker thread function is
static void *Thread_Func (void *arg)
{
thrdata_t *data = (thrdata_t *)arg;
snd_pcm_sframes_t res;
while (1)
{ pthread_mutex_lock (&lock);
if (data->shall_stop)
{ data->shall_stop = false;
snd_pcm_drop (data->pcm_device);
snd_pcm_prepare (data->pcm_device);
data->remaining = 0;
}
if (data->remaining > 0)
{ res = snd_pcm_writei (data->pcm_device, data->bufptr, data->remaining);
if (res == -EAGAIN) continue;
if (res < 0) // error
{ fprintf (stderr, "snd_pcm_writeX() error: %s\n", snd_strerror(result));
snd_pcm_recover (data->sub_device, res);
}
else // another chunk has been handed over to sound hw
{ data->bufptr += res * bytes_per_frame;
data->remaining -= res;
}
if (data->remaining == 0) snd_pcm_prepare (data->pcm_device);
}
pthread_mutex_unlock (&lock);
usleep (sleep_us); // processor relief
}
} // Thread_Func
Ok, so this works well for one sound at a time. How do I play various?
I found dmix, but it seems a tool on user level, to mix streams coming
from separate programs.
Furthermore, I found the Simple Mixer Interface in the ALSA Project C
Library Interface, without any hint or example or tutorial about how
to use all these function described by one line of text each.
As a last resort I could calculate the mean value of all the buffers
to be played synchronously. So long I've been avoiding that, hoping
that an ALSA solution might use sound hardware resources, thus
relieving the main processor.
I'd be thankful for any hint about how to continue.

Related

UDP server consuming high CPU

I am observing high CPU usage in my UDP server implementation which runs an infinite loop expecting 15 1.5KB packets every milliseconds. It looks like below:
struct RecvContext
{
enum { BufferSize = 1600 };
RecvContext()
{
senderSockAddrLen = sizeof(sockaddr_storage);
memset(&overlapped, 0, sizeof(OVERLAPPED));
overlapped.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
memset(&sendersSockAddr, 0, sizeof(sockaddr_storage));
buffer.clear();
buffer.resize(BufferSize);
wsabuf.buf = (char*)buffer.data();
wsabuf.len = ULONG(buffer.size());
}
void CloseEventHandle()
{
if (overlapped.hEvent != INVALID_HANDLE_VALUE)
{
CloseHandle(overlapped.hEvent);
overlapped.hEvent = INVALID_HANDLE_VALUE;
}
}
OVERLAPPED overlapped;
int senderSockAddrLen;
sockaddr_storage sendersSockAddr;
std::vector<uint8_t> buffer;
WSABUF wsabuf;
};
void Receive()
{
DWORD flags = 0, bytesRecv = 0;
SOCKET sockHandle =...;
while (//stopping condition//)
{
std::shared_ptr<RecvContext> _recvContext = std::make_shared<IO::RecvContext>();
if (SOCKET_ERROR == WSARecvFrom(sockHandle, &_recvContext->wsabuf, 1, nullptr, &flags, (sockaddr*)&_recvContext->sendersSockAddr,
(LPINT)&_recvContext->senderSockAddrLen, &_recvContext->overlapped, nullptr))
{
if (WSAGetLastError() != WSA_IO_PENDING)
{
//error
}
else
{
if (WSA_WAIT_FAILED == WSAWaitForMultipleEvents(1, &_recvContext->overlapped.hEvent, FALSE, INFINITE, FALSE))
{
//error
}
if (!WSAGetOverlappedResult(sockHandle, &_recvContext->overlapped, &bytesRecv, FALSE, &flags))
{
//error
}
}
}
_recvContext->CloseEventHandle();
// async task to process _recvContext->buffer
}
}
The cpu consumption for this udp server is very high even when the packets are not being processed post receipt. How can the cpu consumption be improved here?
You've chosen about the most inefficient combination of mechanisms imaginable.
Why use overlapped I/O if you're only going to pend one operation and then wait for it complete?
Why use an event, which is about the slowest notification scheme that Windows has.
Why do you only pend one operation at a time? You're forcing the implementation to stash datagrams in its own buffers and then copy them into yours.
Why do you post the receive operation right before you're going to wait for it to complete rather than right after the previous one completes?
Why do you create a new receive context each time instead of re-using the existing buffer, event, and so on?
Use IOCP. Windows events are very slow and heavy.
Post lots of operations. You want the operating system to be able to put the datagram right in your buffer rather than having to allocate another buffer that it copies data into and out of.
Re-use your buffers and allocate all your receive buffers from a contiguous pool rather than fragmenting them throughout process memory. The memory used for your buffers has to be pinned and you want to minimize the amount of pinning needed.
Re-post operations as soon as they complete. Don't process them and then re-post. There's no reason to delay starting the operation. You can probably ignore this if you followed all the other suggestions because you wouldn't have a "spare" buffer to post anyway.
Alternatively, you can probably get away with having a thread that spins on a blocking receive operation. Just make sure your code has a loop that is as tight as possible, posting a different (already-allocated) buffer as soon as it returns after dispatching another thread to process the buffer it just filled with the receive operation.

STM32 HAL UART receive by interrupt cleaning buffer

I'm working on an application where I process commands of fixed length received via UART.
I'm also using FreeRTOS and the task that handles the incoming commands is suspended until the uart interrupt handler is called, so my code is like this
void USART1_IRQHandler()
{
HAL_UART_IRQHandler(&huart1);
}
void HAL_UART_ErrorCallback(UART_HandleTypeDef *huart){
HAL_UART_Receive_IT(&huart1, uart_rx_buf, CMD_LEN);
}
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart){
BaseType_t higherTaskReady = pdFALSE;
HAL_UART_Receive_IT(&huart1, uart_rx_buf, CMD_LEN); //restart interrupt handler
xSemaphoreGiveFromISR(uart_mutex, &higherTaskReady);
portYIELD_FROM_ISR( higherTaskReady); //Relase the semaphore
}
I am using the ErrorCallBack in case if an overflow occurs. Now I successfully catch every correct command, even if they are issued char by char.
However, I'm trying to make the system more error-proof by considering the case where more characters are received than expected.
The command length is 4 but if I receive, for example, 5 chars, then the first 4 is processed normally but when another command is received it starts from the last unprocessed char, so another 3 chars are needed until I can correctly process the commands again.
Luckily, the ErrorCallback is called whenever I receive more than 4 chars, so I know when it happens, but I need a robust way of cleaning the UART buffer so the previous chars are gone.
One solution I can think of is using UART receive 1 char at a time until it can't receive anymore, but is there a better way to simply flush the buffer?
Yes, the problem is the lack of delimiter, because every byte can can carry a value to be processed from 0 to 255. So, how can you detect the inconsistency?
My solution is a checksum byte in the protocol. If the checksum fails, a blocking-mode UART_Receive function is called in order to put the rest of the data from the "system-buffer" to a "disposable-buffer". In my example the fix size of the protocol is 6, I use the UART6 and I have a global variable RxBuffer. Here is the code:
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *UartHandle)
{
if(UartHandle->Instance==USART6) {
if(your_checksum_is_ok) {
// You can process the incoming data
} else {
char TempBuffer;
HAL_StatusTypeDef hal_status;
do {
hal_status = HAL_UART_Receive(&huart6, (uint8_t*)&TempBuffer, 1, 10);
} while(hal_status != HAL_TIMEOUT);
}
HAL_UART_Receive_IT(&huart6, (uint8_t*)RxBuffer, 6);
}
}
void HAL_UART_ErrorCallback(UART_HandleTypeDef *UartHandle) {
if(UartHandle->Instance==USART6) {
HAL_UART_Receive_IT(&huart6, (uint8_t*)RxBuffer, 6);
}
}

Socket read often return -1 while the buffer is not empty

I am trying to test WiFi data transfer between cell phone and Esp32 (Arduino), when ESP32 reads file data via WiFi, even there is still data in, client.read() often return -1, I have to add other conditions to check reading finished or not.
My question is why there are so many failed reads, any ideas are highly appreciated.
void setup()
{
i=0;
Serial.begin(115200);
Serial.println("begin...");
// You can remove the password parameter if you want the AP to be open.
WiFi.softAP(ssid, password);
IPAddress myIP = WiFi.softAPIP();
Serial.print("AP IP address: ");
Serial.println(myIP);
server.begin();
Serial.println("Server started");
}
// the loop function runs over and over again until power down or reset
void loop()
{
WiFiClient client = server.available(); // listen for incoming clients
if(client) // if you get a client,
{
Serial.println("New Client."); // print a message out the serial port
Serial.println(client.remoteIP().toString());
while(client.connected()) // loop while the client's connected
{
while(client.available()>0) // if there's bytes to read from the client,
{
char c = client.read(); // read a byte, then
if(DOWNLOADFILE ==c){
pretime=millis();
uint8_t filename[32]={0};
uint8_t bFilesize[8];
long filesize;
int segment=0;
int remainder=0;
uint8_t data[512];
int len=0;
int totallen=0;
delay(50);
len=client.read(filename,32);
delay(50);
len=client.read(bFilesize,8);
filesize=BytesToLong(bFilesize);
segment=(int)filesize/512;
delay(50);
i=0; //succeed times
j=0; //fail times
////////////////////////////////////////////////////////////////////
//problem occures here, to many "-1" return value
// total read 24941639 bytes, succeed 49725 times, failed 278348 times
// if there were no read problems, it should only read 48,715 times and finish.
//But it total read 328,073 times, including 278,348 falied times, wasted too much time
while(((len=client.read(data,512))!=-1) || (totallen<filesize))
{
if(len>-1) {
totallen+=len;
i++;
}
else{
j++;
}
}
///loop read end , too many times read fail//////////////////////////////////////////////////////////////////
sprintf(toClient, "\nfile name %s,size %d, total read %d, segment %d, succeed %d times, failed %d times\n",filename,filesize,totallen,segment,i,j);
Serial.write(toClient);
curtime=millis();
sprintf(toClient, "time splashed %d ms, speed %d Bps\n", curtime-pretime, filesize*1000/(curtime-pretime));
Serial.write(toClient);
client.write(RETSUCCESS);
}
else
{
Serial.write("Unknow command\n");
}
}
}
// close the connection:
client.stop();
Serial.println("Client Disconnected.");
}
When you call available() and check for > 0, you are checking to see if there is one or more characters available to read. It will be true if just one character has arrived. You read one character, which is fine, but then you start reading more without stopping to see if there are more available.
TCP doesn't guarantee that if you write 100 characters to a socket that they all arrive at once. They can arrive in arbitrary "chunks" with arbitrary delays. All that's guaranteed is that they will eventually arrive in order (or if that's not possible because of networking issues, the connection will fail.)
In the absence of a blocking read function (I don't know if those exist) you have to do something like what you are doing. You have to read one character at a time and append it to a buffer, gracefully handing the possibility of getting a -1 (the next character isn't here yet, or the connection broke). In general you never want to try to read multiple characters in a single read(buf, len) unless you've just used available() to make sure len characters are actually available. And even that can fail if your buffers are really large. Stick to one-character-at-a-time.
It's a reasonable idea to call delay(1) when available() returns 0. In the places where you try to guess at something like delay(20) before reading a buffer you are rolling the dice - there's no promise that any amount of delay will guarantee bytes get delivered. Example: Maybe a drop of water fell on the chip's antenna and it won't work until the drop evaporates. Data could be delayed for minutes.
I don't know how available() behaves if the connection fails. You might have to do a read() and get back a -1 to diagnose a failed connection. The Arduino documentation is absolutely horrible, so you'll have to experiment.
TCP is much simpler to handle on platforms that have threads, blocking read, select() and other tools to manage data. Having only non-blocking read makes things harder, but there it is.
In some situations UDP is actually a lot simpler - there are more guarantees about getting messages of certain sizes in a single chunk. But of course whole messages can go missing or show up out of order. It's a trade-off.

How to performs I/O to block device from block device driver under Linux

I have a task to write a block device driver (/dev/dua - for example) , this block device is must be looks like to OS as a disk device like /dev/sda. So, this driver must process data blocks and write it to other block device.
I looking for a right way to performs I/O operations on the backend device like "/dev/sdb".
I have played with the vfs_read/write routines it's works at glance for disk sector sized transfers. But, probably there is more effective way to performs I/O on backend device ?
TIA.
Follows a piece of code (original has been found here : https://github.com/asimkadav/block-filter) implements a "filtering" feature, so it can be used as a method to performs I/O on a backend block device `
void misc_request_fn(struct request_queue *q, struct bio *bio) {
printk ("we are passing bios.\n");
// here is where we trace requests...
original_request_fn (q, bio);
return;
}
void register_block_device(char *path) {
struct request_queue *blkdev_queue = NULL;
if (path == NULL) {
printk ("Block device empty.\n");
return;
}
printk ("Will open %s.\n", path);
blkdev = lookup_bdev(path);
if (IS_ERR(blkdev)) {
printk ("No such block device.\n");
return;
}
printk ("Found block device %p with bs %d.\n", blkdev, blkdev->bd_block_size);
blkdev_queue = bdev_get_queue(blkdev);
original_request_fn = blkdev_queue->request_fn;
blkdev_queue->request_fn = misc_request_fn;
}
`

Why would alSourceUnqueueBuffers fail with INVALID_OPERATION

Here's the code:
ALint cProcessedBuffers = 0;
ALenum alError = AL_NO_ERROR;
alGetSourcei(m_OpenALSourceId, AL_BUFFERS_PROCESSED, &cProcessedBuffers);
if((alError = alGetError()) != AL_NO_ERROR)
{
throw "AudioClip::ProcessPlayedBuffers - error returned from alGetSroucei()";
}
alError = AL_NO_ERROR;
if (cProcessedBuffers > 0)
{
alSourceUnqueueBuffers(m_OpenALSourceId, cProcessedBuffers, arrBuffers);
if((alError = alGetError()) != AL_NO_ERROR)
{
throw "AudioClip::ProcessPlayedBuffers - error returned from alSourceUnqueueBuffers()";
}
}
The call to alGetSourcei returns with cProcessedBuffers > 0, but the following call to alSourceUnqueueBuffers fails with an INVALID_OPERATION. This in an erratic error that does not always occur. The program containing this sample code is a single-threaded app running in a tight loop (typically would be sync'ed with a display loop, but in this case I'm not using a timed callback of any sort).
Try alSourceStop(m_OpenALSourceId) first.
Then alUnqueueBuffers(), and after that, Restart playing by alSourcePlay(m_OpenALSourceId).
I solved the same problem by this way. But I don't know why have to do so in
Mentioned in this SO thread,
If you have AL_LOOPING enabled on a streaming source the unqueue operation will fail.
The looping flag has some sort of lock on the buffers when enabled. The answer by #MyMiracle hints at this as well, stopping the sound releases that hold, but it's not necessary..
AL_LOOPING is not meant to be set on a streaming source, as you manage the source data in the queue. Keep queuing, it will keep playing. Queue from the beginning of the data, it will loop.