Pyaudio/Portaudio issue in synchronizing read/write buffers - latency

My sampler (sound blaster) is 96000hz.
I need to syncronize the output signal with the input signal, so the time_info is critical for us. We need to know the exact time difference between the first sample of of the output buffer and the first sample of the input buffer.
However, the problem is that the time_info gap between two consecutive write or read buffers (callbacks) are not the same. For example:
My buffer is 19200 for the read and write and when I print it each callback I get:
Time gap between READ buffers: 0.19993333332968177274
Time gap between WRITE buffers: 0.20000299999810522422
Time gap between READ buffers: 0.20001300000149058178
Time gap between WRITE buffers: 0.20000299999810522422
Time gap between READ buffers: 0.20000900000013643876
Time gap between WRITE buffers: 0.20000099999742815271
Time gap between READ buffers: 0.19996774999162880704
Time gap between WRITE buffers: 0.19999800000368850306
19200 / 96000 should be 0.2 seconds always, but I get different times in the time_info, which eliminates my ability to sync the output with the input.
I am working with a 40khz sound wave, so in order to sync the phase I need the times to be accurate by 1 microsecond maximum.
Is this a problem in PortAudio? Is this a problem in my sound card? Do these time_info numbers come from the sound card (the hardware) or from PortAudio?
I am using PyAudio (PortAudio binding for Python) in Ubuntu.

Related

STM32 - Delay between UART TX frames at high speed

I am sending a fixed-sized buffer (512 byte) at 8 Mbits/s via UART out of an STM32F3, and I am experiencing what it seems to be a fixed delay (~ 2-3 bit periods) between consecutive frames.
In the screenshot below, after sending the dummy value 01010101, it can be seen how the line idles high for quite a long time between the stop bit "1" of one frame and start bit "0" of the next. The bit period of ~125 ns is as expected and the data is received successfully by another STM32, but such a cumulative delay between frames (128 us min. over the entire buffer) is a problem for my application.
Scope screenshot
I have tried sending the buffer using HAL_Transmit, HAL_Transmit_DMA (single call) and LL_USART_TransmitData8 (one byte at a time) with similar results.
Any idea of what could be causing it? Thanks!!!

Postgresql index only scan speed

So I've got an index only scan returning 750k rows, and sucking it into a cte and doing count star is taking .5 seconds. It's barely using any iops, and maxing out the instance to 16xlarge isn't moving the needle. Switched to bitmap heap scan and it's still giving me .5 seconds. What are some alternatives (other than using a mat view) I can try to speed it up? Or is this just postgres v10 at its finest?

Time delay of audio signal

Here is the scenario:
I'm generated a signal which is: 200ms # 2kHz 1000ms of zeros 200ms # 2kHz
and i want to calculate the time delay between them, not between the two synthetic audio part. but by playing the signal on a speaker and recording it using a microphone (adds noise)
Fs = 44100
i tried: 1. cross correlation 2. calculation the diff between two maximas of RMS window at the size of 8820 samples. (we get the maxima when the window is on the sound part.
the distance between the speaker and the mic is around 30cm. i cant get a steady result. why?
If you want to do this accurately and consistently then one method I have used in the past is to loop back one channel (e.g. the left channel) from the output to the input and then use the other (i.e. right) channel for the timing test. You can then cross correlate between the left (loopback) and right (actual audio) channels. This eliminates many potential sources of error (buffer delays, hardware latency, software issues, etc), since the left and right channels will always be "in sync" and you should be able to make measurements accurate to +/- 1 sample period (+/- 12 µs at 44.1 kHz).

How to run a multi-queue code using OpenCL?

For example, I'm doing some image processing work on every frame of a video.
Every frame's processing using 200ms including writing, processing and reading.
And the fps is 25, in that case every two frames' distance is 40ms. Then the processing is too slow to show continuous result.
So here is my idea, I use multi-queues for this work.
In CPU part,
while(video is not over)
{
1. read the frame0;
processing the **frame0** using **queue0**;
wait 40 ms;
2. read the frame1;
processing the frame1 using **queue1**;
wait 40 ms;
3.4.5.
...(after 5 frames(just about the 200ms's processing time))
6. download the **frame0**'s result.
7. read the frame5;
processing the frame5 using **queue0**;
wait 40 ms;
...
}
The code means that, I use different queues for reading and processing the same frame in a video.
However, my experiment result is faster, but just 2 times faster, but not in my imaginary speed.
Can anyone tell me how to deal it? THX!
Assuming you have one Device, here are some thoughts on this point:
Main reason to have multiple Command Queues (CQ) per single OpenCL Device is the ability to execute kernels & do IO operations simultaneously.
Usually one CQ is enought to load single Device at ~100%. Though, your multi-CQ idea is good (in my opinion), as you're constantly feeding GPU with workload.
Look at kernel execution time. May be, it's big enough, so that your Device is constantly executing kernels & can't go any faster.
I think, you don't need to wait for 40ms. Good solution is to process frames in queue, in which they are put to eliminate the difference between bitstream & display order.
If you have too many CQ, your OpenCL driver thread will be busy maintaining them, so that performance may decrease.

LabVIEW Real-Time Timed Loop resolution

we are using LabVIEW Real-Time with the PXI-8110 Controller.
I am facing the following problem:
I have a loop with 500µs period time (time-loop) and no other task. I write the time each loop iteration into ram and then save the data afterwords.
It is necessary that the period is exact, but I see that it is 500µs with +/- 25 µs.
The clock for the timed loop is 1 MHz.
How is it possible to have 500µs - 25µs. I would understand if I get 500µs + xx µs when my compution is to heavy. But till now I just do an addition nothing more.
So does anyone have a clue what is going wrong?
I thought it would be possible to have resolution of 1µs as NI advertise (if the computation isn't so heavy).
Thanks.
You may need to check which thread the code is working in. An easier way to work is to use the Timed Loop as this will try and correct for overruns. Also pre-allocate the array that you are storing the data into and then replace array subset which each new value. You should see a massive improvement with this way.
If you display that value and are running in development mode you will see jitter +- time as you are reporting everything back to the host. Build the executable and again jitter will shrink.