I've implemented a PIC32 as a USB sound card, using USB Audio Class 1. I'm sending a sawtooth signal from the microcontroller to the PC(windows 7, 64 bit), as 16-bit samples:
in decimal:
000
800
1600
2400
.. so on
then i try recording the received audio using Audacity, with MME -driver, as .wav or .raw.
I use MATLAB to open and inspect the data, and there i see data like:
000
799
1599
2400
..
The distortion varies from -1 to +1 bit pr sample..
Anyone have any idea where the problem might be.?
Windows-audio drivers.?
Since you receive the audio signal on PC, playback it, and record it using SW, the audio signal is converted from digital to analog, and to digital again. These introduce quantization error and noise, and you see the little difference between two signals.
I solved my problem..
The problem was caused by the application i used to record the data, and the method i used.. I used Audacity, which supports the old windows MME audio API, and the DirectSound API. These are relatively high-level API's apparently, and are the cause of the distortion.
About the Windows Core Audio APIs
Instead i used another program, called Reaper, it has an option to record using ASIO og WASAPI. This solves my problem. I've checked every sample in an 2 hour .wav file, using MATLAB, and it is completely bit-perfect.
I was probably some quantization error, but it was caused by the API.
ASIO and WASAPI gave me bit-perfect sound, MME and DirectSound gave me a distorted signal.
Related
I am new to GNU Radio and I'm trying to transmit a value using it and the USRP B210 board.
I used Matlab to convert the value 0.121 to wav format then convert the wav file to .dat file using audio_to_file example in GNU Radio.
When I transmit the .dat file using the B210 and GNU Radio, I received a wav file but when I read the wav using matlab function (audioread()) I get a different value.
P.S.
Sample rate for the converted .dat file was 44100 Hz and 16 bits per sample.
The receiver and transmitter sampling rate is 400K Hz.
I used fm_tx4.py example from the GNU Radio package for my transmitter.
I used uhd_nbfm_receiver.grc for the receiver.
If you're wondering why your received signal doesn't have the same amplitude as your sent signal, you're not getting the very basics of radio communications: as there is no digital line between your transmitter and your receiver, power can go anywhere, and how much reaches the receiver depends on a lot of factors, including gain, antennas, distance, matching...
There will be a lot more things that are different on the RX side than they were on the TX side: Your reception has not been time-synchronized, so you might see a phase shift. You don't mention whether the receiver is the same, a clock-synchronized or an clock-independent B210, which means you have the general case, where no two physical clocks can be identical (yes, that's impossible, but you can reduce errors), so you'll generally see some frequency offset, too.
I recommend reading up a bit on basic radio comm theory, I often recommend GNU Radio's pictured introduction, and GNU Radio's suggested Reading Page. Michael Ossmann gets some recognition for his courses, too, so you should definitely have a look at them.
Also, all your data->Wav->transmit conversion is totally unnecessary. Matlabs fread/fwrite functions can read/store the native machine float format that GNU Radio's file_sink/file_source can store/read. See the FAQ entry.
I'm running a test to measure the basic latency of my iPhone app, and the result was disappointing: 50ms for a play-through test app. The app just picks up mic input and plays it out using the same render callback, no other audio units or processing involved. Therefore, the results seemed too bad for such a basic scenario. I need some pointers to see if the result makes sense or I had design flaws in my test.
The basic idea of the test was to have three roles:
My finger snap as the reference sound source.
A simple iOS play-thru app (using built-in mic) as the first
listener to #1.
A Mac (with a USB mic and Audacity) as the second listener to #1 and
the only listener to the iOS output (through a speaker connected via
iOS headphone jack).
Then, with Audacity in recording mode, the Mac would pick up both the sound from my fingers and its "clone" from the iOS speaker in close range. Finally I simply visually observe the waveform in Audacity's recorded track and measure the time interval between the peaks of the two recorded snaps.
This was by no means a super accurate measurement, but at least the innate latency of the Mac recording pipeline should have been cancelled out this way. So that the error should mainly come from the peak distance measurement, which I assume should be much smaller than the audio pipeline latency and can be ignored.
I was expecting 20ms or lower latency, but clearly the result gave me 50~60ms.
My ASBD uses kAudioFormatFlagsCanonical and kAudioFormatLinearPCM as format.
50 mS is about 4 mS more than the duration of 2 audio buffers (one output, one input) of size 1024 at a sample rate of 44.1 kHz.
17 mS is around 5 mS more than the duration of 2 buffers of length 256.
So it looks like the iOS audio latency is around 5 mS plus the duration of the two buffers (the audio output buffer duration plus the time it takes to fill the input buffer) ... on your particular iOS device.
A few iOS devices may support even shorter audio buffer sizes of 128 samples.
You can use core audio and set up the audio session to have a very low latency.
You can set the buffer size to be smaller using AudioSessionSetProperty(kAudioSessionProperty_PreferredHardwareIOBufferDuration,...
Using smaller buffers causes the audio callback to happen more often while grabbing smaller chunks of audio. Keep in mind that this is merely a suggestion to the audio system. iOS will use a callback time suitable value based on your sample rate and integer powers of 2.
Once you set the buffer duration, you can get the actual buffer duration that the system will use using AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareIOBufferDuration,...
I'll summarize Paul R's comments as the answer, which has solved my problem:
50 ms corresponds to a total buffer size of around 2048 at a 44.1 kHz sample rate, which doesn't seem unreasonable given that you have both a record and a playback path.
I don't know that the buffer size is 2048, and there may be more than one buffer in your record-playback loopback test, but it seems that the effective total buffer size in you test is probably of the order of 2048, which doesn't seem unreasonable. Of course if you're only interested in record latency, as the title of your question suggests, then you'll need to find a way to tease that out separately from playback latency.
I'm looking to output four channels of audio simultaneously from MATLAB using an external soundcard (Creative Soundblaster X-Fi Surround 5.1 Pro USB) and haven't yet found a working solution.
As far as I understand it, MATLAB's audioplayer object can only output a stereo signal, so I've tried two alternatives: playrec and pa_wavplay. Both appear to do precisely what I need, but seem to recognize the soundcard as a two-channel device only.
Any advice would be terrific. Thanks for reading.
(The MATLAB version is R2007b and the only available toolbox is the Signal Processing Toolbox.)
I've got a bit of experience of pa_wavplay and found it dealt with large numbers of inputs/output without any problems. I'd suspect the problem is with your audio interface.
While it can output 5.1, it's quite possibly producing those "additional" channels itself by decoding a Dolby Digital stream once in the device. This suggests the interface won't allow you to output 6 six channels of PCM audio as such.
If you're determined to use this device and prepared to get your hands dirty you could always try encoding your audio as ac3 yourself, but I guess you'd have to do this outside Matlab.
I'm looking for some C/C++ code for VAD (Voice Activity Detection).
Basically, my application is reading PCM frames from the device. I would like to know when the user is talking. I'm not looking for any speech recognition algorithm but only for voice detection.
I would like to know when the user is talking and when he finishes:
bool isVAD(short* pcm,size_t count);
Google's open-source WebRTC code has a VAD module written in C. It uses a Gaussian Mixture Model (GMM), which is typically much more effective than a simple energy-threshold detector, especially in a situation with dynamic levels and types of background noise. In my experience it's also much more effective than the Moattar-Homayounpour VAD that Gilad mentions in their comment.
The VAD code is part of the much, much larger WebRTC repository, but it's very easy to pull it out and compile it on its own. E.g. the webrtcvad Python wrapper includes just the VAD C source.
The WebRTC VAD API is very easy to use. First, the audio must be mono 16 bit PCM, with either a 8 KHz, 16 KHz or 32 KHz sample rate. Each frame of audio that you send to the VAD must be 10, 20 or 30 milliseconds long.
Here's an outline of an example that assumes audio_frame is 10 ms (320 bytes) of audio at 16000 Hz:
#include "webrtc/common_audio/vad/include/webrtc_vad.h"
// ...
VadInst *vad;
WebRtcVad_Create(&vad);
WebRtcVad_Init(vad);
int is_voiced = WebRtcVad_Process(vad, 16000, audio_frame, 160);
There are open source implementations in the Sphinx and Freeswitch projects. I think they are all energy based detectors do won't need any kind model.
Sphinx 4 (Java but it should be easy to port to C/C++)
PocketSphinx
Freeswitch
How about LibVAD? www.libvad.com
Seems like that does exactly what you're describing.
Disclosure: I'm the developer behind LibVAD
Here is a question for all you iPhone experts:
If you guys remember the sounds that modems used to make, or when one was trying to load a program from a cassette tape – I am trying to replicate this in an iPhone for a ham radio application. I have a stream of data (ASCII) and I need to encode it as AFSK at 1200 baud. So basically everything in the stream is converted to a series of 1200 and 2200 Hz tones. It needs to sound something like this: http://upload.wikimedia.org/wikipedia/commons/2/27/AFSK_1200_baud.ogg
I successfully built a bit array out of the string, but when I try to assign tones to each bit I get gaps in the sound, therefore it doesn’t demodulate correctly.
Any thought of how one should tackle this problem? Thank you.
The mobilesynth project is open-source. You might be able to scan that for code that generates the tones you need.
How are you assigning tones to the bits? Remember, a digital audio signal is just a stream of samples with values between -1 and 1. Perhaps there is a clipping issue between tone assignments. This can happen if the signal dives below -1 or above 1. If it stays above or below this range at a constant value, there will be no sound. Maybe you could output your stream of samples to check if this is the case. Or plug the output into an oscilloscope...
Also note that clicking can occur between "uneven" transitions of signals. For example if i output a sample with value 1 followed immediately by a sample with value -1, a click or pop will be produced.