How to implement non interleaved mmap (direct) access mode in Alsa for live streaming from SRAM? - mmap

I have a buffer in SRAM of size 4096 bytes which gets updated with new raw audio data periodically:
----------------------------------------
| 2048 bytes of L | 2048 bytes of right|
----------------------------------------
^ ^
|A |B
NOTE: A and B are pointers to the start addresses.
As shown, the data is non-interleaved stereo (16 bit samples, 44100Hz sampling rate) and since it is in memory already, I prefer to use MMAP'ed access instead of RW (and as far as my understanding of alsa goes, it should not use a separate buffer for copying data into from this buffer).
The starting address for this buffer is fixed (say 0x3f000000 physical address) and I am MMAPing this buffer to get a virtual address pointer.
Now, how do I send data to alsa for playback and what should be my configuration?
My current unsuccessful way is:
Resample ON
Rate 44100
SND_PCM_ACCESS_MMAP_NONINTERLEAVED
channels 2
format SND_PCM_FORMAT_S16_LE
period near 1024 frames
buffer near 2*1024 frames
void* ptr[2];
ptr[0] = A // Points to mmaped virtual address of A
ptr[1] = B // Points to mmaped virtual address of B
while(1)
{
wait_for_new_data_in_buffer();
snd_pcm_mmap_writen(handle, &ptr, period_size);
}
Extra info:
1. I am using an embedded board with arm cores and running basic linux on it.
2. This is a proprietary work related project and hence the vague-ness of this question.
3. I already know that directly MMAPing a physical address is not recommended so do not waste your time commenting about it.
Thanks in advance.

Related

What is the benefit of having the registers as a part of memory in AVR microcontrollers?

Larger memories have higher decoding delay; why is the register file a part of the memory then?
Does it only mean that the registers are "mapped" SRAM registers that are stored inside the microprocessor?
If not, what would be the benefit of using registers as they won't be any faster than accessing RAM? Furthermore, what would be the use of them at all? I mean these are just a part of the memory so I don't see the point of having them anymore. Having them would be just as costly as referencing memory.
The picture is taken from Avr Microcontroller And Embedded Systems The: Using Assembly and C by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
AVR has some instructions with indirect addressing, for example LD (LDD) – Load Indirect From Data Space to Register using Z:
Loads one byte indirect with or without displacement from the data space to a register. [...]
The data location is pointed to by the Z (16-bit) Pointer Register in the Register File.
So now you can move from a register by loading its data-space address into Z, allowing indirect or indexed register-to-register moves. Certainly one can think of some usage where such indirect access would save the odd instruction.
what would be the benefit of using registers as they won't be any faster than accessing RAM?
accessing General purpose Registers is faster than accessing Ram
first of all let us define how fast measured in microControllers .... fast mean how many cycle the instruction will take to excute ... LOOk at the avr architecture
See the General Purpose Registers GPRs are input for the ALU , and the GPRs are controlled by instruction register (2 byte width) which holds the next instruction from the code memory.
Let us examine simple instruction ADD Rd , Rr; where Rd,Rr are any two register in GPRs so 0<=r,d<=31 so each of r and d could be rebresented in 5 bit,now open "AVR Instruction Set Manual" page number 32 look at the op-code for this simple add instraction is 000011rdddddrrrr and becuse this op-code is two byte(code memory width) this will fetched , Decoded and excuit in one cycle (under consept of pipline ofcourse) jajajajjj only one cycle seems cool to me
I mean these are just a part of the memory so I don't see the point of having them anymore. Having them would be just as costly as referencing memory
You suggest to make the all ram as input for the ALU; this is a very bad idea: a memory address takes 2 bytes.
If you have 2 operands per instruction as in Add instruction you will need 4 Byte for saving only the operands .. and 1 more byte for the op-code of the operator itself in total 5 byte which is waste of memory!
And furthermore this architecture could only fetch 2 bytes at a time (instruction register width) so you need to spend more cycles on fetching the code from code memory which is waste of cycles >> more slower system
Register numbers are only 4 or 5 bits wide, depending on the instruction, allowing 2 per instruction with room to spare in a 16-bit instruction word.
conclusion GPRs' existence are crucial for saving code memory and program execution time
Larger memories have higher decoding delay; why is the register file a part of the memory then?
When cpu deal with GPRs it only access the first 32 position not all the data space
Final comment
don't disturb yourself by time diagram for different ram technology because you don't have control on it ,so who has control? the architecture designers , and they put the limit of the maximum crystal frequency you can use with there architecture and everything will be fine .. you only concern about cycles consuming with your application

ADXL345 Accelerometer data use on I2C (Beaglebone Black)

Background Information
I am trying to make sure I will be able to run two ADXL345 Accelerometers on the same I2C Bus.
To my understanding, the bus can transmit up to 400k bits/s on fast mode.
In order to send 1 byte of data, there are 20 extra bits of overhead.
There are 6 bytes per accelerometer reading (XLow, XHigh, YLow, YHigh, ZLow, ZHigh)
I need to do 1000 readings per second with both accelerometers
Thus,
My total data used per second is 336k bits/s which is within my limit of 400k bits/s.
I am not sure if I am doing these calculations correctly.
Question:
How much data am I transmitting per second with two accelerometers reading 1000 times per second on i2c?
Your math seems to be a bit off; for this accelerometer (from the datasheet: https://www.sparkfun.com/datasheets/Sensors/Accelerometer/ADXL345.pdf), in order to read the 6 bytes of XYZ sample data, you need to perform a 6-byte burst read of the registers. What this means in terms of data transfer is a write of the register address to the accelerometer (0x31) then a burst read of 6 bytes continuously. Each of these two transfers requires sending first the I2C device address and the R/W bit, as well as an ACK/NAK per byte, including the address bytes, as well as START/REPEAT START/STOP conditions. So, over all, an individual transfer to get a single sample (ie, a single XYZ acceleration vector) is as follows:
Start (*) | Device Address: 0x1D (7) | Write: 0 (1) | ACK (1) | Register Address: 0x31 (8) | ACK (1) | Repeat Start (*) | Device Address: 0x1D (7) | Read: 1 (1) | ACK (1) | DATA0 (8) | ACK(1) | DATA1 (8) | ACK (1) | ... | DATA5 (8) | NAK (1) | Stop (*)
If we add all that up, we get 81+3 bits of data that need to be transmitted. Note first that the START, REPEAT START and STOP might not actually take a bits worth of time each but for simplicity we can assume they do. Note also that while the device address is only 7 bits, you always need to postpend the READ/WRITE bit, so an I2C transaction is always 8 bits + ACK/NAK, so 9 bits in total. Note also, the I2C max transfer rate really defines the max SCK speed the device can handle, so in fast mode, the SCK is at most 400KHz (thus 400Kbps at most, but because of the protocol, you'll get less in real data). Thus, 84 bits at 400KHz means that we can transfer a sample in 0.21 ms or ~4700 samples/sec assuming no gaps or breaks in transmission.
Since you need to read 2 samples every 1ms (2 accelerometers, so 84 bits * 2 = 164 bits/sample or 164Kbps at 1KHz sampling rate), this should at least be possible for fast mode I2C. However, you will need to be careful that you are taking full use of the I2C controller. Depending on the software layer you are working on, it might be difficult to issue I2C burst reads fast enough (ie, 2 burst read transactions within 1ms). Using the FIFO on the accelerometer would significantly help the latency requirement, meaning instead of having 1ms to issue two burst reads, you can delay up to 32ms to issue 64 burst reads (since you have 2 accelerometers); but since you need to issue a new burst read to read the next sample, you'll have to be careful about the delay introduced by software between calls to whatever API youre using to perform the I2C transactions.

How to determin size of internal flash for target?

I want to upload the device firmware to a file using dfu-util. How can I determine the correct size of flash memory?
After booting the device into DFU it can be found using:
dfu-util -l
For which I receive the following information:
Found DFU: [0483:df11] ver=2200, devnum=8, cfg=1, intf=0, alt=1, name="#Option Bytes /0x1FFFF800/01*016 e", serial="FFFFFFFEFFFF"
Found DFU: [0483:df11] ver=2200, devnum=8, cfg=1, intf=0, alt=0, name="#Internal Flash /0x08000000/064*0002Kg", serial="FFFFFFFEFFFF"
To upload the flash configuration to a file I need to determine the size of flash memory. Based on this article the size would be 64 x 1kB of flash memory.
What is the meaning of 'Kg' in 0002Kg?
The instructions I am following (elsewhere, for a different device, see above) is using 128 x 1kB, instead which I believe is incorrect. How can I calculate the size of flash memory and what will happen if I set the memory size too large to download an image?
The command is:
dfu-util -a 0 -s 0x08000000:131072 -U ./original.bin
I think it should be
dfu-util -a 0 -s 0x08000000:65536 -U ./original.bin
Please see UM0290 in which we find:
Each Alternate setting string descriptor must follow this memory mapping else the PC
Host Software would be able to decode the right mapping for the selected device:
#: To detect that this is a special mapping descriptor (to avoid decoding standard
descriptor)
/: for separator between zones
Maximum 8 digits per address starting by “0x”
/: for separator between zones
Maximum of 2 digits for the number of sectors
* : For separator between number of sectors and sector size
Maximum 3 digits for sector size between 0 and 999
1 digit for the sector size multiplier. Valid entries are: B (byte), K (Kilo), M (Mega)
1 digit for the sector type as follows:
a (0x41): Readable
b (0x42): Erasable
c (0x43): Readable and Erasable
d (0x44): Writeable
e (0x45): Readable and Writeable
f (0x46): Erasable and Writeable
g (0x47): Readable, Erasable and Writeable
So your string really does mean that the internal flash is 64 sectors of 2 KB, and that they are "readable, erasable and writable" (i.e. flash). Are you sure about your expectations of the device's flash layout?

advantages of segmentation in 8086 microprocessor

what are the advantages of segmentation in 8086 microprocessor?
Not getting the importance of segmentation. Is it for managing more memory?
The instruction set used in 8086 is a 16-bit instruction set. This means that a register can only store values in the range 0x0000 to 0xFFFF, and instructions mostly only did 16-bit operations (16-bit addition, 16-bit subtraction, etc). If a register contains an address/pointer, then it would've worked out to a maximum of 64 KiB of address space (some for ROMs, some for RAM) and this wasn't enough for the market at the time.
Segmentation was a way to allow the 16-bit CPU to support a larger address space. Essentially, combining two 16-bit registers together, so that addresses/pointers could be much larger. Unfortunately (likely, to avoid "unnecessary at the time" costs of having more address lines on the CPU's bus), instead of using two 16-bit registers as a 32-bit address, Intel did an "address = segment * 16 + offset" thing to end up with a 20-bit address, giving the 8086 a 1 MiB address space.
Later (early 1980s) there was a push towards "protected objects" where "objects" (in object oriented programming) could be given access controls and limits that are enforced/checked by hardware, and around the same time there were "virtual memory" ideas floating around. These ideas led to the ill-fated iAPX 432 CPU; but also led to the idea of associating protection (attributes and limits) to the segments that 8086 already had, which resulted in the "protected mode" introduced with 80286 (and extended in 80386).
Essentially; the original reason for (advantage of) segments was to increase the address space (without the cost of a 32-bit instruction set, etc); and things like protection and memory management were retro-fitted afterwards (and then barely used by software before being abandoned in favour of paging).
Answer
Memory size is divided into segments of various sizes.
A segment is just area in memory.
Process of dividing memory in this way is called segmentation.
data ----> bytes -----> specific address.
8086 has 20 lines address bus.
2^20 bytes = 1Mb
4 types of Segments
Code Segment
Data Segment
Stack Segment
Extra Segment
Each of these segments are addressed by an address stored in corresponding segment address.
registers are 16 bit in size.
store base address of corresponding segments and store upper 16 bits.

understanding double buffers

I am using the C8051F320 and basing my firmware on the HID example firmware (for example, BlinkyExample).
IN and OUT reports are each 64B long (a single 64B packet).
I enabled the ADC and set it for 10kSps. Every ADC interrupt, a sample is stored in an array. When enough samples are taken to fill a packet, an IN packet is sent.
Software sends a report telling the firmware how many reports to return.
1) The example firmware uses EP1, which has 128B. It splits the EP into IN and OUT, 64B each.
The firmware drops the first sample of each IN report at 10kSps. At 5kSps it runs fine.
2) I modified EP1 to be double buffered, but it is only 32B long now. Regardless, streaming 1000s of IN reports with 10kSps data works great (confirmed by FFT of the sampled sine wave in software).
3) I modified the firmware to use EP2, since that has 256B total, giving 64B if splitting and double buffering.
a) Again, at 10kSps, the first sample of each packet is dropped. Why? It runs fine at 5kSps.
Actually, I cannot seem to visualize how double buffering works. If the sample rate is faster than the HID transfer rate, the FIFOs will overflow regardless, right? How does double buffering help? But it seems that for double-buffering to be effective, the packet size must be cut in half.
b) While switching references of EP1 to EP2, I came across this code in F3xx_USB0_Standard_Requests.c: DATAPTR = (unsigned char*)&ONES_PACKET;. Setting a char* = address of a char* does not seem correct to me. I modified it to DATAPTR = (unsigned char*)ONES_PACKET; Regardless, there seems to be no difference. What does the zeros and ones arrays do?
HID example firmware
HID uses interrupt type endpints, which will transfer data at most once per frame, or once per 1 ms - depending on your HID descriptor, it can be much slower. This yields a net data rate of about 64000 Bytes/sec.
Once you need to transfer more data, use bulk or isochronous endpoints.