BPF verifier rejects when try to access __sk_buff member - bpf

I'm trying to write a sample eBPF program which can access __sk_buff member and dump it into
/sys/kernel/debug/tracing/trace.
#include <uapi/linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
SEC("dump_skb_member")
int test_prog(struct __sk_buff *skb)
{
char fmt[] = "packet: local %u remote %u\n";
__u32 local_ip4 = bpf_htonl(skb->local_ip4);
__u32 remote_ip4 = bpf_htonl(skb->remote_ip4);
bpf_trace_printk(fmt, sizeof(fmt), local_ip4, remote_ip4);
return BPF_OK;
}
char _license[] SEC("license") = "GPL";
When i compile this code, and load this program
ip route add 192.168.56.104 encap bpf out obj sample.o section dump_skb_member dev enp0s8
An error is thrown.
Prog section 'dump_skb_member' rejected: Permission denied (13)!
- Type: 11
- Instructions: 21 (0 over limit)
- License: GPL
Verifier analysis:
0: (b7) r2 = 685349
1: (63) *(u32 *)(r10 -8) = r2
2: (18) r2 = 0x2065746f6d657220
4: (7b) *(u64 *)(r10 -16) = r2
5: (18) r2 = 0x7525206c61636f6c
7: (7b) *(u64 *)(r10 -24) = r2
8: (18) r2 = 0x203a74656b636170
10: (7b) *(u64 *)(r10 -32) = r2
11: (61) r4 = *(u32 *)(r1 +92)
invalid bpf_context access off=92 size=4
processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Error fetching program/map!
But if i don't call bpf_trace_printk to dump member, it can be loaded.
My question is why the error is caused by calling bpf_trace_printk?

The error is not caused by bpf_trace_prink(), but by the skb accesses that are present in your bytecode only when you call bpf_trace_printk().
Accessing skb->local_ip4 and skb->remote_ip4 is not allowed for programs of types BPF_PROG_TYPE_LWT_OUT, that you use.
See kernel code: The function that checks for valid access for this type returns false for certain offsets or range in skb:
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
[...]
return false;
This corresponds to the range where local_ip4 and remote_ip4 are defined:
struct __sk_buff {
[...]
/* Accessed by BPF_PROG_TYPE_sk_skb types from here to ... */
__u32 family;
__u32 remote_ip4; /* Stored in network byte order */
__u32 local_ip4; /* Stored in network byte order */
__u32 remote_ip6[4]; /* Stored in network byte order */
__u32 local_ip6[4]; /* Stored in network byte order */
__u32 remote_port; /* Stored in network byte order */
__u32 local_port; /* stored in host byte order */
/* ... here. */
When you remove your call to the bpf_trace_printk() helper, your local variables are no longer needed and clang compiles your code out of the program. The attempt to read at forbidden offsets is no longer part of your bytecode, so the program loads successfully.

Related

STM32F407ze throws hard fault on executing code from RAM

I am trying to execute a simple code from RAM, but for some reason the program halts/throws hard fault. I am using CCMRAM for my data, heap and stack while SRAM1 for executing code. Here is my linker script and startup file.
LinkerScript.ld
/*
******************************************************************************
**
** File : LinkerScript.ld
**
** Author : Auto-generated by Ac6 System Workbench
**
** Abstract : Linker script for STM32F407ZETx Device from STM32F4 series
** 112Kbytes SRAM1
** 16Kbytes SRAM2
** 64Kbytes CCMRAM
** 512Kbytes ROM
**
** Set heap size, stack size and stack location according
** to application requirements.
**
** Set memory bank area and size if external memory is used.
**
** Target : STMicroelectronics STM32
**
** Distribution: The file is distributed �as is,� without any warranty
** of any kind.
**
*****************************************************************************
** #attention
**
** <h2><center>© COPYRIGHT(c) 2019 Ac6</center></h2>
**
** Redistribution and use in source and binary forms, with or without modification,
** are permitted provided that the following conditions are met:
** 1. Redistributions of source code must retain the above copyright notice,
** this list of conditions and the following disclaimer.
** 2. Redistributions in binary form must reproduce the above copyright notice,
** this list of conditions and the following disclaimer in the documentation
** and/or other materials provided with the distribution.
** 3. Neither the name of Ac6 nor the names of its contributors
** may be used to endorse or promote products derived from this software
** without specific prior written permission.
**
** THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
** AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
** IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
** DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
** FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
** DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
** SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
** CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
** OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
** OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**
*****************************************************************************
*/
/* Entry Point */
ENTRY(Reset_Handler)
/* Highest address of the user mode stack */
/*_estack = 0x2001C000; /* start of SRAM2 */
_estack = 0x10010000; /* end of CCMRAM */
_Min_Heap_Size = 0; /* required amount of heap */
_Min_Stack_Size = 0x400; /* required amount of stack */
/* Memories definition */
MEMORY
{
SRAM1 (xrw) : ORIGIN = 0x20000000, LENGTH = 112K
SRAM2 (xrw) : ORIGIN = 0x2001C000, LENGTH = 16K
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
CCMRAM (rw) : ORIGIN = 0x10000000, LENGTH = 64K
}
/* Sections */
SECTIONS
{
/* The startup code into ROM memory */
.isr_vector :
{
. = ALIGN(4);
KEEP(*(.isr_vector)) /* Startup code */
. = ALIGN(4);
} >FLASH
/*sram1 section for code and data*/
_sisram1 = LOADADDR(.sram1);
.sram1 :
{
. = ALIGN(4);
_s_sram1 = .;
*(.sram1);
*(.sram1*);
. = ALIGN(4);
_e_sram1 = .;
} > SRAM1 AT >FLASH
/*sram2 section for code and data*/
_sisram2 = LOADADDR(.sram2);
.sram2 :
{
. = ALIGN(4);
_s_sram2 = .;
*(.sram2);
*(.sram2*);
. = ALIGN(4);
_e_sram2 = .;
} > SRAM2 AT >FLASH
/* The program code and other data into FLASH memory */
.text :
{
. = ALIGN(4);
*(.text) /* .text sections (code) */
*(.text*) /* .text* sections (code) */
*(.glue_7) /* glue arm to thumb code */
*(.glue_7t) /* glue thumb to arm code */
*(.eh_frame)
KEEP (*(.init))
KEEP (*(.fini))
. = ALIGN(4);
_etext = .; /* define a global symbols at end of code */
} >FLASH
/* Constant data into FLASH memory*/
.rodata :
{
. = ALIGN(4);
*(.rodata) /* .rodata sections (constants, strings, etc.) */
*(.rodata*) /* .rodata* sections (constants, strings, etc.) */
. = ALIGN(4);
} >FLASH
.ARM.extab : {
. = ALIGN(4);
*(.ARM.extab* .gnu.linkonce.armextab.*)
. = ALIGN(4);
} > FLASH
.ARM : {
. = ALIGN(4);
__exidx_start = .;
*(.ARM.exidx*)
__exidx_end = .;
. = ALIGN(4);
} >FLASH
.preinit_array :
{
. = ALIGN(4);
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP (*(.preinit_array*))
PROVIDE_HIDDEN (__preinit_array_end = .);
. = ALIGN(4);
} >FLASH
.init_array :
{
. = ALIGN(4);
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT(.init_array.*)))
KEEP (*(.init_array*))
PROVIDE_HIDDEN (__init_array_end = .);
. = ALIGN(4);
} >FLASH
.fini_array :
{
. = ALIGN(4);
PROVIDE_HIDDEN (__fini_array_start = .);
KEEP (*(SORT(.fini_array.*)))
KEEP (*(.fini_array*))
PROVIDE_HIDDEN (__fini_array_end = .);
. = ALIGN(4);
} >FLASH
/* Used by the startup to initialize data */
_sidata = LOADADDR(.data);
/* Initialized data sections into RAM memory */
.data :
{
. = ALIGN(4);
_sdata = .; /* create a global symbol at data start */
*(.data) /* .data sections */
*(.data*) /* .data* sections */
. = ALIGN(4);
_edata = .; /* define a global symbol at data end */
} >CCMRAM AT> FLASH
/* Uninitialized data section into RAM memory */
. = ALIGN(4);
.bss :
{
/* This is used by the startup in order to initialize the .bss secion */
_sbss = .; /* define a global symbol at bss start */
__bss_start__ = _sbss;
*(.bss)
*(.bss*)
*(COMMON)
. = ALIGN(4);
_ebss = .; /* define a global symbol at bss end */
__bss_end__ = _ebss;
} >CCMRAM
/* User_heap_stack section, used to check that there is enough RAM left */
._user_heap_stack :
{
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >CCMRAM
/* Remove information from the compiler libraries */
/DISCARD/ :
{
libc.a ( * )
libm.a ( * )
libgcc.a ( * )
}
.ARM.attributes 0 : { *(.ARM.attributes) }
}
And here is part of my startup file
/**
******************************************************************************
* #file startup_stm32.s dedicated to STM32F407ZETx device
* #author Ac6
* #version V1.0.0
* #date 2019-03-30
******************************************************************************
*/
.syntax unified
.cpu cortex-m4
.fpu softvfp
.thumb
.global g_pfnVectors
.global Default_Handler
/* start address for the initialization values of the .data section.
defined in linker script */
.word _sidata
/* start address for the .data section. defined in linker script */
.word _sdata
/* end address for the .data section. defined in linker script */
.word _edata
/* start address for the .bss section. defined in linker script */
.word _sbss
/* end address for the .bss section. defined in linker script */
.word _ebss
/**
* #brief This is the code that gets called when the processor first
* starts execution following a reset event. Only the absolutely
* necessary set is performed, after which the application
* supplied main() routine is called.
* #param None
* #retval : None
*/
.section .text.Reset_Handler
.weak Reset_Handler
.type Reset_Handler, %function
Reset_Handler:
ldr r0, =_estack
mov sp, r0 /* set stack pointer */
/* Copy the data segment initializers from flash to SRAM */
ldr r0, =_sdata
ldr r1, =_edata
ldr r2, =_sidata
movs r3, #0
b LoopCopyDataInit
CopyDataInit:
ldr r4, [r2, r3]
str r4, [r0, r3]
adds r3, r3, #4
LoopCopyDataInit:
adds r4, r0, r3
cmp r4, r1
bcc CopyDataInit
/* Zero fill the bss segment. */
ldr r2, =_sbss
ldr r4, =_ebss
movs r3, #0
b LoopFillZerobss
FillZerobss:
str r3, [r2]
adds r2, r2, #4
LoopFillZerobss:
cmp r2, r4
bcc FillZerobss
///////////// Following blocks are for SRAM1 /////////////////////
// Copy from flash to SRAM1
ldr r0, =_s_sram1
ldr r1, =_e_sram1
ldr r2, =_sisram1
movs r3, #0
b LoopCopySRAM1Init
CopySRAM1Init:
ldr r4, [r2, r3]
str r4, [r0, r3]
adds r3, r3, #4
LoopCopySRAM1Init:
adds r4, r0, r3
cmp r4, r1
bcc CopySRAM1Init
// End of data copy from Flash to SRAM1
// Zero fill the SRAM1 segment.
ldr r2, =_s_sram1
b LoopFillZeroSRAM1
FillZeroSRAM1:
movs r3, #0
str r3, [r2]
adds r2, r2, #4
LoopFillZeroSRAM1:
ldr r3, = _e_sram1
cmp r2, r3
bcc FillZeroSRAM1
//////////////// End of SRAM1 Blocks /////////////////
///////////// Following blocks are for SRAM2 /////////////////////
// Copy from flash to SRAM2
ldr r0, =_s_sram2
ldr r1, =_e_sram2
ldr r2, =_sisram2
movs r3, #0
b LoopCopySRAM2Init
CopySRAM2Init:
ldr r4, [r2, r3]
str r4, [r0, r3]
adds r3, r3, #4
LoopCopySRAM2Init:
adds r4, r0, r3
cmp r4, r1
bcc CopySRAM2Init
// End of data copy from Flash to SRAM2
// Zero fill the SRAM2 segment.
ldr r2, =_s_sram2
b LoopFillZeroSRAM2
FillZeroSRAM2:
movs r3, #0
str r3, [r2]
adds r2, r2, #4
LoopFillZeroSRAM2:
ldr r3, = _e_sram2
cmp r2, r3
bcc FillZeroSRAM2
//////////////// End of SRAM2 Blocks /////////////////
/* Call the clock system intitialization function.*/
bl SystemInit
/* Call static constructors */
bl __libc_init_array
/* Call the application's entry point.*/
bl main
LoopForever:
b LoopForever
Can anybody please help me if there is any bug here. Lets say, for now we are only trying to run code from SRAM1 since that is the one which is connected to I-bus and D-bus ports of the Cortex M4 on this MCU. With respect to the picture below, I would expect bl to be jumping to beginning of SRAM1.. Can you please shed some light on it:
According to the screen captures, your code is jumping to SRAM (as expected) but there is no code in SRAM (MOV R0 R0 indicates the memory is filled with 0s) hence the hardFault.
You have to copy from Flash to RAM the code that you want to execute in RAM.

GNU ASM .section directive not working/linker issue

I'm trying to move my _start function to 0x0, as it is the bootloader.
Flash ROM exists from 0x0 to the first 128MB (=1Gb), other memory is DDR3 RAM but we will map RAM to 0x80000000 to 0xFFFFFFFF.
The issue is with the directive .section ".vect", the _start address is not going into the .vect section, it is going into the .text section.
_start:
.section ".vect" /* I've tried .section .vect and I've tried moving above _start */
b ResetHandler
b UndefHandler
b SVC_Handler
b PrefetchAbortHandler
b DataAbortHandler
b NotUsedHandler
b IRQ_Handler
b FIQ_Handler
1:
b 1b /* Hang and don't return */
In my linker script, the MEMORY command and then the start of SECTIONS is:
MEMORY
{
vect (rx) : o = 0x0, l = 1M
rom (rx) : o = 1M, l = 127M
ram (wx) : o = 0x80000000, l = 0x80000000
}
SECTIONS
{
.vect : ALIGN(64)
{
. = 0x0;
*(.vect*)
VectTableEnd = .;
VectTableSize = SIZEOF(.vect);
} > vect
.... (.text, .bss, .data, .stack, etc are other SECTIONS entries)
}
But no matter what, the _start assembly function code gets shoved into the standard .text section, not at address 0x0. Does anyone know what I'm going wrong?
The code is targetted at bare-metal ARMv7A machines, compiled/linked with arm-none-eabi-as/ld
Cheers.
No surprises here if your _start label ends up in .text:
/* implicitly default section .text */
_start:
/* still the same section .text */
.section .vect
/* explicitly the section .vect */
b ResetHandler
You probably want to switch the section before defining the _start label to make that label end up in the intended section.
/* implicitly default section .text */
.section .vect
/* explicitly the section .vect */
_start:
b ResetHandler
Using the name _start as the symbol for a vector table is a bit weird (I would have expected a symbol like __vectors or vector_table), but that is beyond the scope of this question.

eBPF tools - skb_network_header crashes in a BPF Kernel trace function

I am looking to trace ip_forward_finish. The intent is to trace latency of all TCP connections going through a linux based gateway router. Hence thought of tracing ip_forward_finish kernel function. And capture the time-stamp of SYN, SYN-ACK and ACK messages at the router.
The issue is accessing iphdr inside the trace function makes the verifier complain with the following error:
bpf: Failed to load program: Permission denied
0: (79) r6 = *(u64 *)(r1 +96)
1: (b7) r1 = 0
2: (6b) *(u16 *)(r10 -24) = r1
3: (bf) r3 = r6
4: (07) r3 += 192
5: (bf) r1 = r10
6: (07) r1 += -24
7: (b7) r2 = 2
8: (85) call bpf_probe_read#4
9: (69) r1 = *(u16 *)(r10 -24)
10: (55) if r1 != 0x8 goto pc+7
R0=inv(id=0) R1=inv8 R6=inv(id=0) R10=fp0
11: (69) r1 = *(u16 *)(r6 +196)
R6 invalid mem access 'inv'
HINT: The invalid mem access 'inv' error can happen if you try to dereference
memory without first using bpf_probe_read() to copy it to the BPF stack.
Sometimes the bpf_probe_read is automatic by the bcc rewriter, other times
you'll need to be explicit.
The code fragment I originally had was as below and the crash occurs when an access to ip_Hdr->protocol is made. And I also checked that ip_Hdr is not null.
int trace_forward_finish(struct pt_regs *ctx,struct net *net,
struct sock *sk, struct sk_buff *skb)
{
if (skb->protocol != htons(ETH_P_IP))
return 0;
struct iphdr* ip_Hdr = (struct iphdr *) skb_network_header(skb);
if (ip_Hdr->protocol != IPPROTO_TCP)
return 0;
/// More code
}
Per the HINT in the message, I did try to change to bpf_probe_read but still the same outcome
int trace_forward_finish(struct pt_regs *ctx,struct net *net,
struct sock *sk, struct sk_buff *skb)
{
if (skb->protocol != htons(ETH_P_IP))
return 0;
struct iphdr ip_Hdr;
bpf_probe_read(&ip_Hdr, sizeof(ip_Hdr), (void*)ip_hdr(skb));
if (ip_Hdr.protocol != IPPROTO_TCP)
return 0;
return 0;
}
Any help would be appreciated.
bcc will try to transform your dereferences of kernel pointers into calls to bpf_probe_read. You can see that happening by passing debug=4 to the BPF() call.
In your case, I suspect that you would need to include function skb_network_header in your code so that bcc is able to rewrite it.
If that's not sufficient, then you might need a manual call to bpf_probe_read to retrieve structure struct iphdr from skb_network_header's pointer.

Analog read from all 7 inputs using PRU and a host C program for beaglebone black

I'm kinda new to the beaglebone black world running on a AM335X Cortex A8 processor and I would like to use the PRU for fast analog read with the maximum sampling rate possible.
I would like to read all 7 inputs in a loop form like:
while( n*7 < sampling_rate){ //initial value for n = 0
read(AIN0); //and store it in shared memory(7*n + 0)
read(AIN1); //and store it in shared memory(7*n + 1)
read(AIN2); //and store it in shared memory(7*n + 2)
read(AIN3); //and store it in shared memory(7*n + 3)
read(AIN4); //and store it in shared memory(7*n + 4)
read(AIN5); //and store it in shared memory(7*n + 5)
read(AIN6); //and store it in shared memory(7*n + 6)
n++;
}
so that I can read them from a host program running on the main processor. Any idea how to do so? I tried using a ready code called ADCCollector.c from a package named AM335x_pru_package but I can't figure out how to get all the addresses and values of the registers used.
This is the code I was trying to modify (ADCCollector.p):
.origin 0 // offset of the start of the code in PRU memory
.entrypoint START // program entry point, used by debugger only
#include "ADCCollector.hp"
#define BUFF_SIZE 0x00000fa0 //Total buff size: 4kbyte(Each buffer has 2kbyte: 500 piece of data
#define HALF_SIZE BUFF_SIZE / 2
#define SAMPLING_RATE 1 //Sampling rate(16khz) //***//16000
#define DELAY_MICRO_SECONDS (1000000 / SAMPLING_RATE) //Delay by sampling rate
#define CLOCK 200000000 // PRU is always clocked at 200MHz
#define CLOCKS_PER_LOOP 2 // loop contains two instructions, one clock each
#define DELAYCOUNT DELAY_MICRO_SECONDS * CLOCK / CLOCKS_PER_LOOP / 1000 / 1000 * 3 //if sampling rate = 98000 --> = 3061.224
.macro DELAY
MOV r10, DELAYCOUNT
DELAY:
SUB r10, r10, 1
QBNE DELAY, r10, 0
.endm
.macro READADC
//Initialize buffer status (0: empty, 1: first buffer is ready, 2: second buffer is ready)
MOV r2, 0
SBCO r2, CONST_PRUSHAREDRAM, 0, 4
INITV:
MOV r5, 0 //Shared RAM address of ADC Saving position
MOV r6, BUFF_SIZE //Counting variable
READ:
//Read ADC from FIFO0DATA
MOV r2, 0x44E0D100
LBBO r3, r2, 0, 4
//Add address counting
ADD r5, r5, 4
//Write ADC to PRU Shared RAM
SBCO r3, CONST_PRUSHAREDRAM, r5, 4
DELAY
SUB r6, r6, 4
MOV r2, HALF_SIZE
QBEQ CHBUFFSTATUS1, r6, r2 //If first buffer is ready
QBEQ CHBUFFSTATUS2, r6, 0 //If second buffer is ready
QBA READ
//Change buffer status to 1
CHBUFFSTATUS1:
MOV r2, 1
SBCO r2, CONST_PRUSHAREDRAM, 0, 4
QBA READ
//Change buffer status to 2
CHBUFFSTATUS2:
MOV r2, 2
SBCO r2, CONST_PRUSHAREDRAM, 0, 4
QBA INITV
//Send event to host program
MOV r31.b0, PRU0_ARM_INTERRUPT+16
HALT
.endm
// Starting point
START:
// Enable OCP master port
LBCO r0, CONST_PRUCFG, 4, 4 //#define CONST_PRUCFG C4 taken from ADCCollector.hp
CLR r0, r0, 4
SBCO r0, CONST_PRUCFG, 4, 4
//C28 will point to 0x00012000 (PRU shared RAM)
MOV r0, 0x00000120
MOV r1, CTPPR_0
ST32 r0, r1
//Init ADC CTRL register
MOV r2, 0x44E0D040
MOV r3, 0x00000005
SBBO r3, r2, 0, 4
//Enable ADC STEPCONFIG 1
MOV r2, 0x44E0D054
MOV r3, 0x00000002
SBBO r3, r2, 0, 4
//Init ADC STEPCONFIG 1
MOV r2, 0x44E0D064
MOV r3, 0x00000001 //continuous mode
SBBO r3, r2, 0, 4
//Read ADC and FIFOCOUNT
READADC
Another question is: if I simply changed the #define Sampling_rate from 16000 to any other number below or equal to 200000 in the (.p) file, I will get that sampling rate? or should I change other things?
Thanks in advance.
I used the c wrappers from libpruio: http://www.freebasic.net/forum/viewtopic.php?f=14&t=22501
and then use this code to get all my ADC values:
#include "stdio.h"
#include "c_wrapper/pruio.h" // include header
#include "sys/time.h"
//! The main function.
int main(int argc, char **argv) {
struct timeval start, now;
long mtime, seconds, useconds;
gettimeofday(&start, NULL);
int i,x;
pruIo *io = pruio_new(PRUIO_DEF_ACTIVE, 0x98, 0, 1); //! create new driver structure
if (pruio_config(io, 1, 0x1FE, 0, 4)){ // upload (default) settings, start IO mode
printf("config failed (%s)\n", io->Errr);}
else {
do {
gettimeofday(&now, NULL);
seconds = now.tv_sec - start.tv_sec;
useconds = now.tv_usec - start.tv_usec;
mtime = ((seconds) * 1000 + useconds/1000.0) + 0.5;
printf("%lu",mtime);
for(i = 1; i < 9; i++) {
printf(",%d", io->Adc->Value[i]); //0-66504 for 0-1.8v
}
printf("\n");
x++;
}while (mtime < 100);
printf("count: %d \n", x);
pruio_destroy(io); /* destroy driver structure */
}
return 0;
}
In your example you use libpruio in IO mode (synchronous) and therefore you have no control over the sampling rate, since the host CPU doesn't work in real-time.
To get the maximum sampling rate (as mentioned in the OP) you have to use either RB or MM mode. In those modes libpruio buffers the samples in memory and the host can access them asynchronously. See example rb_file.c (or triggers.bas) in the libpruio package.

Maximum number of writable data containers

OS: Windows 8.1 64 Bit - fully updated
IDE: Visual Studio Professional 2013 - Version 12.0.30110.00 Update 1 - fully updated
I have a situation, where I get the following exception not during compile-, but run-time.
The number of writable data containers referenced in the entry function of the parallel_for_each call (17) exceeds the selected accelerator's limit (8).
The function where this happens looks like the following
void run_epoch(
accelerator_view mainAccelView,
ActivatorState activatorState,
TrainingState trainingState,
array_view<double, 2> avLayer1,
array_view<double, 2> avLayer2,
array_view<double, 2> avLayer3,
array_view<const double, 2> avPredictors,
array_view<const double, 2> avTargets,
array_view<double> avErrors,
int epoch
){
accelerator_view mainAccelView = accelerator::accelerator().create_view(queuing_mode::queuing_mode_immediate);
int noOfColumnsPredictors = AmpUtils::get_no_of_columns(avPredictors);
int noOfRowsPredictors = AmpUtils::get_no_of_rows(avPredictors, noOfColumnsPredictors);
int noOfColumnsLayer1 = AmpUtils::get_no_of_columns(avLayer1);
int noOfColumnsLayer2 = AmpUtils::get_no_of_columns(avLayer2);
int noOfColumnsLayer3 = AmpUtils::get_no_of_columns(avLayer3);
int noOfRowsLayer1 = AmpUtils::get_no_of_rows(avLayer1, noOfColumnsLayer1);
int noOfRowsLayer2 = AmpUtils::get_no_of_rows(avLayer2, noOfColumnsLayer2);
int noOfRowsLayer3 = AmpUtils::get_no_of_rows(avLayer3, noOfColumnsLayer3);
array_view<double, 2> avOutputLayer1(noOfRowsPredictors, noOfRowsLayer1);
array_view<double, 2> avOutputLayer2(noOfRowsPredictors, noOfRowsLayer2);
array_view<double, 2> avOutputLayer3(noOfRowsPredictors, noOfRowsLayer3);
array_view<double, 2> avErrorsLayer1(noOfRowsPredictors, noOfRowsLayer1);
array_view<double, 2> avErrorsLayer2(noOfRowsPredictors, noOfRowsLayer2);
array_view<double, 2> avErrorsLayer3(noOfRowsPredictors, noOfRowsLayer3);
array_view<double, 2> avThresholdLayer1(noOfRowsPredictors, noOfRowsLayer1);
array_view<double, 2> avThresholdLayer2(noOfRowsPredictors, noOfRowsLayer2);
array_view<double, 2> avThresholdLayer3(noOfRowsPredictors, noOfRowsLayer3);
array_view<double, 3> avWeightsLayer1(noOfRowsPredictors, noOfRowsLayer1, (noOfColumnsLayer1 - 1));
array_view<double, 3> avWeightsLayer2(noOfRowsPredictors, noOfRowsLayer2, (noOfColumnsLayer2 - 1));
array_view<double, 3> avWeightsLayer3(noOfRowsPredictors, noOfRowsLayer3, (noOfColumnsLayer3 - 1));
array_view<double, 2> avErrorsTempBuffer(noOfRowsPredictors, noOfRowsLayer3);
int errorTempBufferSize = avErrorsTempBuffer.extent.size();
array_view<double> avEpochErrors(noOfRowsPredictors);
try{
parallel_for_each(extent<1>(AmpUtils::get_no_of_rows(avPredictors)), [=](index<1> idx) restrict(cpu, amp){
int predictorRow = idx[0];
// step 1: compute
// step 11: compute layer 1
compute_layer(activatorState, avPredictors[predictorRow], avLayer1, avOutputLayer1, noOfColumnsLayer1, predictorRow);
// step 12: compute layer 2
compute_layer(activatorState, avPredictors[predictorRow], avLayer2, avOutputLayer2, noOfColumnsLayer2, predictorRow);
// step 13: compute layer 3
compute_layer(activatorState, avPredictors[predictorRow], avLayer3, avOutputLayer3, noOfColumnsLayer3, predictorRow);
// step 2: calculate_error
// step 21: calculate_error layer 3
for (int column = 0; column < noOfRowsLayer3; column++){
double neuronError = avTargets[predictorRow][column] - avOutputLayer3[predictorRow][column];
avErrorsTempBuffer[predictorRow][column] = neuronError * neuronError;
avErrorsLayer3[predictorRow][column] = neuronError * AmpActivator::derivative2(activatorState, avOutputLayer3[predictorRow][column]);
}
double errorSum = 0.0;
for (int column = 0; column < errorTempBufferSize; column++){
errorSum += avErrorsTempBuffer[predictorRow][column];
}
avEpochErrors[predictorRow] = errorSum;
// step 22: calculate_error layer 2
calculate_error_layer(activatorState, avErrorsLayer2[predictorRow], avErrorsLayer3, avLayer3, avOutputLayer2[predictorRow], noOfRowsLayer3, noOfRowsLayer3);
// step 23: calculate_error layer 1
calculate_error_layer(activatorState, avErrorsLayer1[predictorRow], avErrorsLayer2, avLayer2, avOutputLayer1[predictorRow], noOfRowsLayer2, noOfRowsLayer2);
// step 3: calculate_updates
// step 31: calculate_updates layer 1
calculate_updates_layer(trainingState, avErrorsLayer1[predictorRow], avPredictors[predictorRow], avThresholdLayer1[predictorRow], avWeightsLayer1[predictorRow], (noOfColumnsLayer1 - 1), noOfRowsLayer1);
// step 31: calculate_updates layer 2
calculate_updates_layer(trainingState, avErrorsLayer2[predictorRow], avPredictors[predictorRow], avThresholdLayer2[predictorRow], avWeightsLayer2[predictorRow], (noOfColumnsLayer2 - 1), noOfRowsLayer2);
// step 31: calculate_updates layer 3
calculate_updates_layer(trainingState, avErrorsLayer3[predictorRow], avPredictors[predictorRow], avThresholdLayer3[predictorRow], avWeightsLayer3[predictorRow], (noOfColumnsLayer3 - 1), noOfRowsLayer3);
// step 4: update_network
// step 41: update_network layer 1
update_layer(avLayer1, avWeightsLayer1[predictorRow], avThresholdLayer1[predictorRow], noOfColumnsLayer1, noOfRowsLayer1);
// step 42: update_network layer 2
update_layer(avLayer2, avWeightsLayer2[predictorRow], avThresholdLayer2[predictorRow], noOfColumnsLayer2, noOfRowsLayer2);
// step 43: update_network layer 3
update_layer(avLayer3, avWeightsLayer3[predictorRow], avThresholdLayer3[predictorRow], noOfColumnsLayer3, noOfRowsLayer3);
});
avEpochErrors.synchronize();
double epochErrorsSum = 0.0;
for (int i = 0; i < (int)avEpochErrors.extent.size(); i++){
epochErrorsSum += avEpochErrors[i];
}
avErrors[epoch] = epochErrorsSum;
}
catch (std::exception e){
std::wcout << "Exception Project::run_epoch: " << e.what() << std::endl;
}
}
According to this MSDN-post here and also here, the maximum number of writeable containers should have been increased to 64 since Windows 8.
My question is now, are there different types of writeable containers whereas I still only might use a maximum of 8 of a certain type?
Strictly speaking the limitation is on the number of UAVs. This is coupled to the DX version not Windows.
Limited number of writable array_view/array/texture/writeonly_texture_view objects allowed per kernel
C++ AMP supports a limited number of writable array_view/array/texture/writeonly_texture_view objects per kernel. Specifically, the total number of writable array_view + array + texture + writeonly_texture_view per kernel should not exceed 8 on DirectX 11 and 64 on DirectX11.1. The total number of allowed read-only array_view/array/texture objects per kernel is 128 and specifying the read-only restriction can help you avoid hitting the limit on maximum number of allowed writable array_view/array/texture/writeonly_texture_view objects per kernel.
From Parallel Programming in Native Code.
DX11.1 is supported on Win8 and has been back ported in some limited form to Win7. Looking at my machine it is running Windows 8.1 but seems to be using DX 11 not 11.1 drivers. DXDIAG.EXE will tell you what you are using. You need to make sure that your card supports DX11.1 and that you have the latest drivers installed.