Viewing P4 packet counter - counter

I'm new to P4, and I'm trying to understand how packet counters are implemented.
In the documentation, it is quite clear how counters are defined and then incremented. But I cannot see how one would be able to view the counter using the control plane.
I'm using Mininet and the BMV2 software switch - how would I be able to view my counter values?


stm32 NVIC_EnableIRQ() bare metal equivalent?

I'm using the blue pill, and trying to figure out interrupts. I have an interrupt handler:
void __attribute__ ((interrupt ("TIM4_IRQHandler"))) myhandler()
TIM4->EGR |= TIM_EGR_UG; // send an update even to reset timer and apply settings
TIM4->SR &= ~0x01; // clear UIF
TIM4->DIER |= 0x01; // UIE
I set up the timer:
TIM4->EGR |= TIM_EGR_UG; // send an update even to reset timer and apply settings
TIM4->DIER |= 0x01; // UIE enable interrupt
My timer doesn't seem to activate. I don't think I've actually enabled it though. Have I??
I see in lots of example code commands like:
What is actually going in NVIC_EnableIRQ()?
I've googled around, but I can't find actual bare-metal code that's doing something similar to mine.
I seem to be missing a crucial step.
Update 2020-09-23 Thanks to the respondents to this question. The trick is to set the bit for the interrupt number in an NVIC_ISER register. As I pointed out below, this doesn't seem to be mentioned in the STM32F101xx reference manual, so I probably would never have been able to figure this out on my own; not that I have any real skill in reading datasheets.
Anyway, oh joy, I managed to get interrupts working! You can see the code here:
Even if you go bare metal, you might still want to use the CMSIS header files that provide declarations and inline version of very basic ARM Cortex elements such NVIC_EnableIRQ.
You can find NVIC_EnableIRQ at
It's defined as:
#define NVIC_EnableIRQ __NVIC_EnableIRQ
if ((int32_t)(IRQn) >= 0)
NVIC->ISER[(((uint32_t)IRQn) >> 5UL)] = (uint32_t)(1UL << (((uint32_t)IRQn) & 0x1FUL));
If you want to, you can ignore __COMPILER_BARRIER(). Previous versions didn't use it.
The definition is applicable to Cortex M-3 chips. It's different for other Cortex versions.
With the libraries is still considered bare metal. Without operating system, but anyway, good that you have a desire to learn at this level. Someone has to write the libraries for others.
I was going to do a full example here, (it really takes very little code to do this), but will take from my code for this board that uses timer1.
You obviously need the ARM documentation (technical reference manual for the cortex-m3 and the architectural reference manual for armv7-m) and the data sheet and reference manual for this st part (no need for programmers manual from either company).
You have provided next to no information related to making the part work. You should never dive right into a interrupt, they are advanced topics and you should poll your way as far as possible before finally enabling the interrupt into the core.
I prefer to get a uart working then use that to watch the timer registers when the roll over, count, etc. Then see/confirm the status register fired, learn/confirm how to clear it (sometimes it is just a clear on read).
Then enable it into the NVIC and by polling see the NVIC sees it, and that you can clear it.
You didn't show your vector table this is key to getting your interrupt handler working. Much less the core booting.
08000000 <_start>:
8000000: 20005000
8000004: 080000b9
8000008: 080000bf
800000c: 080000bf
80000a0: 080000bf
80000a4: 080000d1
80000a8: 080000bf
080000b8 <reset>:
80000b8: f000 f818 bl 80000ec <notmain>
80000bc: e7ff b.n 80000be <hang>
080000be <hang>:
80000be: e7fe b.n 80000be <hang>
080000d0 <tim1_handler>:
The first word loads the stack pointer, the rest are vectors, the address to the handler orred with one (I'll let you look that up).
In this case the st reference manual shows that interrupt 25 is TIM1_UP at address 0x000000A4. Which mirrors to 0x080000A4, and that is where the handler is in my binary, if yours is not then two things, one you can use VTOR to find an aligned space, sometimes sram or some other flash space that you build for this and point there, but your vector table handler must have the proper pointer or your interrupt handler won't run.
volatile unsigned int counter;
void tim1_handler ( void )
volatile isn't necessarily the right way to share a variable between interrupt handler and foreground task, it happens to work for me with this compiler/code, you can do the research and even better, examine the compiler output (disassemble the binary) to confirm this isn't a problem.
ra|=1<<11; //TIM1
A minimal init and runtime for tim1.
Notice that the NVIC_ISER0 is bit 25 that is set to enable interrupt 25 through.
Well before trying this code, I polled the timer status register to see how it works, compare with docs, clear the interrupt per the docs. Then with that knowledge confirmed with the NVIC_ICPR0,1,2 registers that it was interrupt 25. As well as there being no other gates between the peripheral and the NVIC as some chips from some vendors may have.
Then released it through to the core with NVIC_ISER0.
If you don't take these baby steps and perhaps you have already, it only makes the task much worse and take longer (yes, sometimes you get lucky).
TIM4 looks to be interrupt 30, offset/address 0x000000B8, in the vector table. NVIC_ISER0 (0xE000E100) covers the first 32 interrupts so 30 would be in that register. If you disassemble the code you are generating with the library then we can see what is going on, and or look it up in the library source code (as someone already did for you).
And then of course your timer 4 code needs to properly init the timer and cause the interrupt to fire, which I didn't check.
There are examples, you need to just keep looking.
The minimum is
vector in the table
set the bit in the interrupt set enable register
enable the interrupt to leave the peripheral
fire the interrupt
Not necessarily in that order.

SPI bit banging; MCP3208; Raspberry;error

I am using Raspberry Pi 2 board with raspbian loaded. need to do SPI by bit banging & interface MCP3208.
I have taken code from Github. It is written for MCp3008(10 bit adc).
Only change I made in code is that instead of calling:
adcValue = recvBits(12, clkPin, misoPin)
I called adcValue = recvBits(14, clkPin, misoPin) since have to receive 14 bits of data.
Problem: It keeps on sending random data ranging from 0-10700. Even though data should be max 4095. It means I am not reading data correctly.
I think the problem is that MCP3208 has max freq = 2Mhz, but in code there is no delay between two consecutive data read or write. I think I need to add some delay of 0.5us whenever I need to transition clock since I am operating at 1Mhz.
For a small delay I am currently reading Accurate Delays on the Raspberry Pi
...when we need accurate short delays in the order of microseconds, it’s
not always the best way, so to combat this, after studying the BCM2835
ARM Peripherals manual and chatting to others, I’ve come up with a
hybrid solution for wiringPi. What I do now is for delays of under
100μS I use the hardware timer (which appears to be otherwise unused),
and poll it in a busy-loop, but for delays of 100μS or more, then I
resort to the standard nanosleep(2) call.
I finally found some py code to simplify reading from the 3208 thanks to RaresPlescan.
I had a data logger build on the pi, that was using a 3008. The COTS data logger I was trying to replicate had better resolution, so I started looking for a 12 bit and found the 3208. I literally swapped the 3008 out for the 3208 and with this guys code I have achieved better resolution than the COTS data logger.

problems with implementation of 0000-9999 counter on fpga(seven segment)

okay i couldnt post a long comment(i am new to the website so please accept my apologies) so i am editing my earlier question. I have tried to implement multiplexing in 2 attempts:
-2nd attempt
-3rd attempt
in 2nd attempt i have tried to send the seven seg variables of each module to the module which is just one step ahead of it, and when they all reach the final top module i have multiplexed them...there is also a clock module which generates a clock for the units module(which makes units place change 2 times in a second) and a clock for multiplexing(multiplexing between each displays 500 times per second)...ofcourse i read that my board has a clock freq of 50M hertz, so these calculations for clocks are based on that figure...
in the 3rd comment i have done the same thing, in one single module. see the 2nd attempt first and then the 3rd one.
both give errors right after synthesis and lots of unfamiliar warnings.
I have been able to synthesize and implement the program in attempt4(which i am not allowed to post since my reputation is low), using the save flag for variables, variables1 variables2 and variables3(which were giving warning of unused pins) but the program doesnt run on simply shows the number 3777. also there are still warnings of "combinatorial loops" for some things that are related to some variables( i am sorry i am new to all this verilog thing) but you can see all of them in attempt 3 as well.
You can not implement counters with loops. Neither can you implement cascaded counters with nested loops.
Writing HDL is not writing software! Please read a book or tutorial on VHDL or Verilog on how to design basic hardware circuits. There is also the Synthesis and Simulation Guide 14.4 - UG626 from Xilinx. Have a look at page 88.
Now it's possible to access your zip file without any dropbox credentials and I have looked into your project. Here are my comments on your code.
I'll number my bullets for better reference:
Your project has 4 mostly identical ucf files. The difference is only in assigning different anode control signals to the same pin location. This will cause errors in post synthesis steps (assign multiple nets to one pin). Normally, simple projects have only one ucf file.
The Nexsys 2 board has a 4 digit 7-segment display with common cathodes and switchable common anodes. In total these are 8+4 wires to control. A time multiplexing circuit is needed to switch at 25Hz < f < 1kHz through every digit of your 4-digit output vector.
Choosing a nested hierarchy is not so good. One major drawback is the passing of many signals from every level to the topmost level for connecting them to the FPGA pins. I would suggest a top-level module and 4 counters on level one. The top-level module can also provide the time-multiplexing circuit and the binary to 7-seg encoding.

Make signal names coming from library links unique?

OK, I've been struggling with this for a while. What is the best way to accomplish the following:
where Reaction Wheel 1-4 are links to the same block in a library. When the Speed Counter, Speed Direction and Current signals are added to the final bus output as shown, MATLAB (rightfully) complains:
Warning: Signals 9, 10, 11, 12 entering Bus Creator
'myAwesomeModel' have duplicated names 'Current'. These are being made unique
by appending "(signal #)" to the signals within the resulting bus. Please
update the labels of the signals such that they are all unique.
Until now I've been using a "solution" like this:
that is, place a size-1-mux/gain-of-1/other-dummy block in the middle, so the signals can be renamed into something unique. However, I really like to believe that The MathWorks has thought of a better way to do this...
What is the "proper" way to construct bus signals like this? It feels rather like I'm being pushed to adopt a particular design/architecture, but what that is precisely, eludes me for the moment...
It was quite a challenge for me but looks like I kinda sorted it out. Matlab R2007a here. I'll do the example with an already done subsystem, with its inputs, outputs, ...
1- In Block Properties, add a tag to the block. This will be done to identify the block and its "siblings" among the system. MY_SUBSYSTEM for this example.
2- Block Properties again. Add the following snippet in CopyFcn callback:
%Find total amount of copies of the block in system
len = length(find_system(gcs,'Tag','MY_SUBSYSTEM'));
%Get handle of the block copied/added and name the desired signal accordingly
v = get_param(gcb,'PortHandles');
set(v.Outport(_INDEX_OF_PORT_TO_BE_RENAMED_),'SignalNameFromLabel',['BASENAME_HERE' num2str(len)]);
3- In _INDEX_OF_PORT_TO_BE_RENAMED_ you should put the port signal index (starting from 1) that you want to have renamed for each copy of the block. For a single output block this should be 1. BASENAME_HERE should be the port basename, in this case "Current" for you.
4- Add the block to the desired library, and delete the instance you used to create this example. From there on, as you add from the library or copy an existing block, the outport should name Current1, Current2, Current3, and so on. Notice that you could apply any convention or formatting.
Hope this helps. It worked for me, don't hesitate to ask/criticize!
Note: Obviously, as the model grows, this method may be computer-demanding as find_system will have to loop through the entire model, however looks like a good workaround for me in small-medium sized systems.
Connect a Bus Selector to each Data Output. Select the signals you want and set "Output as bus". Then connect all Bus Selectors to a Bus Creator.
simulink model

VHDL simulation in real time?

I've written some code that has an RTC component in it. It's a bit difficult to do proper emulation of the code because the clock speed is set to 50MHz so to see any 'real time' events take place would take forever. I did try to do simulation for 2 seconds in modelsim but it ended up crashing.
What would be a better way to do it if I don't have an evaluation board to burn and test using scope?
If you could provide a little more specific example of exactly what you're trying to test and what is chewing up your simulation cycles that would be helpful.
In general, if you have a lot of code that you need to test in simulation, it's helpful if you can create testbenches of the sub-modules and test them first. Often, if you simulate at the top (chip) level and try to stimulate sub-modules that are buried deep in the hierarchy of a design, it takes many clock ticks just to get data into and out of the sub-module. If you simulate the sub-module directly you have direct access to the modules I/O and can test the things you want to test in that module in fewer cycles than if you try to get to it from the top level.
If you are trying to test logic that has very deep fifos that you are trying to fill or a specific count of a large counter you're trying to hit, you can either add logic to your code to help create those conditions in fewer cycles (like a load instruction on the counter) or you can force the values of internal signals of your design from the testbench itself.
These are just a couple of general ideas. Again, if you provide more detail about what it is you're simulating there are probably people on this forum that can provide help that is more specific to your problem.
As already mentioned by Ciano, if you provided more information about your design we would be able to give more accurate answer. However, there are several tips that hardware designers should follow, specially for complex system simulation. Some of them (that I mostly use) are listed below:
Hierarchical simulation (as Ciano, already posted): instead of simulating the entire system, try to simulate smaller set of modules.
Selective configuration: most systems require some initialization processes such as reset initialization time, external chips register initialization, etc... Usually for simulation purposes a few of them are not require and you may use a global constant to jump these stages when simulating, like:
-- in reset condition:
currentState <= state_executeSystem; -- jump the initialization procedures
currentState <= state_initializeSystem;
end if;
Be careful, do not modify your code directly (hard coded). As the system increases, it becomes impossible to remember which parts of it you modified to simulate. Use constants instead, as the above example, to configure modules to simulation profile.
Scaled time/size constants: instead of using (everytime) the real values for time and sizes (such as time event, memory sizes, register file size, etc) use scaled values whenever possible. For example, if you are building a RTC that generates an interrupt to the main system every 60 seconds - scale your constants (if possible) to generate interrupts to about (6ms, 60us). Of course, the scale choice depends on your system. In my designs, I use two global configuration files. One of them I use for simulation and the other for synthesis. Most constant values are scaled down to enable lower simulation time.
Increase the abstraction: for bigger modules it might be useful to create a simplified and more abstract module, acting as a model of your module. For example, if you have a processor that has this RTC (you mentioned) as a peripheral, you may create a simplified module of this RTC. Pretending that you only need the its interrupt you may create a simplified model such as:
constant INTERRUPT_EVENTS array(1 to 2) of time := (
32 ns,
100 ms
for i in 1 to INTERRUPT_EVENTS'length loop
rtcInterrupt <= '0';
rtcInterrupt <= '1';
wait for clk = '1' and clk'event
end for
end process;