how system verilog program module avoids timing issues ? - system-verilog

why exactly the program module concept came into picture ? I read in one book that it is to avoid the timing violations. How ?
Any suggestions or help is highly appreciated.
Thank You
Sam

Normally, a question like this is considered to broad and opinionated for SO. But since I was directly involved in the development and standardization of SystemVerilog, I can present a few facts from an article I wrote about it.
Program blocks came directly from a donation of the Vera language to SystemVerilog by Synopsys , and try to mimic the scheduling semantics that a PLI application has interacting with a Verilog simulator.
A program block's original purpose in SystemVerilog was to avoid race conditions (not timing violations) between sampling and driving signals between the DUT and the Testbench. It also controlled starting and termination of the "test".
Since its introduction, a number of other features within SystemVerilog have subsumed the need for program blocks as I explain in my article.

Related

What does the gate-level circuitry of a Program Counter in a processor look like? Or the time step counter?

Computer architecture seems to be a very difficult area of computer engineering that I think needs further material. There are great conceptual explanations of how different units work with each other, particularly concerning registers in an MOS 6502 processor.
Here is a basic program counter:
https://www.clear.rice.edu/elec422/1996/bomb/finalmw.html
https://www.clear.rice.edu/elec422/1996/bomb/IMG00003.GIF
I've been perusing through a variety of textbooks particularly the following on Computer Systems:
David Patterson, John Hennessy: Computer Organization and Design, revised 4th edition, Morgan Kaufman, 2011. [Buy]
Randal Bryant, David O'Hallaron: Computer Systems, Prentice Hall, 2011. [Buy]
But I do not find any gate level circuitry drawings of program counters and other registers inside a processor.
It would be awesome for personal enrichment if anyone knows where I can find these schematics, it would be interesting to see what these look like in basic AND, OR, XOR, etc gates!
EDIT: I'm not particularly looking for a book, or resource, but rather ideally if anyone in our community has experience with drawing circuit diagrams for something like a program counter. I would be interested in seeing what they look like.
Here's a counter using logic gates http://www.northdownfarm.co.uk/rory/tim/logic.htm
Learn more: https://www.youtube.com/watch?v=ZiAbLltaz4A
Bunch of t flip flops https://en.wikipedia.org/wiki/Flip-flop_(electronics)

System Verilog simulation versus execution

Much ado is made about SystemVerilog (SV) being used for both programming chips and simulating SV code. This economy of language constructs has caused a bit of confusion for me: Section 9.2.2 of the SV Reference states
"There are four forms of always procedures: always, always_comb, always_latch, and always_ff. All forms of always procedures repeat continuously throughout the duration of the simulation."
Certainly, though, these constructs also specify the creation of combinatorial and latched logic. So is the SV standard aimed mainly at simulation, leaving it up to the chip OEMs to advise customers which SV constructs will result in actual hardware, as Altera has done here?
Altera makes CPLDs and FPGAs, some of which are not too expensive (hence my drive to learn SV). That subset of SV constructs blessed by Altera as synthesisable would compile in Quartus into a form suitable for downloading to a chip. Altera labels other constructs, such as many assertions (section 16 of the above reference), as "Supported. Ignored for synthesis." with concurrent assertions as an example.
So my conclusion, pending new information gained here, is that I may use, for instance, concurrent assertions for a test bench module only, but immediate assertions can be used anywhere.
Basically I am trying to get a picture of how SV works, and how I may best interpret the SV standard, quoted above. Thanks.
The Verilog languages are quite low level so when designing hardware for FPGA or ASIC we have combinatorial logic and sequential logic. Assertions in any tools are really for verification, the concept is to high level to be able to get the hardware you want.
SystemVerilog is not just for simulation, but using the correct subset for design will allow RTL and a post synthesis gates file to match in simulation. The way you write SystemVerilog design will determine what the synthesis tools generate. Flip-flops and latches will only be created if you have implied them. Different tools may optimise the combinatorial sections differently but if written using best practices then they should all be functionally equivalent.
Verilog in a day gives a guide on design. The SystemVerilog LRM does not split the spec between synthesisable components and verification but the unofficial guide to synthesising SystemVerilog is a good guide.
To the part of the question regarding usage of the different always blocks.
From Verilog we have:
always #* // For combinatorial logic
always #(posedge clk) // For flip-flops (sequential) Logic
Implying a latch involved an incomplete if/else branch and was quite difficult tell if it was a accident or actually intended.
//Latch from bug or actually intended?
always #* begin
if (enable) begin
//..
end
end
System verilog has kept the simple always for backwards compatibility with verilog code but added three types so the designer can be explicit in there design intent.
always_comb //For Combinatorial logic
always_latch //For implying latches
always_ff //For implying flip-flops (sequential logic)
always_comb has stricter rules than always #* for triggering in simulation to further minimise RTL to Gate level simulation mismatch.

From where is the code for dealing with critical section originated?

While learning the subject of operating systems, Critical Section is a topic which I've come across. To solve this problem, certain methods are provided like semaphores, certain software solutions, etc...etc..etc. But I've a question that from where is the code for implementing these solutions originated? As programmers never are found writing such codes for their program. Suppose I write a simple program executing printf in 'C', I never write any code for critical section problem. And the code is converted into low level instructions and is executed by OS, which behaves as our obedient servant. So, where does code dealing with critical section originate and fit in? Let resources like frame buffer be the critical section.
The OS kernel supplies such inter-thread comms synchronization mechanisms, mutex, semaphore, event, critical section, conditional variables etc. It has to because the kernel needs to block threads that cannot proceed. Many languages provide convenient wrappers around such calls.
Your app accesses them, directly or indirectly, via system calls, ie intrrupts that enter kernel state and ask for such services.
In some cases, a short-term user-space spinlock may get plastered on top, but such code should defer to a system call if the spinner is not quickly satisfied.
In the case of C printf, the relevant library, (stdio usually), will make the calls to lock/unlock the I/O stream, (assuming you have linked in a multithreaded version of the library).

In Simulink, are Goto and From blocks generally considered bad style?

I was working on a Simulink model recently and was using Goto and From blocks to keep a very busy system from becoming a twisted mess of wires. I was informed that I was not to use Goto and From blocks as they are considered bad style (at least, according to my employer).
While I hold that wires should be kept connected whenever possible, I believe that Goto and From blocks can significantly improve the readability of a system/subsystem if the model would result in lots of crossed wires otherwise; especially if the blocks can be color-coded (e.g. purple Goto block goes to all the purple From blocks).
I'd supply an image of the subsystem I'm working with, but I'm not sure I can put it on here. The subsystem itself has about 12 subsystem blocks (and possibly more later) within it, each with two bus-type outputs. The first output of each subsystem goes to a Bus Creator block, and the second output of each goes to a second Bus Creator block. Since the subsystem are aligned vertically and the Bus Creators are to the right, this results in many crossed wires. I was using Goto and From blocks to clean up the system.
I can supply an image of a smaller, but similar model that I put together for this question.
For a system with on the order of 12 subsystems, this becomes very busy. I was using Goto and From blocks to connect the subsystems and the Bus Creators without a plethora of crossed wires.
I believe my employer may be carrying the stigma of using goto statements from text-based languages and applying it to Goto/From blocks in Simulink. Generally speaking, is using Goto and From blocks in this way (or any way) considered to be bad style?
The Mathworks Automotive Advisory Board has published some modeling guidelines (PDF) that include usage of Goto/From. The rules they list are:
Do not have subsystems that are floating, i.e. all inputs / output ports are connected via Gotos. One of the great things about Simulink is the ability to determine signal flow with only a cursory visual inspection, do not destroy this by linking everything with Gotos. At least have one feed-forward and one feedback loop between subsystems connected by signal lines.
My personal opinion on feedback signals is that they should all be connected with signal lines, but I'm sure you can come up with cases where drawing all of them clutters the model.
The second guideline is about the scope of the Goto tag; keep the visibility local as much as possible.
I feel setting visibility to scoped is acceptable also as long as you're not using the matching From more than a couple of levels downstream from the Goto. I've yet to come across a legitimate need for a global Goto tag.
So, all Goto usage isn't bad, and you're right that it can improve readability in some cases. That being said, I don't think Gotos are justified for the picture above. I realize it is just an example, but I should point out that if the buses being created are virtual that order of the inputs at the creator doesn't matter, and rearranging Bus Create and Mux block inputs can work wonders for readability.
The problem with the guidelines above are that there's room for bending them, and developers on your team might do just that. Even if everyone is diligent about following them at first, you may run afoul of these guidelines one day, a long time from now, when you redraw that section of the model for refining / adding functionality. Rearranging inputs and outputs can be especially irritating in middle of implementing some cool new feature. That may be the reason your employer chose to impose a blanket ban. It is inconvenient in some cases, but is easier to enforce.

Speed improvements for Perl's chameneos-redux in the Computer Language Benchmarks Game

Ever looked at the Computer Language Benchmarks Game (formerly known as the Great Language Shootout)?
Perl has some pretty healthy competition there at the moment. It also occurs to me that there's probably some places that Perl's scores could be improved. The biggest one is in the chameneos-redux script right now—the Perl version runs the worst out of any language: 1,626 times slower than the C baseline solution!
There are some restrictions on how the programs can be made and optimized, and there is Perl's interpreted runtime penalty, but 1,626 times? There's got to be something that can get the runtime of this program way down.
Taking a look at the source code and the challenge, how can the speed be improved?
I ran the source code through the Devel::SmallProf profiler. The profile output is a little too verbose to post here, but you can see the results yourself using $ perl -d:SmallProf chameneos.pl 10000 (no need to run it for 6000000 meetings unless you really want to!) See perlperf for more details on some profiling tools in Perl.
It turns out that using semaphores is the major bottleneck. The lion's share of total CPU time is spent on checking whether a semaphore is locked or not. Although I haven't had enough time to look at why the source code uses semaphores, it may be that you can work around having to use semaphores altogether. That's probably your best shot at improving the code's performance.
As Zaid posted, Thread::Semaphore is rather slow. One optimization could be to use the implicit locks on shared variables instead of them. It should be faster, though I suspect it won't be faster by much.
In general, Perl's threading implementation sucks for any kind of usage that requires a lot of interthread communication. It's very suitable for tasks with little communication (as unlike CPython's threads and CRuby's threads they are actually preemptive).
It may be possible to improve that situation, we need better primitives.
I have a version based on another version from Jesse Millikian, which I think was never published.
I think it may run ~ 7x faster than the current entry, and uses standard modules all around. I'm not sure if it actually complies with all the rules though.
I've tried the forks module on it, but I think it slows it down a bit.
Anyone tried s/threads/forks/ on the Perl entry? Or Coro / Coro::MP, though the latter would probably trigger the 'interesting alternative implementations' clause.