trying to know more about verilog language, vhdl,and assembly language - system-verilog

I would like to know what is the difference between verilog and assembly language.
Next semester we will be working with micro-controllers, but I would like to learn a little bit about it before the semester begins. I've been doing a lot of research about low-level programming, and so far I have gained a good understanding in assembly language, but I get confused trying to understand Verilog and VHDL?

Verilog and VHDL are completely different languages for describing hardware, for purposes of programming FPGAs.
FPGAs are devices that can be on-the-fly programmed to implement any sort of digital logic (and sometimes analog too).
So using verilog or VHDL, I can design a circuit that creates a couple latches, some twos-complement adders, a mux, and a clock source, and suddenly you've just designed a circuit that can calculate. You could then take the output from the VHDL compiler (or whatever its called), "download" it to the FPGA, and now you actually have some hardware that can be used to do calculation.
Of course, you can use FPGAs to implement all sorts of complicated stuff - even a full custom CPU. One uses verilog and VHDL to design the circuits that are programmed to FPGAs. Those circuits could implement something simple like a ripple counter, or something more complex like a LCD driver, or something even more complex like a USB transceiver. You can go from as simple as a few latches to as complicated as a fully operating CPU; as long as its digital hardware, you can make whatever you want with VHDL and some FPGAs.
To clarify further -
"Assembly language" typically refers to raw instructions given to some sort of CPU. Of course, there are many different types of CPUs (x86, ARM, SPARC, MIPS) and further many different variants of those types of CPUs. Each CPU has its own instruction set.
Machine code is complete, fully specified, ready to be executed instructions. Assembly languages allow you type instructions from your CPU's instruction set in plain text, use labels and such, and describe the memory layout structure of the program. Put the assembly through an assembler and out comes machine code in your CPUs machine instruction set.
You could design your own CPU from scratch using VHDL. As you're designing the CPU, you would have it implement your own custom instruction set. From there, you could take the VHDL for your CPU, compile it, write it to an FPGA and have your own custom CPU. Then you could start writing programs for your made-up CPU using your custom instruction set by writing a custom assembler. Some friends of mine in college did this for giggles.
For example, you know how most CPUs are load-store, register based CPUs? Instructions tend to go something like this:
Load the value '1' into register A
Load the value '2' into register B
Add register A and register B, storing result in register A
(You just added 1 + 2! Heh)
That sort of model of computation happens to be the most popular, but it's not the only way you could do computation. What if you had a stack based CPU, where you push values onto a hardware stack, and then computations work with the values on the top of the stack, pushing results back onto the stack.
For instance:
Push 1 onto the stack (stack current contains: 1)
Push 2 onto the stack (stack current contains: 2 1)
Push 3 onto the stack (stack currently contains: 3 2 1 )
Add
'Add' takes the top two elements on the stack, adds them together, and pushes the result on the top of the stack.
Stack now contains: 5 1
Add
Stack now contains: 6
Neat isn't it? As far as a computation model goes, it has its advantages - operands tend to be short, and need fewer bits. Smaller instructions means that the CPU can be faster.
The problem is that no such processor like this exists anymore.
But if you knew what you were doing, you could design one in VHDL, program it to an FGPA, and suddenly you have one of the only operating stack-based processors in existence.
Say, if you were doing a masters thesis, for instance, you might dig around and find out that virtual-machine-based programming languages like C# and Java compile down to a bytecode for a CPU that doesn't really exist, but the model for that CPU proves useful for making code portable. You might find out that the imaginary machines used by these languages are based on stack-based processor models. If you were looking for something interesting to do, perhaps you write in VHDL a processor that natively implements the Java bytecode language. Now you'd be the only person that has a computer that can directly run Java.

Verilog and VHDL are both HDLs (Hardware description languages) used mainly for describing digital electronics. Their targets may be FPGA or ASIC (custom silicon).
Assembly level on the other hand is using an processors instruction set to perform a series of calculations. Every thing executed on a computer eventually ends up as an assembly level instruction. One example of an instruction set would be the x86 ISA.
Summary: Verilog, VHDL describe hardware. Assembly is the low level program being executed on a processor.

Related

Specman beginner's questions

I am new to Specman.
I have a couple of questions:
I am trying to use the agent methodology. After writing the env,agent,bfm etc - what is the recommended way to create clock and reset? by writing a tb.v (calling the top verilog module) or is there a better way?
How do I link the specman env file to the tb (or maybe its just enough to link the ports of the different specman files with a signals_map to the verilog files?
Most important how do I run the environment with irun?
I was thinking of creating a file listing all the verilog files, e.g. - veri.lst
the specman top shall import all the specman files, e.g - spec_top.e
irun -access +wrc veri.lst spec_top.e
should be ok?
should I mention the top level module in the command?
Should I put the test name in a special way in the command?
Thanks alot for all the help!!
Cadence recommends driving clocks from inside an HDL testbench (i.e. written in Verilog in your case). This is because every time the simulator yields control to Specman to execute it wastes processor time. You want to minimize the number of switches as much as possible.
Linking the env to the TB is done by connecting the Verilog signals of interest to the corresponding Specman ports (using hdl_path()).
W.r.t. running it, there are 2 things to keep in mind. e code can be executed in compiled or in interpreted mode. Also, compiled code is faster, but can't be debugged. You have to tell irun what you want compiled and what you want interpreted:
irun -f veri.lst \
compiled_top.e \
-snload interpreted_top.e
What you typically compile are files which you don't expect to change (verification components that you buy or reuse from other projects, for example). The rest of your files you'd load interpreted to be able to easily debug.
Adding to Tudor's great answer -
First - yes, connecting The e TB to the DUT is done using hdl_path(), and connecting the ports to external. You usually would have one unit designated for the interface, so configuring it would look something like this:
extend signal_map {
// name of the instance of the verilog module you interface
keep hdl_path() == "sub_system_a";
keep bind (sig_clock, external);
// name of the clock signal
keep sig_clock.hdl_path == "clk";
};
Please take a look in the IES release, at the UVM Examples.
They are in
specman/uvm/uvm_examples
For example, check out the specman/uvm/uvm_examples/xserial/e/xserial_collector_h.e:
And about the clock -
Connecting a clock in the e TB to the design is very simple. Something like this -
unit synch {
sig_clock : in simple_port of bit is instance;
keep bind(sig_clock, external);
event clock is rise(sig_clock$) #sim;
// can define also on fall or change
};
Now the clock event can be used as sampling event for TCMs and Temporals. This is a simple fast way for using the clock in the TB.
Another way to use the clock, is more "acceleration ready". In this methodology, you would implement a clock agent in verilog, and it will provide "clock services" to the TB. According to this methodology, the TB will not have any "wait cycles" in it. instead - it will call the Clock Agent task "wait_cycles()" - and wait for indication that required number of clock cycles passed.
This is a rather new methodology, oriented to be Acceleration Ready.
It will be demonstrated in the UVM Examples in next IES release, 15.1.
/efrat

From where is the code for dealing with critical section originated?

While learning the subject of operating systems, Critical Section is a topic which I've come across. To solve this problem, certain methods are provided like semaphores, certain software solutions, etc...etc..etc. But I've a question that from where is the code for implementing these solutions originated? As programmers never are found writing such codes for their program. Suppose I write a simple program executing printf in 'C', I never write any code for critical section problem. And the code is converted into low level instructions and is executed by OS, which behaves as our obedient servant. So, where does code dealing with critical section originate and fit in? Let resources like frame buffer be the critical section.
The OS kernel supplies such inter-thread comms synchronization mechanisms, mutex, semaphore, event, critical section, conditional variables etc. It has to because the kernel needs to block threads that cannot proceed. Many languages provide convenient wrappers around such calls.
Your app accesses them, directly or indirectly, via system calls, ie intrrupts that enter kernel state and ask for such services.
In some cases, a short-term user-space spinlock may get plastered on top, but such code should defer to a system call if the spinner is not quickly satisfied.
In the case of C printf, the relevant library, (stdio usually), will make the calls to lock/unlock the I/O stream, (assuming you have linked in a multithreaded version of the library).

In Simulink, are Goto and From blocks generally considered bad style?

I was working on a Simulink model recently and was using Goto and From blocks to keep a very busy system from becoming a twisted mess of wires. I was informed that I was not to use Goto and From blocks as they are considered bad style (at least, according to my employer).
While I hold that wires should be kept connected whenever possible, I believe that Goto and From blocks can significantly improve the readability of a system/subsystem if the model would result in lots of crossed wires otherwise; especially if the blocks can be color-coded (e.g. purple Goto block goes to all the purple From blocks).
I'd supply an image of the subsystem I'm working with, but I'm not sure I can put it on here. The subsystem itself has about 12 subsystem blocks (and possibly more later) within it, each with two bus-type outputs. The first output of each subsystem goes to a Bus Creator block, and the second output of each goes to a second Bus Creator block. Since the subsystem are aligned vertically and the Bus Creators are to the right, this results in many crossed wires. I was using Goto and From blocks to clean up the system.
I can supply an image of a smaller, but similar model that I put together for this question.
For a system with on the order of 12 subsystems, this becomes very busy. I was using Goto and From blocks to connect the subsystems and the Bus Creators without a plethora of crossed wires.
I believe my employer may be carrying the stigma of using goto statements from text-based languages and applying it to Goto/From blocks in Simulink. Generally speaking, is using Goto and From blocks in this way (or any way) considered to be bad style?
The Mathworks Automotive Advisory Board has published some modeling guidelines (PDF) that include usage of Goto/From. The rules they list are:
Do not have subsystems that are floating, i.e. all inputs / output ports are connected via Gotos. One of the great things about Simulink is the ability to determine signal flow with only a cursory visual inspection, do not destroy this by linking everything with Gotos. At least have one feed-forward and one feedback loop between subsystems connected by signal lines.
My personal opinion on feedback signals is that they should all be connected with signal lines, but I'm sure you can come up with cases where drawing all of them clutters the model.
The second guideline is about the scope of the Goto tag; keep the visibility local as much as possible.
I feel setting visibility to scoped is acceptable also as long as you're not using the matching From more than a couple of levels downstream from the Goto. I've yet to come across a legitimate need for a global Goto tag.
So, all Goto usage isn't bad, and you're right that it can improve readability in some cases. That being said, I don't think Gotos are justified for the picture above. I realize it is just an example, but I should point out that if the buses being created are virtual that order of the inputs at the creator doesn't matter, and rearranging Bus Create and Mux block inputs can work wonders for readability.
The problem with the guidelines above are that there's room for bending them, and developers on your team might do just that. Even if everyone is diligent about following them at first, you may run afoul of these guidelines one day, a long time from now, when you redraw that section of the model for refining / adding functionality. Rearranging inputs and outputs can be especially irritating in middle of implementing some cool new feature. That may be the reason your employer chose to impose a blanket ban. It is inconvenient in some cases, but is easier to enforce.

VHDL Bus Functional Modelling - Can't put groups of procedures into a package to clean up the code

I want to organize a working bus functional model and push commonly used procedures (which look like CPU subroutines) out into a package and get them out of the main cpu model, but I'm stuck.
The procedures don't have access to the hardware bits when they're pushed out in a package.
In Verilog, I would put commonly used procedures out into an include file and link them into the CPU model as required for a given test suite.
More details:
I have a working bus functional model of a CPU, for simulation test benching.
At the "user interface" level I have a process called "main" running inside the CPU model which calls my predefined "instruction set" like this:
cpu_read(address, read_result);
cpu_write(address, write_data);
etc.
I bundle groups of those calls up into higher level procedures like
configure_communication_bus;
clear_all_packet_counters;
etc.
At the next layer these generic functions call a more hardware specific version which knows the interface timing for the design,
and those procedures then use an input record and output record to connect to the hardware module ports and waggle the cpu bus signals as required.
cpu_read calls hardware_cpu_read(cpu_input_record, cpu_output_record, address);
Something like this:
procedure cpu_read (address : in std_logic_vector(15 downto 0);
read_result : out std_logic_vector(31 downto 0));
begin
hardware_cpu_read(cpu_input_record, cpu_output_record, address, read_result);
end procedure;
The cpu_input_record and cpu_output_record are declared as signals of type nnn_record in the cpu model vhdl file.
So this is all working, but every single one of these procedures is all stored in the cpu VHDL module file, and all in the procedure declaration section so that they are all in the same scope.
If I share the model with team members they will need to add their own testing subroutines, and those also are all in the same location in the file, as well, their simulation test code has to go into the "main" process along with mine.
I'd rather link in various tests from outside the model, and only keep model specific procedures in the model file..
Ironically I can push the lowest level hardware procedure out to a package, and call those procedures from within the "main" process, but the higher level processes can't be put out into that package or any other packages because they don't have access to the cpu_read_record and cpu_write_record.
I feel like there must be a simple way to clean up this code and make it modular, and I'm just missing something obvious.
I don't really think making a command interpreter and loading my test code into a behavioral ROM is the right way to go by the way. Nor is fighting with the simulator interface to connect up a C program, but I may break down and try this..
Quick sketch of an answer (to the question I think you are asking! :-) though I may be off-beam...
To move the BFM subprograms into a reusable package, they need to be independent of the execution scope - that usually means a long parameter list for each of them. So using them in a testbench quickly gets tedious compared with the parameterless (or parameter-lite) versions you have now..
The usual workaround is to implement the BFM in a package, with long parameter lists.
Then write parameter-lite local equivalents (wrappers) in the execution scope, which simply call the package versions supplying all the parameters explicitly.
This is just boilerplate - not pretty but it does allow you to move the BFM into a package. These wrappers can be local to the testbench, to a process within it, or even to a subprogram within that process.
(The parameter types can be records for tidiness : these are probably declared in a third package, shared between BFM. TB, and synthesisable device under test...)
Thanks to overloading, there is no ambiguity between the local and BFM package versions, so the actual testbench remains as simple as possible.
Example wrapper function :
function cpu_read(address : unsigned) return slv_32 is
begin
return BFM_pack.cpu_read (
address => address,
rd_data_bus => tb_rd_data_bus,
wait => tb_wait_signal,
oe => tb_mem_oe,
-- ditto for all the signals constants variables it needs from the tb_ scope
);
end cpu_read;
Currently your test procedures require two extra signals on them, cpu_input_record and cpu_output_record. This is not so bad. It is not uncommon to just have these on all procedures that interact with the cpu and be done with it. So use hardware_cpu_read and not cpu_read. Add cpu_input_record, cpu_output_record to your configure_communication_bus and clear_all_packet_counters procedures and be done. Perhaps choose shorter names.
I do a similar approach, except I use only one record with resolved elements. To make this work, you need to initialize the record so that all elements are non-driving (ie: 'Z' for std_logic). To make this more flexible, I have created resolution functions for integer, time, and real. However, this only saves you one signal. Not a real huge win. Perhaps half way to where you think you want to be. But it is more work than what you are doing.
For VHDL-201X, we are working on syntax to allow parameters/ports automatically map to a identically named signal. This will get you to where you want to be with any of the approaches (yours, mine, or Brian's without the extra wrapper subprogram). It is posted here: http://www.eda.org/twiki/bin/view.cgi/P1076/ImplicitConnections. Given this, I would add the two records to your procedures and call it good enough for now.
Once you get by this problem, you seem to also be asking is how do I write separate tests using the same testbench. For this I use multiple architectures - I like to think of these as a Factory Class for concurrent code. To make this feasible, I separate the stimulus generation code from the rest of the testbench (typically: netlist connections and clock). My presentation, "VHDL Testbench Techniques that Leapfrog SystemVerilog", has an overview of this architecture along with a number of other goodies. It is available at: http://www.synthworks.com/papers/index.htm
You're definitely on the right track, in fact I have a variant like this (what you describe).
The catch is, now I build up a whole subroutine using the "parameter light" procedures, and those are what I want to put in a package to share and reuse. The problem is that any procedure pushed out to a package can't call to the parameter light procedures in the main vhdl file..
So what happens is we have one main vhdl file with all the common CPU hardware setup routines, and every designer's test code all in the same vhdl file..
Long story short, putting our test subroutines into separate files is really what I was hoping for..

Looking for the best equivalents of prefetch instructions for ia32, ia64, amd64, and powerpc

I'm looking at some slightly confused code that's attempted a platform abstraction of prefetch instructions, using various compiler builtins. It appears to be based on powerpc semantics initially, with Read and Write prefetch variations using dcbt and dcbtst respectively (both of these passing TH=0 in the new optional stream opcode).
On ia64 platforms we've got for read:
__lfetch(__lfhint_nt1, pTouch)
wherease for write:
__lfetch_excl(__lfhint_nt1, pTouch)
This (read vs. write prefetching) appears to match the powerpc semantics fairly well (with the exception that ia64 allows for a temporal hint).
Somewhat curiously the ia32/amd64 code in question is using
prefetchnta
Not
prefetchnt1
as it would if that code were to be consistent with the ia64 implementations (#ifdef variations of that in our code for our (still live) hpipf port and our now dead windows and linux ia64 ports).
Since we are building with the intel compiler I should be able to many of our ia32/amd64 platforms consistent by switching to the xmmintrin.h builtins:
_mm_prefetch( (char *)pTouch, _MM_HINT_NTA )
_mm_prefetch( (char *)pTouch, _MM_HINT_T1 )
... provided I can figure out what temporal hint should be used.
Questions:
Are there read vs. write ia32/amd64 prefetch instructions? I don't see any in the instruction set reference.
Would one of the nt1, nt2, nta temporal variations be preferred for read vs. write prefetching?
Any idea if there would have been a good reason to use the NTA temporal hint on ia32/amd64, yet T1 on ia64?
Are there read vs. write ia32/amd64 prefetch instructions? I don't see any in the instruction set reference.
Some systems support the prefetchw instructions for writes
Would one of the nt1, nt2, nta temporal variations be preferred for read vs. write prefetching?
If the line is exclusively used by the calling thread, it shouldn't matter how you bring the line, both reads and writes would be able to use it. The benefit for prefetchw mentioned above is that it will bring the line and give you ownership on it, which may take a while if the line was also used by another core. The hint level on the other hand is orthogonal with the MESI states, and only affects how long would the prefetched line survive. This matters if you prefetch long ahead of the actual access and don't want to prefetch to get lost in that duration, or alternatively - prefetch right before the access, and don't want the prefetches to thrash your cache too much.
Any idea if there would have been a good reason to use the NTA temporal hint on ia32/amd64, yet T1 on ia64?
Just speculating - perhaps the larger caches and aggressive memory BW are more vulnerable to bad prefetching and you'd want to reduce the impact through the non-temporal hint. Consider that your prefetcher is suddenly set loose to fetch anything it can, you'd end up swamped in junk prefetches that would through away lots of useful cachelines. The NTA hint makes them overrun each other, leaving the rest undamaged.
Of course this may also be just a bug, I can't tell for sure, only whoever developed the compiler, but it might make sense for the reason above.
The best resource I could find on x86 prefetching hint types was the good ol' article What Every Programmer Should Know About Memory.
For the most part on x86 there aren't different instructions for read and write prefetches. The exceptions seem to be those that are non-temporal aligned, where a write can bypass the cache but as far as I can tell, a read will always get cached.
It's going to be hard to backtrack through why the earlier code owners used one hint and not the other on a certain architecture. They could be making assumptions about how much cache is available on processors in that family, typical working set sizes for binaries there, long term control flow patterns, etc... and there's no telling how much any of those assumptions were backed up with good reasoning or data. From the limited background here I think you'd be justified in taking the approach that makes the most sense for the platform you're developing on now, regardless what was done on other platforms. This is especially true when you consider articles like this one, which is not the only context where I've heard that it's really, really hard to get any performance gain at all with software prefetches.
Are there any more details known up front, like typical cache miss ratios when using this code, or how much prefetches are expected to help?