UVM: Split sequences onto different sub sequencers - system-verilog

On the DUT I have two channels each consisting of a data interface and a sideband interface. The transactions that are sent down these channels must in order but one channel can stall back while the other channel catches up. I.E:
I send transaction A down channel 0, transaction C down channel 1, but channel 1 will not accept transaction C until channel 0 has recieved transaction B.
Furthermore the data interface can be slower than the sideband interface on each channel and certain sideband transactions do not require data to be sent with them.
Currently the tests are set up to create the individual data and sideband sequences, place them into queues then split the queues into the number of channels and send them. However this is becoming difficult to maintain with interface changes on the channels and varying number of channels per configuration. So ideally I'd like to write the test sequence so that it has no knowledge of how many channels are there or what interface needs data for the abstract transaction.
The top sequence should just generate sequences like this:
'uvm_do(open_data_stream_sequence);
'uvm_do_with(send_data_sequence, {send_data_sequence.packet_number == 0;});
'uvm_do_with(send_data_sequence, {send_data_sequence.packet_number == 1;});
'uvm_do_with(send_data_sequence, {send_data_sequence.packet_number == 2;});
'uvm_do_with(send_data_sequence, {send_data_sequence.packet_number == 3;});
'uvm_do(close_data_stream_sequence);
The problem with this approach is that I do not want one channel to block the other or one interface to block the other unless both are stalled back. If I use a virtual sequence like above, the open_data_stream_sequence may stall for that individual channel when I want to pipeline the send_data_sequence into the other channel or it may stall on the sideband interface but I want to pipeline the send_data_sequence data transaction onto the same channel's data interface.
However I'm struggling to figure out how to implement the arbitration between the subsequencers. I thought about sequence layering and the use of fifos to only stall when all interfaces are saturated in a kind of middle layer. Is there asny UVM tricks I'm missing?

I don't think there's a way of getting around writing some code that understand the current channel state and schedules things in an optimal way (or intentionally un-optimal, if the test calls for it). You will have to have some amount of queuing going on in order to allow sideband requests with no data to pass data requests when data channels are stalled, for example.
That can still be encapsulated in a base-class virtual sequence, let's call it 'scheduler', in such a way that stimulus is oblivious to its implementation. The scheduler will have a 'start_sequence' API that starts the given sequence on a channel's sequencer, or queues it up to start it as soon as it a channel is not stalled. The test writer can sub-class 'scheduler' for every top-level sequence he wants to write, and put in the "start_sequence(data0); start_sequence(data1); start_sequence(sideband0);" calls, where each of dataN/sidebandM virtual sequences look like the one you described in your question.
'start_sequence' should return immediately to allow full saturation of the channels, or could block when all channels are saturated to reduce unnecessary queuing.

I don't fully understand what your conditions for stalling are, but what I can tell you is that it's going to be complicated in any case.
The code you wrote will execute in a linear fashion, but what you're describing is parallel behavior. What you can do is start all sequences in parallel and block or release driving them based on events. These events would be highly specific to your application:
fork
'uvm_do(open_data_stream_sequence);
begin
#(unblock_channel_0);
'uvm_do_with(send_data_sequence, {send_data_sequence.packet_number == 0;});
end
// ...
begin
#(done_e);
'uvm_do(close_data_stream_sequence);
end
join

Related

Need for multi-threading in Systemverilog using fork-join

In most text books advocating layered testbench designs, it is recommended that different layers/block run in parallel. I'm currently unable to figure out the reason why is it so. Why cannot we follow the following sequence.
repeat for 1000 tests
generate a transaction
drive the transaction on the DUT
monitor the transaction on the DUT
compare output with a reference
Instead, what is recommended is that all four blocks generator, driver, monitor and scoreboard/checker should run in parallel. My confusion is that why do we avoid the above mentioned sequential behavior in which we go through tests one test case at a time and prefer different blocks running in parallel.
Some texts say that it is because that is how things are done in hardware, i.e. everything runs in parallel. However, the layered testbench is not needed to model any synthesizable hardware. So, why do we have to restrict our verification enivornment/testbench to follow these hardware-like behavior.
A sample block diagram that I'm referring to is given below:
Suppose that you have a fifo which you want to test. Your driver pushes data into it, and the monitor checks the other end. The data gets pushed when it is available and till the fifo is full, the consumer on the other end reads data when it can. So, the pipe gets sometimes full, sometimes empty.
When the fifo is full, the driver must stop. The monitor works always, but its values do not change at the same frequency as the stimuli and it is delayed due to the fifo depth.
In your example, when the fifo is full, the stopped driver will block the whole loop, so the monitor will not work either. Of course, you can come up with some conditional statements which will bypass stopped driver. But you will need to run the monitor and the scoreboard every time, even if the data is not changing.
With more complicated designs with multiple fifos, pipelines, delays, clock frequencies, etc., your loop will become so complicated that it would be difficult if not impossible to manage.
The problem is that in the simple programming it is not possible to express block/wait conditions for statement without blocking the whole loop. It is much easier to do with parallel threads.
The general approach is to run driver and monitor in separate simulation threads. Monitor in this case waits for the data to appear and does not block the driver. The driver pushes data when it is available and can be blocked by fifo full or if there is nothing to drive. It does not block the monitor.
With a single monitor, you can probably pack the scoreboard in the same thread with the monitor, but with multiple monitors it will be problematic, in particular when all monitors run in separate threads. So, the scoreboard should run as a separate thread as well.
You are mixing two different concepts. The layered approach is a software concept that helps manage different abstraction levels from software transactions (a frame of data) to the individual pin wiggles. These layers are very similar to OSI Network Model. Layering also help with maintenance and reusability by defining clear interfaces that enable you to build up a larger system. It's hard to see the benefits of this on a testbench for a small combinational block.
Parallelism come into play for other reasons. There are relatively few complete designs out there that can be tested as a single stream of inputs and then comparing the output to a reference model. You might be able to test one small block of a design this way, but not a complete chip as it typically has many interfaces that need to be driven in parallel.
But let's take the case of two simple blocks that you tested individually with the approach above. Now you want to connect them together where the output of the first DUT becomes the driver of the second DUT
Driver1 -> DUT1 -> DUT2 -> Monitor2
This works best if I originally write the drivers and monitors as separate objects running in parallel.

How Axon framework's sequencing policy works in terms of statefulness

In Axon's reference guide it is written that
Besides these provided policies, you can define your own. All policies must implement the SequencingPolicy interface. This interface defines a single method, getSequenceIdentifierFor, that returns the sequence identifier for a given event. Events for which an equal sequence identifier is returned must be processed sequentially. Events that produce a different sequence identifier may be processed concurrently.
Even more, in this thread's last message it says that
with the sequencing policy, you indicate which events need to be processed sequentially. It doesn't matter whether the threads are in the same JVM, or in different ones. If the sequencing policy returns the same value for 2 messages, they will be guaranteed to be processed sequentially, even if you have tracking processor threads across multiple JVMs.
So does this mean that event processors are actually stateless? If yes, then how do they manage to synchronise? Is the token store used for this purpose?
I think this depends on what you count as state, but I assume that from the point of view your looking at it, yes, the EventProcessor implementations in Axon are indeed stateless.
The SubscribingEventProcessor receives it's events from a SubscribableMessageSource (the EventBus implements this interface) when they occur.
The TrackingEventProcessor retrieves it's event from a StreamableMessageSource (the EventStore implements this interface) on it's own leisure.
The latter version for that needs to keep track of where it is in regards to events on the event stream. This information is stored in a TrackingToken, which is saved by the TokenStore.
A given TrackingEventProcessor thread can only handle events if it has laid a claim on the TrackingToken for the processing group it is part of. Hence, this ensure that the same event isn't handled by two distinct threads to accidentally update the same query model.
The TrackingToken also allow multithreading this process, which is done by segmented the token. The number of segments (adjustable through the initialSegmentCount) drives the number of pieces the TrackingToken for a given processing group will be partitioned in. From the point of view of the TokenStore, this means you'll have several TrackingToken instances stored which equal the number of segments you've set it to.
The SequencingPolicy its job is to drive which events in a stream belong to which segment. Doing so, you could for example use the SequentialPerAggregate SequencingPolicy to ensure all the events with a given aggregate identifier are handled by one segment.

nondeterminism.njoin: maxQueued and prefetching

Why does the njoin prefetch the data before processing? It seems like an unnecessary complication, unless it has something to do with how Processes of Processes are merged?
I have a stream that runs effects whenever a new element is generated. I'd like to keep the effects to a minimum, so whenever a njoin with, say maxOpen = 4, 4 should be the maximum number of elements generated at the same time (no element should be generated unless it can be processed immediately).
Is there a way to solve this gracefully with njoin? Right now I'm using a bounded queue of "tickets" (an element is generated only after it got a ticket).
See https://github.com/scalaz/scalaz-stream/issues/274, specifically the comment below from djspiewak.
"From a conceptual level, the problem here is the interface point between the "pull" model of Process and the "push" model that is required for any concurrent stream merging. Both wye and njoin sit at this boundary point and "cheat" by actively pulling on their source processes to fill an inbound queue, pushing results into an outbound queue pending the pull on the output process. (obviously, both wye and njoin make their inbound queues implicit via Actor) For the most part, this works extremely well and it preserves most of the properties that users care about (e.g. propagation of termination, back pressure, etc)."
The second parameter to njoined, maxQueued, bounds the amount of prefetching. If that parameter is 0, there is no limit on the queue size, and thus no limit on the prefetching. The docs for mergeN, which calls njoin explain a bit more the reasoning for this prefetching behavior. "Internally mergeN keeps small buffer that reads ahead up to n values of A where n equals to number of active source streams. That does not mean that every source process is consulted in this read-ahead cache, it just tries to be as much fair as possible when processes provide their A on almost the same speed." So it seems that the njoin is dealing with the problem of what happens when all the sources provide a value at nearly the same time, but it's trying to prevent any one of those joined streams from crowding out slower streams.

What is Event Driven Concurrency?

I am starting to learn Scala and functional programming. I was reading the book !Programming scala: Tackle Multi-Core Complexity on the Java Virtual Machine". Upon the first chapter I've seen the word Event-Driven concurrency and Actor model. Before I continue reading this book I want to have an idea about Event-Driven concurrency or Actor Model.
What is Event-Driven concurrency, and how is it related to Actor Model?
An Event Driven programming model involves registering code to be run when a given event fires. An example is, instead of calling a method that returns some data from a database:
val user = db.getUser(1)
println(user.name)
You could instead register a callback to be run when the data is ready:
db.getUser(1, u => println(u.name))
In the first example, no concurrency was happening; The current thread would block until db.getUser(1) returned data from the database. In the second example db.getUser would return immediately and carry on executing the next code in the program. In parallel to this, the callback u => println(u.name) will be executed at some point in the future.
Some people prefer the second approach as it doesn't mean memory hungry Threads are needlessly sat around waiting for slow I/O to return.
The Actor Model is an example of how Event-Driven concepts can be used to help the programmer easily write concurrent programs.
From a super high level, Actors are objects that define a series of Event Driven message handlers that get fired when the Actor receives messages. In Akka, each instance of an Actor is single Threaded, however when many of these Actors are put together they create a system with concurrency.
For example, Actor A could send messages to Actor B and C in parallel. Actor B and C could fire messages back to Actor A. Actor A would have message handlers to receive these messages and behave as desired.
To learn more about the Actor model I would recommend reading the Akka documentation. It is really well written: http://doc.akka.io/docs/akka/2.1.4/
There is also lot's of good documentation around the web about Event Driven Concurrency that us much more detailed than what I've written here. http://berb.github.io/diploma-thesis/original/055_events.html
Theon's answer provides a good modern overview. I'd like to add some historical perspective.
Tony Hoare and Robert Milner both developed mathematical algebra for analysing concurrent systems (Communicating Sequential Processes, CSP, and Communicating Concurrent Systems, CCS). Both of these look like heavy mathematics to most of us but the practical application is relatively straightforward. CSP led directly to the Occam programming language amongst others, with Go being the newest example. CCS led to Pi calculus and the mobility of communicating channel ends, a feature that is part of Go and was added to Occam in the last decade or so.
CSP models concurrency purely by considering automomous entities ('processes', v.lightweight things like green threads) interacting simply by event exchange. The medium for passing events is along channels. Processes may have to deal with several inputs or outputs and they do this by selecting the event that is ready first. The events usually carry data from the sender to the receiver.
A principle feature of the CSP model is that a pair of processes engage in communication only when both are ready - in practical terms this leads to what is usually called 'synchronous' communication. However, the actual implementations (Go, Occam, Akka) allow channels to be buffered (the normal state in Akka) so that the lock-step exchange of events is often actually decoupled instead.
So in summary, an event-driven CSP-based system is really a data-flow network of processes connected by channels.
Besides the CSP interpretation of event-driven, there have been others. An important example is the 'event-wheel' approach, once popular for modelling concurrent systems whilst actually having a single processing thread. Such systems handle events by putting them into a processing queue and dealing with them due course, usually via a callback. Java Swing's event processing engine is a good example. There were others, e.g. for time-based simulation engines. One might think of the Javascript / NodeJS model as fitting into this category as well.
So in summary, an event-wheel was a way to express concurrency but without parallelism.
The irony of this is that the two approaches I've described above are both described as event driven but what they mean by event driven is different in each case. In one case, hardware-like entities are wired together; in the other, almost all actions are executed by callbacks. The CSP approach claims to be scalable because it's fully composable; it's naturally adept at parallel execution also. If there are any reasons to favour one over the other, these are probably it.
To understand the answer to this you have to look at event concurrency from the OS layer up. First you start with threads which are the smallest section of code that can be run by the OS and eventually deal with I/O, timing and other kinds of events.
The OS groups threads into a process in which they share the same memory, protection and security permissions. Above that layer you have user programs which typically make I/O requests that are handled by user libraries.
The I/O libraries handle these requests in one of two ways. Unix-like systems use a "reactor" model in which the library registers I/O handlers for all the different types of I/O and events in the system. These handlers are activated when I/O is ready on a specific device. Windows-like systems use an I/O completion model in which I/O requests are made and a callback is triggered when the request is complete.
Both of these models require a significant amount of overhead to manage overall program state if you were to use them directly. However some programming tasks (web apps / services) lend themselves to a seemingly more direct implementation if you use an event model directly, but you still need to manage all of that program state. In order to track program logic across dispatches of several related events you have to manually track state and pass it around to the callbacks. This tracking structure is usually called a state context or baton. As you might imagine passing batons around all over the place to numerous seemingly unrelated handlers makes for some extremely hard to read and spaghetti-like code. It's also a pain to write and debug -- especially when you're trying to handle the synchronization of various concurrent paths of execution. You start getting into Futures and then the code becomes really difficult to read.
One well-known event processing library is call libuv. It's a portable event loop that integrates Unix's reactor model with Windows' completion model into a single model usually called a "proactor". Its the event handler that drives NodeJS.
Which brings us to communicating sequential processes.
https://en.wikipedia.org/wiki/Communicating_sequential_processes
Rather than writing asynchronous I/O dispatch and synchronization code using one or more concurrency models (and their often competing conventions), we flip the problem on its head. We use a "coroutine" which looks like normal sequential code.
A simple example is a coroutine that receives a single byte over an event channel from another coroutine that sends a single byte. This effectively synchronizes I/O producer and consumer because the writer/sender has to wait for a reader/receiver and vice-versa. While either process is waiting they explicitly yield execution to other processes. When a coroutine yields, its scoped program state is saved on a stack frame thus saving you from the confusion of managing multi-layered baton state in an event loop.
Using applications built on these event channels we can construct arbitrary, reusable, concurrent logic and the algorithms no longer look like spaghetti code. In pure CSP systems if you write to a channel and there is no reader, you will be blocked. The channel endpoints are known via handles internally to the program.
Actor systems are different in a couple of ways. First, the endpoints are the actor threads and they are named and known external to the mainline program. The second difference is that sends and receives on these channels are buffered. In other words if you send a message to an actor and there isn't one listening or its busy you aren't blocked until one reads from their input channel. Other differences exist like one actor can publish to two different actors concurrently.
As you might guess Actor systems can easily be built from CSP systems. There are other details like waiting for specific event patterns and selecting from them, but that's the basics.
I hope that clarifies things a bit.
Other constructs can be built from these ideas. Various programming systems (Go, Erlang, etc) include CSP implementations within them. Operating systems like Inferno and Node9 use CSPs and Channels as the basis of their distributed computing model.
Go: https://en.wikipedia.org/wiki/Go_(programming_language)
Erlang: https://en.wikipedia.org/wiki/Erlang_(programming_language)
Inferno: https://en.wikipedia.org/wiki/Inferno_(operating_system)
Node9: https://github.com/jvburnes/node9

Message bus integration and resync of Bounded Contexts after downtime - Service Bus 1.0

I have just downloaded joliver eventstore and looking to wire up a service bus with Windows Service Bus 1.0 for an application separated across more than one Bounded Context process.
If a bounded context has been offline whilst events in other bounded contexts have been created (or may even be a new context that has been deployed), I can see the following sequence of events.
For an example ContextA, ContextB and ContextC, all connected using Service Bus 1.0 and each context with their own event store, they all share the same bus messaging backplane.
ContextC goes offline.
When ContextC comes back-up, other bounded contexts need to be notified of the events that need to be resent to the context that has just come back online. These events are replayed from each of the event stores.
My questions are:
The above scenario would apply to any event sourcing libraries, so is there any infrastructure code on top of this I can use, or do I have to roll my own?
With Windows Service Bus 1.0, how do I marry sequence numbers in my event store to sequence numbers on the Service Bus?
What is the best practice to detect and handle events that have already been received in a safe manner (protecting against message handlers failing)?
The above scenario would apply to any event sourcing libraries, so is there any infrastructure code on top of this I can use, or do I have to roll my own?
The notion of a Projection mechanism tied to the events is certainly common. Unfortunately, there are many many ways of handling how that might be done, depending on your stack, performance requirements and scale and many other factors.
As a result I'm not aware of a commoditized facility of this nature.
The GetEventStore store has an integrated Projection facility which looks extremely powerful and takes the need to build all this off the table. Before its existence, I'd have argued that one shouldnt even consider looking past the the SRPness of the JOES.
You havent said much about your actual stack other than mentioning Azure.
With Windows Service Bus, how do I marry sequence numbers in my event store to sequence numbers on the Service Bus?
You can use stream id + the commit sequence number the MessageId (and use that to ensure duplicates are removed by the bus). You will probably also include properties in the Message metadata.
What is the best practice to detect and handle events that have already been received in a safe manner (protecting against message handlers failing)?
If you're on Azure and considering ServiceBus then the Topics can be used to ensure at least once delivery (and you'll use the sessioning facility). Go watch the two hour deep dive ClemensV Subscribe video plus a few other episodes or you'll spent the same amount of time making mistakes)
To keep broadcast traffic down, if ContextC requests replays from ContextA and ContextB, is there any way for these replay messages to be sent only to ContextC? Or should I not worry about this?
Mu. You started off asking whether this stuff was a good idea but now seem to have baked in an assumption that it's the way to go.
Firstly, this infrastructure is a massive wheel to reinvent. Have you considered simply setting up a topic per BC and having anyone that needs to listen listen?
A key thing here is that you need to bear in mind the fact that just because you can think of cases where BCs need to consume each others events, that this central magic bus that's everywhere will deliver everything everywhere.
EDIT: Answers to your edited versions of questions 2+
With Windows Service Bus 1.0, how do I marry sequence numbers in my event store to sequence numbers on the Service Bus?
Your event store doesnt have a sequence number. It has a commit sequence number per aggregate. You'd typically use a sessioned topic and subscription. Then you need to choose whether you want a global ordering (use a single session id) or per aggregate ordering (use the stream id as the session id).
Once events are on a topic, they have a MessageSequenceNumber and the subscription (when sessioned) delivers (actually the subscriber recieves them) them in sequence.
What is the best practice to detect and handle events that have already been received in a safe manner (protecting against message handlers failing)?
This is built into the Service Bus (or any queueing mechanism). You don't mark the Message completed until it has been successfully processed. Any failure leads to Abandonment (which puts it back on the queue for reprocessing).
The subscriber taking a break, becoming disconnected or work backing up is naturally dealt with by the Topic.