ZeroMQ Choosing Correct Client-Worker Model for a Call Center - perl

I have a project that needs to be written in Perl so I've chosen ZeroMQ.
There is a single client program, generating work for a variable number of workers. The workers are real human operators who will complete a task then request a new task. The job of the client program is keep all available workers busy all day. It's a call center.
So each worker can only process one task at time, and there may be some time before requesting a new task. And the number of workers may vary during the day.
The client needs to keep a queue of tasks ready to give to workers as and when they request them. Whenever the client queue gets low the client can generate more tasks to top-up the queue.
What design pattern (i.e. what ZeroMQ Socket combination) should I use for this? I've skimmed through all the patterns in the 0MQ Guide and can't find anything that matches this.
Thanks

Sure. ... there is not a single, solo Archetype to match the Requirement List use several ZeroMQ Scalable Formal Communication Patterns
Typical software Project uses many ZeroMQ sockets ( with various Archetypes ) as a certain form of node-node signalisation and message-passing platform.
It is fair to note, that automated Load-Balancers may work fine for automated processes, but not always so for processes, executed by Humans or interacting with Humans.
Humans ( both the Call centre Agents and their Line-Supervisors ) introduce another layer of requirements - sometimes with a need to introduce non-just-Round-Robin workload distribution logic, sometimes need to switch a call from Agent A to another Agent B ( which a trivial archetype will simply not be capable of and might get into troubles, if it's hardwired-logic runs into a collision ( mutually blocked REQ-REP stale-mate being one such example ).
So simply forget to wait for one super-powered archetype, but rather create a smart network of behaviours, that will cover your distributed-computing problem desired event-handling.
There are many other aspects, one ought learn before taking the first ZeroMQ socket into service.
failure resillience
performance scaling
latency-profiling ( high-priority voice-traffic, vs. low-priority logging )
watchdog acknowledgements and timeout situations handling
cross-compatibility issues ( version 2.1x vs 3.x vs 4.+ API )
processing robustness against a malfunctioning agent / malicious attack / deadly spurious traffic storms ... to name just a few of problems
all of which has some built-ins in the ZeroMQ toolbox, some of which may need some advanced thinking, so as to handle known constraints.
The Best Next Step?
A would advocate for a fabulous Pieter HINTJENS' book "Code Connected, Volume 1" -- for everyone, who is serious into distributed processing, this is a must-read -- do not hesitate to check other my posts to find a direct URL to a PDF-version of this ZeroMQ Bible.
Worth time and one's tears and sweat.

Related

Is it possible to combine REST and messaging for microservices?

We have the first version of an application based on a microservice architecture. We used REST for external and internal communication.
Now we want to switch to AP from CP (CAP theorem)* and use a message bus for communication between microservices.
There is a lot of information about how to create an event bus based on Kafka, RabbitMQ, etc.
But I can't find any best practices for a combination of REST and messaging.
For example, you create a car service and you need to add different car components. It would make more sense, for this purpose, to use REST with POST requests. On the other hand, a service for booking a car would be a good task for an event-based approach.
Do you have a similar approach when you have a different dictionary and business logic capabilities? How do you combine them? Just support both approaches separately? Or unify them in one approach?
* for the first version, we agreed to choose consistency and partition tolerance. But now availability becomes more important for us.
Bottom line up front: You're looking for Command Query Responsibility Segregation; which defines an architectural pattern for breaking up responsibilities from querying for data to asking for a process to be run. The short answer is you do not want to mix the two in either a query or a process in a blocking fashion. The rest of this answer will go into detail as to why, and the three different ways you can do what you're trying to do.
This answer is a short form of the experience I have with Microservices. My bona fides: I've created Microservices topologies from scratch (and nearly zero knowledge) and as they say hit every branch on the way down.
One of the benefits of starting from zero-knowledge is that the first topology I created used a mixture of intra-service synchronous and blocking (HTTP) communication (to retrieve data needed for an operation from the service that held it), and message queues + asynchronous events to run operations (for Commands).
I'll define both terms:
Commands: Telling a service to do something. For instance, "Run ETL Batch job". You expect there to be an output from this; but it is necessarily a process that you're not going to be able to reliably wait on. A command has side-effects. Something will change because of this action (If nothing happens and nothing changes, then you haven't done anything).
Query: Asking a service for data that it holds. This data may have been there because of a Command given, but asking for data should not have side effects. No Command operations should need to be run because of a Query received.
Anyway, back to the topology.
Level 1: Mixed HTTP and Events
For this first topology, we mixed Synchronous Queries with Asynchronous Events being emitted. This was... problematic.
Message Buses are by their nature observable. One setting in RabbitMQ, or an Event Source, and you can observe all events in the system. This has some good side-effects, in that when something happens in the process you can typically figure out what events led to that state (if you follow an event-driven paradigm + state machines).
HTTP Calls are not observable without inspecting network traffic or logging those requests (which itself has problems, so we're going to start with "not feasible" in normal operations). Therefore if you mix a message based process and HTTP calls, you're going to have holes where you can't tell what's going on. You'll have spots where due to a network error your HTTP call didn't return data, and your services didn't continue the process because of that. You'll also need to hook up Retry/Circuit Breaker patterns for your HTTP calls to ensure they at least try a few times, but then you have to differentiate between "Not up because it's down", and "Not up because it's momentarily busy".
In short, mixing the two methods for a Command Driven process is not very resilient.
Level 2: Events define RPC/Internal Request/Response for data; Queries are External
In step two of this maturity model, you separate out Commands and Queries. Commands should use an event driven system, and queries should happen through HTTP. If you need the results of a query for a Command, then you issue a message and use a Request/Response pattern over your message bus.
This has benefits and problems too.
Benefits-wise your entire Command is now observable, even as it hops through multiple services. You can also replay processes in the system by rerunning events, which can be useful in tracking down problems.
Problems-wise now some of your events look a lot like queries; and you're now recreating the beautiful HTTP and REST semantics available in HTTP for messages; and that's not terribly fun or useful. As an example, a 404 tells you there's no data in REST. For a message based event, you have to recreate those semantics (There's a good Youtube conference talk on the subject I can't find but a team tried to do just that with great pain).
However, your events are now asynchronous and non-blocking, and every service can be refactored to a state-machine that will respond to a given event. Some caveats are those events should contain all the data needed for the operation (which leads to messages growing over the course of a process).
Your queries can still use HTTP for external communication; but for internal command/processes, you'd use the message bus.
I don't recommend this approach either (though it's a step up from the first approach). I don't recommend it because of the impurity your events start to take on, and in a microservices system having contracts be the same throughout the system is important.
Level 3: Producers of Data emit data as events. Consumers Record data for their use.
The third step in the maturity model (and we were on our way to that paradigm when I departed from the project) is for services that produce data to issue events when that data is produced. That data is then jotted down by services listening for those events, and those services will use that (could be?) stale data to conduct their operations. External customers still use HTTP; but internally you emit events when new data is produced, and each service that cares about that data will store it to use when it needs to. This is the crux of Michael Bryzek's talk Designing Microservices Architecture the Right way. Michael Bryzek is the CTO of Flow.io, a white-label e-commerce company.
If you want a deeper answer along with other issues at play, I'll point you to my blog post on the subject.

What is Event Driven Concurrency?

I am starting to learn Scala and functional programming. I was reading the book !Programming scala: Tackle Multi-Core Complexity on the Java Virtual Machine". Upon the first chapter I've seen the word Event-Driven concurrency and Actor model. Before I continue reading this book I want to have an idea about Event-Driven concurrency or Actor Model.
What is Event-Driven concurrency, and how is it related to Actor Model?
An Event Driven programming model involves registering code to be run when a given event fires. An example is, instead of calling a method that returns some data from a database:
val user = db.getUser(1)
println(user.name)
You could instead register a callback to be run when the data is ready:
db.getUser(1, u => println(u.name))
In the first example, no concurrency was happening; The current thread would block until db.getUser(1) returned data from the database. In the second example db.getUser would return immediately and carry on executing the next code in the program. In parallel to this, the callback u => println(u.name) will be executed at some point in the future.
Some people prefer the second approach as it doesn't mean memory hungry Threads are needlessly sat around waiting for slow I/O to return.
The Actor Model is an example of how Event-Driven concepts can be used to help the programmer easily write concurrent programs.
From a super high level, Actors are objects that define a series of Event Driven message handlers that get fired when the Actor receives messages. In Akka, each instance of an Actor is single Threaded, however when many of these Actors are put together they create a system with concurrency.
For example, Actor A could send messages to Actor B and C in parallel. Actor B and C could fire messages back to Actor A. Actor A would have message handlers to receive these messages and behave as desired.
To learn more about the Actor model I would recommend reading the Akka documentation. It is really well written: http://doc.akka.io/docs/akka/2.1.4/
There is also lot's of good documentation around the web about Event Driven Concurrency that us much more detailed than what I've written here. http://berb.github.io/diploma-thesis/original/055_events.html
Theon's answer provides a good modern overview. I'd like to add some historical perspective.
Tony Hoare and Robert Milner both developed mathematical algebra for analysing concurrent systems (Communicating Sequential Processes, CSP, and Communicating Concurrent Systems, CCS). Both of these look like heavy mathematics to most of us but the practical application is relatively straightforward. CSP led directly to the Occam programming language amongst others, with Go being the newest example. CCS led to Pi calculus and the mobility of communicating channel ends, a feature that is part of Go and was added to Occam in the last decade or so.
CSP models concurrency purely by considering automomous entities ('processes', v.lightweight things like green threads) interacting simply by event exchange. The medium for passing events is along channels. Processes may have to deal with several inputs or outputs and they do this by selecting the event that is ready first. The events usually carry data from the sender to the receiver.
A principle feature of the CSP model is that a pair of processes engage in communication only when both are ready - in practical terms this leads to what is usually called 'synchronous' communication. However, the actual implementations (Go, Occam, Akka) allow channels to be buffered (the normal state in Akka) so that the lock-step exchange of events is often actually decoupled instead.
So in summary, an event-driven CSP-based system is really a data-flow network of processes connected by channels.
Besides the CSP interpretation of event-driven, there have been others. An important example is the 'event-wheel' approach, once popular for modelling concurrent systems whilst actually having a single processing thread. Such systems handle events by putting them into a processing queue and dealing with them due course, usually via a callback. Java Swing's event processing engine is a good example. There were others, e.g. for time-based simulation engines. One might think of the Javascript / NodeJS model as fitting into this category as well.
So in summary, an event-wheel was a way to express concurrency but without parallelism.
The irony of this is that the two approaches I've described above are both described as event driven but what they mean by event driven is different in each case. In one case, hardware-like entities are wired together; in the other, almost all actions are executed by callbacks. The CSP approach claims to be scalable because it's fully composable; it's naturally adept at parallel execution also. If there are any reasons to favour one over the other, these are probably it.
To understand the answer to this you have to look at event concurrency from the OS layer up. First you start with threads which are the smallest section of code that can be run by the OS and eventually deal with I/O, timing and other kinds of events.
The OS groups threads into a process in which they share the same memory, protection and security permissions. Above that layer you have user programs which typically make I/O requests that are handled by user libraries.
The I/O libraries handle these requests in one of two ways. Unix-like systems use a "reactor" model in which the library registers I/O handlers for all the different types of I/O and events in the system. These handlers are activated when I/O is ready on a specific device. Windows-like systems use an I/O completion model in which I/O requests are made and a callback is triggered when the request is complete.
Both of these models require a significant amount of overhead to manage overall program state if you were to use them directly. However some programming tasks (web apps / services) lend themselves to a seemingly more direct implementation if you use an event model directly, but you still need to manage all of that program state. In order to track program logic across dispatches of several related events you have to manually track state and pass it around to the callbacks. This tracking structure is usually called a state context or baton. As you might imagine passing batons around all over the place to numerous seemingly unrelated handlers makes for some extremely hard to read and spaghetti-like code. It's also a pain to write and debug -- especially when you're trying to handle the synchronization of various concurrent paths of execution. You start getting into Futures and then the code becomes really difficult to read.
One well-known event processing library is call libuv. It's a portable event loop that integrates Unix's reactor model with Windows' completion model into a single model usually called a "proactor". Its the event handler that drives NodeJS.
Which brings us to communicating sequential processes.
https://en.wikipedia.org/wiki/Communicating_sequential_processes
Rather than writing asynchronous I/O dispatch and synchronization code using one or more concurrency models (and their often competing conventions), we flip the problem on its head. We use a "coroutine" which looks like normal sequential code.
A simple example is a coroutine that receives a single byte over an event channel from another coroutine that sends a single byte. This effectively synchronizes I/O producer and consumer because the writer/sender has to wait for a reader/receiver and vice-versa. While either process is waiting they explicitly yield execution to other processes. When a coroutine yields, its scoped program state is saved on a stack frame thus saving you from the confusion of managing multi-layered baton state in an event loop.
Using applications built on these event channels we can construct arbitrary, reusable, concurrent logic and the algorithms no longer look like spaghetti code. In pure CSP systems if you write to a channel and there is no reader, you will be blocked. The channel endpoints are known via handles internally to the program.
Actor systems are different in a couple of ways. First, the endpoints are the actor threads and they are named and known external to the mainline program. The second difference is that sends and receives on these channels are buffered. In other words if you send a message to an actor and there isn't one listening or its busy you aren't blocked until one reads from their input channel. Other differences exist like one actor can publish to two different actors concurrently.
As you might guess Actor systems can easily be built from CSP systems. There are other details like waiting for specific event patterns and selecting from them, but that's the basics.
I hope that clarifies things a bit.
Other constructs can be built from these ideas. Various programming systems (Go, Erlang, etc) include CSP implementations within them. Operating systems like Inferno and Node9 use CSPs and Channels as the basis of their distributed computing model.
Go: https://en.wikipedia.org/wiki/Go_(programming_language)
Erlang: https://en.wikipedia.org/wiki/Erlang_(programming_language)
Inferno: https://en.wikipedia.org/wiki/Inferno_(operating_system)
Node9: https://github.com/jvburnes/node9

Event or polled based embedded MCU system architecture?

I have prior experience in writing both event and poll based embedded systems (for tiny MCU's with no preemptive OS).
In an event based system, tasks usually receives events (messages) on a queue and handles them in turn.
In a polled based system, tasks polls status with a certain interval and responds to change.
Which architecture do you prefer? Can both co-exist?
UPDATE: POINTS MADE
POLL BASED
- Tight coupling related to timing aspects (#Lundin)
* Can co-exist alongside event system using queues (#embedded.kyle)
* Fine for smaller programs (#Lundin)
EVENT BASED
+ More flexible system in the long run (#embedded.kyle)
- RTOS edition adds complexity (#Lundin)
* Small programs = state-machine controlled (#Lundin)
* Can be implemented using queues and a "super-loop" (inside controller/main) (#embedded.kyle)
* Only true "events" are hw interrupts ones (#Lundin)
RELATED QUESTIONS
* Looking for a comparison of different scheduling algorithms for a Finite State Machine (#embedded.kyle)
RELATED INFO
* "Prefer Using Active Objects Instead of Naked Threads" (#Miro)
http://www.drdobbs.com/parallel/prefer-using-active-objects-instead-of-n/225700095
* "Use Threads Correctly = Isolation + Asynchronous Messages" (#Miro)
http://www.drdobbs.com/parallel/use-threads-correctly-isolation-asynch/215900465
There is really no such thing as "event-driven" on a bare bone MCU platform, despite what the buzzword-spitters are trying to tell you. The only kind of true events you can receive are hardware interrupts.
Depending on the nature of the application and its real time requirements, interrupts may or may not be suitable. Generally, it is far easier to achieve deterministic real time with a polling system. However, systems relying solely on polling are very hard to maintain, because you get tight coupling between all timing aspects.
Suppose you try to start up a LCD, which is slow. Instead of polling some timer repeatedly while burning CPU cycles in an empty loop, you would perhaps decide to receive some data over a bus in the meantime. And then you want to print the data received on the LCD. Such a design has created a tight coupling between the LCD startup time and the serial bus, and another tight coupling between the serial bus and the printing of data. From an object-oriented point-of-view these things are not related to each other at all. If you were to speed up the serial bus at some point in the future, then suddenly you could encounter LCD printing bugs, because it has not finished starting up when you try to print on it.
In a small program, it is perfectly fine to use polling like in the above example. But if the program has potential of growing, polling will make it very complex and the tight coupling will ultimately lead to many strange and fatal bugs.
On the other hand, multi-threading and RTOS adds quite a lot of extra complexity which in turn can lead to bugs as well. Where to draw the line isn't simple to determine.
Out of personal experience I'd say that any program smaller than 20-30k LOC will not benefit from scheduling and multitasking, beyond simple state machines. If the program gets larger than that, I'd consider a multitasking RTOS.
Also, low-end MCUs (8- and 16-bitters) are far from suitable to run an OS. If you find that you need an OS to handle complexity on a 8- or 16-bit platform, you probably picked the wrong MCU to begin with. I'd be sceptical against any attempts to introduce an OS on anything smaller than a 32-bitter.
Actually, event-driven programming and threads can be combined and the resulting pattern is widely known as "active objects" or "actors".
Active objects (actors) are encapsulated, event-driven state machines, which communicate with one another asynchronously by posting events to each other. Active objects process all events in their own thread of execution (at least conceptually, if a cooperative scheduler is used), so they avoid by design most concurrency hazards.
Actors and active objects are all the rage (again) in the general-purpose computing (you can search for Erlang, Scala, Akka). Herb Sutter has written a couple of good articles that explain the "active object" pattern: "Prefer Using Active Objects Instead of Naked Threads" (http://www.drdobbs.com/parallel/prefer-using-active-objects-instead-of-n/225700095) and "Use Threads Correctly = Isolation + Asynchronous Messages" (http://www.drdobbs.com/parallel/use-threads-correctly-isolation-asynch/215900465)
Here is what Herb says in the first of these articles:
"Using raw threads directly is trouble for a number of reasons ...
Active objects dramatically improve our ability to reason about our thread's code and operation by giving us higher-level abstractions and idioms that raise the semantic level of our program and let us express our intent more directly. As with all good patterns, we also get better vocabulary to talk about our design. Note that active objects aren't a novelty: UML and various libraries have provided support for active classes"
So, all this is really not new. But what's perhaps less known, especially in the embedded systems community, is that active objects are not only fully applicable to the embedded systems, but they are actually a perfect match for embedded and they are lighter than a traditional RTOS.
I've been using the event-driven active objects for over a decade now and have created the QP family of active object frameworks for embedded systems (see http://www.state-machine.com/). I would never go back to the polling "superloop" or the raw RTOS.
I prefer whichever architecture is best suited to the application at hand.
Both can co-exist in a multilevel queue architecture. One queue works on a poll basis running in the main loop. While another, most likely tasked with higher priority events, works by using interrupt based preemption.
See my answer to this SO question for a more detailed explanation and comparison of the different scheduling algorithms.

NEventStore 3.0 - Throughput / Performance

I have been experimenting with JOliver's Event Store 3.0 as a potential component in a project and have been trying to measure the throughput of events through the Event Store.
I started using a simple harness which essentially iterated through a for loop creating a new stream and committing a very simple event comprising of a GUID id and a string property to a MSSQL2K8 R2 DB. The dispatcher was essentially a no-op.
This approach managed to achieve ~3K operations/second running on an 8 way HP G6 DL380 with the DB on a separate 32 way G7 DL580. The test machines were not resource bound, blocking looks to be the limit in my case.
Has anyone got any experience of measuring the throughput of the Event Store and what sort of figures have been achieved? I was hoping to get at least 1 order of magnitude more throughput in order to make it a viable option.
I would agree that blocking IO is going to be the biggest bottleneck. One of the issues that I can see with the benchmark is that you're operating against a single stream. How many aggregate roots do you have in your domain with 3K+ events per second? The primary design of the EventStore is for multithreaded operations against multiple aggregates which reduces contention and locks for read-world applications.
Also, what serialization mechanism are you using? JSON.NET? I don't have a Protocol Buffers implementation (yet), but every benchmark shows that PB is significantly faster in terms of performance. It would be interesting to run a profiler against your application to see where the biggest bottlenecks are.
Another thing I noticed was that you're introducing a network hop into the equation which increases latency (and blocking time) against any single stream. If you were writing to a local SQL instance which uses solid state drives, I could see the numbers being much higher as compared to a remote SQL instance running magnetic drives and which have the data and log files on the same platter.
Lastly, did your benchmark application use System.Transactions or did it default to no transactions? (The EventStore is safe without use of System.Transactions or any kind of SQL transaction.)
Now, with all of that being said, I have no doubt that there are areas in the EventStore that could be dramatically optimized with a little bit of attention. As a matter of fact, I'm kicking around a few backward-compatible schema revisions for the 3.1 release to reduce the number writes performed within SQL Server (and RDBMS engines in general) during a single commit operation.
One of the biggest design questions I faced when starting on the 2.x rewrite that serves as the foundation for 3.x is the idea of async, non-blocking IO. We all know that node.js and other non-blocking web servers beat threaded web servers by an order of magnitude. However, the potential for complexity introduced on the caller is increased and is something that must be strongly considered because it is a fundamental shift in the way most programs and libraries operate. If and when we do move to an evented, non-blocking model, it would be more in a 4.x time frame.
Bottom line: publish your benchmarks so that we can see where the bottlenecks are.
Excellent question Matt (+1), and I see Mr Oliver himself replied as the answer (+1)!
I wanted to throw in a slightly different approach that I myself am playing with to help with the 3,000 commits-per-second bottleneck you are seeing.
The CQRS Pattern, that most people who use JOliver's EventStore seem to be attempting to follow, allows for a number of "scale out" sub-patterns. The first one people usually queue off is the Event commits themselves, which you are seeing a bottleneck in. "Queue off" meaning offloaded from the actual commits and inserting them into some write-optimized, non-blocking I/O process, or "queue".
My loose interpretation is:
Command broadcast -> Command Handlers -> Event broadcast -> Event Handlers -> Event Store
There are actually two scale-out points here in these patterns: the Command Handlers and Event Handlers. As noted above, most start with scaling out the Event Handler portions, or the Commits in your case to the EventStore library, because this is usually the biggest bottleneck due to the need to persist it somewhere (e.g. Microsoft SQL Server database).
I myself am using a few different providers to test for the best performance to "queue up" these commits. CouchDB and .NET's AppFabric Cache (which has a great GetAndLock() feature). [OT]I really like AppFabric's durable-cache features that lets you create redundant cache servers that backup your regions across multiple machines - therefore, your cache stays alive as long as there is at least 1 server up and running.[/OT]
So, imagine your Event Handlers do not write the commits to the EventStore directly. Instead, you have a handler insert them into a "queue" system, such as Windows Azure Queue, CouchDB, Memcache, AppFabric Cache, etc. The point is to pick a system with little to no blocks to queue up the events, but something that is durable with redundancy built-in (Memcache being my least favorite for redundancy options). You must have that redundancy, in the case that if a server drops, you still have the event queued up.
To finally commit from this "Queued Event", there are several options. I like Windows Azure's Queue pattern for this, because of the many "workers" you can have constantly looking for work in the queue. But it doesn't have to be Windows Azure - I've mimicked Azure's Queue pattern in local code using a "Queue" and "Worker Roles" running in background threads. It scales really nicely.
Say you have 10 workers constantly looking into this "queue" for any User Updated events (I usually write a single worker role per Event type, makes scaling out easier as you get to monitor the stats of each type). Two events get inserted into the queue, the first two workers instantly pick up a message each, and insert them (Commit them) directly into your EventStore at the same time - multithreading, as Jonathan mentioned in his answer. Your bottleneck with that pattern would be whatever database/eventstore backing you select. Say your EventStore is using MSSQL and the bottleneck is still 3,000 RPS. That is fine, because the system is built to 'catch up' when those RPS drops down to, say 50 RPS after a 20,000 burst. This is the natural pattern CQRS allows for: "Eventual Consistency."
I said there was other scale-out patterns native to the CQRS patterns. Another, as I mentioned above, is the Command Handlers (or Command Events). This is one I have done as well, especially if you have a very rich domain domain as one of my clients does (dozens of processor-intensive validation checks on every Command). In that case, I'll actually queue off the Commands themselves, to be processed in the background by some worker roles. This gives you a nice scale out pattern as well, because now your entire backend, including the EvetnStore commits of the Events, can be threaded.
Obviously, the downside to that is that you loose some real-time validation checks. I solve that by usually segmenting validation into two categories when structuring my domain. One is Ajax or real-time "lightweight" validations in the domain (kind of like a Pre-Command check). And the others are hard-failure validation checks, that are only done in the domain but not available for realtime checking. You would then need to code-for-failure in Domain model. Meaning, always code for a way out if something fails, usually in the form of a notification email back to the user that something went wrong. Because the user is no longer blocked by this queued Command, they need to be notified if the command fails.
And your validation checks that need to go to the 'backend' is going to your Query or "read-only" database, riiiight? Don't go into the EventStore to check for, say, a unique Email address. You'd be doing your validation against your highly-available read-only datastore for the Queries of your front end. Heck, have a single CouchDB document be dedicated to only a list of all email addresses in the system as your Query portion of CQRS.
CQRS is just suggestions... If you really need realtime checking of a heavy validation method, then you can build a Query (read-only) store around that, and speed up the validation - on the PreCommand stage, before it gets inserted into the queue. Lots of flexibility. And I would even argue that validating things like empty Usernames and empty Emails is not even a domain concern, but a UI responsiblity (off-loading the need to do real-time validation in the domain). I've architected a few projects where I had very rich UI validation on my MVC/MVVM ViewModels. Of course my Domain had very strict validation, to ensure it is valid before processing. But moving the mediocre input-validation checks, or what I call "light-weight" validation, up into the ViewModel layers gives that near-instant feedback to the end-user, without reaching into my domain. (There are tricks to keep that in sync with your domain as well).
So in summary, possibly look into queuing off those Events before they are committed. This fits nicely with EventStore's multi-threading features as Jonathan mentions in his answer.
We built a small boilerplate for massive concurrency using Erlang/Elixir, https://github.com/work-capital/elixir-cqrs-eventsourcing using Eventstore. We still have to optimize db connections, pooling, etc... but the idea of having one process per aggregate with multiple db connections is aligned with your needs.

Howto design a clock driven multi-agent simulation

I want to create a multi-agent simulation model for a real word manufacturing process to evaluate some dispatching rules. The simulation needs to produce event logs to evaluate time effect of the dispatching rules compared to the real manufacturing event logs.
How can I incorporate the 'current simulation time' into this kind of multi-agent, message passing intensive simulation?
Background:
The classical discrete event simulation (which handles the time-advancement nicely) cannot be applied here, as the agents in the system represent relatively complex behavior and routing requirements plus the dispatching rules require them to communicate frequently. This and other process complexities rule out a centralized scheduling approach as well.
In the manufacturing science, there are thousands of papers using a multi-agent simulation for their solution of some manufacturing related problem. However, I haven't found a paper yet which describes the inner workings or implementation details of these simulations in the required detail.
Unfortunately, using the shortest process time for discrete time stepping in a system might be infeasible as the range of process time is between 0.1s and 24 hours. There is a possibility my simulation will be used for what-if evaluations in a project later on so the simulation needs to run as fast as possible - no option for overnight simulation runs.
The problem size is about 500 resources and 1000 - 10000 product agents, most of them is finished and not participating in any further communication or resource occupation.
Consequently, in result to the communication new events can trigger an agent to do something before its original 'next time' event would arrive. For example, an agent is currently blocked on a resource lasting an hour. However, another higher priority agent needs that resource right away and asks the fist agent to release that resource.
In some sense, I need a way to create a hybrid of classical message passing agent-simulation and the discrete event simulation.
I considered a mediator agent that is involved in every message - a message router and time enforcer which sends around the messages and the timer tick events. Also the mediator agent keeps a list of next event times for various agents. However, I feel there should be a better way to solve my problem as the concept puts an enormous pressure at the mediator agent.
Update
It took a while, but it seems I managed to create a mini-framework and combined the DES and Agent concept into one. I'm sure its nothing new but at least unique: http://code.google.com/p/tidra-framework/ if you are interested.
This problem sounds as if it should be tackled by using parallel discrete-event simulation - the mediator agent you are planning to implement ('is involved in every message', 'sends around messages and timer tick events') seems to be doing the job of a discrete-event simulator right now. You can make this scale to the desired problem size by using more of such simulators in parallel and then use a synchronization algorithm to maintain causality etc. (see, e.g., this book for details). Of course, this requires some considerable effort, and you might be better off by really trying out the sequential algorithms first.
A nice way of augmenting the classical DES-view of logical processes (= agents) that communicate with each other via events could be to blend in some ideas from other formalisms used to describe discrete-event systems, such as DEVS. In DEVS, each entity can specify the duration it will be in a certain state (e.g., the agent blocking a resource), and will only be interrupted by incoming messages (and then change its state accordingly, e.g. the agent freeing the resource).
BTW In which sense do you think that the agents are too complex to be handled with discrete-event simulation? If you regard each agent as a logical process, it doesn't really matter how complex it is from a simulation point of view - or am I getting something wrong here?