ZMQ socket - disconnect when all request are served - sockets

I am trying to implement ZMQ REQ/REP model in Java
I have a Server-role, running on post 5564, which acts as Replier
ZMQ.Socket repSock = context.socket(ZMQ.REP);
I have a Client-role, running on post 5563
ZMQ.Socket syncclient = context.socket(ZMQ.REQ);
I have a proxy-server in middle, which passes request and response
ZMQ.proxy(reqSocket, repSocket, null);
Good thing about having a proxy is I can add multiple Servers
repSocket.connect("tcp://" + addr.getHostAddress() + ":" + port);
Which is working fine .
Now, when I remove a Server node from Proxy
repSocket.disconnect("tcp://" + addr.getHostAddress() + ":" + port);
Client gets stuck, as an request has being made and the REQ-socket waits for a response.
So the process stucks at syncclient.recvStr()
for (int request_nbr = 0; request_nbr < (request_nbr + 1); request_nbr++) {
syncclient.send(str.getBytes(),0);
System.out.println("Send Dataaaa....... " );
String data = syncclient.recvStr(Charset.defaultCharset());
System.out.println(" here.. " +data);
request_nbr++;
}
I searched and couldn't find a way to track the REQ-socket
I need any one of 2 things:
A way to keep track on a Socket-instance, which I am about to disconnect, wait till all messages are processed, so that syncclient.recvStr() will not be blocked
A way to reset the syncclient-socket, so that I can keep getting REQ/REP respond without an interruption

In real-world scenarios, rather avoid using a blocking-mode of the ZeroMQ .send() / .recv() methods and better use .poll().
While this may require a bit more SLOCs of code, the results are leaving you in a control, whereas a blocking SLOC takes all the control from your code and you cannot do much about that until ( if at all ) a next message gets delivered. That's a very wrong design practice and except the most simplistic schoolbook examples, that are actually sort of anti-patterns for the real-world.
So, do not expect Question 2 to become somehow magically solved, this is not a part of ZeroMQ API ( for many rather aloud evangelisated reasons ). Better decide between .setsockopt( ZMQ.REQ_RELAXED, 1 ), if API version and context of use permits, or do not use the trivial REQ/REP pattern at all ( due it's known risk of falling into an unsalvageable mutual dead-lock ( ref. other my posts on this very subject, where this phenomenon was both illustrated and countless times explained ) ).
In a similar manner, asking Question 1 seems reasonable in cases you have never read the ZeroMQ specifications and/or documentation and ZeroMQ "Best Practices". Having spent some time in this, your options would be crystal-clear. There are none such tools for doing this built-in. One can add some add-on, if in a need to add any similar non-core logic for her/his own need. The only setting that indirectly influences the behaviour on aSocket.close() is available in .setsockopt( ZMQ.LINGER, 0 ), which may help to prevent a system from a transition into an effective hangup-state, once aSocket waits infinitely for a state that will never happen in cases, when a message-queue is still non-empty ( messages still waiting for getting delivered ).
Going into Distributed-Systems design is like entering a new world. No sequences are guarranteed ( non-serial code execution paths happen ). No means of any local control of remote entities, their states, their failures, their presence at all, their actual ZeroMQ API version.
Indeed a challenging world to enter into.
N.b.:
You might already know, that one can .connect() aSocket-instance ( better an Access Point to aSocket-instance ) to more than one remote ends without using the proxy. With some additional .setsockopt() tuning ZMQ.IMMEDIATE to a value of 1, will help better manage the round-robin distribution policy, irrespective of the transport-classes used for the actual message delivery ( { tcp:// | ipc:// | vmci:// | pgm:// | epgm:// | inproc:// } ). All that at your fingertips.

Related

gRPC repeated field vs stream

Hi im currently looking into grpc and im curious about the use usage of a repeated field vs a stream.
For example let's say i want to implement a reservation service for movie seats. The issue im facing is, that i want to inform the service for which movie i want to reserve the seats for.
I can think of 2 solutions, first:
I send the id of the movie with every seat i want to reserve, or with oneof at the beginning of the stream
like this:
rpc ReserveSeatsForShowing(stream SeatReservationRequest) returns(Reservation);
message SeatReservationRequest{
oneof reservationOneOf{
int32 showingId = 1;
SeatReservation seatReservation = 2;
}
}
Or using a repeated field like this
rpc ReserveSeatsForShowing(SeatReservationRequest) returns(Reservation);
message SeatReservationRequest{
int32 showingId = 1;
repeated SeatReservation seatReservation = 2;
}
Since i haven't really worked with grpc before im not quite sure which option to choose or if other options are available.
Looking forward for your recommendations
For the seat reservation, I think it would make sense to use repeated field. Just like real world scenario, the request is like "I want seat A, B, C for movie X", which is more like repeated manner than streaming. thus, the payload is very small. Also, this way should use less server resource since it is a batch process.

Moving from file-based tracing session to real time session

I need to log trace events during boot so I configure an AutoLogger with all the required providers. But when my service/process starts I want to switch to real-time mode so that the file doesn't explode.
I'm using TraceEvent and I can't figure out how to do this move correctly and atomically.
The first thing I tried:
const int timeToWait = 5000;
using (var tes = new TraceEventSession("TEMPSESSIONNAME", #"c:\temp\TEMPSESSIONNAME.etl") { StopOnDispose = false })
{
tes.EnableProvider(ProviderExtensions.ProviderName<MicrosoftWindowsKernelProcess>());
Thread.Sleep(timeToWait);
}
using (var tes = new TraceEventSession("TEMPSESSIONNAME", TraceEventSessionOptions.Attach))
{
Thread.Sleep(timeToWait);
tes.SetFileName(null);
Thread.Sleep(timeToWait);
Console.WriteLine("Done");
}
Here I wanted to make that I can transfer the session to real-time mode. But instead, the file I got contained events from a 15s period instead of just 10s.
The same happens if I use new TraceEventSession("TEMPSESSIONNAME", #"c:\temp\TEMPSESSIONNAME.etl", TraceEventSessionOptions.Create) instead.
It seems that the following will cause the file to stop being written to:
using (var tes = new TraceEventSession("TEMPSESSIONNAME"))
{
tes.EnableProvider(ProviderExtensions.ProviderName<MicrosoftWindowsKernelProcess>());
Thread.Sleep(timeToWait);
}
But here I must reenable all the providers and according to the documentation "if the session already existed it is closed and reopened (thus orphans are cleaned up on next use)". I don't understand the last part about orphans. Obviously some events might occur in the time between closing, opening and subscribing on the events. Does this mean I will lose these events or will I get the later?
I also found the following in the documentation of the library:
In real time mode, events are buffered and there is at least a second or so delay (typically 3 sec) between the firing of the event and the reception by the session (to allow events to be delivered in efficient clumps of many events)
Does this make the above code alright (well, unless the improbable happens and for some reason my thread is delayed for more than a second between creating the real-time session and starting processing the events)?
I could close the session and create a new different one but then I think I'd miss some events. Or I could open a new session and then close the file-based one but then I might get duplicate events.
I couldn't find online any examples of moving from a file-based trace to a real-time trace.
I managed to contact the author of TraceEvent and this is the answer I got:
Re the exception of the 'auto-closing and restarting' feature, it is really questions about the OS (TraceEvent simply calls the underlying OS API). Just FYI, the deal about orphans is that it is EASY for your process to exit but leave a session going. This MAY be what you want, but often it is not, and so to make the common case 'just work' if you do Create (which is the default), it will close a session if it already existed (since you asked for a new one).
Experimentation of course is the touchstone of 'truth' but I would frankly expecting unusual combinations to just work is generally NOT true.
My recommendation is to keep it simple. You need to open a new session and close the original one. Yes, you will end up with duplicates, but you CAN filter them out (after all they are IDENTICAL timestamps).
The other possibility is use SetFileName in its intended way (from one file to another). This certainly solves your problem of file size growth, and often is a good way to deal with other scenarios (after all you can start up you processing and start deleting files even as new files are being generated).

zmq pipeline pattern can not work in multiprocess?

I think I do not use zmq in right pattern, what I want to do is:
send message by zmq in multiprocess
accept message in multiple client, but one message should be accept only once
according the second requirements I thought a pipeline should be ok ( PUSH/PULL ), but this mode can not work in multiprocess:
def foo(i):
return i
def producer():
context = zmq.Context()
zmq_socket = context.socket(zmq.PUSH)
zmq_socket.bind("tcp://127.0.0.1:5559")
with concurrent.futures.ProcessPoolExecutor(max_workers=10) as executor:
futs = [executor.submit(foo, i) for i in range(10)]
for fut in concurrent.futures.as_completed(futs):
work_message = { 'num' : fut.result() }
zmq_socket.send("test")
producer()
so, maybe I should use PUB/SUB pattern, but this can not meet the second requirement.
in fact what I want is something like this:
PUSH|-----| | PULL
PUSH|-----| | PULL
PUSH|-----|----- DEVICE -----| PULL
PUSH|-----| | PULL
PUSH|-----| | PULL
Not exactly, roger, ZeroMQ PUSH/PULL pattern works
it just does something else than you would like it to do.
ZeroMQ is a wonderfull toolbox kit of pre-baked behaviours with immense potential to assemble more complex behaviour models, where needed.
Start with understanding the primitive actors, than design your functional requirements.
PUSH/PULL Formal Communication Scenario gets two players:
1st: PUSH picks the phone and calls. Whom? The connected PULL-side. PUSH leaves a voice-mail message to be listened by PULL, once it decides to pick it from the voicemail.
2nd: PULL side ( at some point in time, not necessarily right upon 1.) hears the bell ringing and picks the phone.
3rd: PULL, if instructed to, processes the received message from PUSH
Nothing more, per se.
Assemblies of ZeroMQ primitive components:
yes, there is the way to step farther towards your goal:
Just complement your functional requirements with appropriate ZeroMQ primitives inter-connected so as to meet your additional requirements.
The most trivial-one, from Chapter 1 of the "Code Connected, Volume I." - a round-robin based forwarding of messages towards a pool-of-"worker"-processes.
More specialised assemblies may create additional functionalities for smart distributed complex behavioral models:
Typically a "control + signalling" SIG_PLANE is being implemented in parallel to the primary, functional processing.
Self-diagnostics service, hearbeat/self-healing signalling for non-stop processing scenarios, remote non-blocking logging service, front-end MVC/GUI-plane being the most typical layered design goals.
Your task?
If not interested in more features, just .connect() your Futures calculating PUSH-ers towards a PULL-ing endpoint.
This middle-step ( a performance / failure analyses singularity ) can collect on it's PULL-ing entry-side and right go and PUSH on it's other endpoint, here - on a round-robin basis, towards the pool of actual workers ( who are crowd-waiting for a task with their .connect()-ed PULL-er entry endpoints for an incoming task from the "sink"-collector entity ) - a task, from collected FIFO ( beware the buffering capacity and performance overheads you pay for this ), just goes to the "next" one, down the line.
________________________________ _____________________________
process(0)|....[PUSH].connect(A)|- ________________________________ -|.connect(B)[PULL] process(-1)
process(1)|....[PUSH].connect(A)|-- | | --|.connect(B)[PULL] process(-2)
process(2)|....[PUSH].connect(A)|--- ---.bind(A)[PULL]:NOP:[PUSH].bind(B)|--- ---|.connect(B)[PULL] process(-3)
process(3)|....[PUSH].connect(A)|-- |________________________________| --|.connect(B)[PULL] process(-4)
... ...
process(n)|....[PUSH].connect(A)|- -|.connect(B)[PULL] process(-m)
It's that simple.
Enjoy the cool ZeroMQ toolkit.

Forwarding AnyEvent::Log messages to a callback if certain requirements are met

I am working on a project that uses AnyEvent Log in the main program as well as several dependent modules/packages. I currently have each module writing to it's own context, and all contexts are added to the main programs context as slaves. This project is part of a much larger project, and in addition to writing out a local log file, there are certain messages that I would like to send to a remote program which will then be responsible for presenting the messages to users.
The problem is that in order to send to the remote program, I have to have a piece of information that is only available from the main program, so it's not feesible to just implement a method at the package level to send messages. The piece of information I need is more or less a transaction id, and the log messages are interesting events from a particular transaction.
The main program has 2 contexts ( main , secondary ). The messages I am interested in will either come from the secondary ctx OR one of the package/module contexts. I am interested in only sending info - crit level messages to users, but ONLY WHEN the txID exists in the main program. I ALWAYS want messages to be written to my local log file regardless of whether or not a deployment is running. I would like this to be something that I setup in the main program rather than in a module because the modules are tasked to do certain thing and shouldn't even be aware of the fact that there is an ID associated with the task at hand.
Here is a quick breakdown of the log configuration specific code in the main program.
# Immediately after Proc::Daemon::Init
my $logger = AnyEvent::Log::ctx "desman";
# configure is done before daemonization to allow for --nodaemon
sub configure {
my ( $level, $file ) = #_;
$AnyEvent::Log::FILTER->level($level);
$AnyEvent::Log::LOG->log_to_file($file);
}
sub log_event {
... logic to send messages as tx event ...
}
sub worker_init {
threads->create(sub {
$logger->attach( my $worklog = AnyEvent::Log::ctx "worker" );
... more stuff for worker specifics ...
});
}
Ideally, I would be able to use one or both of log_cb and fmt_cb to handle the formatting and sending of messages to the remote program using the log_event sub. I have tried a few different things, and so far I'm stuck.
# doesn't seem to do anything
$logger->fmt_cb( sub { ... } );
$logger->log_cb( sub { ... } );
# broke everything
$AnyEvent::Log::COLLECT->attach( my $evtlog = new AnyEvent::Log::Ctx
fmt_cb => \&event_formatter,
log_cb => \&log_event
);
$evtlog->levels('crit','warning','notice','info');
I've been searching around for more examples than what's in the docs, but haven't found much yet. Not much of a surprise there since AE::log is pretty much awesome as it is, but anything to help will be greatly appreciated.

Debug missing messages in akka

I have the following architecture at the moment:
Load(Play app with basic interface for load tests) -> Gateway(Spray application with REST interface for incoming messages) -> Processor(akka app that works with MongoDB) -> MongoDB
Everything works fine as long as number of messages I am pushing through is low. However when I try to push 10000 events, that will eventully end up at MongoDB as documents, it stops at random places, for example on 742 message or 982 message and does nothing after.
What would be the best way to debug such situations? On the load side I am just pushing hard into the REST service:
for (i ← 0 until users) workerRouter ! Load(server, i)
and then in the workerRouter
WS.url(server + "/track/message").post(Json.toJson(newUser)).map { response =>
println(response.body)
true
}
On the spray side:
pathPrefix("track") {
path("message") {
post {
entity(as[TrackObj]) { msg =>
processors ! msg
complete("")
}
}
}
}
On the processor side it's just basically an insert into a collection. Any suggestions on where to start from?
Update:
I tried to move the logic of creating messages to the Gatewat, did a cycle of 1 to 10000 and it works just fine. However if spray and play are involed in a pipeline it interrupts and random places. Any suggestions on how to debug in this case?
In a distributed and parallel environment it is next to impossible to create a system that work reliably. Whatever debugging method you use it will only allow you to find a few bugs that will happen during the debug session.
Once our team spent 3 months(!) while tuning an application for a robust 24/7 working. And still there were bugs. Then we applied a method of Model checking (Spin). Within a couple of weeks we implemented a model that allowed us to get a robust application. However, model checking requires a bit different way of thinking and it can be difficult to start.
I moved the load test app to Spray framework and now it works like a charm. So I suppose the problem was somewhere in the way that I used WS API in Play framework:
WS.url(server + "/track/message").post(Json.toJson(newUser)).map { response =>
println(response.body)
true
}
The problem is resovled but not solved, won't work on a solution based on Play.