Why a code, trying to read messages from two ZeroMQ sockets, fails? - python-3.7

I have issues with reading messages from two zmq servers (one set to REQ|REP and one PUB|SUB)
The two servers are running on another computer. When I read just the REQ|REP connection everything works perfectly but as soon as I also try to read the PUB|SUB connection the program freezes (I guess it waits forever for a message)
from PyQt5 import QtCore, QtGui, QtWidgets
import zmq
import ui_mainwindow
class MainWindow(QtWidgets.QMainWindow, ui_mainwindow.Ui_MainWindow):
def __init__(self, parent = None):
super(MainWindow, self).__init__(parent)
self.context = zmq.Context()
try:
self.stateSocket = self.context.socket(zmq.REQ)
self.stateSocket.connect("tcp://134.105.89.197:5555")
except zmq.ZMQError as e:
print('States setup failed: ', e)
try:
self.context = zmq.Context()
self.anglesSocket = self.context.socket(zmq.SUB)
self.anglesSocket.connect("tcp://134.105.89.197:5556")
except zmq.ZMQError as e:
print('angles setup failed: ', e)
self.timer = QtCore.QTimer()
self.timer.timeout.connect(self.publishState)
self.timer.setInterval(500)
self.timer.start()
self.timer2 = QtCore.QTimer()
self.timer2.timeout.connect(self.publishAngles)
self.timer2.setInterval(500)
self.timer2.start()
# +more variables unrelated to problem
def publishState(self):
request= "a string"
try:
self.stateSocket.send_string(request)
self.reset = 0
message = self.stateSocket.recv()#flags=zmq.NOBLOCK)
values = [float(i) for i in message.decode("UTF-8").split(',')]
print("Status: ", message)
except zmq.ZMQError as e:
print('State communication: ', e)
values = [0] * 100
def publishAngles(self):
try:
message = anglesSocket.recv_string() # flags=zmq.NOBLOCK)
#values = [float(i) for i in message.decode("UTF-8").split(',')]
print("Angles: ", message)
except zmq.ZMQError as e:
print('Angles communication: ', e)
values = [0] * 100
edit: added the full relevant code.
What I observe is the deadlock does not come from the REQ|REP, This part alone works perfectly fine. But it seems that the PUB|SUBpart does not work in the timer function. When I make a minimal example with a while loop inside publishAngels() it works.
So is there an elegant way to use a PUB|SUB socket in a Qt Timer connected function?

In case one has never worked with ZeroMQ,one may here enjoy to first look at "ZeroMQ Principles in less than Five Seconds"before diving into further details
Q: "Is there any stupid mistake I am overlooking?"
Yes, there are a few and all easy to refine.
1) The, so far incomplete, visible ZeroMQ part exhibits principal uncertainty of what type of the subscription and other safe-guarding settings were, if ever, when and where, applied to the SUB-socket-Archetype AccessPoint. The same applied to the REQ-socket-Archetype AccessPoint, except the subscription management related kind(s) of setting(s) for obvious reasons.
2) The code ignores the documented principles of the known rules for the distributed-Finite-State-Automaton's (dFSA) logic, hardwired into the REQ/REP Scalable Formal Communication Archetype. Avoid this using a correct logic, not violating the, here mandatory, dFSA-stepper of REQ-REP-REQ-REP-REQ-REP, plus make either of the REQ and SUB handling become mutually independent and you have it. In other words, a naive, dFSA-rules ignoring use of the zmq.NOBLOCK flag does not solve the deadlock either.
If you feel to be serious into becoming a distributed-computing professional, a must read is the fabulous Pieter Hintjen's book "Code Connected, Volume 1"

Related

How to connect a PCIe device to a chipyard design

I'm trying to connect a PCIe device to a chipyard design using the existing edge overlay for the VCU118 (slightly modified because I'm using a different board but this should not matter).
#michael-etzkorn already posted an issue on Github about this in which they explain how they only got this working using two different clocks.
I'd appreciate it if I could get some pointers as to how this is done (the issue leaves out some implementation details of the configs) and also if it would be possible to do this without adding an extra clock (#michael-etzkorn points out that this could cause some issues).
Based on the work in your gist, it looks like you've answered most of your original question, but since I already typed this out I'll include it as an answer here.
To hook up any port, you'll essentially need to do three things.
Create an IOBinder
Create a HarnessBinder
Hook up the diplomatic nodes in the TestHarness
The IOBinder takes the bundles from within the system and punches them through to chiptop. The HarnessBinder connects the IO in ChipTop to the harness. Diplomacy negotiates the parameters for the diplomatic nodes. This step may be optional, but many modules, like the XDMA wrapper in the PCIe overlay, are diplomatic so this is usually a required step.
IOBinder
The IOBinder can take your CanHAveMasterTLMMIOPort and punch out pins for it
class WithXDMASlaveIOPassthrough extends OverrideIOBinder({
(system: CanHaveMasterTLMMIOPort) => {
val io_xdma_slave_pins_temp = IO(DataMirror.internal.chiselTypeClone[HeterogeneousBag[TLBundle]](system.mmio_tl)).suggestName("tl_slave_mmio")
io_xdma_slave_pins_temp <> system.mmio_tl
(Seq(io_xdma_slave_pins_temp), Nil)
}
})
This can look relatively the same for each port. However, I've found experimentally, I had to flip the temp pin connection <> for CanHaveSlaveTLPort.
HarnessBinder
The harness binder retrieves that port and connects it to the outer bundle. The pcieClient bundle is retrieved from the harness and connected to ports.head which is returned from IOBinders. This is essentially a fancy functional programming way to clone the IO and connect it to the bundle in ChipTop.
class WithPCIeClient extends OverrideHarnessBinder({
(system: CanHaveMasterTLMMIOPort, th: BaseModule with HasHarnessSignalReferences, ports: Seq[HeterogeneousBag[TLBundle]]) => {
require(ports.size == 1)
th match { case vcu118th: XDMAVCU118FPGATestHarnessImp => {
val bundles = vcu118th.xdmavcu118Outer.pcieClient.out.map(_._1)
val pcieClientBundle = Wire(new HeterogeneousBag(bundles.map(_.cloneType)))
// pcieClientBundle <> DontCare
bundles.zip(pcieClientBundle).foreach{case (bundle, io) => bundle <> io}
pcieClientBundle <> ports.head
} }
}
})
Also, I should note: this isn't the ideal way to connect to the harness as its possible BundleMap user fields are generated and they won't be driven unless you have that pcieClientBundle <> DontCare there. I found I had to expose AXI ports instead and modified the overlay to output axi nodes to get Diplomacy to work between the testharness and ChipTop regions.
A write up of that problem along with some more info at:
What are these `a.bits.user.amba_prot` signals and why are they only uninitialized conditionally in my HarnessBinder?
All this code is at that question.
TestHarness Diplomatic Connections
val overlayOutput = dp(PCIeOverlayKey).last.place(PCIeDesignInput(wrangler=pcieWrangler.node, corePLL=harnessSysPLL)).overlayOutput
val (pcieNode: TLNode, pcieIntNode: IntOutwardNode) = (overlayOutput.pcieNode, overlayOutput.intNode)
val (pcieSlaveTLNode: TLIdentityNode, pcieMasterTLNode: TLAsyncSinkNode) = (pcieNode.inward, pcieNode.outward)
val inParamsMMIOPeriph = topDesign match { case td: ChipTop =>
td.lazySystem match { case lsys: CanHaveMasterTLMMIOPort =>
lsys.mmioTLNode.edges.in(0)
}
}
val inParamsControl = topDesign match {case td: ChipTop =>
td.lazySystem match { case lsys: CanHaveMasterTLCtrlPort =>
lsys.ctrlTLNode.edges.in(0)
}
}
val pcieClient = TLClientNode(Seq(inParamsMMIOPeriph.master))
val pcieCtrlClient = TLClientNode(Seq(inParamsControl.master))
val connectorNode = TLIdentityNode()
// pcieSlaveTLNode should be driven for both the control slave and the axi bridge slave
connectorNode := pcieClient
connectorNode := pcieCtrlClient
pcieSlaveTLNode :=* connectorNode
Clock Groups... (unsolved)
pcieWrangler was my attempt at hooking up the axi_aclk. It's not correct. This just creates a second clock with the same 250 MHz frequency as the axi_aclk and so it mostly works but using a second clock isn't correct.
val sysClk2Node = dp(ClockInputOverlayKey).last.place(ClockInputDesignInput()).overlayOutput.node
val pciePLL = dp(PLLFactoryKey)()
pciePLL := sysClk2Node
val pcieClock = ClockSinkNode(freqMHz = 250) // Is this the reference clock?
val pcieWrangler = LazyModule(new ResetWrangler)
val pcieGroup = ClockGroup()
pcieClock := pcieWrangler.node := pcieGroup := pciePLL
Perhaps you can experiment and find out how we can hook up the axi_aclk as the driver for the axi async logic :)
I'd be happy to open a question about that since I don't know the answer yet myself.
To answer some follow up questions
How do I know how big of an address range I should reserve for PCIe (master and control)?
For control, you can match the size of in the overlay 0x4000000. For the master port, you'd ideally just hook a DMA engine up to that that has access to the full address range on the host. Otherwise, you have to do AXI2PCIE BAR translation logic to access different regions of host memory which isn't fun.
How do I connect the interrupt node to the system?
I believe this is only if you're connecting a PCIe Root Complex. If you're connecting a device, you shouldn't need to worry about the interrupt node. If you do wish to add it, you'll have to add the NExtInterrupts(3)++ to your config. I only got so far as that and my uncommented code in the TestHarness before I realized I didn't need it. If you feel you do, we can open a new question and try answering this more fully.

How to capture command line input from Vert.x

Env: Mac OS 12.1, JDK 17, Vert.x 4.2.4
Question: how to capture command line input from a verticle? Tried so far following in the public void start(Promise<Void> startPromise) throws Exception method:
getVertx().createSharedWorkerExecutor("sys-in").executeBlocking(promise -> {
try (final BufferedReader br = new BufferedReader(new InputStreamReader(System.in))) {
String line;
int count = 0;
do {
System.out.print("message to MC: ");
line = br.readLine();
count++;
//doSth(line); // e.g. send line over multicast
} while (count < 3);
} catch (Throwable t) {
// log.info("<start> ", t);
} finally {
// bye(); // send a final message and close vertx
promise.complete();
}
});
This will start, get 3 nulls from br, and exit. Also tried a separated ExecutorService, in vain. Couldn't find any help in Vert.x doc either. Any hints are appreciated:
aware of the warnings of Vert.x when doing blocking stuff
Vert.x might not meant to be used this way, but would be cool if it (reading from command line) can be done with the same toolkit
I understand what you are trying to accomplish, but the problem is that that goes against fundamentals of verticles concept. Waiting for user input is potentially infinitely blocking operation i.e. there is no guarantee user will ever input the values. In that case, you are left with the verticle that is hung forever, spending resources and stuck in one spot. Multiply this if you are using worker verticles and you might have serious problems with the app. This issue is also emphasized here: https://vertx.io/docs/vertx-core/java/#blocking_code (under Warning).
In the link provided you can also find a suggested solution with a separate thread solution. Non-vertx thread won't mind being blocked and when the user input is provided can inform the vertx part of the application via the event bus that the user input dependent code can now be executed.
This might not be the solution you had in mind since it's not pure vertx, but have in mind that vert.x is just another tool, and that tool is not a good fit for what you are trying to accomplish here. However, it can be paired well with plain Java and it won't mind.

ZMQ socket - disconnect when all request are served

I am trying to implement ZMQ REQ/REP model in Java
I have a Server-role, running on post 5564, which acts as Replier
ZMQ.Socket repSock = context.socket(ZMQ.REP);
I have a Client-role, running on post 5563
ZMQ.Socket syncclient = context.socket(ZMQ.REQ);
I have a proxy-server in middle, which passes request and response
ZMQ.proxy(reqSocket, repSocket, null);
Good thing about having a proxy is I can add multiple Servers
repSocket.connect("tcp://" + addr.getHostAddress() + ":" + port);
Which is working fine .
Now, when I remove a Server node from Proxy
repSocket.disconnect("tcp://" + addr.getHostAddress() + ":" + port);
Client gets stuck, as an request has being made and the REQ-socket waits for a response.
So the process stucks at syncclient.recvStr()
for (int request_nbr = 0; request_nbr < (request_nbr + 1); request_nbr++) {
syncclient.send(str.getBytes(),0);
System.out.println("Send Dataaaa....... " );
String data = syncclient.recvStr(Charset.defaultCharset());
System.out.println(" here.. " +data);
request_nbr++;
}
I searched and couldn't find a way to track the REQ-socket
I need any one of 2 things:
A way to keep track on a Socket-instance, which I am about to disconnect, wait till all messages are processed, so that syncclient.recvStr() will not be blocked
A way to reset the syncclient-socket, so that I can keep getting REQ/REP respond without an interruption
In real-world scenarios, rather avoid using a blocking-mode of the ZeroMQ .send() / .recv() methods and better use .poll().
While this may require a bit more SLOCs of code, the results are leaving you in a control, whereas a blocking SLOC takes all the control from your code and you cannot do much about that until ( if at all ) a next message gets delivered. That's a very wrong design practice and except the most simplistic schoolbook examples, that are actually sort of anti-patterns for the real-world.
So, do not expect Question 2 to become somehow magically solved, this is not a part of ZeroMQ API ( for many rather aloud evangelisated reasons ). Better decide between .setsockopt( ZMQ.REQ_RELAXED, 1 ), if API version and context of use permits, or do not use the trivial REQ/REP pattern at all ( due it's known risk of falling into an unsalvageable mutual dead-lock ( ref. other my posts on this very subject, where this phenomenon was both illustrated and countless times explained ) ).
In a similar manner, asking Question 1 seems reasonable in cases you have never read the ZeroMQ specifications and/or documentation and ZeroMQ "Best Practices". Having spent some time in this, your options would be crystal-clear. There are none such tools for doing this built-in. One can add some add-on, if in a need to add any similar non-core logic for her/his own need. The only setting that indirectly influences the behaviour on aSocket.close() is available in .setsockopt( ZMQ.LINGER, 0 ), which may help to prevent a system from a transition into an effective hangup-state, once aSocket waits infinitely for a state that will never happen in cases, when a message-queue is still non-empty ( messages still waiting for getting delivered ).
Going into Distributed-Systems design is like entering a new world. No sequences are guarranteed ( non-serial code execution paths happen ). No means of any local control of remote entities, their states, their failures, their presence at all, their actual ZeroMQ API version.
Indeed a challenging world to enter into.
N.b.:
You might already know, that one can .connect() aSocket-instance ( better an Access Point to aSocket-instance ) to more than one remote ends without using the proxy. With some additional .setsockopt() tuning ZMQ.IMMEDIATE to a value of 1, will help better manage the round-robin distribution policy, irrespective of the transport-classes used for the actual message delivery ( { tcp:// | ipc:// | vmci:// | pgm:// | epgm:// | inproc:// } ). All that at your fingertips.

General pattern for failing over from one database to another using Entity Framework?

We have an enterprise DB that is replicated through many sites throughout the world. We would like our app to attempt to connect to one of the local sites, and if that site is down we want it to fall back to the enterprise DB. We'd like this behavior on each of our DB operations.
We are using Entity Framework, C#, and SQL Server.
At first I hoped I could just specify a "Failover Partner" in the connection string, but that only works in a mirrored DB environment, which this is not. I also looked into writing a custom IDbExecutionStrategy. But these strategies only allow you to specify the pattern for retrying a failed DB operation. It does not allow you to change the operation in any way like directing it to a new connection.
So, do you know of any good pattern for dealing with this type of operation, other than duplicating retry logic around each of our many DB operations?
Update on 2014-05-14:
I'll elaborate in response to some of the suggestions already made.
I have many places where the code looks like this:
try
{
using(var db = new MyDBContext(ConnectionString))
{
// Database operations here.
// var myList = db.MyTable.Select(...), etc.
}
}
catch(Exception ex)
{
// Log exception here, perhaps rethrow.
}
It was suggested that I have a routine that first checks each of the connections strings and returns the first one that successfully connects. This is reasonable as far as it goes. But some of the errors I'm seeing are timeouts on the operations, where the connection works but the DB has issues that keep it from completing the operation.
What I'm looking for is a pattern I can use to encapsulate the unit of work and say, "Try this on the first database. If it fails for any reason, rollback and try it on the second DB. If that fails, try it on the third, etc. until the operation succeeds or you have no more DBs." I'm pretty sure I can roll my own (and I'll post the result if I do), but I was hoping there might be a known way to approach this.
How about using some Dependency Injection system like autofac and registering there a factory for new context objects - it will execute logic that will try to connect first to local and in case of failure it will connect to enterprise db. Then it will return ready DbContext object. This factory will be provided to all objects that require it with Dependency Injection system - they will use it to create contexts and dispose of them when they are not needed any more.
" We would like our app to attempt to connect to one of the local sites, and if that site is down we want it to fall back to the enterprise DB. We'd like this behavior on each of our DB operations."
If your app is strictly read-only on the DB and data consistency is not absolutely vital to your app/users, then it's just a matter of trying to CONNECT until an operational site has been found. As M.Ali suggested in his remark.
Otherwise, I suggest you stop thinking along these lines immediately because you're just running 90 mph down a dead end street. As Viktor Zychla suggested in his remark.
Here is what I ended up implementing, in broad brush-strokes:
Define delegates called UnitOfWorkMethod that will execute a single Unit of Work on the Database, in a single transaction. It takes a connection string and one also returns a value:
delegate T UnitOfWorkMethod<out T>(string connectionString);
delegate void UnitOfWorkMethod(string connectionString);
Define a method called ExecuteUOW, that will take a unit of work and method try to execute it using the preferred connection string. If it fails, it tries to execute it with the next connection string:
protected T ExecuteUOW<T>(UnitOfWorkMethod<T> method)
{
// GET THE LIST OF CONNECTION STRINGS
IEnumerable<string> connectionStringList = ConnectionStringProvider.GetConnectionStringList();
// WHILE THERE ARE STILL DATABASES TO TRY, AND WE HAVEN'T DEFINITIVELY SUCCEDED OR FAILED
var uowState = UOWStateEnum.InProcess;
IEnumerator<string> stringIterator = connectionStringList.GetEnumerator();
T returnVal = default(T);
Exception lastException = null;
string connectionString = null;
while ((uowState == UOWStateEnum.InProcess) && stringIterator.MoveNext())
{
try
{
// TRY TO EXECUTE THE UNIT OF WORK AGAINST THE DB.
connectionString = stringIterator.Current;
returnVal = method(connectionString);
uowState = UOWStateEnum.Success;
}
catch (Exception ex)
{
lastException = ex;
// IF IT FAILED BECAUSE OF A TRANSIENT EXCEPTION,
if (TransientChecker.IsTransient(ex))
{
// LOG THE EXCEPTION AND TRY AGAINST ANOTHER DB.
Log.TransientDBException(ex, connectionString);
}
// ELSE
else
{
// CONSIDER THE UOW FAILED.
uowState = UOWStateEnum.Failed;
}
}
}
// LOG THE FAILURE IF WE HAVE NOT SUCCEEDED.
if (uowState != UOWStateEnum.Success)
{
Log.ExceptionDuringDataAccess(lastException);
returnVal = default(T);
}
return returnVal;
}
Finally, for each operation we define our unit of work delegate method. Here an example
UnitOfWorkMethod uowMethod =
(providerConnectionString =>
{
using (var db = new MyContext(providerConnectionString ))
{
// Do my DB commands here. They will roll back if exception thrown.
}
});
ExecuteUOW(uowMethod);
When ExecuteUOW is called, it tries the delegate on each database until it either succeeds or fails on all of them.
I'm going to accept this answer since it fully addresses all of concerns raised in the original question. However, if anyone provides and answer that is more elegant, understandable, or corrects flaws in this one I'll happily accept it instead.
Thanks to all who have responded.

Python-Multithreading Time Sensitive Task

from random import randrange
from time import sleep
#import thread
from threading import Thread
from Queue import Queue
'''The idea is that there is a Seeker method that would search a location
for task, I have no idea how many task there will be, could be 1 could be 100.
Each task needs to be put into a thread, does its thing and finishes. I have
stripped down a lot of what this is really suppose to do just to focus on the
correct queuing and threading aspect of the program. The locking was just
me experimenting with locking'''
class Runner(Thread):
current_queue_size = 0
def __init__(self, queue):
self.queue = queue
data = queue.get()
self.ID = data[0]
self.timer = data[1]
#self.lock = data[2]
Runner.current_queue_size += 1
Thread.__init__(self)
def run(self):
#self.lock.acquire()
print "running {ID}, will run for: {t} seconds.".format(ID = self.ID,
t = self.timer)
print "Queue size: {s}".format(s = Runner.current_queue_size)
sleep(self.timer)
Runner.current_queue_size -= 1
print "{ID} done, terminating, ran for {t}".format(ID = self.ID,
t = self.timer)
print "Queue size: {s}".format(s = Runner.current_queue_size)
#self.lock.release()
sleep(1)
self.queue.task_done()
def seeker():
'''Gathers data that would need to enter its own thread.
For now it just uses a count and random numbers to assign
both a task ID and a time for each task'''
queue = Queue()
queue_item = {}
count = 1
#lock = thread.allocate_lock()
while (count <= 40):
random_number = randrange(1,350)
queue_item[count] = random_number
print "{count} dict ID {key}: value {val}".format(count = count, key = random_number,
val = random_number)
count += 1
for n in queue_item:
#queue.put((n,queue_item[n],lock))
queue.put((n,queue_item[n]))
'''I assume it is OK to put a tulip in and pull it out later'''
worker = Runner(queue)
worker.setDaemon(True)
worker.start()
worker.join()
'''Which one of these is necessary and why? The queue object
joining or the thread object'''
#queue.join()
if __name__ == '__main__':
seeker()
I have put most of my questions in the code itself, but to go over the main points (Python2.7):
I want to make sure I am not creating some massive memory leak for myself later.
I have noticed that when I run it at a count of 40 in putty or VNC on
my linuxbox that I don't always get all of the output, but when
I use IDLE and Aptana on windows, I do.
Yes I understand that the point of Queue is to stagger out your
Threads so you are not flooding your system's memory, but the task at
hand are time sensitive so they need to be processed as soon as they
are detected regardless of how many or how little there are; I have
found that when I have Queue I can clearly dictate when a task has
finished as oppose to letting the garbage collector guess.
I still don't know why I am able to get away with using either the
.join() on the thread or queue object.
Tips, tricks, general help.
Thanks for reading.
If I understand you correctly you need a thread to monitor something to see if there are tasks that need to be done. If a task is found you want that to run in parallel with the seeker and other currently running tasks.
If this is the case then I think you might be going about this wrong. Take a look at how the GIL works in Python. I think what you might really want here is multiprocessing.
Take a look at this from the pydocs:
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.