Manually check requests on port in kdb - kdb

From what I understand the main q thread monitors it socket descriptors for requests and respond to them.
I want to use a while loop in my main thread that will go on for an indefinite period of time. This would mean, that I will not be able to use hopen on the process port and perform queries.
Is there any way to manually check requests within the while loop.
Thanks.

Are you sure you need to use a while loop? Is there any chance you could, for instance, instead use the timer functionality of KDB+?
This could allow you to run a piece of code periodically instead of looping over it continually. Depending on your use case, this may be more appropriate as it would allow you to repeatedly run a piece of code (e.g. that could be polling something periodically), without using the main thread constantly.
KDB+ is by default single-threaded, which makes it tricky to do what you want to do. There might be something you can do with slave threads.
If you're interested in using timer functionality, but the built-in timer is too limited for your needs, there is a more advanced set of timer functionality available free from AquaQ Analytics (disclaimer: I work for AquaQ). It is distributed as part of the TorQ KDB framework, the specific script you'd be interested in is timer.q, which is documented here. You may be able to use this code without the full TorQ if you like, you may need some of the other "common" code from TorQ to provide functions used within timer.q

Related

In drools is there a way to detect endless loops and halt a session programmatically?

in short my questions are:
Is there anything built-in into drools that allows/facilitates detection of endless loops?
Is there a way to programmatically halt sessions (e.g. for the case of a detected endless loop)?
More details:
I'm planning to have drools (6.2 or higher) run within a server/platform where users will create and execute their own rules. One of the issues I'm facing is that carelessly/faulty rule design can easily result in endless loops (whether its just a forgotten "no-loop true" statement or the more complex rule1 triggers rule2 triggers rule3 (re)triggers rule1 circles that lead to endless loops.
If this happens, drools basically slows down my server/platform to a halt.
I'm currently looking into how to detect and/or terminate sessions that run in an endless loop.
Now as a (seemingly) endless loop is nothing that is per-se invalid or in certain cases maybe even desired I can imagine that there is not a lot of built-in detection mechanism for this case (if any). But as I am not an expert I'd be happy to know if there is anything built-in to detect endless loops?
In my use case I would be ok to determine a session as "endlessly looped" based on a threshold of how often any rule might have been activated.
As I understand I could use maybe AgendaEventListeners that keep track of how often any rule has been fired and if a threshold is met either insert a control fact or somehow trigger a rule that contains the drools.halt() for this session.
I wonder (and couldn't find a lot of details) if it is possible to programmatically halt/terminate sessions.
I've only come across a fireUntilHalt() method but that didn't seem like the way to go (or I didnt understand it really).
Also, at this point I was only planning to use stateless session (but if it's well encapsulated I could also work with stateful sessions if that makes my goal easier to achieve).
Any answers/ideas/feedback to my initial approach is highly welcome :)
Thanks!
A fundamental breaking point of any RBS implementation is created where the design lets "users create and design their own rules". I don't know why some marketing hype opens the door for non-programmers to write what is program code, without any safeguarding.
Detecting whether a session halts is theoretically impossible. Google "Halting problem".
For certain contexts you might come up with a limit of the number of rules that might be executed at most or something similar. And you can use listeners to count and raise an exception, etc etc.
Basically you have very bad cards once you succumb to the execution of untested code created by amateurs.

Celery - running a set of tasks with complex dependencies

In the application I'm working on, a user can perform a "transition" which consists of "steps". A step can have an arbitrary number of dependencies on other steps. I'd like to be able to call a transition and have the steps execute in parallel as separate Celery tasks.
Ideally, I'd like something along the lines of celery-tasktree, except for directed acyclic graphs in general, rather than only trees, but it doesn't appear that such a library exists as yet.
The first solution that comes to mind is a parallel adaptation of a standard topological sort - rather than determining a linear ordering of steps which satisfy the dependency relation, we determine the entire set of steps that can be executed in parallel at the beginning, followed by the entire set of steps that can be executed in round 2, and so on.
However, this is not optimal when tasks take a variable amount of time and workers have to idle waiting for a longer running task while there are tasks that are now ready to run. (For my specific application, this solution is probably fine for now, but I'd still like to figure out how to optimise this.)
As noted in https://cs.stackexchange.com/questions/2524/getting-parallel-items-in-dependency-resolution, a better way is operate directly off the DAG - after each task finishes, check whether any of its dependent tasks are now able to run, and if so, run them.
What would be the best way to go about implementing something like this? It's not clear to me that there's an easy way to do this.
From what I can tell, Celery's group/chain/chord primitives aren't flexible enough to allow me to express a full DAG - though I might be wrong here?
I'm thinking I could create a wrapper for tasks which notifies dependent tasks once the current task finishes - I'm not sure what the best way to handle such a notification would be though. Accessing the application's Django database isn't particularly neat, and would make it hard to spin this out into a generic library, but Celery itself doesn't provide obvious mechanisms for this.
I also faced this problem but i couldn't really find a better solution or library except for one library, For anyone still interested, you can checkout
https://github.com/selinon/selinon. Although its only for python 3, It seems to be the only thing that does exactly what you want.
Airflow is another option but airflow is used in a more static environment just like other dag libraries.

Make A step in a SAS macro timeout after a set interval

I'm on SAS 9.1.3 (on a server) and have a macro looping over an array to feed a computationally intensive set of modelling steps which are appended out to a table. I'm wondering if it is possible to set a maximum time to run for each element of the array. This is so that any element which takes longer than 3 minutes to run is skipped and the next item fed in.
Say for example I'm using a proc nlin with a by statement to build separate models per class on a large data set, and one class is failing to converge; how do I skip over that class?
Bit of a niche requirement, hope someone can assist!
The only approach I can think of here would be to rewrite your code so that it runs each by group separately from the rest, in one or more SAS/CONNECT sessions, have the parent session kill each one after a set timeout, and then recombine the surviving output.
As Dom and Joe have pointed out, this is not a trivial task, but it's possible if you're sufficiently keen on learning about that aspect of SAS. A good place to get started for this sort of thing would be this page:
http://support.sas.com/rnd/scalability/tricks/connect.html
I was able to use the examples there and elsewhere as the basis of a simple parallel processing framework (in SAS 9.1.3, coincidentally!), but there are many details you will need to consider. To give you an idea of the sorts of adventures in store if you go down this route:
Learning how to sign on to your server via SAS/CONNECT within whatever infrastructure you're using (will the usual autoexec file work? What invocation options do you need to use?)
Explaining to your sysadmin/colleagues why you need to run multiple processes in parallel
Managing asynchronous sessions
Syncing macro variables, macro definitions, libraries and formats between sessions
Obscure bugs (I wasn't able to use the usual option for syncing libraries and had to roll my own via call execute...)
One could write a (lengthy) SUGI paper on this topic, and I'm sure there are plenty of them out there if you look around.
In general, SAS is running in a linear manner. So you cannot write a step to monitor another step in the same program. What you could do is run your code in a SAS/CONNECT session and monitor it with the process that started the session. That's not trivial and the how to is beyond the scope of Stack Overflow.
For a data step, use the datetime() function to get the current system date and time. This is measured in seconds. You can check the time inside your data step. Stop a data step with the stop; statement.
Now you specifically asked about breaking a specific step inside a PROC. That must be implemented in the PROC by the SAS developer. If it is possible, it will be documented in the procedure's documentation. View SAS documentation at http://support.sas.com/documentation/.
For PROC NLIN, I do not think there is a "break after X" parameter. You can use the trace parameters to track model execution to see what it hanging up. You can then work on changing the convergence parameters to attempt to speed up slow, badly converging, models.

Importance of knowing if a standard library function is executing a system call

Is it actually important for a programmer to know if the standard library function he/she is using is actually executing a system call? If so, why?
Intuitively I'm guessing the only importance is in knowing if the general standard function is a library function or a system call itself. In other cases, I'm guessing there isn't much of a need to know if a library functions uses internally a system call?
It is not always possible to know (for sure) if a library function wraps a system call. But in one way or another, this knowledge can help improve the portability and (or) efficiency of your program. At least in the following two cases, knowing the syscall-level behaviours of your program is helpful.
When your program is time critical. Some system calls are expensive, and the library functions that wrap them are even more expensive. Thus time-critical tasks may need to switch to equivalent functions that do not enter kernel space at all.
It is also worth noticing the vsyscall (or vdso) mechanism of linux, which accelerates some system calls (i.e. gettimeofday) through mapping their implementations into user-space memory. See this for more details.
When your program needs to be deployed to some restricted environments with system call auditing. In order for your programs to survive such environments, it could be necessary to profile your program for any potential policy violations, or perhaps less tough if you are aware of the restrictions when you wrote the program.
Sometimes it might be important, and sometimes it isn't. I don't think there's any universal answer to this question. Reasons I can think of that might be important in some contexts are: if the system call requires user permissions that the user might not have; in performance critical code a system call might be too heavyweight; if you're writing a signal-handler where most system calls are forbidden; if it might use some system resource (e.g. reading from /dev/random for every random number could use up the whole entropy pool - you'd want to know if that's going to happen every time you call rand()).

How can I poll web requests without blocking?

I have two web requests which I need to poll to find out when they return. Ideally I don't want to keep testing them in a tight loop. I would like to free up the CPU so other processes can execute.
I'm currently using Perl's Time::HiRes::sleep(0.100) function to release the CPU before testing whether or not the web requests have returned.
During testing under load I can see that the sleep duration 'stretches'. Ideally I want to make sure that the sleep duration is adhered to but that CPU is freed up. Should I be calling a different function to achieve this?
I'm coding Perl on Linux 2.6.
Rather than polling, see if you can't get file-descriptors and do a select call.
Then you'll get control back as soon as anything happens, without occupying the CPU at all.
Somewhere in the web-request will be some sockets, and attached to the sockets will be file-descriptors that you can use in select.
In any case your program can be interrupted at any point for any amount of time; if this is a real problem you need a real-time operating system, but since you're dealing with web-requests I doubt you need that level of responsiveness.
In fact what you want is a high level interface that does the select call for you. As suggested in the comments: http://search.cpan.org/dist/HTTP-Async/ looks like it'll do precisely what you need.
It sounds like you really want an event loop. There is
POE,
EV, and abstraction layers over
both.
Either way, don't implement this yourself. This wheel has already
been invented.
I don't think sleep duration can be guaranteed on regular Linux. That's pretty much the point of a "Real Time" operating system, and regular Linux is not "Real Time."
I agree with #Douglas Leeder: use a select call to have the kernel notify you when something changes. You can also emulate sub-second sleeps with a select call, but Time::HiRes is a cleaner interface (and you're still not going to avoid stretching the wait).