Drools Rule execution delays the first time after updating rules - drools

In Drools v7.32.0.Final, the first time a KieSession of a KieBase fires the rules when the program starts, or whenever a new jar is deployed, execution time of rules is approximately 5 times more than the average time later on.
I have seen posts referring to v6.x with the same problem, suggesting to do a dummy execution of rules, because the problem is indexing. I have done dummy executions, passing in drools all the objects that i am currently using with empty fields and the only improvement i got is to go to 4 times slower.
Can anyone suggest a solution? If the problem is indexing as mentioned, is there a way to force KieBases to index before executing any rules?

Related

Benchmarking Redshift Queries

I want to know how long my queries take to execute, so that I can see whether my changes improve the runtime or not.
Simply timing the executing of the whole query is unsuitable, since this also takes into account the (highly variable) time spent waiting in an execution queue.
Redshift provides the STL_WLM_QUERY table that contains separate columns for queue wait time and execution time. However, my queries do not reliably show up in this table. For example if I execute the same query multiple times the number of corresponding rows in STL_WLM_QUERY is often much smaller than the number of repetitions. Sometimes, but not always, only one row is generated no matter how often I run the query. I suspect some caching is going on.
Is there a better way to find the actual execution time of a Redshift query, or can someone at least explain under what circumstances exactly a row in STL_WLM_QUERY is generated?
My tips
If possible, ensure that your query has not waited at all, if it has
there should be a row on stl_wlm_query. If it did wait - then rerun
it.
Run the query once to compile it, then a second time to benchmark
it. compile time can be significant
Disable the new query result caching feature (if you have it yet -
you probably don't)
(https://aws.amazon.com/about-aws/whats-new/2017/11/amazon-redshift-introduces-result-caching-for-sub-second-response-for-repeat-queries/)

JBPM 6 Performance concern

Am inserting numerous facts in JBPM for rules matching and once the rules are matched i do appropriate actions needed. The number of facts will be numerous, with the size of 20k facts in memory JBPM takes following time to do following things:
Start Process - 3-4 seconds
Insert Fact - 4+ seconds
FireAllRules - 3-4 seconds.
Can someone please help me understand what could cause these delays.
Starting the process may include rule compilation, which depends on the number of rules or lines of code in the DRL file(s).
The 4+ seconds are due to the evaluation of all rule conditions forall inserted facts. Of course, this not only depends on the number of facts, but also on the complexity of left hand side code (conditions). Whether 4+ sec is adequate cannot be said without inspecting the rules.
You can easily track what firing of rules means in terms of right hand side code (consequences). You have coded this, and you can monitor it very easily using an event listener. If this is too slow, check what that code does, and how.

Drools is very slow when we integrate with Talend ETL and process millions of records

we have used around 30 rules with multiple conditions in it. we are under the assumption that Drools takes one record and compares it against the records then will give the output for each one.So the time taken for processing 1 million record is around 4 hours. Cant we not process the records in batches. I mean to say in big numbers and reducing time for processing. Pls help me this issue. Thanks for the response.
Inserting 1M facts in one batch is a very bad strategy (unless you need to find combinations out of the lot). The documentation makes it clear that all work (at least in 5.x) is done during inserts and modifications. (6.x is reportedly different, but it's still bad practice to needlessly fill your memory up with objects galore.)
Simply insert, and after some suitable number, call fireAllRules() and process (transmit,...) the results. Make sure that no "dead stock" remains in Working Memory from such a batch - this would also slow you down.

Drools is very slow in processing big data

We have integrated Drools with Talend ETL. Drools takes lot of time to process records counting upto half a million or more. How can we increase the processing speed of drools. I am familiar with drools coding but i am not aware how drools internally works. please help me with this issue. It would be really greatful. I am not sure whether I have given right tags i.e whether they have the right answer. But please do help me on this as it is needed.
The typical problems involve:
Not using == constraints, to allow for indexing.
Make sure you have the field on the left, and the variable on the right.
Not having your most restrictive patterns and constraints first
Not ensuring your rules are written to avoid large cross products
Use of multiple accumulates per rule, or sub networks.
The last issue is improved in Drools 6.0.

Sphinx UPDATE performance

Sphinx 2.0.1 brings with it the ability to call UPDATE and update an individual item in an index.
Does anyone know what type of performance this brings to sphinx when called VERY frequently (as frequently as several hundred times a second)? The reason for this would be to keep a real time index of trending item scores which get updated every time a user performs an action. Obviously when there are lots of users this value can be update quite frequently.
EDIT:
I should mention that I am not using SphinxSE.
You are talking about sphinx rt indices... Updates are fast, but remember, this type of indices do not support enable_star. This means you can't perform searches like appl*.
Such attributes are stored in memory. So updates should be really fast.
But I've never benchmarked it. So try benchmarking it!
... although to be honest I would still be tempted to 'batch process' it. Write the actions to a log "file", and then process that log in batches. Maybe every 10 seconds. All actions on the same record can be run as one update statement.