Stateful session memory management in Drools

Stateful session memory management in Drools - drools

Just have a general question regarding memory management when using "Stateful sessions" in Drools. For context, I'm specifically looking to use a ksession in "Stream" mode, together with fireUntilHalt() to process an infinite stream of events. Each event is timestamped, however I'm mainly writing rules using length-based windows notation (i.e. window:length()) and the from accumulate syntax for decision making.
The docs are a little vague though about how memory management works in this case. The docs suggest that using temporal operators the engine can automatically remove any facts/events that can no longer match. However, would this also apply to rules that only use the window:length()? Or would my system need to manually delete events that are no longer applicable, in order to prevent running OOM?

window:time() calculates expiration, so works for automatic removal. However, window:length() doesn't calculate expiration, so events would be retained.
You can confirm the behaviour with my example:
https://github.com/tkobayas/kiegroup-examples/tree/master/Ex-cep-window-length-8.32
FYI)
https://github.com/kiegroup/drools/blob/8.32.0.Final/drools-core/src/main/java/org/drools/core/rule/SlidingLengthWindow.java#L145
You would need to explicitly delete them or set #expires (if it's possible for your application to specify expiration with time) to avoid OOME.
Thank you for pointing that the document is not clear about it. I have filed a doc JIRA to explain it.
https://issues.redhat.com/browse/DROOLS-7282

Related

What is the difference between reg_defaults and reg_defaults_raw in regmap facility?

When configuring regmap, it is possible to include a list of power on defaults for the registers. As I understand it, the purpose is to pre-populate the cache in order to avoid an initial read after power on or after waking up. I'm confused by the fact that there is both a reg_defaults and a reg_defaults_raw field. Only one or the other is ever used. The vast majority of drivers use reg_defaults however there's a small handful that use reg_defaults_raw. I did look through the git history and found the commit that introduced reg_defaults and the later commit that introduced reg_defaults_raw. Unfortunately I wasn't able to divine the reason for that new field.
Does anyone know the difference between those fields?

Drools Engine Inferred Expiration

I'm working on a project which needs to process a large number of events per second. The project uses Drools 6.5 running in stream mode. The data is fed to the engine as "event" objects.
Due to the large number of events that need to be processed, automatic memory management provided by Drools simplifies the development process significantly. However, drools has a somewhat vague documentation in this category. I need to count the number of events with certain conditions in the past T seconds and fire a rule if the number surpasses a threshold. I currently use sliding windows to achieve this. The problem is, Drools discards events before T seconds passes from their insertion (using #expires value) or does not discard them at all (if #expires tag is removed); thus either making inference impossible, or causing a heap memory overflow in the long run.
Is there a better approach to the problem? Can anyone clarify how inferred expiration works? Am I doing something wrong?
Any help would be greatly appreciated.

After a few hours of exploring the documentation for Drools 6.5, I finally found out what was happening. I will leave the information here to help anyone else who might have the same problem.
Important
An explicit expiration policy for a given event type overrides any inferred expiration offset for that same type.
As the documentation says(9.8.1), explicit #expires tag overrides any inferred expiration offset, so in order to let the engine handle the events' life cycle, do not use this tag.
7.5.1. Passive Mode
With Passive mode not only is the user responsible for working memory operations, such as insert(), but also for when the rules are to evaluate the data and fire the resulting rule instantiations - using fireAllRules()
Apparently, in order to use the inferred expiration feature, one can not use the passive execution mode. Running kSession.fireUntilHalt() runs the engine in active mode and enables the use of inferred expiration offsets.
TL;DR:
1. Remove any #expires tag
2. run the engine using fireUntilHalt() in a dedicated thread.

EventStore basics - what's the difference between Event Meta Data/MetaData and Event Data?

I'm very much at the beginning of using / understanding EventStore or get-event-store as it may be known here.
I've consumed the documentation regarding clients, projections and subscriptions and feel ready to start using on some internal projects.
One thing I can't quite get past - is there a guide / set of recommendations to describe the difference between event metadata and data ? I'm aware of the notional differences; Event data is 'Core' to the domain, Meta data for describing, but it is becoming quite philisophical.
I wonder if there are hard rules regarding implementation (querying etc).
Any guidance at all gratefully received!

Shamelessly copying (and paraphrasing) parts from Szymon Kulec's blog post "Enriching your events with important metadata" (emphases mine):
But what information can be useful to store in the metadata, which info is worth to store despite the fact that it was not captured in
the creation of the model?
1. Audit data
who? – simply store the user id of the action invoker
when? – the timestamp of the action and the event(s)
why? – the serialized intent/action of the actor
2. Event versioning
The event sourcing deals with the effect of the actions. An action
executed on a state results in an action according to the current
implementation. Wait. The current implementation? Yes, the
implementation of your aggregate can change and it will either because
of bug fixing or introducing new features. Wouldn’t it be nice if
the version, like a commit id (SHA1 for gitters) or a semantic version
could be stored with the event as well? Imagine that you published a
broken version and your business sold 100 tickets before fixing a bug.
It’d be nice to be able which events were created on the basis of the
broken implementation. Having this knowledge you can easily compensate
transactions performed by the broken implementation.
3. Document implementation details
It’s quite common to introduce canary releases, feature toggling and
A/B tests for users. With automated deployment and small code
enhancement all of the mentioned approaches are feasible to have on a
project board. If you consider the toggles or different implementation
coexisting in the very same moment, storing the version only may be
not enough. How about adding information which features were applied
for the action? Just create a simple set of features enabled, or map
feature-status and add it to the event as well. Having this and the
command, it’s easy to repeat the process. Additionally, it’s easy to
result in your A/B experiments. Just run the scan for events with A
enabled and another for the B ones.
4. Optimized combination of 2. and 3.
If you think that this is too much, create a lookup for sets of
versions x features. It’s not that big and is repeatable across many
users, hence you can easily optimize storing the set elsewhere, under
a reference key. You can serialize this map and calculate SHA1, put
the values in a map (a table will do as well) and use identifiers to
put them in the event. There’s plenty of options to shift the load
either to the query (lookups) or to the storage (store everything as
named metadata).
Summing up
If you create an event sourced architecture, consider adding the
temporal dimension (version) and a bit of configuration to the
metadata. Once you have it, it’s much easier to reason about the
sources of your events and introduce tooling like compensation.
There’s no such thing like too much data, is there?

I will share my experiences with you which may help. I have been playing with akka-persistence, akka-persistence-eventstore and eventstore. akka-persistence stores it's event wrapper, a PersistentRepr, in binary format. I wanted this data in JSON so that I could:
use projections
make these events easily available to any other technologies
You can implement your own serialization for akka-persistence-eventstore to do this, but it still ended up just storing the wrapper which had my event embedded in a payload attribute. The other attributes were all akka-persistence specific. The author of akka-persistence-eventstore gave me some good advice, get the serializer to store the payload as the Data, and the rest as MetaData. That way my event is now just the business data, and the metadata aids the technology that put it there in the first place. My projections now don't need to parse out the metadata to get at the payload.

Can I use claims to secure EF fields using PostSharp?

It it possible to use claims based permissions to secure EF fields using post sharp. We have a multi-tenanted app that we are moving to claims and also have issues of who can read/write to what fields. I saw this but it seems role based http://www.postsharp.net/aspects/examples/security.
As far as I can see it would just be a case of rewriting the ISecurable part.
We were hoping to be able to decorate a field with a permission and silently ignore any write to if if the user did not have permission. We were also hopping that if they read it we could swap in another value e.g. Read salary and get back 0 if you don't have a claim ReadSalary.
Are these standard sort of things to do I've never done any serious AOP. So just wanted a quick confirmation before I mention this as an option.

Yes, it is possible to use PostSharp in this case and it should be pretty easy to convert given example from RBAC to claims based.
One thing that has to be considered is performance. When decorated fields are accessed frequently during processing an use-case (e.g. are read inside a loop) then a lot of time is wasted in redundant security checks. Decorating a method that represent an end-user's use-case would be more appropriate.
I would be afraid to silently swapping values of fields when user has insufficient permission. It could lead to some very surprising results when an algorithm is fed by artificial not-expected data.

SimpleDB: Guaranteed to see all item attributes if we see the item? (non-consistent read)

I've just discovered an assumption in my use of SimpleDB. I suspect it's safe but would like other opinions since the docs don't seem to cover it.
So say Process 1 stores an item with x attributes. When Process 2 tries to access said item (without consistent read) & finds it, is it guaranteed to have all the attributes stored by Process 1?
I'm excluding the possibility that another process could have changed the data.
I also know that Process 2 has no guarantee of seeing the item unless consistent read is used, I'm just talking about the point when it does eventually see it.
I guess the question is, once I can get an item & am not changing it anywhere else can I assume it has an ad-hoc fixed schema and access all my expected attributes without checking they actually exist?
I don't want to be in a situation where I need to keep requesting items until they have all the attributes I need to use them.
Thanks.

Although Amazon makes no such guarantees in the documentation the current implementation of their eventual consistency guarantees that you'll see all the properties stored by Process 1 or none of them.
See this thread over at the AWS forums and more specifically this answer by an Amazon employee confirming the behavior (emphasis mine).
I don't think we make that guarantee in the documentation, but the
current implementation treats each Put request as a bundle. It won't
split the request up and apply the operations piecemeal. You'll get
either step-1 responses or step-2 responses until eventual consistency
shakes out and leaves you with step-2 responses.
While this is undocumented behavior I suspect quite a few SimpleDB users are relying on it now and as such Amazon won't be likely to change it anytime soon, but that's ju my guess.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse