Can optaplanner facts be arrays? - drools

I'm looking to load a large number of facts, 3 million distances between points, and I want to know whether optaplanner facts can be provided using a searchable data structure such as an Array. My sense is that using a List to hold that many values would create long search times. I'm using Drools scoring.

Yes, problem facts can be arrays. Problem facts can also have fields that are arrays. To use them in the Drools score calculation, you need to return them from the getProblemFacts() method as a Collection, but just wrap them in an ArrayList, you will not suffer any noticable perf slowdown there: it's a 1 time hit at initialization and as soon as they are in the Drools workingMemory, that getProblemFacts() Collection is forgotten.
Note that ArrayList (which implements the interface List which in turn implements Collection) has the same memory and performance scalability characteristics of an array. What I mean by that is that it does have some overhead, but not BigO affecting overhead.
Also note that it's highly recommended to use new ArrayList(int initialCapacity) over the no-arg constructor if you have any estimation of the size you'll need.
Planning entities currently need to be a Collection because #PlanningEntityCollectionProperty currently only supports a return type of Collection. Feel free to make a jira that we also should support an array - or even contribute a github PR that implements it :)
Planning value ranges currently need to be a List or numeric bounds. Feel free to make a jira or PR that we also should support an array there, it's just a matter of copy-pasting ListValueRange to ArrayValueRange.

Related

Drools 6 Fusion Notification

We are working in a very complex solution using drools 6 (Fusion) and I would like your opinion about best way to read Objects created during the correlation results over time.
My first basic approach was to read Working Memory every certain time, looking for new objects and reporting them to external Service (REST).
AgendaEventListener does not seems to be the "best" approach beacuse I dont care about most of the objects being inserted in working memory, so maybe, best approach would be to inject particular "object" in some sort of service inside DRL. Is this a good approach?
You have quite a lot of options. In decreasing order of my preference:
AgendaEventListener is probably the solution requiring the smallest amount of LOC. It might be useful for other tasks as well; all you have on the negative side is one additional method call and a class test per inserted fact. Peanuts.
You can wrap the insert macro in a DRL function and collect inserted fact of class X in a global List. The problem you have here is that you'll have to pass the KieContext as a second parameter to the function call.
If the creation of a class X object is inevitably linked with its insertion into WM, you could add the registry of new objects into a static List inside class X, to be done in a factory method (or the constructor).
I'm putting your "basic approach" last because it requires much more cycles than the listener (#1) and tons of overhead for maintaining the set of X objects that have already been put to REST.

Best practice to represent Object tree in Drools

We are working on a monitoring application in which we follow the processing of a task in a set of applications.
Each application task processing is designed as ChainStep, the whole task process is designed as Chain.
Chain contains a tree of ChainSteps, each ChainStep may be parent of others.
We have a set of drools rules matching our needs but we have some performance issues (we may have easily up to 50k objects in session).
We are looking for best practices to improve drools performances.
Currently we represent Chains and ChainSteps as flat objects, each object has Id (GUID), we frequently have rule with conditions such as:
rule "Chain_App1_LinkToParent"
when $app1Step:App1ChainStep(!HasParent )
$app2Step:App2ChainStep($app2Step.ChainId == $app1Step.ChainId)
then
modify($app1Step) {
setParent($app2Step.getId()),
setHasParent(true)
}
end
(App1ChainStep and App2ChainStep both extends ChainStep type)
We tried to use unification but rules processing seems slower
when $app1Step:App1ChainStep(!HasParent, $Id:=ChainId )
$app2Step:App2ChainStep($Id:=ChainId)
We are working now on a non flat representation, but we encounter problems on rules triggering on object modifications.
For example:
rule "SetChainCollectable"
when
$chain:Chain(!Collectable )
not ( exists( $chainStep:ChainStep( !Collectable) from $chain.Steps))
then
modify($chain){
setCollectable(true)
}
end
seems not triggered on ChainStep modification of Collectable flag.
We would like to sure to obtain a better result before to finnish to migrate our rules.
What would be the more efficient way to represent object tree in Drools ?
For designing an efficient fact representation all use cases and their frequencies must be taken into account. (What you've let out in your question is far from the mark.) So I can only point out a few oddities I've observed:
Maintaining a parent and a hasParent could be simplified. Most of the time, IDs have a null value, NULL or 0 etc. Perhaps you should object references rather than ID values.
Unification isn't useful for the rule pattern you have there.
not( exists() ) is redundant - not() is the negative existence quantifier all by itself. (Drools discards the exists in this situation.)
from <expression> iterating over some POJOs (not facts) in a collection creates what I call "temporary facts", which, as facts, are limited to the context of the condition where the from clause is written. Thus, the Engine is not aware of any modification of a ChainStep object. ChainStep objects must be facts if you expect rules to react to their modification.
Basically, representing graphs by nodes as facts and with references to neighbours - up or down, or up and down, perhaps even including siblings, if it is a tree - is the way to go. But I'm not going to say more - see the initial paragraph.

G-Counters in Riak: Don't the underlying vclocks provide the same data?

I've been reading into CvRDTs and I'm aware that Riak has already added a few to Riak 2.
My question is: why would Riak implement a gcounter when it sounds like the underlying vclock that is associated with every object records the same information? Wouldn't the result be a gcounter stored with a vclock, each containing the same essential information?
My only guess right now would be that Riak may garbage-collect the vclocks, trimming information that would actually be important for the purpose of a gcounter (i.e. the number of increments).
I cannot read Erlang particularly well, so maybe I've wrongly assumed that Riak stores vclocks with these special-case data types. However, the question still applies to the homegrown solutions that are written on top of standard Riak (and hence inherit vclocks with each object persisted).
EDIT:
I have since written the following article to help explain CvRDTs in a more practical manner. This article also touches on the redundancy I have highlighted above:
Conflict-free Replicated Data Types (CRDT) - A digestible explanation with less math.
Riak prunes version vectors, no big deal for causality (false concurrency, more siblings, safe) but a disaster for counters.
Riak's CRDT support is general. We "hide" CRDTs inside the regular riak object.
Riak's CRDT support is in it's first wave, we'll be optimising further as we make further releases.
We have a great mailing list for questions like this, btw. Stack Overflow has it's uses but if you want to talk to the authors of an open source DB why not use their list? Since Riak is open source, you can submit a pull request, we'd love to incorporate your ideas into the code base.
Quick answer: Riak's counters are actually PN-Counters, ie they allow both increments and decrements, so can't be implemented like a vclock, as they require tracking the increments and decrements differently.
Long Answer:
This question suggests you have completely misunderstood the difference between a g-counter and a vector clock (or version vector).
A vector clock (vclock) is a system for tracking the causality of concurrent updates to a piece of data. They are a map of {actor => logical clock}. Actors only increment their logical clocks when the data they're associated with changes, and try to increment it as little as possible (so at most once per update). Two vclocks can either be concurrent, or one can dominate the other.
A g-counter is a CvRDT with what looks like the same structure as a vclock, but with important differences. They are implemented as a map of {actor => counter}. Actors can increment their own counter as much as they want. A g-counter has the concept of a "counter value", and the concept of a "merge", so that when concurrent operations are executed by different actors, they can work out what the actual "counter value" should be.
Importantly, g-counters can't track causality, and vclocks have no idea what their "counter value" is.
To conflate the two in a codebase would not only be confusing, but could also bring in errors.
Add this to the fact that riak actually implements pn-counters. The difference is that a g-counter can only be incremented, but pn-counters can be both incremented and decremented. Pn-counters work by being a map of {actor => (increment count, decrement count)}, which more obviously has a different structure to a vclock. You can only increment both those counts, hence why there are two and not just one.

Is there a rule of thumb for how granular an object should be in OO programming? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm in school learning OO programming, and for the next few months, every assignment involves dice games and word games like jumble and hangman. Each assignment has us creating a new class for these variables; HangmanWordArray, JumbleWordArray, etc. In the interest of reusability, I want to create a class (or series of classes) that can be re-used for my assignments. I'm having a hard time visualizing what that would look like, and hope my question makes sense...
Let's assume that the class(es) will contain properties with accessors and mutators, and methods to return the various objects...a word, a letter, a die roll. Is there a rule of thumb for how to organize these classes?
Is it best to keep one object per class? Or group all the objects in a single class because they're all related as "stuff I need for assignments?" Or group by data type, so all the numeric objects are in one class and all the strings in another?
I guess I'm grappling with how a programmer, in the real world, makes decisions about how to group objects within a class or series of classes, and some of the questions and thought processes I should be using to frame this type of design scenario.
Realistically, it varies from project to project. There is no guaranteed 'right way' to group anything; it all depends on your needs. What it comes down to is manageability, meaning how easily you can read and update old code. If you can contain all your games in a single 'games' class, then there's nothing wrong with doing it. However, if your games are very complicated with many subs and variables, perhaps moving them all to their own class would be easier to manage.
That being said, there are ways to logically group items. For instance if you have a lot of solo functions that are used for manipulation (char to string, string to int, html encode/decode, etc.), you may decide to create a 'helper functions' class to hold them all. Similarly, if your application uses a database connection, you may create a class to hold and manage a shared connection as well as have methods for getting query results and executing non-queries.
Some people try to break things down to much. For example, instead of having the database core mentioned above, they might create one class to create and manage the database connection. They will create another class to then use the connection class to handle queries. Not that this method won't work, but it may become very difficult to manage when items are split up too small.
Without knowing exactly what you are doing, there's no way to tell you how to do it. If you reuse the same methods in each project, then perhaps you can place them somewhere that they can be shared. The best way I found to figuring out what works best is just to try it out and see how it responds!
What I see people doing is breaking down their objects and methods until each method is just a handful of code; if any method exceeds a page of code, they will try to break down the object structure further in order to shorten things up.
I personally have no objection to long methods, as long as they are readable. I think a "one-page limit" tends to create too much granularity, and risks more confusion rather than less. But this seems to be the current fashion.
Just reporting what I'm seeing in the wild.

Lazy and Deferred TreeViewer questions

I have actually two questions but they are kind of related so here they go as one...
How to ensure garbage collection of tree nodes that are not currently displayed using TreeViewer(SWT.VIRTUAL) and ILazeTreeContentProvider?
If a node has 5000 children, once they are displayed by the viewer they are never let go,
hence Out of Memory Error if your tree has great number of nodes and leafs and not big enough heap size.
Is there some kind of a best practice how to avoid memory leakages, caused by never closed view holding a treeviewer with great amounts of data (hundreds of thousands objects or even millions)?
Perhaps maybe there is some callback interface which allow greater flexibility with viewer/content provider elements?
Is it possible to combine deffered (DeferredTreeContentManager) AND lazy (ILazyTreeContentProvider) loading for a single TreeViewer(SWT.VIRTUAL)?
As much as I understand by looking at examples and APIs, it is only possible to use either one at a given time but not both in conjunction, e.g. ,
fetch ONLY the visible children for a given node AND fetch them in a separate thread using Job API. What bothers me is that Deferred approach
loads ALL children. Although in a different thread, you It still load all elements
even though only a minimal subset are displayed at once.
I can provide code examples to my questions if required...
I am currently struggling with those myself so If I manage to come up with something in the meantime I will gladly share it here.
Thanks!
Regards,
Svilen
I find the Eclipse framework sometimes schizophrenic. I suspect that the DeferredTreeContentManager as it relates to the ILazyTreeContentProvider is one of these cases.
In another example, at EclipseCon this past year they recommended that you use adapter factories (IAdapterFactory) to adapt your models to the binding context needed at the time. For example, if you want your model to show up in a tree, do it this way.
treeViewer = new TreeViewer(parent, SWT.BORDER);
IAdapterFactory adapterFactory = new AdapterFactory();
Platform.getAdapterManager().registerAdapters(adapterFactory, SomePojo.class);
treeViewer.setLabelProvider(new WorkbenchLabelProvider());
treeViewer.setContentProvider(new BaseWorkbenchContentProvider());
Register your adapter and the BaseWorkbenchContentProvider will find the adaption in the factory. Wonderful. Sounds like a plan.
"Oh by-the-way, when you have large datasets, please do it this way", they say:
TableViewertableViewer = new TableViewer(parent, SWT.VIRTUAL);
// skipping the noise
tableViewer.setItemCount(100000);
tableViewer.setContentProvider(new LazyContentProvider());
tableViewer.setLabelProvider(new TableLabelProvider());
tableViewer.setUseHashlookup(true);
tableViewer.setInput(null);
It turns out that first and second examples are not only incompatible, but they're mutually exclusive. These two approaches where probably implemented by different teams that didn't have a common plan or maybe the API is in the middle of a transition to a common framework. Nevertheless you're on your own.