we are using drools 5.5 final version.we have thousands of objects and two rules so we are getting objects in chunk(100 size) wise and creating knowledge base for every chunk and firing rules.since creation of Knowledge Base is expensive we are getting performance issue.So we are creating Knowledge Base once and using that knowledge base for every chunk in this case after 4 to 5 chunks got executed from 6th chunk on wards rules are not getting fired though match is there .please suggest what can be done.
sample code
public static KnowledgeBase getPackageKnowledgeBase(PackageDescr pkg){
KnowledgeBuilderConfiguration builderConf = KnowledgeBuilderFactory.newKnowledgeBuilderConfiguration();
KnowledgeBuilder kbuilder = KnowledgeBuilderFactory.newKnowledgeBuilder(builderConf);
kbuilder.add(ResourceFactory.newDescrResource(pkg), ResourceType.DESCR);
Collection<KnowledgePackage> kpkgs = kbuilder.getKnowledgePackages();
if(kbuilder.hasErrors()){
LOGGER.error(kbuilder.getErrors());
}
KnowledgePackage knowledgePackage = kpkgs.iterator().next();
KnowledgeBase kbase= KnowledgeBaseFactory.newKnowledgeBase();
kbase.addKnowledgePackages(Collections.singletonList(knowledgePackage));
return kbase;
}
using method
chunkSize=100;
int start = 0;
Count = -1;
KnowledgeBase kbase=getPackageKnowledgeBase(pkgdscr)//pkgdscr contails all rules got from db
while(Count!=0 && Count <= chunkSize ){
LOGGER.debug("Deduction not getting "+mappedCustomerId);
Objects inputObjects = handler.getPaginatedInputObjects(start);
Count = inputObjects.size();
start=start+chunkSize;
StatefulKnowledgeSession ksession = kbase.newStatefulKnowledgeSession();
for(Object object:inputObjects){
ksession.insert(object);
}
ksession.fireAllRules();
ksession.dispose();
}
Below is the essential part of your loop. Looks to me that this loop terminates as soon as Count exceeds chunkSize (100). You sure this never happens?
while(Count!=0 && Count <= chunkSize ){
Objects inputObjects = ...;
Count = inputObjects.size();
...
StatefulKnowledgeSession ksession = ...;
for(Object object:inputObjects){
ksession.insert(object);
}
ksession.fireAllRules();
...
}
Related
In the current project I am working, we are using spark as computation engine for one of workflows.
Workflow is as follows
We have product catalog being served from several pincodes. User logged in from any particular pin code should be able to see least available cost from all available serving pincodes.
Least cost is calculated as follows
product price+dist(pincode1,pincode2) -
pincode2 being user pincode and pincode1 being source pincode. Apply the above formula for all source pincodes and identify the least available one.
My Core spark logic looks like this
pincodes.javaRDD().cartesian(pincodePrices.javaRDD()).mapPartitionsToPair(new PairFlatMapFunction<Iterator<Tuple2<Row,Row>>, Row, Row>() {
#Override
public Iterator<Tuple2<Row, Row>> call(Iterator<Tuple2<Row, Row>> t)
throws Exception {
MongoClient mongoclient = MongoClients.create("mongodb://localhost");
MongoDatabase database = mongoclient.getDatabase("catalogue");
MongoCollection<Document>pincodeCollection = database.getCollection("pincodedistances");
List<Tuple2<Row,Row>> list =new LinkedList<>();
while (t.hasNext()) {
Tuple2<Row, Row>tuple2 = t.next();
Row pinRow = tuple2._1;
Integer srcPincode = pinRow.getAs("pincode");
Row pricesRow = tuple2._2;
Row pricesRow1 = (Row)pricesRow.getAs("leastPrice");
Integer buyingPrice = pricesRow1.getAs("buyingPrice");
Integer quantity = pricesRow1.getAs("quantity");
Integer destPincode = pricesRow1.getAs("pincodeNum");
if(buyingPrice!=null && quantity>0) {
BasicDBObject dbObject = new BasicDBObject();
dbObject.append("sourcePincode", srcPincode);
dbObject.append("destPincode", destPincode);
//System.out.println(srcPincode+","+destPincode);
Number distance;
if(srcPincode.intValue()==destPincode.intValue()) {
distance = 0;
}else {
Document document = pincodeCollection.find(dbObject).first();
distance = document.get("distance", Number.class);
}
double margin = 0.02;
Long finalPrice = Math.round(buyingPrice+(margin*buyingPrice)+distance.doubleValue());
//Row finalPriceRow = RowFactory.create(finalPrice,quantity);
StructType structType = new StructType();
structType = structType.add("finalPrice", DataTypes.LongType, false);
structType = structType.add("quantity", DataTypes.LongType, false);
Object values[] = {finalPrice,quantity};
Row finalPriceRow = new GenericRowWithSchema(values, structType);
list.add(new Tuple2<Row, Row>(pinRow, finalPriceRow));
}
}
mongoclient.close();
return list.iterator();
}
}).reduceByKey((priceRow1,priceRow2)->{
Long finalPrice1 = priceRow1.getAs("finalPrice");
Long finalPrice2 = priceRow2.getAs("finalPrice");
if(finalPrice1.longValue()<finalPrice2.longValue())return priceRow1;
return priceRow2;
}).collect().forEach(tuple2->{
// Business logic to push computed price to mongodb
}
I am able to get the answer correctly, however mapPartitionsToPair is taking a bit of time(~22 secs for just 12k records).
After browsing internet I found that mapPartitions performs better than mapPartitionsToPair, but I am not sure how to emit (key,value) from mapPartitions and then sort it.
Is there any alternative for above transformations or any better approach is highly appreciated.
Spark Cluster: Standalone(1 executor, 6 cores)
I am quite new to drools.
I am working on an application where my drools engine will get a series of event every second. I need to see if all the events in last 10 seconds has attribute value below 10, if the condition is true, I have to do some processing. Here is the example code which I tried, Please help me understand and resolve the issue.
My Rule file.....
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
declare Employee
#role (event)
#expires(10s)
end
// Using timer to ensure rule processing starts only after 10 secs,
//else processing starts as soon as first event comes in
rule "Test Timer"
no-loop true
10timer(int: 5s)
when
$E : Employee()
$total : Number( doubleValue < 1 )
from accumulate( Employee( Age > 10 ), count() )
then
System.out.println( $E.getName() + " is crossing the threshold of 20");
retract($E);
end
And Main class
// import classes removed from here...
public class MainClass {
/**
* #param args
*/
public static void main(String[] args){
//Create KnowledgeBase...
KnowledgeBase knowledgeBase = createKnowledgeBase();
//Create a stateful session
StatefulKnowledgeSession session = knowledgeBase.newStatefulKnowledgeSession();
// KnowledgeRuntimeLogger logger = KnowledgeRuntimeLoggerFactory.newConsoleLogger(session);
try {
// Using random generator to simulate the data.
int randomInt=0;
Random randomGenerator = new Random();
DateFormat dateFormat = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss");
Date date = null;
while (true) {
Thread.sleep(1000);
date = new Date();
randomInt = randomGenerator.nextInt(12);
//Create Facts and insert them
Employee emp = new Employee();
emp.setName("Anurag" + randomInt);
emp.setAge(randomInt);
//LOAD THE FACT AND FIREEEEEEEEEEEEEEEEEEE............
System.out.println(dateFormat.format(date)+ " => Random no " + randomInt);
session.insert(emp);
session.fireAllRules();
}
} catch (Exception e) {
e.printStackTrace();
}finally {
session.dispose();
}
}
/**
* Create new knowledge base
*/
private static KnowledgeBase createKnowledgeBase() {
KnowledgeBuilder builder = KnowledgeBuilderFactory.newKnowledgeBuilder();
//Add drl file into builder
File drl = new File("D:\\eclipse\\worspace\\Research\\misc\\testforall.drl");
builder.add(ResourceFactory.newFileResource(drl), ResourceType.DRL);
if (builder.hasErrors()) {
System.out.println(builder.getErrors().toString());
//throw new RuntimeException(builder.getErrors().toString());
}
KnowledgeBase knowledgeBase = KnowledgeBaseFactory.newKnowledgeBase();
//Add to Knowledge Base packages from the builder which are actually the rules from the drl file.
knowledgeBase.addKnowledgePackages(builder.getKnowledgePackages());
KnowledgeBaseConfiguration config = KnowledgeBaseFactory.newKnowledgeBaseConfiguration();
config.setOption( EventProcessingOption.STREAM );
return knowledgeBase;
}
}
public class Employee {
private String Name;
private int Age;
// getter - setters
}
did you check the Drools Fusion documentation?
First of all, Employee doesn't sounds as a good idea for an Event. Events are meaningful changes of states of something related with your domain. So, an event could be the time of arrival of an Employee to the office, or the time of departure, but the Employee itself is a domain entity (or a fact for the rule engine) more than an event.
If you are interested in using Drools fusion temporal operators I recommend you to read about sliding windows (temporal ones) which will allow you to see what happen in the last ten seconds all the time. You don't need to use timers for that.
Cheers
You forgot telling what happened when you ran it, if you did.
If your entities set is not very large, I think this problem can be solved very easily with the base Drools distribution.
I have a similar app to yours and works for me:
rule "Clear only auxiliar fact"
salience 1
when
af: AuxFact()
then
DroolsRepository.retractFact(af);
end
rule "Clear auxiliar fact and an old meaningful fact"
salience 1000
when
af: AuxFact()
mf: MeaningfulFact()
then
if(DroolsRepository.getCurrentTimeMillis() - tmf.getCreationDate().getTime() > 5000){
DroolsRepository.retractFact(af);
DroolsRepository.retractFact(mf);
// YOUR MEANINGFUL CODE
}
else{
DroolsRepository.retractFact(af);}
end
query "getAllFacts"
$result: Fact()
end
and
// Boot rules re-executing thread.
new Thread(new Runnable(){
public void run(){
do{
kSession.insert(new AuxFact());
try{
Thread.sleep(99);}
catch(InterruptedException e){
e.printStackTrace();}}
while(true);}
}).start();
A similar approach could be simpler and effective.
I want to know why Code fragment 1 is faster than Code 2 using POCO's with Devart DotConnect for Oracle.
I tried it over 100000 records and Code 1 is way faster than 2. Why? I thought "SaveChanges" would clear the buffer making it faster as there is only 1 connection. Am I wrong?
Code 1:
for (var i = 0; i < 100000; i++)
{
using (var ctx = new MyDbContext())
{
MyObj obj = new MyObj();
obj.Id = i;
obj.Name = "Foo " + i;
ctx.MyObjects.Add(obj);
ctx.SaveChanges();
}
}
Code 2:
using (var ctx = new MyDbContext())
{
for (var i = 0; i < 100000; i++)
{
MyObj obj = new MyObj();
obj.Id = i;
obj.Name = "Foo " + i;
ctx.MyObjects.Add(obj);
ctx.SaveChanges();
}
}
The first code snippet works faster as the same connection is taken from the pool every time, so there are no performance losses on its re-opening.
In the second case 100000 objects gradually are added to the context. A slow snapshot-based tracking is used (if no dynamic proxy). This leads to the detection if any changes in any of cached objects occured on each SaveChanges(). More and more time is spent by each subsequent iteration.
We recommend you to try the following approach. It should have a better performance than the mentioned ones:
using (var ctx = new MyDbContext())
{
for (var i = 0; i < 100000; i++)
{
MyObj obj = new MyObj();
obj.Id = i;
obj.Name = "Foo " + i;
ctx.MyObjects.Add(obj);
}
ctx.SaveChanges();
}
EDIT
If you use an approach with executing large number of operations within one SaveChanges(), it will be useful to configure additionally the Entity Framework behaviour of Devart dotConnect for Oracle provider:
// Turn on the Batch Updates mode:
var config = OracleEntityProviderConfig.Instance;
config.DmlOptions.BatchUpdates.Enabled = true;
// If necessary, enable the mode of re-using parameters with the same values:
config.DmlOptions.ReuseParameters = true;
// If object has a lot of nullable properties, and significant part of them are not set (i.e., nulls), omitting explicit insert of NULL-values will decrease greatly the size of generated SQL:
config.DmlOptions.InsertNullBehaviour = InsertNullBehaviour.Omit;
Only some options are mentioned here. The full list of them is available in our article:
http://www.devart.com/blogs/dotconnect/index.php/new-features-of-entity-framework-support-in-dotconnect-providers.html
Am I wrong to assume that when SaveChanges() is called, all the
objects in cache are stored to DB and the cache is cleared, so each
loop is independent?
SaveChanges() sends and commits all changes to database, but change tracking is continued for all entities which are attached to the context. And new SaveChanges, if snapshot-based change tracking is used, will start a long process of checking (changed or not?) the values of each property for each object.
I have an issue implementing CCR with SQL. It seems that when I step through my code the updates and inserts I am trying to execute work great. But when I run through my interface without any breakpoints, it seems to be working and it shows the inserts, updates, but at the end of the run, nothing got updated to the database.
I proceeded to add a pause to my code every time I pull anew thread from my pool and it works... but that defeats the purpose of async coding right? I want my interface to be faster, not slow it down...
Any suggestions... here is part of my code:
I use two helper classes to set my ports and get a response back...
/// <summary>
/// Gets the Reader, requires connection to be managed
/// </summary>
public static PortSet<Int32, Exception> GetReader(SqlCommand sqlCommand)
{
Port<Int32> portResponse = null;
Port<Exception> portException = null;
GetReaderResponse(sqlCommand, ref portResponse, ref portException);
return new PortSet<Int32, Exception>(portResponse, portException);
}
// Wrapper for SqlCommand's GetResponse
public static void GetReaderResponse(SqlCommand sqlCom,
ref Port<Int32> portResponse, ref Port<Exception> portException)
{
EnsurePortsExist(ref portResponse, ref portException);
sqlCom.BeginExecuteNonQuery(ApmResultToCcrResultFactory.Create(
portResponse, portException,
delegate(IAsyncResult ar) { return sqlCom.EndExecuteNonQuery(ar); }), null);
}
then I do something like this to queue up my calls...
DispatcherQueue queue = CreateDispatcher();
String[] commands = new String[2];
Int32 result = 0;
commands[0] = "exec someupdateStoredProcedure";
commands[1] = "exec someInsertStoredProcedure '" + Settings.Default.RunDate.ToString() + "'";
for (Int32 i = 0; i < commands.Length; i++)
{
using (SqlConnection connSP = new SqlConnection(Settings.Default.nbfConn + ";MultipleActiveResultSets=true;Async=true"))
using (SqlCommand cmdSP = new SqlCommand())
{
connSP.Open();
cmdSP.Connection = connSP;
cmdSP.CommandTimeout = 150;
cmdSP.CommandText = "set arithabort on; " + commands[i];
Arbiter.Activate(queue, Arbiter.Choice(ApmToCcrAdapters.GetReader(cmdSP),
delegate(Int32 reader) { result = reader; },
delegate(Exception e) { result = 0; throw new Exception(e.Message); }));
}
}
where ApmToCcrAdapters is the class name where my helper methods are...
The problem is when I pause my code right after the call to Arbiter.Activate and I check my database, everything looks fine... if I get rid of the pause ad run my code through, nothing happens to the database, and no exceptions are thrown either...
The problem here is that you are calling Arbiter.Activate in the scope of your two using blocks. Don't forget that the CCR task you create is queued and the current thread continues... right past the scope of the using blocks. You've created a race condition, because the Choice must execute before connSP and cmdSP are disposed and that's only going to happen when you're interfering with the thread timings, as you have observed when debugging.
If instead you were to deal with disposal manually in the handler delegates for the Choice, this problem would no longer occur, however this makes for brittle code where it's easy to overlook disposal.
I'd recommend implementing the CCR iterator pattern and collecting results with a MulitpleItemReceive so that you can keep your using statements. It makes for cleaner code. Off the top of my head it would look something like this:
private IEnumerator<ITask> QueryIterator(
string command,
PortSet<Int32,Exception> resultPort)
{
using (SqlConnection connSP =
new SqlConnection(Settings.Default.nbfConn
+ ";MultipleActiveResultSets=true;Async=true"))
using (SqlCommand cmdSP = new SqlCommand())
{
Int32 result = 0;
connSP.Open();
cmdSP.Connection = connSP;
cmdSP.CommandTimeout = 150;
cmdSP.CommandText = "set arithabort on; " + commands[i];
yield return Arbiter.Choice(ApmToCcrAdapters.GetReader(cmdSP),
delegate(Int32 reader) { resultPort.Post(reader); },
delegate(Exception e) { resultPort.Post(e); });
}
}
and you could use it something like this:
var resultPort=new PortSet<Int32,Exception>();
foreach(var command in commands)
{
Arbiter.Activate(queue,
Arbiter.FromIteratorHandler(()=>QueryIterator(command,resultPort))
);
}
Arbiter.Activate(queue,
Arbiter.MultipleItemReceive(
resultPort,
commands.Count(),
(results,exceptions)=>{
//everything is done and you've got 2
//collections here, results and exceptions
//to process as you want
}
)
);
I am running an import that will have 1000's of records on each run. Just looking for some confirmation on my assumptions:
Which of these makes the most sense:
Run SaveChanges() every AddToClassName() call.
Run SaveChanges() every n number of AddToClassName() calls.
Run SaveChanges() after all of the AddToClassName() calls.
The first option is probably slow right? Since it will need to analyze the EF objects in memory, generate SQL, etc.
I assume that the second option is the best of both worlds, since we can wrap a try catch around that SaveChanges() call, and only lose n number of records at a time, if one of them fails. Maybe store each batch in an List<>. If the SaveChanges() call succeeds, get rid of the list. If it fails, log the items.
The last option would probably end up being very slow as well, since every single EF object would have to be in memory until SaveChanges() is called. And if the save failed nothing would be committed, right?
I would test it first to be sure. Performance doesn't have to be that bad.
If you need to enter all rows in one transaction, call it after all of AddToClassName class. If rows can be entered independently, save changes after every row. Database consistence is important.
Second option I don't like. It would be confusing for me (from final user perspective) if I made import to system and it would decline 10 rows out of 1000, just because 1 is bad. You can try to import 10 and if it fails, try one by one and then log.
Test if it takes long time. Don't write 'propably'. You don't know it yet. Only when it is actually a problem, think about other solution (marc_s).
EDIT
I've done some tests (time in miliseconds):
10000 rows:
SaveChanges() after 1 row:18510,534SaveChanges() after 100 rows:4350,3075SaveChanges() after 10000 rows:5233,0635
50000 rows:
SaveChanges() after 1 row:78496,929
SaveChanges() after 500 rows:22302,2835
SaveChanges() after 50000 rows:24022,8765
So it is actually faster to commit after n rows than after all.
My recommendation is to:
SaveChanges() after n rows.
If one commit fails, try it one by one to find faulty row.
Test classes:
TABLE:
CREATE TABLE [dbo].[TestTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[SomeInt] [int] NOT NULL,
[SomeVarchar] [varchar](100) NOT NULL,
[SomeOtherVarchar] [varchar](50) NOT NULL,
[SomeOtherInt] [int] NULL,
CONSTRAINT [PkTestTable] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Class:
public class TestController : Controller
{
//
// GET: /Test/
private readonly Random _rng = new Random();
private const string _chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
private string RandomString(int size)
{
var randomSize = _rng.Next(size);
char[] buffer = new char[randomSize];
for (int i = 0; i < randomSize; i++)
{
buffer[i] = _chars[_rng.Next(_chars.Length)];
}
return new string(buffer);
}
public ActionResult EFPerformance()
{
string result = "";
TruncateTable();
result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(10000, 1).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 100 rows:" + EFPerformanceTest(10000, 100).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 10000 rows:" + EFPerformanceTest(10000, 10000).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(50000, 1).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 500 rows:" + EFPerformanceTest(50000, 500).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 50000 rows:" + EFPerformanceTest(50000, 50000).TotalMilliseconds + "<br/>";
TruncateTable();
return Content(result);
}
private void TruncateTable()
{
using (var context = new CamelTrapEntities())
{
var connection = ((EntityConnection)context.Connection).StoreConnection;
connection.Open();
var command = connection.CreateCommand();
command.CommandText = #"TRUNCATE TABLE TestTable";
command.ExecuteNonQuery();
}
}
private TimeSpan EFPerformanceTest(int noOfRows, int commitAfterRows)
{
var startDate = DateTime.Now;
using (var context = new CamelTrapEntities())
{
for (int i = 1; i <= noOfRows; ++i)
{
var testItem = new TestTable();
testItem.SomeVarchar = RandomString(100);
testItem.SomeOtherVarchar = RandomString(50);
testItem.SomeInt = _rng.Next(10000);
testItem.SomeOtherInt = _rng.Next(200000);
context.AddToTestTable(testItem);
if (i % commitAfterRows == 0) context.SaveChanges();
}
}
var endDate = DateTime.Now;
return endDate.Subtract(startDate);
}
}
I just optimized a very similar problem in my own code and would like to point out an optimization that worked for me.
I found that much of the time in processing SaveChanges, whether processing 100 or 1000 records at once, is CPU bound. So, by processing the contexts with a producer/consumer pattern (implemented with BlockingCollection), I was able to make much better use of CPU cores and got from a total of 4000 changes/second (as reported by the return value of SaveChanges) to over 14,000 changes/second. CPU utilization moved from about 13 % (I have 8 cores) to about 60%. Even using multiple consumer threads, I barely taxed the (very fast) disk IO system and CPU utilization of SQL Server was no higher than 15%.
By offloading the saving to multiple threads, you have the ability to tune both the number of records prior to commit and the number of threads performing the commit operations.
I found that creating 1 producer thread and (# of CPU Cores)-1 consumer threads allowed me to tune the number of records committed per batch such that the count of items in the BlockingCollection fluctuated between 0 and 1 (after a consumer thread took one item). That way, there was just enough work for the consuming threads to work optimally.
This scenario of course requires creating a new context for every batch, which I find to be faster even in a single-threaded scenario for my use case.
If you need to import thousands of records, I'd use something like SqlBulkCopy, and not the Entity Framework for that.
MSDN docs on SqlBulkCopy
Use SqlBulkCopy to Quickly Load Data from your Client to SQL Server
Transferring Data Using SqlBulkCopy
Use a stored procedure.
Create a User-Defined Data Type in Sql Server.
Create and populate an array of this type in your code (very fast).
Pass the array to your stored procedure with one call (very fast).
I believe this would be the easiest and fastest way to do this.
Sorry, I know this thread is old, but I think this could help other people with this problem.
I had the same problem, but there is a possibility to validate the changes before you commit them. My code looks like this and it is working fine. With the chUser.LastUpdated I check if it is a new entry or only a change. Because it is not possible to reload an Entry that is not in the database yet.
// Validate Changes
var invalidChanges = _userDatabase.GetValidationErrors();
foreach (var ch in invalidChanges)
{
// Delete invalid User or Change
var chUser = (db_User) ch.Entry.Entity;
if (chUser.LastUpdated == null)
{
// Invalid, new User
_userDatabase.db_User.Remove(chUser);
Console.WriteLine("!Failed to create User: " + chUser.ContactUniqKey);
}
else
{
// Invalid Change of an Entry
_userDatabase.Entry(chUser).Reload();
Console.WriteLine("!Failed to update User: " + chUser.ContactUniqKey);
}
}
_userDatabase.SaveChanges();