Drools rules not firing from Akka actor system - scala

We've built a Drools module in Scala which runs just fine when called separately, however now we're integrating it into an Akka actor system we've built to be able to fire rules via REST calls.
For some reason no rules are firing whatsoever, even blank rules such as:
rule "sample 1"
salience 1000
auto-focus true
when
then
System.out.println("Well, that finally worked!");
end
The KieContainer, Session etc. seem to be fine and the objects (facts) are being inserted correctly (verified by checking the Fact Count). (The KieServices and KieContainer are being initialised at boot level, i.e. before the actors are created, and used at a later stage.) The strange thing is that when running kieSession.fireallrules() the total number of rules fired is always 0 and the facts aren't updated.
Using Akka, we're sending an object (of type MyObject) in JSON format via REST. An actor is created per REST request and calls the Drools module as below:
A new actor is created to call the Drools Engine.
A new KieSession is created using the KieServices set at Boot level. [For those who've seen my previous posts, yes the following is Scala code]
val kieSession = DroolsMgt.getKieSession(List("myFile.drl"), Boot.kieServices)
where getKieSession is calling the following:
val kfs = kieServices.newKieFileSystem()
for (filename <- drlFiles) {
val fis = new FileInputStream(filename)
kfs.write(filename, kieServices.getResources.newInputStreamResource(fis))
}
val kieBuilder = kieServices.newKieBuilder(kfs).buildAll()
val kieContainer = kieServices.newKieContainer(kieServices.getRepository.getDefaultReleaseId)
kieContainer.newKieSession()
The object received via REST (which is extracted from the JSON format) is then loaded into Drools Memory via ksession.insert(testObject) and the object's FactHandle is saved
The rules are then fired and the updated object is returned using its FactHandle as follows:
ksession.fireAllRules()
val getObject = ksession.getObject(myObjectFH)
ksession.dispose()
getObject.asInstanceOf[MyObject]
As before, this works when running the Drools Module on its own but not when using the actor system as above. I've even tried firing empty rules and printing text out onto the screen for debugging purposes but literally no rules are being fired. I am sure I'm calling the right DRL file and the right KieSession but can't figure out what's going wrong here. (Is there any way to check the number of rules in a KieSession?)
Any ideas?
EDIT:
After looking into laune's suggestion I found that there weren't any KiePackages being loaded into the KieBase. I've narrowed this down to the files not being loaded as KieResources at kfs.write("src/main/resources/testFile.drl", kieServices.getResources().newInputStreamResource(fis))
Any idea what might be causing this?
For reference, I'm loading DRL files into the KieContainer and creating the KieSession (successfully) as follows:
val kieServices = KieServices.Factory.get()
val kfs = kieServices.newKieFileSystem()
val fis = new FileInputStream("src/main/resources/testFile.drl")
kfs.write("src/main/resources/testFile.drl", kieServices.getResources().newInputStreamResource(fis))
val kieBuilder = kieServices.newKieBuilder(kfs).buildAll()
val results = kieBuilder.getResults()
if (results.hasMessages(Message.Level.ERROR)) {
throw new RuntimeException(results.getMessages().toString())
}
val kieContainer = kieServices.newKieContainer(kieServices.getRepository().getDefaultReleaseId())
kieContainer.newKieSession()

The following code will not fix your problem but it should help you diagnose whether you really run the rules and the session you think you do. I'm using Java notation.
KieSession kieSession = ...
KieBase kieBase = kieSession.getKieBase();
Collection<KiePackage> kiePackages = kieBase.getKiePackages();
for( KiePackage kiePackage: kiePackages ){
for( Rule rule: kiePackage.getRules() ){
System.out.println( rule.getName() );
}
}

Related

How do I ensure that my Apache Spark setup code runs only once?

I'm writing a Spark job in Scala that reads in parquet files on S3, does some simple transforms, and then saves them to a DynamoDB instance. Each time it runs we need to create a new table in Dynamo so I've written a Lambda function which is responsible for table creation. The first thing my Spark job does is generates a table name, invokes my Lambda function (passing the new table name to it), waits for the table to be created, and then proceeds normally with the ETL steps.
However it looks as though my Lambda function is consistently being invoked twice. I cannot explain that. Here's a sample of the code:
def main(spark: SparkSession, pathToParquet: String) {
// generate a unique table name
val tableName = generateTableName()
// call the lambda function
val result = callLambdaFunction(tableName)
// wait for the table to be created
waitForTableCreation(tableName)
// normal ETL pipeline
var parquetRDD = spark.read.parquet(pathToParquet)
val transformedRDD = parquetRDD.map((row: Row) => transformData(row), encoder=kryo[(Text, DynamoDBItemWritable)])
transformedRDD.saveAsHadoopDataset(getConfiguration(tableName))
spark.sparkContext.stop()
}
The code to wait for table creation is pretty-straightforward, as you can see:
def waitForTableCreation(tableName: String) {
val client: AmazonDynamoDB = AmazonDynamoDBClientBuilder.defaultClient()
val waiter: Waiter[DescribeTableRequest] = client.waiters().tableExists()
try {
waiter.run(new WaiterParameters[DescribeTableRequest](new DescribeTableRequest(tableName)))
} catch {
case ex: WaiterTimedOutException =>
LOGGER.error("Timed out waiting to create table: " + tableName)
throw ex
case t: Throwable => throw t
}
}
And the lambda invocation is equally simple:
def callLambdaFunction(tableName: String) {
val myLambda = LambdaInvokerFactory.builder()
.lambdaClient(AWSLambdaClientBuilder.defaultClient)
.lambdaFunctionNameResolver(new LambdaByName(LAMBDA_FUNCTION_NAME))
.build(classOf[MyLambdaContract])
myLambda.invoke(new MyLambdaInput(tableName))
}
Like I said, when I run spark-submit on this code, it definitely does hit the Lambda function. But I can't explain why it hits it twice. The result is that I get two tables provisioned in DynamoDB.
The waiting step also seems to fail within the context of running this as a Spark job. But when I unit-test my waiting code it seems to work fine on its own. It successfully blocks until the table is ready.
At first I theorized that perhaps spark-submit was sending this code to all of the worker nodes and they were independently running the whole thing. Initially I had a Spark cluster with 1 master and 2 workers. However I tested this out on another cluster with 1 master and 5 workers, and there again it hit the Lambda function exactly twice, and then apparently failed to wait for table creation because it dies shortly after invoking the Lambdas.
Does anyone have any clues as to what Spark might be doing? Am I missing something obvious?
UPDATE: Here's my spark-submit args which are visible on the Steps tab of EMR.
spark-submit --deploy-mode cluster --class com.mypackage.spark.MyMainClass s3://my-bucket/my-spark-job.jar
And here's the code for my getConfiguration function:
def getConfiguration(tableName: String) : JobConf = {
val conf = new Configuration()
conf.set("dynamodb.servicename", "dynamodb")
conf.set("dynamodb.input.tableName", tableName)
conf.set("dynamodb.output.tableName", tableName)
conf.set("dynamodb.endpoint", "https://dynamodb.us-east-1.amazonaws.com")
conf.set("dynamodb.regionid", "us-east-1")
conf.set("mapred.output.format.class", "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
conf.set("mapred.input.format.class", "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
new JobConf(conf)
}
Also here is a Gist containing some of the exception logs I see when I try to run this.
Thanks #soapergem for adding logging and options. I add an answer (a try one) since it may be a little bit longer than a comment :)
To wrap-up:
nothing strange with spark-submit and configuration options
in https://gist.github.com/soapergem/6b379b5a9092dcd43777bdec8dee65a8#file-stderr-log you can see that the application is executed twice. It passes twice from an ACCEPTED to RUNNING state. And that's consistent with EMR defaults (How to prevent EMR Spark step from retrying?). To confirm that, you can check whether you have 2 tables created after executing the step (I suppose here that you're generating tables with dynamic names; a different name per execution which in case of retry should give 2 different names)
For your last question:
It looks like my code might work if I run it in "client" deploy mode, instead of "cluster" deploy mode? Does that offer any hints to anyone here?
For more information about the difference, please check https://community.hortonworks.com/questions/89263/difference-between-local-vs-yarn-cluster-vs-yarn-c.html In your case, it looks like the machine executing spark-submit in client mode has different IAM policies than the EMR jobflow. My supposition here is that your jobflow role is not allowed to dynamodb:Describe* and that's why you're getting the exception with 500 code (from your gist):
Caused by: com.amazonaws.services.dynamodbv2.model.ResourceNotFoundException: Requested resource not found: Table: EmrTest_20190708143902 not found (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ResourceNotFoundException; Request ID: V0M91J7KEUVR4VM78MF5TKHLEBVV4KQNSO5AEMVJF66Q9ASUAAJG)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:4243)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:4210)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeDescribeTable(AmazonDynamoDBClient.java:1890)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.describeTable(AmazonDynamoDBClient.java:1857)
at org.apache.hadoop.dynamodb.DynamoDBClient$1.call(DynamoDBClient.java:129)
at org.apache.hadoop.dynamodb.DynamoDBClient$1.call(DynamoDBClient.java:126)
at org.apache.hadoop.dynamodb.DynamoDBFibonacciRetryer.runWithRetry(DynamoDBFibonacciRetryer.java:80)
To confirm this hypothesis, you an execute your part creating the table and waiting for creation locally (no Spark code here, just a simple java command of your main function) and:
for the first execution ensure that you have all permissions. IMO it will be dynamodb:Describe* on Resources: * (if it's the reason, AFAIK you should use somthing Resources: Test_Emr* in production for principle of least privilege )
for the 2nd execution remove dynamodb:Describe* and check whether you're getting the same stack trace like in the gist
I encountered the same problem in cluster mode too (v2.4.0). I workaround it by launching my apps programmatically using SparkLauncher instead of using spark-submit.sh. You could move your lambda logic into your main method that starts your spark app like this:
def main(args: Array[String]) = {
// generate a unique table name
val tableName = generateTableName()
// call the lambda function
val result = callLambdaFunction(tableName)
// wait for the table to be created
waitForTableCreation(tableName)
val latch = new CountDownLatch(1);
val handle = new SparkLauncher(env)
.setAppResource("/path/to/spark-app.jar")
.setMainClass("com.company.SparkApp")
.setMaster("yarn")
.setDeployMode("cluster")
.setConf("spark.executor.instances", "2")
.setConf("spark.executor.cores", "2")
// other conf ...
.setVerbose(true)
.startApplication(new SparkAppHandle.Listener {
override def stateChanged(sparkAppHandle: SparkAppHandle): Unit = {
latch.countDown()
}
override def infoChanged(sparkAppHandle: SparkAppHandle): Unit = {
}
})
println("app is launching...")
latch.await()
println("app exited")
}
your spark job starts before the table is actually created because defining operations one by one doesn't mean they will wait until previous one is finished
you need to change the code so that block related to spark is starting after table is created, and in order to achieving it you have to either use for-comprehension that insures every step is finished or put your spark pipeline into the callback of waiter called after the table is created (if you have any, hard to tell)
you can also use andThen or simple map
the main point is that all the lines of code written in your main are executed one by one immediately without waiting for previous one to finish

How to ensure if drl file is loaded by KIE

Is there any quick way to check programmatically if DRL file is loaded successfully by the drools library within our web application? BTW, I am developing soap based web service using drools. For ex: listing out all the rule names present in the knowledge base at a certain time etc.
Please help.
This is what I am doing to load the drl file from centOS filesystem:
String drlFile = "/tmp/conf/object.drl";
ks = KieServices.Factory.get();
KieFileSystem kfs = ks.newKieFileSystem();
FileInputStream fis = new FileInputStream( drlFile );
kfs.write("/Drools/Object.drl",ks.getResources().newInputStreamResource( fis ));
KieBuilder kieBuilder = ks.newKieBuilder( kfs ).buildAll();
The simplest way to detect errors on a build all is to pull up the messages on the results from the buildAll().
kieBuilder.getResults().hasMessages(Message.Level.ERROR)
or
kieBuilder.getResults().getMessages()
as a side note*
I had a similar issue where buildAll() was not reading the file I wrote inside the kfs. The symptoms were the same where no packages were in the collection.
To fix that issue I had to add a specific resource type to the inputStream.
e.g.
kfs.write("/Drools/Object.drl",ks.getResources().newInputStreamResource( fis ).setResourceType(ResourceType.DRL));
For determining the packages inside the ksession I always have used.
ksession.getKieBase().getKiePackages()
For detecting the files my builder knew about and loaded I used this
((MemoryKieModule)kb.getKieModule()).getFileNames()

JBoss Drools - how to get data (facts) from java to DRL

How can I get fact definied by user in GUI and insert it to DRL?
For example: The user has chosen black car in GUI (JavaFX), and now I want to use that fact in DRL code. How to send that info about black car to DRL? Should i use POJO?
If you want to execute rules that you have written in DRL file you have to create a POJO and using KieSession you can execute your rules. For example,
val pojo = new POJO('POJO arguments')
val kieServices = KieServices.Factory.get()
val kieContainer = kieServices.newKieClasspathContainer()
val kieSession = kContainer.newKieSession()
kieSession.insert(pojo)
kieSession.fireAllRules()
Read this documentation. You can get all the drool-API examples here

Getting a handle of POJOs inside my kjar

I just set up a kie-workbench (6.1.0 Final) on tomcat and created an example demo-project which contains a drl file and a big flat POJO created with the data modeller.
I built and deployed the demo-project and managed to fire the rules from a client application using the code below:
String url = "http://yytomcat7kie.domain.com:8080/kie/maven2/gro/up/demoproject/0.0.3/demoproject-0.0.3.jar";
ReleaseIdImpl releaseId = new ReleaseIdImpl("gro.up", "demoproject", "0.0.3");
KieServices ks = KieServices.Factory.get();
KieFileSystem kfs = ks.newKieFileSystem();
UrlResource urlResource = (UrlResource) ResourceFactory.newUrlResource(url);
kfs.write(urlResource);
KieBuilder kieBuilder = ks.newKieBuilder(kfs).buildAll();
KieContainer kContainer = ks.newKieContainer(releaseId);
KieSession kSession = kContainer.newKieSession();
SessionConfiguration sConf = (SessionConfiguration)kSession.getSessionConfiguration();
MyKiePojo kiePojo = new MyKiePojo();
kiePojo.setField01("blah");
kiePojo.setField02("blahblah");
kiePojo.setField03("blahblahblah");
kSession.insert(kiePojo);
kSession.fireAllRules();
System.out.println(" ALL RULES FIRED ");
System.out.println(kiePojo.getField04());
System.out.println(kiePojo.getField05());
It works fine but the question I have now is:
Is it possible to get a handle of the MyKiePojo class which is in the demoproject.jar without having it in the client app's classpath? Ideally I would like to keep all my models in the workbench without having to mirror them in the client app and be able to instantiate them and populate them with values received from rest requests. Is this possible?
A KieContainer when used with dynamic modules keeps all the jars it loads in an isolated ClassLoader. So you can put your models into their own jar and specify them as a maven dependency on the project being deployed. If you are using kie-ci it will resolve the transitive dependencies and build a ClassLoader from them.
Externally you can use reflection to access the pojos in that CassLoader, or you can have an initialisation rule that calls out to a static initialisation method. Where that static initializer method is any class in the jar or one of the dependant jars.
What we don't have yet is a life cycle for KieContainers and KieSession to automate certain things via callbacks. This is definitely something we need to look into, and I expect it to be in the next (after 6.2) release.
See the documentation chapter "Rule Language Reference", section "Type Declaration". A quick example taken from there:
declare Address
number : int
streetName : String
city : String
end
You can create objects using new and use getters and setters etc.
You'll have to code the transformation from the request to this object.

Inserting a new object to StatelessKnowledgeSession

I've a simple rule like this:
rule "First Rule" //You do not talk about Fight Club
when
MyInp(id=="1")
then
insert(new MyOut(true));
end
What I want is, getting the created MyOut object from a Java class.
Is there a way to do this or do I have to pass a global variable and update it inside the rule?
With a stateless session you can modify a fact which you insert, or register a global which you update.
If you have a stateful session then there are a few more options:
Fetching fact from stateful session of drools
If you're looking for a general means of examining what existed in working memory during your stateless session,
StatelessKnowledgeSession ksession =
kbase.newStatelessKnowledgeSession();
TrackingWorkingMemoryEventListener listener =
new TrackingWorkingMemoryEventListener();
ksession.addEventListener(listener);
List<Object> facts = new ArrayList<Object>();
facts.add(myRequestFact);
ksession.execute(facts);
List<ObjectInsertedEvent> insertions = listener.getInsertions();
It's handy for debugging and audit purposes, but I wouldn't recommend it as a means of getting the actual results out of a request. Example code (by me) for a tracking WorkingMemoryEventListener can be found here:
https://github.com/gratiartis/sctrcd-payment-validation-web/blob/master/src/main/java/com/sctrcd/drools/util/TrackingWorkingMemoryEventListener.java