why esper_ext timed does not filter out old entries - complex-event-processing

I need help with understanding of the win_ext window in Esper (CEP). I'm wondering why older (first 2) events still popup on the update-method even though they have been "expired"
public class MyCepTest {
public static void main(String...args) throws Exception{
System.out.println("starting");
MyCepTest ceptest = new MyCepTest();
ceptest.execute();
System.out.println("end");
}
public void execute() throws Exception{
Configuration config = new Configuration();
config.addEventType(MyPojo.class);
EPServiceProvider epService = EPServiceProviderManager.getDefaultProvider(config);
EPAdministrator admin = epService.getEPAdministrator();
EPStatement x1 = admin.createEPL(win);
EPStatement x2 = admin.createEPL(win2);
x1.setSubscriber(this);
x2.setSubscriber(this);
EPRuntime runtime = epService.getEPRuntime();
ArrayList<MyPojo> staffToSendToCep = new ArrayList<MyPojo>();
staffToSendToCep.add(new MyPojo(1, new Date(1490615719497L)));
staffToSendToCep.add(new MyPojo(2, new Date(1490615929497L)));
for(MyPojo pojo : staffToSendToCep){
runtime.sendEvent(pojo);
}
Thread.sleep(500);
System.out.println("round 2...");//why two first Pojos are still found? Shouldn't ext_timed(pojoTime.time, 300 seconds) rule them out?
staffToSendToCep.add(new MyPojo(3, new Date(1490616949497L)));
for(MyPojo pojo : staffToSendToCep){
runtime.sendEvent(pojo);
}
}
public void update(Map<String,Object> map){
System.out.println(map);
}
public static String win = "create window fiveMinuteStuff.win:ext_timed(pojoTime.time, 300 seconds)(pojoId int, pojoTime java.util.Date)";
public static String win2 = "insert into fiveMinuteStuff select pojoId,pojoTime from MyPojo";
}
class MyPojo{
int pojoId;
Date pojoTime;
MyPojo(int pojoId, Date date){
this.pojoId = pojoId;
this.pojoTime = date;
}
public int getPojoId(){
return pojoId;
}
public Date getPojoTime(){
return pojoTime;
}
public String toString(){
return pojoId+"#"+pojoTime;
}
}
I've been puzzled with this for a while and help would be greatly appreciated

See the processing model in docs. http://espertech.com/esper/release-6.0.1/esper-reference/html/processingmodel.html
All incoming insert-stream events are delivered to listeners and subscribers. regardless of your window. A window, if one is in the query at all, defines the subsets of events to consider and therefore defines what gets aggregated, pattern-matched or is available for iteration. Try "select * from MyPojo" for reference. My advice to read up on external time, see http://espertech.com/esper/release-6.0.1/esper-reference/html/api.html#api-controlling-time
Usually when you want "external time window" you want event time to drive engine time.

Related

How to optimize SQL query in Anylogic

I am generating Agents with parameter values coming from SQL table in Anylogic. when agent is generated at source I am doing a v look up in table and extracting corresponding values from table. For now it is working perfectly but it is slowing down the performance.
Structure of Table looks like this
I am querying the data from this table with below code
double value_1 = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.avg_value)).get(0);
double value_min = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.min_value)).get(0);
double value_max = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.max_value)).get(0);
// Fetch the cluster number from account table
int cluster_num = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.cluster)).get(0);
int act_no = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.actno)).get(0);
String pay_term = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.pay_term)).get(0);
String pay_term_prob = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.pay_term_prob)).get(0);
But this is very slow and wants to improve the performance. someone mentioned that we can create a Java class and then add the table into collection . Is there any example where I can refer. I am finding it difficult to put entire code.
I have created a class using below code:
public class Customer {
private String act_code;
private int actno;
private double avg_value;
private String pay_term;
private String pay_term_prob;
private int cluster;
private double min_value;
private double max_value;
public String getact_code() {
return act_code;
}
public void setact_code(String act_code) {
this.act_code = act_code;
}
public int getactno() {
return actno;
}
public void setactno(int actno) {
this.actno = actno;
}
public double getavg_value() {
return avg_value;
}
public void setavg_value(double avg_value) {
this.avg_value = avg_value;
}
public String getpay_term() {
return pay_term;
}
public void setpay_term(String pay_term) {
this.pay_term = pay_term;
}
public String getpay_term_prob() {
return pay_term_prob;
}
public void setpay_term_prob(String pay_term_prob) {
this.pay_term_prob = pay_term_prob;
}
public int cluster() {
return cluster;
}
public void setcluster(int cluster) {
this.cluster = cluster;
}
public double getmin_value() {
return min_value;
}
public void setmin_value(double min_value) {
this.min_value = min_value;
}
public double getmax_value() {
return max_value;
}
public void setmax_value(double max_value) {
this.max_value = max_value;
}
}
Created collection object like this:
Pls provide an reference to add this database table into collection as a next step. then I want to query the collection based on the condition
You are on the right track here!
Every time you access the database to read data there is a computational overhead. So the best option is to access the database only once, at the start of the model. Create all the objects you need, store other data you will need later into Java classes, and then use the Java classes.
My suggestion is to create a Java class for each row in your table, like you have done. And then create a map object - like you have done, but with the key as String and the value as this new object.
Then on model start you can populate this map as follows:
List<Tuple> rows = selectFrom(customer).list();
for (Tuple row : rows) {
Customer customerData = new Customer(
row.get( customer.act_code ),
row.get( customer.actno ),
row.get( customer.avg_value )
);
mapOfCustomerData.put(customerData.act_code, customerData);
}
Where mapOfCustomerData is a linkedHashMap and customer is the name of the table
See the model created in this blog post for more details and an example on using a scenario object to store all the data from the Database in a separate object
Note: The code above is just an example - read this blog post for more details on using the AnyLogic INternal Database
Before using Java classes, try this first: click the "index" tickbox for all columns that you query with a WHERE clause.

Apache beam stream processing event time

I'm trying to make an event processing stream using apache beam.
Steps which happen in my stream:
Read from kafka topics in avro format & deserialize avro using schema registry
Create Fixed Size window (1 hour) with triggering every 10 min (processing time)
Write avro files in GCP dividing directories by topic name. (filename = schema + start-end-window-pane)
Now let's deep into code.
This code shows how I read from Kafka. I use custom deserializer and coder to deserialize properly using schema registry (in my case it's hortonworks).
KafkaIO.<String, AvroGenericRecord>read()
.withBootstrapServers(bootstrapServers)
.withConsumerConfigUpdates(configUpdates)
.withTopics(inputTopics)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializerAndCoder(BeamKafkaAvroGenericDeserializer.class, AvroGenericCoder.of(serDeConfig()))
.commitOffsetsInFinalize()
.withoutMetadata();
In pipeline after reading records by KafkaIO is creating windowing.
records.apply(Window.<AvroGenericRecord>into(FixedWindows.of(Duration.standardHours(1)))
.triggering(AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMinutes(10)))
.withLateFirings(AfterPane.elementCountAtLeast(1))
)
.withAllowedLateness(Duration.standardMinutes(5))
.discardingFiredPanes()
)
What I want to achieve by this window is to group data by event time every 1 hour and trigger every 10 min.
After grouping by a window it starts writing into Google Cloud Storage (GCS).
public class WriteAvroFilesTr extends PTransform<PCollection<AvroGenericRecord>, WriteFilesResult<AvroDestination>> {
private String baseDir;
private int numberOfShards;
public WriteAvroFilesTr(String baseDir, int numberOfShards) {
this.baseDir = baseDir;
this.numberOfShards = numberOfShards;
}
#Override
public WriteFilesResult<AvroDestination> expand(PCollection<AvroGenericRecord> input) {
ResourceId tempDir = getTempDir(baseDir);
return input.apply(AvroIO.<AvroGenericRecord>writeCustomTypeToGenericRecords()
.withTempDirectory(tempDir)
.withWindowedWrites()
.withNumShards(numberOfShards)
.to(new DynamicAvroGenericRecordDestinations(baseDir, Constants.FILE_EXTENSION))
);
}
private ResourceId getTempDir(String baseDir) {
return FileSystems.matchNewResource(baseDir + "/temp", true);
}
}
And
public class DynamicAvroGenericRecordDestinations extends DynamicAvroDestinations<AvroGenericRecord, AvroDestination, GenericRecord> {
private static final DateTimeFormatter formatter = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss");
private final String baseDir;
private final String fileExtension;
public DynamicAvroGenericRecordDestinations(String baseDir, String fileExtension) {
this.baseDir = baseDir;
this.fileExtension = fileExtension;
}
#Override
public Schema getSchema(AvroDestination destination) {
return new Schema.Parser().parse(destination.jsonSchema);
}
#Override
public GenericRecord formatRecord(AvroGenericRecord record) {
return record.getRecord();
}
#Override
public AvroDestination getDestination(AvroGenericRecord record) {
Schema schema = record.getRecord().getSchema();
return AvroDestination.of(record.getName(), record.getDate(), record.getVersionId(), schema.toString());
}
#Override
public AvroDestination getDefaultDestination() {
return new AvroDestination();
}
#Override
public FileBasedSink.FilenamePolicy getFilenamePolicy(AvroDestination destination) {
String pathStr = baseDir + "/" + destination.name + "/" + destination.date + "/" + destination.name;
return new WindowedFilenamePolicy(FileBasedSink.convertToFileResourceIfPossible(pathStr), destination.version, fileExtension);
}
private static class WindowedFilenamePolicy extends FileBasedSink.FilenamePolicy {
final ResourceId outputFilePrefix;
final String fileExtension;
final Integer version;
WindowedFilenamePolicy(ResourceId outputFilePrefix, Integer version, String fileExtension) {
this.outputFilePrefix = outputFilePrefix;
this.version = version;
this.fileExtension = fileExtension;
}
#Override
public ResourceId windowedFilename(
int shardNumber,
int numShards,
BoundedWindow window,
PaneInfo paneInfo,
FileBasedSink.OutputFileHints outputFileHints) {
IntervalWindow intervalWindow = (IntervalWindow) window;
String filenamePrefix =
outputFilePrefix.isDirectory() ? "" : firstNonNull(outputFilePrefix.getFilename(), "");
String filename =
String.format("%s-%s(%s-%s)-(%s-of-%s)%s", filenamePrefix,
version,
formatter.print(intervalWindow.start()),
formatter.print(intervalWindow.end()),
shardNumber,
numShards - 1,
fileExtension);
ResourceId result = outputFilePrefix.getCurrentDirectory();
return result.resolve(filename, RESOLVE_FILE);
}
#Override
public ResourceId unwindowedFilename(
int shardNumber, int numShards, FileBasedSink.OutputFileHints outputFileHints) {
throw new UnsupportedOperationException("Expecting windowed outputs only");
}
#Override
public void populateDisplayData(DisplayData.Builder builder) {
builder.add(
DisplayData.item("fileNamePrefix", outputFilePrefix.toString())
.withLabel("File Name Prefix"));
}
}
}
I've written down the whole of my pipeline. It kind of works well but I have misunderstood (not sure) that I handle events by event time.
Could someone review my code (especially 1 & 2 steps where I read and group by windows) either it windows by event time or not?
P.S. For every record in Kafka I have timestamp field inside.
UPD
Thanks jjayadeep
I include in KafkaIO custom TimestampPolicy
static class CustomTimestampPolicy extends TimestampPolicy<String, AvroGenericRecord> {
protected Instant currentWatermark;
CustomTimestampPolicy(Optional<Instant> previousWatermark) {
this.currentWatermark = previousWatermark.orElse(BoundedWindow.TIMESTAMP_MIN_VALUE);
}
#Override
public Instant getTimestampForRecord(PartitionContext ctx, KafkaRecord<String, AvroGenericRecord> record) {
currentWatermark = Instant.ofEpochMilli(record.getKV().getValue().getTimestamp());
return currentWatermark;
}
#Override
public Instant getWatermark(PartitionContext ctx) {
return currentWatermark;
}
}
From the documentation present here [1] event time is used as the processing time by default in KafkaIO
By default, record timestamp (event time) is set to processing time in KafkaIO reader and source watermark is current wall time. If a topic has Kafka server-side ingestion timestamp enabled ('LogAppendTime'), it can enabled with KafkaIO.Read.withLogAppendTime(). A custom timestamp policy can be provided by implementing TimestampPolicyFactory. See KafkaIO.Read.withTimestampPolicyFactory(TimestampPolicyFactory) for more information.
Also processing time is the default timestamp method used as documented below
// set event times and watermark based on LogAppendTime. To provide a custom
// policy see withTimestampPolicyFactory(). withProcessingTime() is the default.
1 - https://beam.apache.org/releases/javadoc/2.4.0/org/apache/beam/sdk/io/kafka/KafkaIO.html

Update facts in Decision Table : Drools

I have a drools decision table in excel spreadsheet with two rules. (This example has been greatly simplified, the one I am working with has alot more rules.)
The first rule checks if amount is more than or equal to 500. If it is, then it sets status to 400.
The second rule checks if status is 400. If it is, then it sets the message variable.
The problem is, I am unable to get the second rule to fire, even though sequential is set. I also have to use no-loop and lock-on-active to prevent infinite looping.
My goal is to get the rules to fire top down, and the rules that come after might depend on changes made to the fact/object by earlier rules.
Is there a solution to this problem?
Any help would be appreciated, thanks!
package com.example;
import org.kie.api.KieServices;
import org.kie.api.runtime.KieContainer;
import org.kie.api.runtime.KieSession;
public class SalaryTest {
public static final void main(String[] args) {
try {
// load up the knowledge base
KieServices ks = KieServices.Factory.get();
KieContainer kContainer = ks.getKieClasspathContainer();
KieSession kSession = kContainer.newKieSession("ksession-dtables");
Salary a = new Salary();
a.setAmount(600);
kSession.insert(a);
kSession.fireAllRules();
} catch (Throwable t) {
t.printStackTrace();
}
}
public static class Salary{
private String message;
private int amount;
private int status;
public String getMessage() {
return message;
}
public void setMessage(String message) {
this.message = message;
}
public int getAmount() {
return amount;
}
public void setAmount(int amount) {
this.amount = amount;
}
public int getStatus() {
return status;
}
public void setStatus(int status) {
this.status = status;
}
}
}
The attribute lock-on-active countermands any firings after the first from the group of rules with the same agenda group. Remove this column.
Don't plan to have rules firing in a certain order. Write logic that describes exactly the state of a fact as it should trigger the rule. Possibly you'll have to write
rule "set status"
when
$s: Salary( amount >= 500.0 && < 600.0, status == 0 )
then
modify( $s ){ setStatus( 400 ) }
end
to avoid more than one status setting to happen or just the right setting to happen. But you'll find that your rules may be more outspoken and easier to read.
Think of rule attributes are a last resort.
Please replace the action in the column H in the following way:
Current solution:
a.setStatus($param);update(a);
New solution:
modify(a) {
setStatus($param)
}

Create observables using straight methods

I need to recollect some data calling to a method is connecting to a webservice.
problem: Imagine I need to update the content text of a label control according to this remote gathered information. Until all this data is recollected I'm not going to be able to show the label.
desired: I'd like to first show the label with a default text, and as I'm receiving this information I want to update the label content (please, don't take this description as a sucked code, I'm trying to brief my real situation).
I'd like to create an observable sequence of these methods. Nevertheless, these method have not the same signature. For example:
int GetInt() {
return service.GetInt();
}
string GetString() {
return service.GetString();
}
string GetString2 {
return service.GetString2();
}
These methods are not async.
Is it possible to create an observable sequence of these methods?
How could I create it?
Nevertheless, which's the best alternative to achieve my goal?
Creating custom observable sequences can be achieved with the Observable.Create. An example using your requirements is shown below:
private int GetInt()
{
Thread.Sleep(1000);
return 1;
}
private string GetString()
{
Thread.Sleep(1000);
return "Hello";
}
private string GetString2()
{
Thread.Sleep(2000);
return "World!";
}
private IObservable<string> RetrieveContent()
{
return Observable.Create<string>(
observer =>
{
observer.OnNext("Default Text");
int value = GetInt();
observer.OnNext($"Got value {value}. Getting string...");
string string1 = GetString();
observer.OnNext($"Got string {string1}. Getting second string...");
string string2 = GetString2();
observer.OnNext(string2);
observer.OnCompleted();
return Disposable.Empty;
}
);
}
Note how I have emulated network delay by introducing a Thread.Sleep call into each of the GetXXX methods. In order to ensure your UI doesn't hang when subscribing to this observable, you should subscribe as follows:
IDisposable subscription = RetrieveContent()
.SubscribeOn(TaskPoolScheduler.Default)
.ObserveOn(DispatcherScheduler.Current)
.Subscribe(text => Label = text);
This code uses the .SubscribeOn(TaskPoolScheduler.Default) extension method to use a TaskPool thread to start the observable sequence and will be blocked by the calls the Thread.Sleep but, as this is not the UI thread, your UI will remain responsive. Then, to ensure we update the UI on the UI thread, we use the ".ObserveOn(DispatcherScheduler.Current)" to invoke the updates onto the UI thread before setting the (data bound) Label property.
Hope this is what you were looking for, but leave a comment if not and I'll try to help further.
I would look at creating a wrapper class for your service to expose the values as separate observables.
So, start with a service interface:
public interface IService
{
int GetInt();
string GetString();
string GetString2();
}
...and then you write ServiceWrapper:
public class ServiceWrapper : IService
{
private IService service;
private Subject<int> subjectGetInt = new Subject<int>();
private Subject<string> subjectGetString = new Subject<string>();
private Subject<string> subjectGetString2 = new Subject<string>();
public ServiceWrapper(IService service)
{
this.service = service;
}
public int GetInt()
{
var value = service.GetInt();
this.subjectGetInt.OnNext(value);
return value;
}
public IObservable<int> GetInts()
{
return this.subjectGetInt.AsObservable();
}
public string GetString()
{
var value = service.GetString();
this.subjectGetString.OnNext(value);
return value;
}
public IObservable<string> GetStrings()
{
return this.subjectGetString.AsObservable();
}
public string GetString2()
{
var value = service.GetString2();
this.subjectGetString2.OnNext(value);
return value;
}
public IObservable<string> GetString2s()
{
return this.subjectGetString2.AsObservable();
}
}
Now, assuming that you current service is called Service, you would write this code to set things up:
IService service = new Service();
ServiceWrapper wrapped = new ServiceWrapper(service); // Still an `IService`
var subscription =
Observable
.Merge(
wrapped.GetInts().Select(x => x.ToString()),
wrapped.GetStrings(),
wrapped.GetString2s())
.Subscribe(x => label.Text = x);
IService wrappedService = wrapped;
Now pass wrappedService instead of service to your code. It's still calling the underlying service code so no need for a re-write, yet you still are getting the observables that you want.
This is effectively a gang of four decorator pattern.

Dynamic DataGrid in GWT

I am trying to construct a DataGrid in GWT that will show an arbitrary dataset taken from an rpc method.
I have done some progress as I get the fields from a method and the data from another.
I have managed to construct the Datagrid and add the columns from the rpc.getFields() method and fill the table using an AsyncDataProvider.
The problem is that when I refresh the browser, it duplicates all the columns at the Datagrid. I cannot figure out what to do. I tried to remove first all the columns but no luck.
I attach the code if anyone have an idea.
public class MyCallBack implements AsyncCallback<List<Field>> {
DataGrid<Record> dg;
public MyCallBack(DataGrid<Record> dgrid) {
this.dg=dgrid;
}
public void onFailure(Throwable caught) {
Window.alert(caught.getMessage());
}
public void onSuccess(List<Field> result) {
for (int i=0;i<=result.size();i++) {
IndexedColumn ic = new IndexedColumn(i);
dg.addColumn(ic, result.get(i).getLabel());
}
}
public AsyncCallback<List<Field>> getCb() {
return this;
}
public void onModuleLoad() {
final DataGrid<Record> dg = new DataGrid<Record>();
MyCallBack mcb = new MyCallBack(dg);
DataProvider dp = new DataProvider();
DBConnectionAsync rpcService = (DBConnectionAsync) GWT.create(DBConnection.class);
ServiceDefTarget target = (ServiceDefTarget) rpcService;
String moduleRelativeURL = GWT.getModuleBaseURL() + "MySQLConnection";
target.setServiceEntryPoint(moduleRelativeURL);
rpcService.getFields(mcb.getCb());
dp.addDataDisplay(dg);
dg.setVisibleRange(0, 200);
SplitLayoutPanel slp = new SplitLayoutPanel();
slp.setHeight("700px");
slp.setWidth("1500px");
slp.addWest(dg, 770);
RootPanel.get().add(slp);
}
When you refresh a browser, all UI is lost. There is no difference between (a) show the UI for the first time or (b) show the UI after browser refresh.
Your comment "Only if I restart tomcat it works" suggests that the problem is on the server side. Most likely, you return twice the number of data points on a second call.
Try clearing the table before filling it like this:
public void onSuccess(List<Field> result) {
clearTable();
for (int i=0;i<=result.size();i++) {
IndexedColumn ic = new IndexedColumn(i);
dg.addColumn(ic, result.get(i).getLabel());
}
}
private void clearTable(){
while (dg.getColumnCount() > 0) {
db.removeColumn(0);
}
}