Provide general query statement for all particular events - complex-event-processing

the following real use case, that already has been implemented by our company:
We've built application with rest api as wrapper around esper java api.
The user can define it's map schema and query statement.
Each event (instanceof MapEventBean) sent by user (as JSON) has one common parameter (thou with different values every time), which are to be added to map in background to the user map event as additional key-value pair.
Problem: in order to retrieve this additional parameter by UpdateListener, the user defined both schema and query statement are to be extended programmatically over this attribute.
E.g.:
User defined schema: create map schema Name as (...)
Prog. modified schema: create map schema Name as (additionalAttribute Map, ...)
User defined query stmt: select foo, bar from Name
Prog. modified query stmt: select additionalAttribute, foo, bar from Name
Question: approach does work, but is very error-prone or not too independent, as we'd like to have it.
So the question: Is any possibility to define common query stmt (e.g. select additionalAttribute on each event ...) or tell update listener to retrieve the particular attribute on each succeeded query, independently of whether it has been defined in user defined stmt or not.
Thanks in advance!
Update:
I've already considered some possibilities such as NamedWindows, but the problem is, that this additional attributes are to belong to each particular event, - that is the attributes should be fetched from pattern by update listener simultaneously with the event self.

There is a statement object model API that you can use to modify the queries without doing string manipulation. The documentation link for the API is http://esper.espertech.com/release-8.0.0/reference-esper/html_single/index.html#apicompiler-soda

Related

How to execute graphql query for a specific schema in hasura?

As it can be seen in the following screenshot, the current project database (postgresql)
named default has these 4 schema - public, appcompany1, appcompany2 and appcompany3.
They share some common tables. Right now, when I want to fetch data for customers, I write a query like this:
query getCustomerList {
customer {
customer_id
...
...
}
}
And it fetches the required data from public schema.
But according to the requirements, depending on user interactions in front-end, that query will be executed for appcompanyN (N=1,2,3,..., any positive integer). How do I achieve this goal?
NOTE: Whenever the user creates a new company, a new schema is created for that company. So the total number of schema is not limited to 4.
I suspect that you see a problem where it does not exists actually.
Everything is much simpler than maybe it seems.
A. Where all those tables?
There are a lot of schemas with identical (or almost identical) objects inside them.
All tables are registered in hasura.
Hasura can't register different tables with the same name, so by default names will be [schema_name]_[table_name] (except for public)
So table customer will be registered as:
customer (from public)
appcompany1_customer
appcompany2_customer
appcompany3_customer
It's possible to customize entity name in GraphQL-schema with "Custom GraphQL Root Fields".
B. The problem
But according to the requirements, depending on user interactions in front-end, that query will be executed for appcompanyN (N=1,2,3,..., any positive integer). How do I achieve this goal?
There are identical objects that differs only with prefixes with schema name.
So solutions are trivial
1. Dynamic GraphQL query
Application stores templates of GraphQL-queries and replaces prefix with real schema name before request.
E.g.
query getCustomerList{
[schema]_customer{
}
}
substitute [schema] with appcompany1, appcompany2, appcompanyZ and execute.
2. SQL view for all data
If tables are 100% identical then it's possible to create an sql view as:
CREATE VIEW ALL_CUSTOMERS
AS
SELECT 'public' as schema,* FROM public.customer
UNION ALL
SELECT 'appcompany1' as schema,* FROM appcompany1.customer
UNION ALL
SELECT 'appcompany2' as schema,* FROM appcompany2.customer
UNION ALL
....
SELECT `appcompanyZ',* FROM appcompanyZ.customer
This way: no need for dynamic query, no need to register all objects in all schemas.
You need only to register view with combined data and use one query
query{
query getCustomerList($schema: string) {
all_customer(where: {schema: {_eq: $schema}}){
customer_id
}
}
About both solutions: it's hard to call them elegant.
I myself dislike them both ;)
So decide yourself which is more suitable in your case.

GET a resource filtering by a composite key as query parameter?

I'm thinking the best way to create an endpoint that one of the filters be a composite key.
Per example, we have a rest service to search for orders:
/orders/
We can filter the orders by start and final date:
/orders?dt-start=2017-05-11T17:12Z&dt-final=2017-05-11T17:12Z
Until here, so far so good. But I would like to filter the orders by customer. The customer is identified by his type of document and number of this document.
So, something like this could be possible:
/orders?type=ID&number=123456789
But the type and number are query parameters that only work together, it's a composite key. But using query parameter - like the last example - seems that the API user can do too:
/orders?number=123456789
/orders?type=ID
But not makes sense. Yes, I could return an error in response (bad request) if only one of these parameters were passed, but this is not natural for who are reading the API endpoint.
Another strategy is combine type and number in the same parameter, but I never see this in any API.
/orders?document=ID-12345678
It's odd to me too. I prefer to use separated parameters instead of this.
So, there are a way to use query parameter and solve this problem in a more "elegant" way?
Thanks!
Don't make up a composite key, instead conditionally require the two params. This ins't bad and IMO is much cleaner than creating a composite key which isn't represented by the data (or resource).
I've done this before, so to help illustrate I'll point you to it. This resource is to query for CyberFacts. The query is bound by a date range. To get data, you can do one of two things.
You can say ?today=true, and get the data for today (equivalent to saying ?startDate=2017-05-13&endDate=2017-05-13)
You can use the startDate and endDate query parameters, however if you use one and not the other (eg ?startDate=2017-05-13) you will receive a 400 Bad Request status response on the query and a error message in the response body.
So in this case I've done a few things to make this work
Make a higher priority parameter (today overrides startDate and endDate)
Document the valid behaviors
Provide appropriate error responses
For you only #2 and #3 would be needed, I think. Not knowing all of your use cases, I would suggest using /orders?type=ID&number=123456789 and document that number is a require query param when type=ID, and also include the appropriate error (eg: "You queried for an Order by Type 'ID', however you did not provide a 'number' query parameter")
How about providing a default value for type, (such as 'ID') as a fallback if the type parameter is absent (I'd probably go for the most common/used document type depending on your situation).
While for the number parameter I would enforce it, i.e. by specifying that it is a required parameter (somewhere in the docs?). If absent, return a bad request.

Using childByAutoId On Single Value?

I am pretty new to both Swift and Firebase, and I am attempting to make a simple app using Firebase as the backend. As far as I know, there is no memory-efficient way to use the numChildren() function without loading every single child into memory for counting, so I am implementing my own simple counter for the number of "Events" that have been created in my app.
The documentation for Firebase states that the childByAutoID() method should be used for updating lists in multi-user applications. I am assuming it adds a timestamp to the requested update and does them in order.
My question is whether it is necessary to use childByAutoID() when only updating a SINGLE field in a multi-user application. That is, will there be conflicts on my numEvents field if I do:
dbRef = FIRDatabase.database().reference()
dbRef.child("numEvents").setValue(num)
Or must I do:
dbRef = FIRDatabase.database().reference()
dbRef.child("numEvents").childByAutoId().setValue(num)
In order to avoid write conflicts? My only real confusion is that the documentation for childByAutoID stresses that it is useful when the children are a list of items, but mine is only a single item.
If you are only updating a single field you should not be using childByAutoId. To update a child value for an object, you need to obtain a reference to that object somehow, perhaps by a query of some sort (in many cases you will naturally already have a reference to the object if it needs to be changed) and you can change the value like this:
dbRef.child("events").child(objectToUpdateId).child(fieldToUpdateKey).setValue(newValue)
childByAutoId in this context would be used to create a new field like:
dbRef.child("events").childByAutoId().setValue(newObject)
I'm not exactly sure how this applies to your situation, but those are some descriptions of how to update a field, and use childByAutoId.
What childByAutoId does is create a unique key for a node, to avoid using the same key multiple times and then creating data conflicts like inconsistency (not write conflicts) to avoid write conflicts you use the transaction blocks.
The best way to learn is to try it out
If num == 1 , in the first example the result will be
dbRef:{
numEvents:1
}
While the second will be
dbRef:{
numEvents:{
//The auto-generated key
KLBHJBjhbjJBJHB:1
}
}
The childByAutoId would be useful if you want to save in a node multiple children of the same type, that way each children will have its own unique identifier
For example
pet:{
KJHBJJHB:{
name:fluffy,
owner:John Smith,
},
KhBHJBJjJ:{
name:fluffy,
owner:Jane Foster,
}
}
This way you have a unique identifier for cases where there is no clear way with the item data to guarantee it will be unique (in this case the pet's name)
Few things here:
childByAutoId is not a timestamp. But is used to create unique nodes in any given node.
Use case of childByAutoId :
You have messages node which stores messages from multiple user who are involved in a group chat. So each user can add messages in the group chat so you would do something like this each time user sends message:
dbRef = FIRDatabase.database().reference()
dbRef.child("messages").childByAutoId().setValue(messageText)
So this will create a unique message id for each message from different users. This will kind of act like primary key of message in normal databases.
The structure of database will be something like this:
messages: {
"randomIdGenerated-12asd12" : "hello",
"randomIdGenerated-12323D123" : "Hi, HOw are you",
}
So in your case your first approach is good enough! Since you dont need unique node for counting number of events added.

Changing a DB View dynamically according the current user-group

we are currently digging into Amazon Redshift and testing different functionalities.
One of our basic requirements is that we will define different user groups which in turn will be granted access to different views.
One way to go about this would be to implement one view seperately for each user-group. However, since we have a lot of user-groups that share almost the exact same need for information, I'm looking for a way to implement this more dynamically in Redshift.
For instance, let's say I have a user group called users_london and another one called users_berlin. Both will have access to a view called v_employee_master_data which contains the columns employee_name, employee_job_title and employee_city.
Both groups share the same scope of information with one exception - the column employee_city.
In essence, the view should be pre-filtered for a certain value in the column employee_city according to the currently logged-in user-group.
In SQL - something like this:
For the usergroup users_london:
SELECT * FROM v_employee_master_data WHERE employee_city = 'London';
For the usergroup users_berlin:
SELECT * FROM v_employee_master_data WHERE employee_city = 'Berlin';
Now to make the connection back to Amazon Redshift. Does the underlying DB runtime provide an out-of-the-box functionality to somehow catch the currently logged user-group as a form of global variable and alter the SQL-statement according to the value of that variable?
It is possible to do:
get current user
select current_user
find what group it belongs to
select groname from pg_group where current_user_id = any(grolist);
Extract city and capitalize it:
select initcap(substring(groname from 'users_(.*)')) from pg_group where current_user_id = any(grolist);
Now you have your city based on the "user". So just inject it in the view
... WHERE employee_city = initcap(substring(groname from 'users_(.*)') ...

Filter and display database audit / changelog (activity stream)

I'm developing an application with SQLAlchemy and PostgreSQL. Users of the system modify data in 8 or so tables. Consider this contrived example schema:
I want to add visible logging to the system to record what has changed, but not necessarily how it has changed. For example: "User A modified product Foo", "User A added user B" or "User C purchased product Bar". So basically I want to store:
Who made the change
A message describing the change
Enough information to reference the object that changed, e.g. the product_id and customer_id when an order is placed, so the user can click through to that entity
I want to show each user a list of recent and relevant changes when they log in to the application (a bit like the main timeline in Facebook etc). And I want to store subscriptions, so that users can subscribe to changes, e.g. "tell me when product X is modified", or "tell me when any products in store S are modified".
I have seen the audit trigger recipe, but I'm not sure it's what I want. That audit trigger might do a good job of recording changes, but how can I quickly filter it to show recent, relevant changes to the user? Options that I'm considering:
Have one column per ID type in the log and subscription tables, with an index on each column
Use full text search, combining the ID types as a tsvector
Use an hstore or json column for the IDs, and index the contents somehow
Store references as URIs (strings) without an index, and walk over the logs in reverse date order, using application logic to filter by URI
Any insights appreciated :)
Edit It seems what I'm talking about it an activity stream. The suggestion in this answer to filter by time first is sounding pretty good.
Since the objects all use uuid for the id field, I think I'll create the activity table like this:
Have a generic reference to the target object, with a uuid column with no foreign key, and an enum column specifying the type of object it refers to.
Have an array column that stores generic uuids (maybe as text[]) of the target object and its parents (e.g. parent categories, store and organisation), and search the array for marching subscriptions. That way a subscription for a parent category can match a child in one step (denormalised).
Put a btree index on the date column, and (maybe) a GIN index on the array UUID column.
I'll probably filter by time first to reduce the amount of searching required. Later, if needed, I'll look at using GIN to index the array column (this partially answers my question "Is there a trick for indexing an hstore in a flexible way?")
Update this is working well. The SQL to fetch a timeline looks something like this:
SELECT *
FROM (
SELECT DISTINCT ON (activity.created, activity.id)
*
FROM activity
LEFT OUTER JOIN unnest(activity.object_ref) WITH ORDINALITY AS act_ref
ON true
LEFT OUTER JOIN subscription
ON subscription.object_id = act_ref.act_ref
WHERE activity.created BETWEEN :lower_date AND :upper_date
AND subscription.user_id = :user_id
ORDER BY activity.created DESC,
activity.id,
act_ref.ordinality DESC
) AS sub
WHERE sub.subscribed = true;
Joining with unnest(...) WITH ORDINALITY, ordering by ordinality, and selecting distinct on the activity ID filters out activities that have been unsubscribed from at a deeper level. If you don't need to do that, then you could avoid the unnest and just use the array containment #> operator, and no subquery:
SELECT *
FROM activity
JOIN subscription ON activity.object_ref #> subscription.object_id
WHERE subscription.user_id = :user_id
AND activity.created BETWEEN :lower_date AND :upper_date
ORDER BY activity.created DESC;
You could also join with the other object tables to get the object titles - but instead, I decided to add a title column to the activity table. This is denormalised, but it doesn't require a complex join with many tables, and it tolerates objects being deleted (which might be the action that triggered the activity logging).