Is it possible to Group By a field extracted from JSON input in a Siddhi Query? - complex-event-processing

I currently have an stream, with 1 string attribute that contains a Json event.
This stream receives different events, which I want to apply Json path expressions so I can use those attributes on filters and functions.
JsonPath extractors work like a charm on filters and selectors, unfortunately, I am not being able to use them for the 'Group By' part.
I am actually doing it in an embedded Siddhi App with siddhi-execution-json extension added manually, but for the discussion, so everybody can
easily check and test it, I will paste an example app that works on WSO2 Stream Processor.
The objective looks like the following App:
#App:name("Group_by_json_attribute")
define stream JsonStream(json string);
#sink(type='log')
define stream LogStream(myField string, count long);
#info(name='query1')
from JsonStream#window.time(10 sec)
select json:getString(json, '$.myField') as myField, count() as count
group by myField having count > 1
insert into LogStream;
and it can accept the following events:
{"myField": "my_value"}
However, this query will raise the error:
Cannot find attribute type as 'myField' does not exist in 'JsonStream'; define stream JsonStream(json string)
I have also tried to use directly the Json extractor at 'Group by':
group by json:getString(json, '$.myField') as myField having count > 1
However the error now is:
mismatched input ':' expecting {',', ORDER, LIMIT, OFFSET, HAVING, INSERT, DELETE, UPDATE, RETURN, OUTPUT}
which seems to not be expecting to use an extension here
I am just wondering, if it is possible to group by attributes not directly defined in the input stream. In this case is a field extracted from a JSON object, but it could be any other function that generates another attribute.
I am also using versions from maven central repository
Siddhi: io.siddhi:siddhi-core:5.0.1
siddhi-execution-json: io.siddhi.extension.execution.json:siddhi-execution-json:2.0.1
(Edit) Clarification
The objective is, to use attributes not directly defined in the Stream, to be used on the Group By.
The reason why is, I currently have an embedded app which defines the whole set of input streams coming from external sources formatted as JSON, and there are also a set of output streams to inform external components when a query matches.
This app allows users to create custom queries on this set of predefined Streams, but they are not able to create Streams by their own.
Many thanks!

It seems we are expecting the group by fields from the query input stream, in this case, JsonStream. Use another query before this for extraction and the aggregation and filtering in the following query,
#App:name("Group_by_json_attribute")
define stream JsonStream(json string);
#sink(type='log')
define stream LogStream(myField string, count long);
#info(name='extract_stream')
from JsonStream
select json:getString(json, '$.myField') as myField
insert into ExtractedStream;
#info(name='query1')
from ExtractedStream#window.time(10 sec)
select myField, count() as count
group by myField
having count > 1
insert into LogStream;

Related

ADF Copy function comparing watermark against isnull(date1,date2)

Forum Newbie...
I want to utilise the ADF Copy function, to carry out incremental table extracts from one Azure DB to another. Every table in the database that I need all have the same 2 relevant fields i.e. date1, date2. For Watermark comparison purposes, I need to use isnull(date1,date2), but unsure how to do this, i.e. I am not sure how I can add this consistent derived value to the Source as an additional field that can perhaps be added via the Query or Stored Procedure Option on the source, to utilise the #item().source.schema and #item().source.table values that have already been generated as parameters..?
You can use the query option in the Copy data activity source and add a new column in the query itself to get the results of isnull(date1,date2) and include the parameter values to get the table name instead of hardcoding them as shown below.
In source, select Query option under Use query and add dynamic content to concat() select statement with parameter values.
#concat('select *, isnull(date1,date2) as final_dt from ',pipeline().parameters.schema,'.',pipeline().parameters.table)
Sink table data output:

How to convert a response from KSQL - UDF returning JSON array to columns

I have a custom UDF called getCityStats(string city, double distance) which takes 2 arguments and returns an array of JSON strings ( Objects) as follows
{"zipCode":"90921","mode":3.54}
{"zipCode":"91029","mode":7.23}
{"zipCode":"96928","mode":4.56}
{"zipCode":"90921","mode":6.54}
{"zipCode":"91029","mode":4.43}
{"zipCode":"96928","mode":3.96}
I would like to process them in a KSQL table creation query as
create table city_stats
as
select
zipCode,
avg(mode) as mode
from
(select
getCityStats(city,distance) as (zipCode,mode)
from
city_data_stream
) t
group by zipCode;
In other words can KSQL handle tuple type where an array of Json strings can be processed to return as indicated above in a table creation query?
No, KSQL doesn't currently support the syntax you're suggesting. Whilst KSQL can work with arrays, it doesn't get support any kind of explode function, so you can reference specific index points in the array only.
Feel free to view and upvote if appropriate these issues: #527, #1830, or indeed raise your own if they don't cover what you want to do.

Querying on multiple LINKMAP items with OrientDB SQL

I have a class that contains a LINKMAP field called links. This class is used recursively to create arbitrary hierarchical groupings (something like the time-series example, but not with the fixed year/month/day structure).
A query like this:
select expand(links['2017'].links['07'].links['15'].links['10'].links) from data where key='AAA'
Returns the actual records contained in the last layer of "links". This works exactly as expected.
But a query like this (note the 10,11 in the second to last layer of "links"):
select expand(links['2017'].links['07'].links['15'].links['10','11'].links) from data where key='AAA'
Returns two rows of the last layer of "links" instead:
{"1000":"#23:0","1001":"#24:0","1002":"#23:1"}
{"1003":"#24:1","1004":"#23:2"}
Using unionAll or intersect (with or without UNWIND) results in this single record:
[{"1000":"#23:0","1001":"#24:0","1002":"#23:1"},{"1003":"#24:1","1004":"#23:2"}]
But nothing I've tried (including various attempts at "compound" SELECTs) will get the expand to work as it does with the original example (i.e. return the actual records represented in the last LINKMAP).
Is there a SQL syntax that will achieve this?
Note: Even this (slightly modified) example from the ODB docs does not result in a list of linked records:
select expand(records) from
(select unionAll(years['2017'].links['07'].links['15'].links['10'].links, years['2017'].links['07'].links['15'].links['11'].links) as records from data where key='AAA')
Ref: https://orientdb.com/docs/2.2/Time-series-use-case.html
I'm not sure of what you want to achieve, but I think it's worth trying with values():
select expand(links['2017'].links['07'].links['15'].links['10','11'].links.values()) from data where key='AAA'

ormlite select count(*) as typeCount group by type

I want to do something like this in OrmLite
SELECT *, COUNT(title) as titleCount from table1 group by title;
Is there any way to do this via QueryBuilder without the need for queryRaw?
The documentation states that the use of COUNT() and the like necessitates the use of selectRaw(). I hoped for a way around this - not having to write my SQL as strings is the main reason I chose to use ORMLite.
http://ormlite.com/docs/query-builder
selectRaw(String... columns):
Add raw columns or aggregate functions
(COUNT, MAX, ...) to the query. This will turn the query into
something only suitable for using as a raw query. This can be called
multiple times to add more columns to select. See section Issuing Raw
Queries.
Further information on the use of selectRaw() as I was attempting much the same thing:
Documentation states that if you use selectRaw() it will "turn the query into" one that is supposed to be called by queryRaw().
What it does not explain is that normally while multiple calls to selectColumns() or selectRaw() are valid (if you exclusively use one or the other),
use of selectRaw() after selectColumns() has a 'hidden' side-effect of wiping out any selectColumns() you called previously.
I believe that the ORMLite documentation for selectRaw() would be improved by a note that its use is not intended to be mixed with selectColumns().
QueryBuilder<EmailMessage, String> qb = emailDao.queryBuilder();
qb.selectColumns("emailAddress"); // This column is not selected due to later use of selectRaw()!
qb.selectRaw("COUNT (emailAddress)");
ORMLite examples are not as plentiful as I'd like, so here is a complete example of something that works:
QueryBuilder<EmailMessage, String> qb = emailDao.queryBuilder();
qb.selectRaw("emailAddress"); // This can also be done with a single call to selectRaw()
qb.selectRaw("COUNT (emailAddress)");
qb.groupBy("emailAddress");
GenericRawResults<String[]> rawResults = qb.queryRaw(); // Returns results with two columns
Is there any way to do this via QueryBuilder without the need for queryRaw(...)?
The short answer is no because ORMLite wouldn't know what to do with the extra count value. If you had a Table1 entity with a DAO definition, what field would the COUNT(title) go into? Raw queries give you the power to select various fields but then you need to process the results.
With the code right now (v5.1), you can define a custom RawRowMapper and then use the dao.getRawRowMapper() method to process the results for Table1 and tack on the titleCount field by hand.
I've got an idea how to accomplish this in a better way in ORMLite. I'll look into it.

How do I prevent sql injection if I want to build a query in parts within the fatfree framework?

I am using the fatfree framework, and on the front-end I am using jQuery datatables plugin with server-side processing. And thus, my server-side controller may or may not receive a variable number of information, for example a variable number of columns to sort on, a variable number of filtering options and so forth. So if I don't receive any request for sorting, I don't need to have a ORDER BY portion in my query. So I want to generate the query string in parts as per certain conditions and join it at the end to get the final query for execution. But if I do it this way, I won't have any data sanitization which is really bad.
Is there a way I can use the frameworks internal sanitization methods to build the query string in parts? Also is there a better/safer way to do this than how I am approaching it?
Just use parameterized queries. They are here to prevent SQL injection.
Two possible syntaxes are allowed:
with question mark placeholders:
$db->exec('SELECT * FROM mytable WHERE username=? AND category=?',
array(1=>'John',2=>34));
with named placeholders:
$db->exec('SELECT * FROM mytable WHERE username=:name AND category=:cat',
array(':name'=>'John',':cat'=>34));
EDIT:
The parameters are here to filter the field values, not the column names, so to answer more specifically to your question:
you must pass filtering values through parameters to avoid SQL injection
you can check if column names are valid by testing them against an array
Here's a quick example:
$columns=array('category','age','weight');//columns available for filtering/sorting
$sql='SELECT * FROM mytable';
$params=array();
//filtering
$ctr=0;
if (isset($_GET['filter']) && is_array($_GET['filter'])
foreach($_GET['filter'] as $col=>$val)
if (in_array($col,$columns,TRUE)) {//test for column name validity
$sql.=($ctr?' AND ':' WHERE ')."$col=?";
$params[$ctr+1]=$val;
$ctr++;
}
//sorting
$ctr=0;
if (isset($_GET['sort']) && is_array($_GET['sort'])
foreach($_GET['sort'] as $col=>$asc)
if (in_array($col,$columns,TRUE)) {//test for column name validity
$sql.=($ctr?',':' ORDER BY ')."$col ".($asc?'ASC':'DESC');
$ctr++;
}
//execution
$db->exec($sql,$params);
NB: if column names contain weird characters or spaces, they must be quoted: $db->quote($col)