Serilog FilterExpression to check if all string Properties of a LogEvent meet a length constraint? - filtering

In my appsettings.json, I want to filter Serilog log events to include only log events where all scalar properties with string values meet a certain length constraint. In C# approach, the predicate would be
logEvent => logEvent.Properties.Values
.OfType<ScalarValue>()
.Select(x => x.Value)
.OfType<string>()
.All(x => x.Length <= 128);
In a json approach, based on the docs,
I sort of think that there may be a hack with regular expressions like
Contains(#Properties[*], /^.{0,128}$/)
or maybe
Length(#Properties[*]) <= 128
but apparently none of these works
Any ideas how to check if any string properties is below length limit?

The above filter expressions do not work because Serilog filter expression compiler has a special treatment for Properties: at some point in the internal filter expression compilation pipeline, Properties['somekey'] gets replaced with somekey.
This is logical, because in effect Properties['somekey'] is in fact an access to a property named somekey. The wildcards ? and * are not exempt from this rule.
This explains why the examples in the question do compile to some kind of Func<FilterExpression, object> internally, but fail to produce the results I expected.

Related

What is the exact meaning of `pl.col("")` expression with empty string argument

The example in a section about 'list context' in the polars-book uses pl.col("") expression with an empty string "" as the argument.
# the percentage rank expression
rank_pct = pl.col("").rank(reverse=True) / pl.col("").count()
From the context and the output I can guess what pl.col("") expression does. But the API documentation does not seem to cover a case of empty string as the argument to pl.col and I would like to know the precise meaning in this use case. Any helpful answer is greatly appreciated!
The precise meaning is to act as a 'root' Expression to start a chain of Expressions inside a List context, i.e., inside arr.eval(....). I'll need to take a step back to explain...
'Root' Expressions
In general, only certain types of Expressions are allowed to start (or be the 'root' of) an Expression. These 'root' Expressions work with a particular context (select, filter,with_column, etc..) to identify what data is being addressed.
Some examples of root Expressions are polars.col, polars.apply, polars.map, polars.first, polars.last, polars.all, and polars.any. (There are others.)
Once we declare a "root" Expression, we can then chain other, more-generic Expressions to perform work. For example, polars.col("my_col").sum().over('other_col').alias('name').
The List context
A List context is slightly different from most contexts. In a List context, there is no ambiguity as to what data is being addressed. There is only a list of data. As such, polars.col and polars.first were chosen as "root" Expressions to use within a List context.
Normally, a polars.col root Expression contains information such as a string to denote a column name or a wildcard expression to denote multiple columns. However, this is not needed in a List context. There is only one option - the single list itself.
As such, any string provided to polars.col is ignored in a List context. For example, from the code from the Polars Guide, this code also works:
# Notice that I'm referring to columns that do not exist...
rank_pct = pl.col("foo").rank(reverse=True) / pl.col("bar").count()
Since any string provided to a polars.col Expression will be ignored in a List context, a single empty string "" is often supplied, just to prevent unnecessary clutter.
Edit: New polars.element expression
Polars now has a polars.element expression designed for use in list evaluation contexts. Using polars.element is now considered idiomatic for list contexts, as it avoids confusion associated with using col(“”).

Drools : applying same rules on all attributes

I am new to Drools, we are trying to create basic validation rules like a NULL check, etc. using the Drools n Scala framework.
I have a source file which has 200 attributes, need to apply NULL-check rule on all these attributes,
is there any easy way to do this? or do I need to create 200 rules for each attribute?
Thanks in advance.
Assuming you have a POJO ("plain old java object", getters/setters and some private variables to hold values) or modern java Records (effectively the same thing), then the answer is no: you need separate rules. For this scenario, the only way to check that field "name" is null is to actually assert against that field like this:
rule "example - name is null"
when
ExampleObject( name == null )
then
System.out.println("Name is null.");
end
However there exist other data structures -- for example, Map and its sibling types -- where you can reference the fields by name. In this case you could theoretically iterate through all of the field names and find the one whose value is empty.
So, for example, Map has a keySet() method which returns a set of fields -- you could iterate through this keyset and for each key check that there is a non-null value present in the map.
rule "example with map"
when
$map: Map()
$keys: Set() from $map.keySet()
$key: String() from $keys
String( this == null ) from $map.get($key)
// or this might work, not sure if the "this" keyword allows this syntax:
// Map( this[$key] == null ) from $map
then
System.out.println($key + " is missing/null");
end
This would require converting your Java object into a Map before passing into the rules.
However I DO NOT RECOMMEND this approach. Maps are extremely un-performant in rules because of how they serialize/deserialize. You will use a ton of unnecessary heap when firing them. If you look at how a HashMap serializes, for example, by peeking at its source code you'll see that it actually contains a bunch of "child" data structures like entryset and keyset and things like that. When using "new", those child structures are only initialized if and when you need them; but when serializing/deserializing, they're created immediately even if you don't need them.
Another solution would be to use Java reflection to get the list of declared field names, and then iterate through those names using reflection to get the value out for that field. In your place I'd do this in Java (reflection is problematic enough without trying to do it in Drools) and then if necessary invoke such a utility function from Drools.

Zend\db\sql - prepareStatementForSqlObject - still need to bind or worry about sql injection?

I'm using zf 2.4 and for this example in Zend\db\sql. Do I need to worry about sql injection or do I still need to do quote() or escape anything if I already use prepareStatementForSqlObject()? The below example will do the blind variable already?
https://framework.zend.com/manual/2.4/en/modules/zend.db.sql.html
use Zend\Db\Sql\Sql;
$sql = new Sql($adapter);
$select = $sql->select();
$select->from('foo');
$select->where(array('id' => $id));
$statement = $sql->prepareStatementForSqlObject($select);
$results = $statement->execute();
The Select class will cleverly check your predicate(s) and add them in a safe manner to the query to prevent SQL-injection. I'd recommend you to take a look at the source for yourself so I'll point you to the process and the classes that are responsible for this in the latest ZF version.
Predicate Processing
Take a look at the class PredicateSet. The method \Zend\Db\Sql\Predicate::addPredicates determines the best way to handle your predicate based on their type. In your case you are using an associative array. Every item in that array will be checked and processed based on type:
If an abstraction replacement character (questionmark) is found, it will be turned into an Expression.
If the value is NULL, an IS NULL check will be performed on the column found in the key: WHERE key IS NULL.
If the value is an array, and IN check will be performed on the kolumn found in the key: WHERE key IN (arrayVal1, arrayVal2, ...).
Otherwise, the predicate will be a new Operator of the type 'equals': WHERE key = value.
In each case the final predicate to be added to the Select will be implementing PredicateInterface
Preparing the statement
The method \Zend\Db\Sql\Sql::prepareStatementForSqlObject instructs its adapter (i.e. PDO) to create a statement that will be prepared. From here it gets a little bit more complicated.
\Zend\Db\Sql is where the real magic happens where in method \Zend\Db\Sql::createSqlFromSpecificationAndParameters the function vsprintf is used to build the query strings, as you can see here.
NotePlease consider using the new docs.framework.zend.com website from now on. This website is leading when it comes to documentation of the latest version.

Spring Data Neo4j - ORDER BY {order} fails

I have a query where the result should be ordered depending on the passed parameter:
#Query("""MATCH (u:User {userId:{uid}})-[:KNOWS]-(:User)-[h:HAS_STUFF]->(s:Stuff)
WITH s, count(h) as count ORDER BY count {order}
RETURN o, count SKIP {skip} LIMIT {limit}""")
fun findFromOthersByUserIdAndSortByAmountOfStuff(
#Param("uid") userId: String,
#Param("skip") skip: Int,
#Param("limit") limit: Int,
#Param("order) order: String): List<StuffWithCountResult>
For the order parameter I use the following enum and its sole method:
enum class SortOrder {
ASC,
DESC;
fun toNeo4JSortOrder(): String {
when(this) {
ASC -> return ""
DESC -> return "DESC"
}
}
}
It seems that SDN does not handle the {order} parameter properly? On execution, I get an exception telling that
Caused by: org.neo4j.kernel.impl.query.QueryExecutionKernelException: Invalid input 'R': expected whitespace, comment or a relationship pattern (line 3, column 5 (offset: 244))
" RETURN o, count SKIP {skip} LIMIT {limit}"
^
If I remove the parameter from the Cypher statement or replace it with a hardcoded DESC the method succeeds. I believe it's not because of the enum since I use (other) enums in other repository methods and all these methods succeed. I already tried a different parameter naming like sortOrder, but this did not help.
What am I missing here?
This is the wrong model for changing sorting and paging information. You can skip to the answer below for using those options, or continue reading for an explanation of what is wrong in your code as it stands.
You cannot bind where things aren't allowed to be bound:
You cannot bind a parameter into a syntax element of the query that is not setup for "parameter binding". Parameter binding doesn't do simple string substitutions (because you would be open for injection attacks) but rather uses binding API's to bind parameters. You are treating the query annotation like it is performing string substitution instead, and that is not what is happening.
The parameter binding docs for Neo4J and the Java manual for Query Parameters show exactly where you can bind, the only places allowed are:
in place of String Literals
in place of Regular Expressions
String Pattern Matching
Create node with properties, as the properties
Create multiple nodes with properties, as the properties
Setting all properties of a node
numeric values for SKIP and LIMIT
as the Node ID
as multiple Node IDs
Index Value
Index Query
There is nothing that says what you are trying is allowed, binding in the ORDER BY clause.
That isn't to say that the authors of Spring Data couldn't work around this and allow binding in other places, but it doesn't appear they have done more than what Neo4J Java API allows.
You can instead use the Sort class:
(the fix to allow this is marked for version 4.2.0.M1 which is a pre-release as of Sept 8, 2016, see below for using milestone builds)
Spring Data has a Sort class, if your #Query annotated method has a parameter of this type, it should apply sorting and allow that to dynamically modify the query.
I assume the code would look something like (untested):
#Query("MATCH (movie:Movie {title={0}})<-[:ACTS_IN]-(actor) RETURN actor")
List<Actor> getActorsThatActInMovieFromTitle(String movieTitle, Sort sort);
Or you can use the PageRequest class / Pageable interface:
(the fix to allow this is marked for version 4.2.0.M1 which is a pre-release as of Sept 8, 2016, see below for using milestone builds)
In current Spring Data + Neo4j docs you see examples using paging:
#Query("MATCH (movie:Movie {title={0}})<-[:ACTS_IN]-(actor) RETURN actor")
Page<Actor> getActorsThatActInMovieFromTitle(String movieTitle, PageRequest page);
(sample from Cypher Examples in the Spring Data + Neo4j docs)
And this PageRequest class also allows sorting parameterization. Anything that implements Pageable will do the same. Using Pageable instead is probably more proper:
#Query("MATCH (movie:Movie {title={0}})<-[:ACTS_IN]-(actor) RETURN actor")
Page<Actor> getActorsThatActInMovieFromTitle(String movieTitle, Pageable page);
You might be able to use SpEL in earlier versions:
As an alternative, you can look at using SpEL expressions to do substitutions in other areas of the query. I am not familiar with it but it says:
Since this mechanism exposes special parameter types like Sort or Pageable as well, we’re now able to use pagination in native queries.
But the official docs seem to say it is more limited.
And you should know this other information:
Here is someone reporting your exact same problem in a GitHub issue. Which then leads to DATAGRAPH-653 issue which was marked as fixed in version 4.2.0.M1. This references other SO questions here which are outdated so you should ignore those like Paging and sorting in Spring Data Neo4j 4 which are no longer correct.
Finding Spring Data Neo4j Milestone Builds:
You can view the dependencies information for any release on the project page. And for the 4.2.0.M1 build the information for Gradle (you can infer Maven) is:
dependencies {
compile 'org.springframework.data:spring-data-neo4j:4.2.0.M1'
}
repositories {
maven {
url 'https://repo.spring.io/libs-milestone'
}
}
Any newer final release should be used instead.

Anorm: WHERE condition, conditionally

Consider a repository/DAO method like this, which works great:
def countReports(customerId: Long, createdSince: ZonedDateTime) =
DB.withConnection {
implicit c =>
SQL"""SELECT COUNT(*)
FROM report
WHERE customer_id = $customerId
AND created >= $createdSince
""".as(scalar[Int].single)
}
But what if the method is defined with optional parameters:
def countReports(customerId: Option[Long], createdSince: Option[ZonedDateTime])
Point being, if either optional argument is present, use it in filtering the results (as shown above), and otherwise (in case it is None) simply leave out the corresponding WHERE condition.
What's the simplest way to write this method with optional WHERE conditions? As Anorm newbie I was struggling to find an example of this, but I suppose there must be some sensible way to do it (that is, without duplicating the SQL for each combination of present/missing arguments).
Note that the java.time.ZonedDateTime instance maps perfectly and automatically into Postgres timestamptz when used inside the Anorm SQL call. (Trying to extract the WHERE condition as a string, outside SQL, created with normal string interpolation did not work; toString produces a representation not understood by the database.)
Play 2.4.4
One approach is to set up filter clauses such as
val customerClause =
if (customerId.isEmpty) ""
else " and customer_id={customerId}"
then substitute these into you SQL:
SQL(s"""
select count(*)
from report
where true
$customerClause
$createdClause
""")
.on('customerId -> customerId,
'createdSince -> createdSince)
.as(scalar[Int].singleOpt).getOrElse(0)
Using {variable} as opposed to $variable is I think preferable as it reduces the risk of SQL injection attacks where someone potentially calls your method with a malicious string. Anorm doesn't mind if you have additional symbols that aren't referenced in the SQL (i.e. if a clause string is empty). Lastly, depending on the database(?), a count might return no rows, so I use singleOpt rather than single.
I'm curious as to what other answers you receive.
Edit: Anorm interpolation (i.e. SQL"...", an interpolation implementation beyond Scala's s"...", f"..." and raw"...") was introduced to allow the use $variable as equivalent to {variable} with .on. And from Play 2.4, Scala and Anorm interpolation can be mixed using $ for Anorm (SQL parameter/variable) and #$ for Scala (plain string). And indeed this works well, as long as the Scala interpolated string does not contains references to an SQL parameter. The only way, in 2.4.4, I could find to use a variable in an Scala interpolated string when using Anorm interpolation, was:
val limitClause = if (nameFilter="") "" else s"where name>'$nameFilter'"
SQL"select * from tab #$limitClause order by name"
But this is vulnerable to SQL injection (e.g. a string like it's will cause a runtime syntax exception). So, in the case of variables inside interpolated strings, it seems it is necessary to use the "traditional" .on approach with only Scala interpolation:
val limitClause = if (nameFilter="") "" else "where name>{nameFilter}"
SQL(s"select * from tab $limitClause order by name").on('limitClause -> limitClause)
Perhaps in the future Anorm interpolation could be extended to parse the interpolated string for variables?
Edit2: I'm finding there are some tables where the number of attributes that might or might not be included in the query changes from time to time. For these cases I'm defining a context class, e.g. CustomerContext. In this case class there are lazy vals for the different clauses that affect the sql. Callers of the sql method must supply a CustomerContext, and the sql will then have inclusions such as ${context.createdClause} and so on. This helps give a consistency, as I end up using the context in other places (such as total record count for paging, etc.).
Finally got this simpler approach posted by Joel Arnold to work in my example case, also with ZonedDateTime!
def countReports(customerId: Option[Long], createdSince: Option[ZonedDateTime]) =
DB.withConnection {
implicit c =>
SQL( """
SELECT count(*) FROM report
WHERE ({customerId} is null or customer_id = {customerId})
AND ({created}::timestamptz is null or created >= {created})
""")
.on('customerId -> customerId, 'created -> createdSince)
.as(scalar[Int].singleOpt).getOrElse(0)
}
The tricky part is having to use {created}::timestamptz in the null check. As Joel commented, this is needed to work around a PostgreSQL driver issue.
Apparently the cast is needed only for timestamp types, and the simpler way ({customerId} is null) works with everything else. Also, comment if you know whether other databases require something like this, or if this is a Postgres-only peculiarity.
(While wwkudu's approach also works fine, this definitely is cleaner, as you can see comparing them side to side in a full example.)