Static vs dynamic queries in OpenEdge - progress-4gl

Question is very common, let's see pros and cons of each in OpenEdge in terms of code readability, flexibility and performance off course.
Static queries:
+ readability: convenient `buffer.field` notation
+ performance: higher (supposedly, need comments)
-/+ "global scope" allows to handle all previously used buffers, but could lead
to ambiguousness, so you'll have to clarify it with a table
name (table.field instead of field)
- flexibility: you cannot alternate predicate-expression much,
even IF function is not recommended (can affect performance)
Dynamic queries:
+ flexibility: you can build predicate-expression completely runtime
+ flexibility: you can work with each field not specifying its name,
f.e. you can process all fields of certain record in cycle
+ flexibility: are reusable (need comments on this point)
+/- "local scope" allows to use only buffers specified in `SET-BUFFERS` method
- readability: a little more code to write
- performance: slightly slower (not sure)
Additions and corrections are welcome. As well as links to any related read.

A static query's filter condition can be changed "on the fly" like so:
DEFINE QUERY q-name FOR table-name.
DEFINE VARIABLE h-qry AS HANDLE NO-UNDO.
h-qry = QUERY q-name:HANDLE.
h-qry:QUERY-PREPARE("for each table-name where table-name.field-name = 1").
from here the query is treated the same as any normal static query.
readability: the "buffer-handle:buffer-field("field-name"):buffer-value" construct refers to dynamic buffers - it's perfectly acceptable to use static buffers in dynamic queries (via the BUFFER table-name:HANDLE), so dynamic query buffers can be used w/static buffers and it's not always necessary to de-reference a field using it's handle.
performance: last time I did a comparison, dynamic queries were slower than static queries for the same query condition. The upside is they're more flexible than static queries.
reusability: Once a dynamic query's buffer structure has been set, AFAIK, it can't be changed. It can be re-opened with a new filter condition same as a static query though.

Related

SOQL Query in loop

The following APEX code:
for (List<MyObject__c> list : [select id,some_fields from MyObject__c]) {
for (MyObject__c item : list) {
//do stuff
}
}
Is a standard pattern for processing large quantitues of data - it will automatically break up a large result set into chucks of 200 records, so each List will contain 200 objects.
Is there any difference between the above approach, and below:
for (List<MyObject__c> list : Database.query('select...')) {
for (MyObject__c item : list) {
//do stuff
}
}
used when the SOQL needs to by dynamic?
I have been told that the 2nd approach is losing data, but I'm not sure of the details, and the challenge is to work out why, and to prevent data loss.
Where did you read that using Dynamic SOQL will result in data loss in this scenario? This is untrue AFAIK. The static Database.query() method behaves exactly the same as static SOQL, except for a few small differences.
To answer your first question, the main difference between your approaches is that the first is static (the query is fixed), and the second allows you to dynamically define the query, as the name suggests. Using your second approach with Dynamic SOQL will still chunk the result records for you.
This difference does lead to some other subtle considerations - Dynamic SOQL doesn't support variable bindings other than basic operations. See Daniel Ballinger's idea for this feature here. I also need to add the necessary security caveat - if you're using Dynamic SOQL, do not construct the query based on input, as that would render your application vulnerable to SOQL injection.
Also, I'm just curious, what is your use case/how large of quantities of data are you talking about here? It might be better to use batch Apex, depending on your requirement.

How to speed up typed builders in 10gen official MongoDB C# driver?

Profiling my application I've discovered an unpleasant fact what typed Upadte<> (and Query<>) builder evaluates lambda expressions on each request, consuming a lot of CPU. You will gain several percents of CPU, switching from nice and modern typed UpdateBuilder<> as:
var u = Update<MyPerson>.Inc(e => e.Age, Age);
to old good static strings:
var u = Update.Inc("Age", Age);
The same problem with QueryBuilder<>, it's also evaluates expressions on each request and wastes valuable CPU resources for nothing.
The official LINQ driver's page states:
The C# compiler translates all queries written using query syntax into
lambda syntax internally anyway, so there is no performance advantage
or penalty to choosing either style.
Yes, it's true if you choose between two LINQ syntaxes. But there is a huge performance difference between using and not using LINQ syntax. Overhead depends on how often your client performing queries. In my case, it's above 30% difference!
Is there any way to get the nice typed syntax and performance both together, I wonder?
Simple caching of builder results is not an answer as far as we obviously need to pass dynamic params to each request. We need to find the way to "pre-compile" CPU-expensive part (concerning lambda expressions evaluation) but preserve dynamic params (ex. array indexes).
The paragraph you quoted:
The C# compiler translates all queries written using query syntax into
lambda syntax internally anyway, so there is no performance advantage or
penalty to choosing either style.
Refers to the two syntaxes available for writing LINQ queries.
var query =
from e in collection.AsQueryable<Employee>()
where e.FirstName == "John"
select e;
or
var query =
collection.AsQueryable<Employee>()
.Where(e => e.FirstName == "John");
It is not meant to imply that LINQ queries in general have no overhead. All LINQ queries must be translated at run time into equivalent MongoDB queries and that process requires a certain amount of CPU time.
We do hope to reduce that overhead if possible in the future, but keep in mind that this overhead occurs only at the client and does not affect the server.

Lex reserved word rules versus lookup table

This web page suggests that if your lex program "has a large number of reserved words, it is more efficient to let lex simply match a string and determine in your own code whether it is a variable or reserved word."
My question is: More efficient where, and why? If it means compiling the lexer is faster, I don't really care about that because it is one step removed from the program which uses the lexer to parse input.
It seems to be that lex just uses your description to build a state machine that processes one character at a time. It does not seem logical that increasing the size of the state machine would necessarily cause it to become any slower than using one rule for identifiers and then doing several string comparisons.
Additionally, if it turns out that there is some logical reason for this to make sense as an optimization, what would be considered a large number of reserved words? I have approximately 20, as compared to about 30 other rules for various things. Would that be considered a large number of reserved words? Should I attempt to use the same strategy for some of the other symbols?
I have attempted to google for a result, but the only relevant articles I found stated this strategy as though it were well-known without giving any reason.
In case it is relevant, I am using flex 2.5.35.
Edit: Here is another reference which claims that lex produces an inefficient scanner when asked to match several long literal strings. It also does not give a reason.
According to the flex manual, "[t]he speed of the scanner is independent of the number of rules or ... how complicated the rules are with regard to operators such as '*' and '|'."
The main performance losses are due to backtracking. This can be avoided by (among other things) using catch-all rules which will match tokens which "start with" the offending token. For example, if you have a list of reserved words consisting of [a-zA-Z_], and then a rule for matching identifiers of the form [a-zA-Z_][a-zA-Z_0-9]*, the rule for matching identifiers will catch any identifiers which start with the name of a reserved word without having to back up and try to match again.
According to the faq, flex generates a deterministic finite automaton which "does all the matching simultaneously, in parallel." The result of this is, as was said above, that the speed of the scanner is independent of the number of rules. On the other hand, string comparison is linear in the number of rules.
As a result, the reserved word rules should, in fact, be considerably faster than a lookup table.

How to get total number of potential results in Lucene

I'm using lucene on a site of mine and I want to show the total result count from a query, for example:
Showing results x to y of z
But I can't find any method which will return me the total number of potential results. I can only seem to find methods which you have to specify the number of results you want, and since I only want 10 per page it seems logical to pass in 10 as the number of results.
Or am I doing this wrong, should I be passing in say 1000 and then just taking the 10 in the range that I require?
BTW, since I know you personally I should point out for others I already knew you were referring to Lucene.net and not Lucene :) although the API would be the same
In versions prior to 2.9.x you could call IndexSearcher.Search(Query query, Filter filter) which returned a Hits object, one of which properties [methods, technically, due to the Java port] was Length()
This is now marked Obsolete since it will be removed in 3.0, the only results of Search return TopDocs or TopFieldDocs objects.
Your alternatives are
a) Use IndexServer.Search(Query query, int count) which will return a TopDocs object, so TopDocs.TotalHits will show you the total possible hits but at the expense of actually creating <count> results
b) A faster way is to implement your own Collector object (inherit from Lucene.Net.Search.Collector) and call IndexSearcher.Search(Query query, Collector collector). The search method will call Collect(int docId) on your collector on every match, so if internally you keep track of that you have a way of garnering all the results.
It should be noted Lucene is not a total-resultset query environment and is designed to stream the most relevant results to you (the developer) as fast as possible. Any method which gives you a "total results" count is just a wrapper enumerating over all the matches (as with the Collector method).
The trick is to keep this enumeration as fast as possible. The most expensive part is deserialisation of Documents from the index, populating each field etc. At least with the newer API design, requiring you to write your own Collector, the principle is made clear by telling the developer to avoid deserialising each result from the index since only matching document Ids and a score are provided by default.
The top docs collector does this for you, for example
TopDocs topDocs = searcher.search(qry, 10);
int totalHits = topDocs.totalHits ;
The above query will count all hits, but return only 10.

Most helpful NDepend CQL queries

A client I work for has begun using NDepend as a replacement for FXCop, and the "architect" has compiled a list of practically unusable CQL queries, which I gather he has taken from advice from the NDepend website.
An example of what "I think" is an unhelpful query
WARN IF Count > 0 IN
SELECT METHODS WHERE PercentageComment < 20
AND NbLinesOfCode > 10
ie: must have at least 2 lines of comment for each 10 lines of code
So what I trying to gather is a useful set of queries that we can use as developers.
Please only provide a single query per response (with description) so that it can be voted accordingly.
Please only provide a single query per response (with description) so that it can be voted accordingly.
Xian, now that CQLinq (Code Rule over LINQ Query) is released, dozens of new default rules are available and most existing ones have been enhanced.
Here are ten of my preferred ones:
Avoid namespaces dependency cycles
UI layer shouldn't use directly DB types
Types with disposable instance fields must be disposable
Types that used to be 100% covered by test but not anymore
Avoid transforming an immutable type into a mutable one
Avoid making complex methods even more complex
API Breaking Changes: Types
Potentially dead Methods
Namespace name should correspond to file location
Methods should be declared static if possible