SQL Server Execution Plans and Expanded Views - tsql

I'm attempting to use the Execution Plan XML to analyse dynamic SQL statements, specifically to determine whether or not it's referencing a non indexed view. The hope is to systematically navigate the XML via XPath within a separate C# project.
However it would seem that by default the SQL statement gets expanded to the constituent tables prior to the QEP being generated, and as such it doesn't include a reference to the view. (I'd like to avoid having to rely upon a string based search of the Statement Text)
Is there an option by which I can expand the XML plan to include the view expansion, or is there an alternative approach I might want to consider?
Thanks.

When the query execution plan is constructed, it goes through three phases - parsing, binding and optimization. In binding phase non indexed views are replaced with their definition and in optimization phase unused columns of the view are removed. So in the final execution plan there is nothing left from the view itself. This isn't true when NOEXPAND hint is specified in the query, which is the only case when you can find the view in the execution plan. But this require changes in the queries, which may not be an option in your case.
If you want to search for views usage in your queries, probably you should try to get the query text returned from sys.dm_exec_sql_text and search there (which isn't trivial though, because VIEW$SOMETHING will be found in the query referencing VIEW$SOMETHING_ELSE).

Related

ObjectBox dynamic queries

e.g. the end user makes selections from two of five possible filters, the last three filters being left as ‘all’.
Rather than me creating queries for every possible combination of the 5 filters (25 different queries in total), what is the most efficient syntax for handling this?
Should I use .and to chain the queries together, and then can I specify ‘all’ for any which are not required?
A query builder can be used to build the query according to the selected filters. That is adding query criteria only using inside if conditions checking for the the filters.
I solved this as follows. Use a ternary operator ?: and in the second condition query one of the values with .notNull()
This gives the result of 'all', effectively ignoring this part of the query.
This is a hack, but it works. It is obviously an expensive solution, as the ideal would be to skip over unwanted filters completely.
Note to developer: 'if' cannot be used within query structure in dart. Thanks for finding time to respond, hopefully my additional info helps.

meta: why do I have to specify a group by clause

Just curious why I really have to specify a group by clause since if I use a function that requiers a group by clause(can't remember the general name of those functions), eg. SUM().
Because if I use one of those I have to specify every column that doesn't use one in the group by clause.
Why doesn't sql just automatically group on all columns that isn't using an aggregation function? It seems redundant since as soon as I'm using an aggregation I'm grouping on all other columns that is not using it.
Probably for the same reason a C compiler would not automatically assume and insert a variable declaration if you are using one that has not been previously declared. There are programming languages which do that sort of things, SQL is not one of them.
Editors, on the other hand, may be aware of this and at least auto-complete functionally dependent parts of the syntax for you. Oracle SQL developer will by default automatically append a GROUP BY clause as soon as it detects you're writing a select column list that needs it. IMO this is a pain, and I usually keep it turned off, but it will be as far as you get - on an IDE/editor level.
Edit: Based on your last comment, there is an option in MySQL (not Microsoft's T-SQL) meant to relax the rule by implementing optional feature T301 of the standard SQL99. I think this is exactly what you're after:
MySQL 5.7.5 and up implements detection of functional dependence. If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them.
Source: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
Could not find much information on the status of this feature in future versions of T-SQL, though. The only reference is this, with the very cryptic remark that T-SQL would "partially support this feature".

How can I use ormlite to escape my insert?

I have ormlite integrated into an application I'm working on. Right now I'm trying to build in functionality to easily switch from automatically inserting data to the database to outputting the equivalent collection of insert statements to a file for later use. The data isn't user input but still requires proper escaping to handle basic gotchas like apostrophes.
Ideas I've burned through:
Dao.create() writes to the database directly, so that's a no-go.
QueryBuilder can't handle inserts.
JdbcDatabaseConnection.compileStatement() might work but the amount of setup required is inappropriate.
Using a java.sql.PreparedStatement has a reasonable enough interface (if toString() returns the SQL like I would hope) but it's not compatible with ormlite's connection types.
This should be very easy and if it is, I can't find the right combination of method calls to make it happen.
Right now I'm trying to build in functionality to easily switch from automatically inserting data to the database to outputting the equivalent collection of insert statements to a file for later use.
Interesting. So one hack would be to use the MappedCreate class. The MappedCreate.build(...) method takes a DatabaseType and a TableInfo which is available from the dao.getTableInfo().
The mappedCreate.toString() exposed the generated INSERT statement (with a prefix) which might help but you would still need to convert the ? arguments to be the actual values with escaped quotes. That you would have to do in your own code.
Hope this helps somewhat.

Optimizing sling queries

I have a cq5 component that needs to query a given path for a couple of other component types like this:
String query = "select * from nt:unstructured where jcr:path like '/content/some/path/%' and ( contains(sling:resourceType, 'resourceType1') or contains(sling:resourceType, 'resourceType2')) ";
Iterator<Resource> resources = resourceResolver.findResources( query,"sql");
Unfortunately if it is working through a path with a lot of content the page times out. Is there any way to optimize a function like this or tips on improving performance?
1. Use some more specific JCR type than nt:unstructured.
I guess you are looking for page nodes, so try cq:Page or (even better) cq:PageContent.
2. Denormalize your data.
If I understand your query correctly, it should return pages containing resource1 or resource2. Rather than using contains() predicate, which is very costly and prevents JCR from using index, mark pages containing these resources with an additional attribute. Eg., set jcr:content/containsResource1 and jcr:content/containsResource2 properties appropriate and then use them in your query:
select * from cq:PageContent where (containsResource1 is not null or containsResource2 is not null) and jcr:path like '/content/some/path/%'
You could use EventHandler or SlingPostProcessor to set the properties automatically when the resource1 or resource2 is added.
I have added "jackrabbit" and "jcr" tags to your question - I'm not an expert in JCR queries but one of those experts might want to comment on the query statement that you are using and if and how that can be optimized.
That being said, your "page times out" statement seems to imply that it's the client browser that is timing out as it does not receive data for too long. I would first check (with a debugger or log statements) if it's really the findResources call that takes too long, or if it's code that runs after that that's the culprit.
If findResources is slow you'll need to optimize the query or redesign your code to make it asynchronous, for example have the client code first get the HTML page and then get the query results via asynchronous calls.
If code that runs after findResources causes the timeout, you might redesign it to start sending data to the browser as soon as possible, and flush the output regularly to avoid timeouts. But if you're finding lots of results that might take too long for the user anyway and a more asynchronous behavior would then be needed as well.

When are (SELECT) queries planned?

In PostgreSQL, when are (SELECT) queries planned?
Is it:
at statement-prepare time, or
at the start of processing the SELECT, or
something else
The reason I ask is that there is a Stackoverflow question: same query, two different ways, vastly different performance
A lot of people seem to be thinking that the query is planned differently because in one case the query contains a string literal ('foo') and in another case it's a placeholder (?).
Now my thinking is that this is a red herring, because the query isn't planned at statement-prepare time, but is actually planned at SELECT time.
So, say, I could prepare a statement with a placeholder, then run the query multiple times with different bound values, and the query planner will be run for each different bound value.
I suspect that the question linked above boils down to the PostgreSQL data type of the value, which in the case of a 'foo' literal is known to be a string, but in the case of a placeholder, the type can't be divined, so is coming through to the query planner as some strange type, which it can't create an efficient plan for. In which case, the issue is not that the query is being planned differently because the value is a placeholder (at statement preparation time) per se but that the value is coming through to the query as a different PostgreSQL type, and that is what is influencing the query planner. To fix this would simply be a matter of binding the placeholder with an appropriate explicit type declaration.
I cannot talk about the client-side Perl interface itself but I can shed some light on the PostgreSQL server side.
PostgreSQL has prepared statements and unprepared statements. Unprepared statements are parsed, planned and executed immediately. They also do not support parameter substitution. On a plain psql shell you can show their query plan like this:
tmpdb> explain select * from sometable where flag = true;
On the other hand there are prepared statements: They are usually (see "exception" below) parsed and planned in one step and executed in a second step. They can be re-executed several times with different parameters, because they do support parameter substitution. The equivalent in psql is this:
tmpdb> prepare foo as select * from sometable where flag = $1;
tmpdb> explain execute foo(true);
You may see, that the plan is different from the plan in the unprepared statement, because planning did take place already in the prepare phase as described in the doc for PREPARE:
When the PREPARE statement is executed, the specified statement is parsed, rewritten, and planned. When an EXECUTE command is subsequently issued, the prepared statement need only be executed. Thus, the parsing, rewriting, and planning stages are only performed once, instead of every time the statement is executed.
This also means, that the plan is NOT optimized for the substituted parameters: In the first examples might use an index for flag because PostgreSQL knows that within a million entries only ten have the value true. This reasoning is impossible when PostgreSQL uses a prepared statement. In that case a plan is created which will work for all possible parameter values as good as possible. This might exclude the mentioned index because fetching the better part of the complete table via random access (due to the index) is slower than a plain sequential scan. The PREPARE doc confirms this:
In some situations, the query plan produced for a prepared statement will be inferior to the query plan that would have been chosen if the statement had been submitted and executed normally. This is because when the statement is planned and the planner attempts to determine the optimal query plan, the actual values of any parameters specified in the statement are unavailable. PostgreSQL collects statistics on the distribution of data in the table, and can use constant values in a statement to make guesses about the likely result of executing the statement. Since this data is unavailable when planning prepared statements with parameters, the chosen plan might be suboptimal.
BTW - Regarding plan caching the PREPARE doc also has something to say:
Prepared statements only last for the duration of the current database session. When the session ends, the prepared statement is forgotten, so it must be recreated before being used again.
Also there is no automatic plan caching and no caching/reuse over multiple connections.
EXCEPTION: I have mentioned "usually". The shown psql examples are not the stuff a client adapter like Perl DBI really uses. It uses a certain protocol. Here the term "simple query" corresponds to the "unprepared query" in psql, the term "extended query" corresponds to "prepared query" with one exception: There is a distinction between (one) "unnamed statement" and (possibly multiple) "named statements". Regarding named statements the doc says:
Named prepared statements can also be created and accessed at the SQL command level, using PREPARE and EXECUTE.
and also:
Query planning for named prepared-statement objects occurs when the Parse message is processed.
So in this case planning is done without parameters as described above for PREPARE - nothing new.
The mentioned exception is the "unnamed statement". The doc says:
The unnamed prepared statement is likewise planned during Parse processing if the Parse message defines no parameters. But if there are parameters, query planning occurs every time Bind parameters are supplied. This allows the planner to make use of the actual values of the parameters provided by each Bind message, rather than use generic estimates.
And here is the benefit: Although the unnamed statement is "prepared" (i.e. can have parameter substitution), it also can adapt the query plan to the actual parameters.
BTW: The exact handling of the unnamed statement has changed several times in the past releases of the PostgreSQL server. You can lookup the old docs for details if you really want.
Rationale - Perl / any client:
How a client like Perl uses the protocol is a completely different question. Some clients like the JDBC driver for Java basically say: Even if the programmer uses a prepared statement, the first five (or so) executions are internally mapped to a "simple query" (i.e. effectively unprepared), after that the driver switches to "named statement".
So a client has these choices:
Force (re)planning each time by using the "simple query" protocol.
Plan once, execute multiple times by using the "extended query" protocol and the "named statement" (plan might be bad because planning is done without parameters).
Parse once, plan for each execution (with current PostgreSQL version) by using the "extended query" protocol and the "unnamed statement" and obeying some more things (provide some params during "parse" message)
Play completely different tricks like the JDBC driver.
What Perl does currently: I don't know. But the mentioned "red herring" is not very unlikely.