How to optimize generic SQL to retrieve DDL information - firebird

I have a generic code that is used to retrieve DDL information from a Firebird database (FB2.1). It generates SQL code like
SELECT * FROM MyTable where 'c' <> 'c'
I cannot change this code. Actually, if that matters, it is inside Report Builder 10.
The fact is that some tables from my database are becoming a litle too populated (>1M records) and that query is starting to take too long to execute.
If I try to execute
SELECT * FROM MyTable where SomeIndexedField = SomeImpossibleValue
it will obviously use that index and run very quickly.
Well, it wouldn´t be that hard to the database find out that that is an impossible matcher and make some sort of optimization and avoid testing it against each row.
Is there any way to make my firebird database to optimize that search?

As the filter condition is a negative proposition (and also doesn't refer a column to search, but only a value to compare to another value), Firebird need to do a full table scan (without use any index) to confirm that aren't any record that meet your criteria.
If you can't change you need to wait for the upcoming 3.0 version, that will implement the Boolean data type, and therefore should start to evaluate "constant" fake comparisons in advance (maybe the client library will do this evaluation before send the statement to the server?).

Related

Oracle DB link - where clause evaluation

i have a DB2 data source and an Oracle 12c target.
The Oracle has a DB link to the DB2 defined which is working in general.
Now i have a huge table in the DB2 which has a timestamp column (lets call it ROW_CHANGED) for row changes. I want to retrieve rows which have changed after a particular time.
Running
SELECT * FROM lib.tbl WHERE ROW_CHANGED >'2016-08-01 10:00:00'
on the DB2 returns exactly 1 row after ca. 90 secs which is fine.
Now i try the same query from the Oracle via the db link:
SELECT * FROM lib.tbl#dblink_name WHERE ROW_CHANGED >TO_TIMESTAMP('2016-08-01 10:00:00')
This runs for hours and ends up in a timeout.
I read some Oracle docs and found distributed query optimization tips but most of them refer to joining a local to a remote table which is not my case.
In my desperation, i have tried the DRIVING_SITE hint, without effect.
Now i wonder when the WHERE part of the query will be evaluated. Since i have to use Oracle syntax and not DB2 syntax for the query, is it possible the Oracle will try to first copy the full table and apply the where clause afterwards? I did some research but did not find anything which would help me in this direction.
The ROW_CHANGED is a hidden column in the DB2, if that matters.
Thx for any hint in advance.
Update
Thanks#all for help. I'll share what did the trick for me.
First of all i have used TO_TIMESTAMP since the DB2 column is also Timestamp (not date) and i had expected to circumvent implicit conversions by this.
Without the explicit conversion i ran into ORA-28534: Heterogeneous Services preprocessing error and i have no hope of touching the DB config within reasonable time.
The explain plan btw did not bring much. It showed a FULL hint and no conversion on the predicates. Indeed it showed the ROW_CHANGED column as Date, i wonder why.
I have tried Justins suggestion to use a bind variable, however i got ORA-28534 again. Next thing i did was to wrap it into a pl/sql block (will run in a SP anyway later).
declare
v_tmstmp TIMESTAMP := 01.08.16 10:00:00;
begin
INSERT INTO ORAUSER.TMP_TBL (SRC_PK,ROW_CHANGED)
SELECT SRC_PK,ROW_CHANGED
FROM lib.tbl#dblink_name
WHERE ROW_CHANGED > v_tmstmp;
end;
This was executing in the same time as in DB2 itself. The date format is DD.MM.YY here since it is the default unfortunately.
When changing the variable assignment to
v_tmstmp TIMESTAMP := TO_TIMESTAMP('01.08.16 10:00:00','DD.MM.YY HH24:MI:SS');
I got the same problem as before.
Meanwhile the DB2 operators have created an index in the ROW_CHANGED column which i requested earlier that day. This has solved the problem in general it seems. Even my original query finishes in no time now.
If you are actually using an Oracle-specific conversion function like to_timestamp, that forces the predicate to be evaluated on the Oracle side. Oracle isn't going to know how to convert a built-in function like to_timestamp into an exactly equivalent function call in DB2.
If you used a bind variable, that would be more likely to get evaluated on the DB2 side. But that may be complicated by the data type mapping between different databases-- there may not be a perfect mapping between one engine's date and another engine's timestamp data type. If this was a numeric column, a bind variable would be almost certain to get pushed. In this case, it probably involves playing around a bit to figure out exactly what data type to use for your variable that works for your framework, Oracle, and DB2.
If using a bind variable doesn't work, you can force the predicate to be evaluated on the remote server using the dbms_hs_passthrough package. That lets you send a query verbatim to the remote server which allows you to do things like use functions defined in your DB2 database. That's a bit of overkill in this situation, hopefully, but it's nice to have the hammer as your backup if the simpler solution doesn't work quickly enough.

How to call 'like any' PostgreSQL function in JPQL

I have next issue:
I have list of names, based on which I want to filter.The problem is that I have not full names(Because I'm receiving them from ui), and I have, for example, this array= ['Joh', 'Michae'].
So, I want to filter based on this array.
I wrote query in PostgreSQL
select * from q_ob_person where name like any (array['%Хомяченко%', '%Вартопуз%']);
And I want to ask how to write JPQL query gor this.
Is there an option to call postgresql function like any from JPQL?
JPA 2.1 allows invocation of any SQL function using
FUNCTION(sqlFuncName, sqlArgs)
So you could likely do something like (note never tried this LIKE ANY you refer to, just play around with it)
FUNCTION("LIKE", FUNCTION("ANY", arrayField))
Obviously by invoking SQL functions specific to a particular RDBMS you lose database independence (in case that's of importance).

Is there any logical reason to use CFQUERYPARAM in Query of Queries?

I primarily use CFQUERYPARAM to prevent SQL injection. Since Query-of-Queries (QoQ) does not touch the database, is there any logical reason to use CFQUERYPARAM in them? I know that values that do not match the cfsqltype and maxlength will throw an exception, but, these values should already be validated before that and display friendly messages (from a UX viewpoint).
Since Query-of-Queries (QoQ) does not touch the database, is there any logical reason to use CFQUERYPARAM in them? Actually, it does touch the database, the database that you currently have stored in memory. The data in that database could still theoretically be tampered with via some sort of injection from the user. Does that affect your physical database - no. Does that affect the use of the data within your application - yes.
You did not give any specific details but I would err on the side of caution. If ANY of the data you are using to build your query comes from the client then use cfqueryparam in them. If you can guarantee that none of the elements in your query comes from the client then I think it would be okay to not use the cfqueryparam.
As an aside, using cfqueryparam also helps optimize the query for the database although I'm not sure if that is true for query of queries. It also escapes characters for you like apostrophes.
Here is a situation where it's simpler, in my opinion.
<cfquery name="NoVisit" dbtype="query">
select chart_no, patient_name, treatment_date, pr, BillingCompareField
from BillingData
where BillingCompareField not in
(<cfqueryparam cfsqltype="cf_sql_varchar"
value="#ValueList(FinalData.FinalCompareField)#" list="yes">)
</cfquery>
The alternative would be to use QuotedValueList. However, if anything in that value list contained an apostrophe, cfqueryparam will escape it. Otherwise I would have to.
Edit starts here
Here is another example where not using query parameters causes an error.
QueryAddRow(x,2);
QuerySetCell(x,"dt",CreateDate(2001,1,1),1);
QuerySetCell(x,"dt",CreateDate(2001,1,11),2);
</cfscript>
<cfquery name="y" dbtype="query">
select * from x
<!---
where dt in (<cfqueryparam cfsqltype="cf_sql_date" value="#ValueList(x.dt)#" list="yes">)
--->
where dt in (#ValueList(x.dt)#)
</cfquery>
The code as written throws this error:
Query Of Queries runtime error.
Comparison exception while executing IN.
Unsupported Type Comparison Exception:
The IN operator does not support comparison between the following types:
Left hand side expression type = "DATE".
Right hand side expression type = "LONG".
With the query parameter, commented out above, the code executes successfully.

Is it correct to scan a table in MySQL using "SELECT * .. LiMIT start, count" without an ORDER BY clause?

Suppose Table X has a 100 tuples.
Will the following approach to scanning X generate all the tuples in TABLE X, in MySQL?
for start in [0, 10, 20, ..., 90]:
print results of "select * from X LIMIT start, 10;"
I ask, because I've been using PostgreSQL, which clearly says that this approach need not work, but there seems to be no such info for MySQL. If it won't, is there a way to return results in a fixed ordering without knowing any other info about the table (like what the primary key fields are)?
I need to scan each tuple in a table in an application, and I want a way to do it without using too much memory in the application (so simply doing a "select * from X" is out).
No, that isn't a safe assumption. Without an ORDER BY clause, there is no guaranteeing that your query will return unique results each time. If this table is properly indexed, adding an ORDER BY (for the index) shouldn't be too expensive.
Edit: Non-ORDER BYed results will sometimes be in the order of the clustered index, but I wouldn't put any money on that!
If you are using Innodb or MyISAM table types, a better approach is to use the HANDLER interface. Only MySQL supports this, but it does what you want:
http://dev.mysql.com/doc/refman/5.0/en/handler.html
Also, the MySQL API supports two modes of retrieving data from the server:
store result: in this mode, as soon as a query is executed, the API retrieves the entire result set before returning to the user code. This can use up a lot of client memory buffering results, but minimises the use of resources on the server.
use result: in this mode, the API pulls results row-by-row and returns control to the user code more frequently. This minimises the use of memory on the client, but can hold locks on the server for longer.
Most of the MySQL APIs for various languages support this in oneform or another. It is usually an argument that can be supplied as when creating the connection, and / or a separate call that can be used against an existing connection to switch it to that mode.
So, in answer to your question - I would do the following:
set the connection to "use result" mode;
select * from X

sqlalchemy group_by error

The following works
s = select([tsr.c.kod]).where(tsr.c.rr=='10').group_by(tsr.c.kod)
and this does not:
s = select([tsr.c.kod, tsr.c.rr, any fields]).where(tsr.c.rr=='10').group_by(tsr.c.kod)
Why?
thx.
It doesn't work because the query isn't valid like that.
Every column needs to be in the group_by or needs an aggregate (i.e. max(), min(), whatever) according to the SQL standard. Most databases have always complied to this but there are a few exceptions.
MySQL has always been the odd one in this regard, within MySQL this behaviour depends on the ONLY_FULL_GROUP_BY setting: https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html
I would personally recommend setting the sql_mode setting to ANSI. That way you're largely compliant to the SQL standard which will help you in the future if you ever need to use (or migrate) to a standards compliant database such as PostgreSQL.
What you are trying to do is somehow valid in mysql, but invalid in standard sql, postgresql and common sense. When you group rows by 'kod', each row in a group has the same 'kod' value, but different values for 'rr' for example. With aggregate functions you can get some aspect of the values in this column for each group, for example
select kod, max(rr) from table group by kod
will give you list of 'kod's and the max of 'rr's in each group (by kod).
That being sad, in the select clause you can only put columns from the group by clause and/or aggregate functions from other columns. You can put whatever you like in where - this is used for filtering. You can also put additional 'having' clause after group that contains aggregate function expression that can also be used as post-group filtering.