same query, two different ways, vastly different performance - perl

I have a Postgres table with more than 8 million rows. Given the following two ways of doing the same query via DBD::Pg, I get wildly different results.
$q .= '%';
## query 1
my $sql = qq{
SELECT a, b, c
FROM t
WHERE Lower( a ) LIKE '$q'
};
my $sth1 = $dbh->prepare($sql);
$sth1->execute();
## query 2
my $sth2 = $dbh->prepare(qq{
SELECT a, b, c
FROM t
WHERE Lower( a ) LIKE ?
});
$sth2->execute($q);
query 2 is at least an order of magnitude slower than query 1... seems like it is not using the indexes, while query 1 is using the index.
Would love hear why.

With LIKE expressions, b-tree indexes can only be used if the search pattern is left-anchored, i.e. terminated with %. More details in the manual.
Thanks to #evil otto for the link. This link to the current version.
Your first query provides this essential information at prepare time, so the query planner can use a matching index.
Your second query does not provide any information about the pattern at prepare time, so the query planner cannot use any indexes.

I suspect that in the first case the query compiler/optimizer detects that the clause is a constant, and can build an optimal query plan. In the second it has to compile a more generic query because the bound variable can be anything at run-time.

Are you running both test cases from same file using same $dbh object?
I think reason of increasing speed in second case is that you using prepared statement which is already parsed(but maybe I wrong:)).

Ahh, I see - I will drop out after this comment since I don't know Perl. But I would trust that the editor is correct in highlighting the $q as a constant. I'm guessing that you need to concatenate the value into the string, rather than just directly referencing the variable. So, my guess is that if + is used for string concatenation in perl, then use something like:
my $sql = qq{
SELECT a, b, c
FROM t
WHERE Lower( a ) LIKE '
} + $q + qq{'};
(Note: unless the language is tightly integrated with the database, such as Oracle/PLSQL, you usually have to create a completely valid SQL string before submitting to the database, instead of expecting the compiler to 'interpolate'/'Substitute' the value of the variable.)
I would again suggest that you get the COUNT() of the statements, to make sure that you are comparing apple to apples.

I don't know Postgres at all, but I think in Line 7 (WHERE Lower( a ) LIKE '$q'
), $q is actually a constant. It looks like your editor thinks so too, since it is highlighted in red. You probably still need to use the ? for the variable.
To test, do a COUNT(*), and make sure they match - I could be way offbase.

Related

where column in (single value) performance

I am writing dynamic sql code and it would be easier to use a generic where column in (<comma-seperated values>) clause, even when the clause might have 1 term (it will never have 0).
So, does this query:
select * from table where column in (value1)
have any different performance than
select * from table where column=value1
?
All my test result in the same execution plans, but if there is some knowledge/documentation that sets it to stone, it would be helpful.
This might not hold true for each and any RDBMS as well as for each an any query with its specific circumstances.
The engine will translate WHERE id IN(1,2,3) to WHERE id=1 OR id=2 OR id=3.
So your two ways to articulate the predicate will (probably) lead to exactly the same interpretation.
As always: We should not really bother about the way the engine "thinks". This was done pretty well by the developers :-) We tell - through a statement - what we want to get and not how we want to get this.
Some more details here, especially the first part.
I Think this will depend on platform you are using (optimizer of the given SQL engine).
I did a little test using MySQL Server and:
When I query select * from table where id = 1; i get 1 total, Query took 0.0043 seconds
When I query select * from table where id IN (1); i get 1 total, Query took 0.0039 seconds
I know this depends on Server and PC and what.. But The results are very close.
But you have to remember that IN is non-sargable (non search argument able), it will not use the index to resolve the query, = is sargable and support the index..
If you want the best one to use, You should test them in your environment because they both work so good!!

Sort data within a subquery with another subquery?

I am trying to sort the OUN.note column by using the OUN.outcomeKey, since
the way it it is working right now is putting the notes in the wrong order (sorting alphabetically). Any idea on how to go about this? I've been trying to sort the data using another sub-query within, but I haven't had much luck (I don't have a plethora of experience).
Here's my current query:
SELECT DISTINCT OC.outcomeKey [Outcome Key], OC.outcome [Result],
STUFF((SELECT ','+' '+ OUN.note
FROM
Outcome AS OUT
JOIN OutcomeNote AS OUN
ON OUT.outcomeKey = OUN.outcomeKey
WHERE OUN.outcomeKey = OC.outcomeKey
GROUP BY OUN.note
FOR XML PATH ('')), 1, 1, '') [Outcome Note]
FROM Outcome AS OC
Any help or tips would be greatly appreciated! Also, please let me know if any more info is needed.
You may replace the line
GROUP BY OUN.note
with the line
ORDER BY OUN.outcomeKey
Also, because the concatenation starts with ', ', you may want to use 1, 2, '' as the additional arguments of the STUFF function. Otherwise, the values in your [Outcome note] column always start with a space.
Edit:
By the way, sorting the notes by outcomeKey in the subquery that generates the values for the [Outcome note] column has no effect... since all the notes in each subquery result will have the same outcomeKey value...
But you may sort on any column you want, of course. Perhaps there are other columns in your OutcomeNotes table that can serve as a useful sorting column of your outcome notes.
If I misunderstood your question, please provide definitions of the Outcome and OutcomeNote tables, together with a demo population of those tables and the desired/expected query result, please.
Edit 2:
Starting with SQL Server 2017, Transact-SQL contains a function called STRING_AGG, which seems to be functionally equivalent (more or less) to MySQL's GROUP_CONCAT function. Using this function, your query would become something like this:
SELECT
OUN.outcomeKey [Outcome Key],
OC.outcome [Result],
STRING_AGG(OUN.[Note], ', ') WITHIN GROUP (ORDER BY OUN.outcomeKey) [Outcome Note]
FROM
Outcome AS OC
JOIN OutcomeNote AS OUN ON OUN.outcomeKey = OC.outcomeKey
GROUP BY
OUN.outcomeKey,
OC.outcome;
When using SQL Server 2017 or SQL Azure, this might be a more fitting choice, since it does not only make the query more readable, but it also eliminates the use of (way less efficient) XML-functions in your query.
I too have used the XML-functionality for field concatenation (the way you use it) intensively in the past, but I noticed a considerable drop in performance of my queries (which sometimes contained up to 10 columns with concatenated data). Since then, I tend to go for recursive common table expressions or scalar UDF with recursion approaches in pre SQL Server 2017 environments.

SphinxQL Variables Deprecated, Alternate Query?

I had what I thought was a fairly straightforward SphinxQL query, but it turns out # variables are deprecated (see example below)
SELECT *,#weight AS m FROM test1 WHERE MATCH('tennis') ORDER BY m DESC LIMIT 0,1000 OPTION ranker=bm25, max_matches=3000, field_weights=(title=10, content=5);
I feel like there must be a way to sort the results by strength of match. What is the replacement?
On another note, what if I want to include in it a devaluation if certain other words appear. For example, let's say I wanted to devalue results that had the word "apparel" in them. Could that be executed in the same query?
Thanks!
Well results are 'by default' in weight decending, so just do...
SELECT * FROM test1 WHERE MATCH('tennis') LIMIT 0,1000 OPTION ...
But otherwise its, just the # variables, are replaced by 'functions' mainly because its more 'SQL like'. So #weight, is WEIGHT()
SELECT * FROM test1 WHERE MATCH('tennis') ORDER BY WEIGHT() DESC ...
or
SELECT *,WEIGHT() AS m FROM test1 WHERE MATCH('tennis') ORDER BY m DESC ...
For reference #group is instead GROUPBY(), #count is COUNT(*), #distinct is COUNT(DISTINCT ...), #geodist is GEODIST(...) , and #expr doesnt really have an equivlent, either just use the expression directly, or use your own custom named alias.
As for second question. Kinda tricky, they isnt really a 'negative' weighter. Ther is a keyword boost operator, but as far can't use it to specifically devalue.
The only way I can think maybe have it work, is if negative match was against a specific field, could build a complex ranking exspression. Basically as a negative weight instead, would need a specific field for the ranking expression, so could use to select that column
... MATCH('#!(negative) tennis #negative apparel')
... OPTION ranker=expr('SUM(word_count*IF(user_weight=99,-1,1))'), field_weights(negative=99)
That's a very basic demo expression for illustrative purposes, a real one would probably be a lot more complex. Its just showing using 99 as a placeholder for 'negative' multiplication.
Would need the new negative field creating, which could just be a duplicate of other field(s)

Define Result Fields if blank then another field

What I'm trying to do is, if a field is blank, use another field within WRKQRY(Query/400) in define result fields. Is this possible?
You can create an SQL view using the RUNSQLSTM command and then run a query over the view.
CREATE VIEW QTEMP/MYVIEW AS
SELECT F1, CASE WHEN F2 <> ' ' THEN F2 ELSE F3 END AS FX FROM MYLIB/MYFILE
Then tie it all together with a CL program.
PGM
DLTF FILE(QTEMP/MYVIEW)
MONMSG MSGID(CPF0000)
RUNSQLSTM SRCFILE(MYLIB/MYSRC) MBR(MYMBR)
RUNQRY QRY(MYLIB/MYQRY)
ENDPGM
Query/400 is obsolete, and should be considered deprecated. It was replaced about 2 decades ago by Query Management. Query/400 queries run under the old database optimizer (CQE) and cannot benefit from newer faster optimization techniques employed by the new optimizer (SQE). It is recommended to migrate Query/400 queries to QM Query or to DB2 Web Query.
Fortunately, Query Management Queries can be created in a prompted mode which should be very familiar to Query/400 users. Prompted-mode queries can be converted to the more powerful SQL-mode.
You can use the RTVQMQRY command to generate SQL source from the Query/400 query you have asked about Once you have the source, you can then use the CASE ... END expression given by #Mike. Create the QM query with the CRTQMQRY command, and run it with STRQMQRY.
If you still need to do this, I can show you how to do it in 3 passes of Query 400.
Yeah, I know that's not efficient but it can be done.
Take a look at CASE that should work for you.
CASE field
WHEN ' ' THEN newfield
ELSE field
END as myfield

T-SQL speed comparison between LEFT() vs. LIKE operator

I'm creating result paging based on first letter of certain nvarchar column and not the usual one, that usually pages on number of results.
And I'm not faced with a challenge whether to filter results using LIKE operator or equality (=) operator.
select *
from table
where name like #firstletter + '%'
vs.
select *
from table
where left(name, 1) = #firstletter
I've tried searching the net for speed comparison between the two, but it's hard to find any results, since most search results are related to LEFT JOINs and not LEFT function.
"Left" vs "Like" -- one should always use "Like" when possible where indexes are implemented because "Like" is not a function and therefore can utilize any indexes you may have on the data.
"Left", on the other hand, is function, and therefore cannot make use of indexes. This web page describes the usage differences with some examples. What this means is SQL server has to evaluate the function for every record that's returned.
"Substring" and other similar functions are also culprits.
Your best bet would be to measure the performance on real production data rather than trying to guess (or ask us). That's because performance can sometimes depend on the data you're processing, although in this case it seems unlikely (but I don't know that, hence why you should check).
If this is a query you will be doing a lot, you should consider another (indexed) column which contains the lowercased first letter of name and have it set by an insert/update trigger.
This will, at the cost of a minimal storage increase, make this query blindingly fast:
select * from table where name_first_char_lower = #firstletter
That's because most database are read far more often than written, and this will amortise the cost of the calculation (done only for writes) across all reads.
It introduces redundant data but it's okay to do that for performance as long as you understand (and mitigate, as in this suggestion) the consequences and need the extra performance.
I had a similar question, and ran tests on both. Here is my code.
where (VOUCHER like 'PCNSF%'
or voucher like 'PCLTF%'
or VOUCHER like 'PCACH%'
or VOUCHER like 'PCWP%'
or voucher like 'PCINT%')
Returned 1434 rows in 1 min 51 seconds.
vs
where (LEFT(VOUCHER,5) = 'PCNSF'
or LEFT(VOUCHER,5)='PCLTF'
or LEFT(VOUCHER,5) = 'PCACH'
or LEFT(VOUCHER,4)='PCWP'
or LEFT (VOUCHER,5) ='PCINT')
Returned 1434 rows in 1 min 27 seconds
My data is faster with the left 5. As an aside my overall query does hit some indexes.
I would always suggest to use like operator when the search column contains index. I tested the above query in my production environment with select count(column_name) from table_name where left(column_name,3)='AAA' OR left(column_name,3)= 'ABA' OR ... up to 9 OR clauses. My count displays 7301477 records with 4 secs in left and 1 second in like i.e where column_name like 'AAA%' OR Column_Name like 'ABA%' or ... up to 9 like clauses.
Calling a function in where clause is not a best practice. Refer http://blog.sqlauthority.com/2013/03/12/sql-server-avoid-using-function-in-where-clause-scan-to-seek/
Entity Framework Core users
You can use EF.Functions.Like(columnName, searchString + "%") instead of columnName.startsWith(...) and you'll get just a LIKE function in the generated SQL instead of all this 'LEFT' craziness!
Depending upon your needs you will probably need to preprocess searchString.
See also https://github.com/aspnet/EntityFrameworkCore/issues/7429
This function isn't present in Entity Framework (non core) EntityFunctions so I'm not sure how to do it for EF6.