Adding a CASE statement has dramatically increased my query execution time - tsql

I have added the following CASE statement in my query:
CASE starlingYear
WHEN CONVERT(VARCHAR, 3000) THEN 'Not yet active'
ELSE CONVERT(VARCHAR, starlingYear)
END AS starBirth
Adding this CASE has increased the query execution time from less than a second to about 8 or 10 seconds.
The query only has about 1000 rows.
Is there a way to increase performance?

The syntax should be as follows. If startlingYear is 3000 (int) then 'Not yet active' else convert starlingYear to varchar.
SELECT CASE starlingYear
WHEN 3000
THEN 'Not yet active'
ELSE
CONVERT(VARCHAR(30), starlingYear)
END AS starBirth
FROM table1

If you're using SQL Server 2012 or above, you could use the following, as well. There's no advantage in performance as it resolves to a Case statement much like what jose_bacoy posted but some prefer the condensed format of the IIF operator.
SELECT starBirth = IIF(starlingYear = 3000,'Not yet active',CONVERT(VARCHAR(30),starlingyear)
FROM dbo.table1
;
Note that IIF is only useful for a single condition unless you nest it, but that can get a bit ugly.

Related

Understanding case statement in sas proc sql

I am having difficulty in understand this sas code.
select
case
when DM_TURNOVER_TMP_STOCK."LIITM"n then
DM_TURNOVER_TMP_STOCK."LIITM"n
else
DM_TURNOVER_TMP_SALES."SDITM"n
end as "LIITM"n
case
when DM_TURNOVER_TMP_STOCK."LIMCU"n then
DM_TURNOVER_TMP_STOCK."LIMCU"n
normally we use sas in sql in condition statment of column but here seems to be diffrent.Please help me in understanding this in postgres term.
Assuming this is a query from tables DM_TURNOVER_TMP_STOCK and DM_TURNOVER_TMP_SALES,
when DM_TURNOVER_TMP_STOCK.LIITM is not missing and non zero, LIITM will get the value of DM_TURNOVER_TMP_STOCK.LIITM.
Otherwise, it will get the value of DM_TURNOVER_TMP_SALES.SDITM.

postgresql case inside function doesn't trigger

So, i have a couple of functions inside my db.
One function needs to run when the data in a specific table is older than 5 minutes.
I've tried doing it with:
PERFORM case when now() - '5 minutes'::interval > (select end_time from x order by end_route desc limit 1) then update_x() else null end;
When i run the command as a regular select query, it runs OK. But when i put that inside another function (The one being called, returns updated table that is no older than 5 minutes), it never runs. Also, if i just put update_x(), then it runs OK (but every time the function is called).
Does anyone have any idea on how i could fix this?
One idea is to just set a cron to run the function every 5 mins independently, but i'd rather the server is idle since the function is quite resource intensive, and it doesn't get called that often.
I'm on version 8.4(Due to my ISP, so can't change, though i am considering moving to VPS, so if this is something that would work on 9.5 and newer, i can wait).
The function now() gives the start time of the current transaction and is invariable inside it. Use clock_timestamp(), example:
do $$
begin
for i in 1..3 loop
perform pg_sleep(1);
raise notice 'now(): % clock_timestamp(): %', now(), clock_timestamp();
end loop;
end $$;
NOTICE: now(): 2017-12-06 10:22:40.422683+01 clock_timestamp(): 2017-12-06 10:22:41.437099+01
NOTICE: now(): 2017-12-06 10:22:40.422683+01 clock_timestamp(): 2017-12-06 10:22:42.452456+01
NOTICE: now(): 2017-12-06 10:22:40.422683+01 clock_timestamp(): 2017-12-06 10:22:43.468124+01
Per the documentation:
clock_timestamp() returns the actual current time, and therefore its value changes even within a single SQL command (...)
now() is a traditional PostgreSQL equivalent to transaction_timestamp().
I'm not sure why, but once i moved the boolean up one level it started working.
So now instead of having perform case query inside function, i'm sending boolean to the function, and performing the check in a view above this function.
CREATE VIEW x_view AS select * from get_x((clock_timestamp() - '5 minutes'::interval)::timestamp > (select end_route from gps_skole2 order by end_route desc limit 1));
And inside the function:
PERFORM case when $1 then update_x() else null end;

Postgres using functions inside queries

I have a table with common word values to match against brands - so when someone types in "coke" I want to match any possible brand names associated with it as well as the original term.
CREATE TABLE word_association ( commonterm TEXT, assocterm TEXT);
INSERT INTO word_association ('coke', 'coca-cola'), ('coke', 'cocacola'), ('coke', 'coca-cola');
I have a function to create a list of these values in a pipe-delim string for pattern matching:
CREATE OR REPLACE FUNCTION usp_get_search_terms(userterm text)
RETURNS text AS
$BODY$DECLARE
returnstr TEXT DEFAULT '';
BEGIN
SET DATESTYLE TO DMY;
returnstr := userterm;
IF EXISTS (SELECT 1 FROM word_association WHERE LOWER(commonterm) = LOWER(userterm)) THEN
SELECT returnstr || '|' || string_agg(assocterm, '|') INTO returnstr
FROM word_association
WHERE commonterm = userterm;
END IF;
RETURN returnstr;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION usp_get_search_terms(text)
OWNER TO customer_role;
If you call SELECT * FROM usp_get_search_terms('coke') you end up with
coke|coca-cola|cocacola|coca cola
EDIT: this function runs <100ms so it works fine.
I want to run a query with this text inserted e.g.
SELECT X.article_number, X.online_description
FROM articles X
WHERE LOWER(X.online_description) % usp_get_search_terms ('coke');
This takes approx 56s to run against my table of ~500K records.
If I get the raw text and use it in the query it takes ~300ms e.g.
SELECT X.article_number, X.online_description
FROM articles X
WHERE X.online_description % '(coke|coca-cola|cocacola|coca cola)';
The result sets are identical.
I've tried modifying what the output string from the function to e.g. enclose it in quotes and parentheses but it doesn't seem to make a difference.
Can someone please advise why there is a difference here? Is it the data type or something about calling functions inside queries? Thanks.
Your function might take 100ms, but it's not calling your function once; it's calling it 500,000 times.
It's because your function is declared VOLATILE. This tells Postgres that either the function returns different values when called multiple times within a query (like clock_timestamp() or random()), or that it alters the state of the database in some way (for example, by inserting records).
If your function contains only SELECTs, with no INSERTs, calls to other VOLATILE functions, or other side-effects, then you can declare it STABLE instead. This tells the planner that it can call the function just once and reuse the result without affecting the outcome of the query.
But your function does have side-effects, due to the SET DATESTYLE statement, which takes effect for the rest of the session. I doubt this was the intention, however. You may be able to remove it, as it doesn't look like date formatting is relevant to anything in there. But if it is necessary, the correct approach is to use the SET clause of the CREATE FUNCTION statement to change it only for the duration of the function call:
...
$BODY$
LANGUAGE plpgsql STABLE
SET DATESTYLE TO DMY
COST 100;
The other issue with the slow version of the query is the call to LOWER(X.online_description), which will prevent the query from utilising the index (since online_description is indexed, but LOWER(online_description) is not).
With these changes, the performance of both queries is the same; see this SQLFiddle.
So the answer came to me about dawn this morning - CTEs to the rescue!
Particularly as this is the "simple" version of a very large query, it helps to get this defined once in isolation, then do the matching against it. The alternative (given I'm calling this from a NodeJS platform) is to have one request retrieve the string of terms, then make another request to pass the string back. Not elegant.
WITH matches AS
( SELECT * FROM usp_get_search_terms('coke') )
, main AS
( SELECT X.article_number, X.online_description
FROM articles X
JOIN matches M ON X.online_description % M.usp_get_search_terms )
SELECT * FROM main
Execution time is somewhere around 300-500ms depending on term searched and articles returned.
Thanks for all your input guys - I've learned a few things about PostGres that my MS-SQL background didn't necessarily prepare me for :)
Have you tried removing the IF EXISTS() and simply using:
SELECT returnstr || '|' || string_agg(assocterm, '|') INTO returnstr
FROM word_association
WHERE LOWER(commonterm) = LOWER(userterm)
In instead of calling the function for each row call it once:
select x.article_number, x.online_description
from
woolworths.articles x
cross join
woolworths.usp_get_search_terms ('coke') c (s)
where lower(x.online_description) % s

Why do I get a DATETIME conversion error in TSQL?

I know there are numerous questions about this topic, even one I asked myself a while ago (here). Now I ran into a different problem, and neither myself nor my colleagues know what the reason for the strange behaviour is.
We've got a relatively simple SQL statement quite like this:
SELECT
CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16) AS MyDate,
SomeOtherColumn,
...
FROM
MyTable
INNER JOIN MyOtherTable
ON MyTable.ID = MyOtherTable.MyTableID
WHERE
MyTable.ID > SomeValue AND
MyText LIKE 'Date: %'
This is not my database and also not my SQL statement, and I didn't create the great schema to store datetime values in varchar columns, so please ignore that bit.
The problem we are facing right now is a SQL conversion error 241 ("Conversion failed when converting date and/or time from character string.").
Now I know that the query optimiser may change the execution plan that the WHERE clause may be used to filter results after the conversion is attempted, but the really strange thing is that I don't get any errors when I delete all of the WHERE clause.
I also don't get any errors when I add a single line to the statement above as follows:
SELECT
MyText, -- This is the added line
CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16) AS MyDate,
...
As soon as I remove it I get the conversion error again. Manually checking the values in the MyText column without trying to convert them does not show that there are any records which might cause a problem.
What is the reason for the conversion error? Why do I not run into it when I also select the column as part of the SELECT statement?
Update
Here the execution plan, although I don't think it's going to help.
Sometimes, SQL Server aggressively optimizes by pushing conversion operations earlier in the process than they would otherwise need to be. (It shouldn't. See SQL Server should not raise illogical errors on Connect, as an example).
When you just select:
CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16)
Then the optimizer decides it can perform this conversion as part of the table/index scan or seek - right at the point at which it's reading the data from the table (and, importantly, before, or at the same time, as the WHERE clause filter). The rest of the query can then just use the converted value.
When you select:
MyText, -- This is the added line
CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16)
It decides to let the conversion happen later. Importantly, the conversion now (by happenstance) happens later than the WHERE clause filter which should, by rights, be filtering all rows before the conversion is attempted.
The only safe way to deal with this is to force the filtering to definitely occur before the conversion is attempted. If you're not dealing with aggregates, a CASE expression may be safe enough:
SELECT CASE WHEN MyText LIKE 'Date: %' THEN CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16) END
Otherwise, the even safer option is to split the query into two separate queries, and store the intermediate results in a temp table or table variable (views, CTEs and subqueries don't count, because the optimizer can "see through" such constructs)

Avoiding execution of some PL/PGSQL functions by query rewriting

I would like to allow the execution of a PL/PGSQL function (my_function) only if its argument (my_table.x) belongs to a predefined interval (e.g. [100,1000]).
Let's take the following query example :
(q) SELECT * FROM my_table WHERE my_function(mytable.x);
I would like this query automatically rewrites itself to check whether mytable.x belongs to the interval [100,1000] :
(q') SELECT * FROM my_table WHERE (my_table.x BETWEEN 100 AND 1000) AND my_function(my_table.x);
The command EXPLAIN ANALYSE shows that the second query is really faster than the first one.
How can I change the query execution plan in order to automate the process of query rewriting (q into q') ?
Where can I store suitably the metadata about the interval [100,1000] associated to my_function ?
Thanks by advance,
Thomas Girault
The help I need will help a project about the integration of fuzzy logics into PostgreSQL : [https://github.com/postgresqlf/PostgreSQL_f/][PostgreSQLf]
The fastest way to catch it is something like this at the top of the function body:
IF $1 BETWEEN 100 AND 1000 THEN
-- proceed
ELSE
RETURN NULL; -- Or what ever you want to return in this case
END IF;
This should be very fast.
Actual query rewriting is done with the RULE system in PostgreSQL. But rules apply to tables and views, not to functions. You could wrap your query in a view - but then you can add the additional condition explicitly, which is cheaper.
CREATE VIEW v_tbl_only_valid_x AS
SELECT *
FROM tbl
WHERE x BETWEEN 100 AND 1000;
Call:
SELECT * FROM v_tbl_only_valid_x WHERE my_function(x);
This way the query planner gets the information about the selectivity of the query on the column x explicitly, which may result in a different query plan.
But wouldn't it be simpler to just add the second WHERE condition in your query like you have it in q'?