This code works fine and does exactly what I want, which is to sum the Qty * Price for each instance of the dynamic query.
But when I add an IIF statement it breaks. What I am trying to do is the same thing as above but when the transaction type is 'CO' set the sum to a negative amount.
The problem turned out to be the NVARCHAR(4000) type of #sql, limiting its length to 4000 characters: the query got truncated at some random place after adding another long chunk to it.
DECLARE #sql NVARCHAR(MAX) solves the problem, allowing a dynamic query of any size below 2GB.
Related
Good Afternoon All,
Can anyone advise if I can dynamically declare and assign values to variables in the scenario described below?
I've written a stored procedure (sproc) that calculates the % of members in subgroups of an organization.
I know there are 7 subgroups. I store the results of the percentage calculation in 7 variables which I use later in the sproc. Each variable is named according to the subgroup name.
Naturally this means if there are changes in the name or number of subgroups, I have to rewrite parts of the sproc.
I believe dynamic SQL could be used to allow the sproc to adjust to changes in the subgroups, but I'm not sure how to set up dynamic SQL for this scenario. Can anyone offer an example or guidance?
What you're proposing goes against the grain in Sql Server. Your concern about having to rewrite later kinda speaks to this...so you're on the right track to be concerned.
Generally, you'd want to make your results into some kind of set-oriented thing...table-like...where one column has the name of the subgroup and the other column has the calculated value.
You might find table-valued functions more appropriate for your problem...but it's hard to say...we're not deep on specifics in this question.
Dynamic SQL is almost always the last resort. It seems fun, but has all sorts of issues...not the least of which is addressing the results in a programmatically safe and consistent way.
You can follow this simple query to see how you can do that
declare #sql nvarchar(max) = ''
declare #outerX int = 0 -- this is your variable you want to set it from dynamic SQL
declare #i int = 0 -- for loop
while #i <= 6
begin
set #sql = 'select #x = ' + CAST(#i as varchar)
exec sp_executesql #sql, N'#x int OUTPUT', #x = #outerX output
set #i = #i + 1
print #outerX
end
Output will be
0
1
2
3
4
5
6
More Detail Here
I have a table with common word values to match against brands - so when someone types in "coke" I want to match any possible brand names associated with it as well as the original term.
CREATE TABLE word_association ( commonterm TEXT, assocterm TEXT);
INSERT INTO word_association ('coke', 'coca-cola'), ('coke', 'cocacola'), ('coke', 'coca-cola');
I have a function to create a list of these values in a pipe-delim string for pattern matching:
CREATE OR REPLACE FUNCTION usp_get_search_terms(userterm text)
RETURNS text AS
$BODY$DECLARE
returnstr TEXT DEFAULT '';
BEGIN
SET DATESTYLE TO DMY;
returnstr := userterm;
IF EXISTS (SELECT 1 FROM word_association WHERE LOWER(commonterm) = LOWER(userterm)) THEN
SELECT returnstr || '|' || string_agg(assocterm, '|') INTO returnstr
FROM word_association
WHERE commonterm = userterm;
END IF;
RETURN returnstr;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION usp_get_search_terms(text)
OWNER TO customer_role;
If you call SELECT * FROM usp_get_search_terms('coke') you end up with
coke|coca-cola|cocacola|coca cola
EDIT: this function runs <100ms so it works fine.
I want to run a query with this text inserted e.g.
SELECT X.article_number, X.online_description
FROM articles X
WHERE LOWER(X.online_description) % usp_get_search_terms ('coke');
This takes approx 56s to run against my table of ~500K records.
If I get the raw text and use it in the query it takes ~300ms e.g.
SELECT X.article_number, X.online_description
FROM articles X
WHERE X.online_description % '(coke|coca-cola|cocacola|coca cola)';
The result sets are identical.
I've tried modifying what the output string from the function to e.g. enclose it in quotes and parentheses but it doesn't seem to make a difference.
Can someone please advise why there is a difference here? Is it the data type or something about calling functions inside queries? Thanks.
Your function might take 100ms, but it's not calling your function once; it's calling it 500,000 times.
It's because your function is declared VOLATILE. This tells Postgres that either the function returns different values when called multiple times within a query (like clock_timestamp() or random()), or that it alters the state of the database in some way (for example, by inserting records).
If your function contains only SELECTs, with no INSERTs, calls to other VOLATILE functions, or other side-effects, then you can declare it STABLE instead. This tells the planner that it can call the function just once and reuse the result without affecting the outcome of the query.
But your function does have side-effects, due to the SET DATESTYLE statement, which takes effect for the rest of the session. I doubt this was the intention, however. You may be able to remove it, as it doesn't look like date formatting is relevant to anything in there. But if it is necessary, the correct approach is to use the SET clause of the CREATE FUNCTION statement to change it only for the duration of the function call:
...
$BODY$
LANGUAGE plpgsql STABLE
SET DATESTYLE TO DMY
COST 100;
The other issue with the slow version of the query is the call to LOWER(X.online_description), which will prevent the query from utilising the index (since online_description is indexed, but LOWER(online_description) is not).
With these changes, the performance of both queries is the same; see this SQLFiddle.
So the answer came to me about dawn this morning - CTEs to the rescue!
Particularly as this is the "simple" version of a very large query, it helps to get this defined once in isolation, then do the matching against it. The alternative (given I'm calling this from a NodeJS platform) is to have one request retrieve the string of terms, then make another request to pass the string back. Not elegant.
WITH matches AS
( SELECT * FROM usp_get_search_terms('coke') )
, main AS
( SELECT X.article_number, X.online_description
FROM articles X
JOIN matches M ON X.online_description % M.usp_get_search_terms )
SELECT * FROM main
Execution time is somewhere around 300-500ms depending on term searched and articles returned.
Thanks for all your input guys - I've learned a few things about PostGres that my MS-SQL background didn't necessarily prepare me for :)
Have you tried removing the IF EXISTS() and simply using:
SELECT returnstr || '|' || string_agg(assocterm, '|') INTO returnstr
FROM word_association
WHERE LOWER(commonterm) = LOWER(userterm)
In instead of calling the function for each row call it once:
select x.article_number, x.online_description
from
woolworths.articles x
cross join
woolworths.usp_get_search_terms ('coke') c (s)
where lower(x.online_description) % s
I have a function in PL/pgSQL that is trying to back out some data for a date range. The problem I have is that I cannot seem to store the double precision inside a variable. No matter what I do the value is always null when running inside a function. When I run the query from psql command line it returns me the correct data. I can also run the query on another column that is isn't of type double precision and it works fine. For example if I change the column to "total_impressions_for_date_range" it will return me the correct data.
I am using PostgreSQL 8.4
CREATE OR REPLACE FUNCTION rollback_date_range_revenue(campaign_id int,
begin_date timestamp, end_date timestamp, autocommit boolean)
RETURNS void AS $BODY$
DECLARE
total_impressions_for_date_range bigint;
total_clicks_for_date_range bigint;
total_revenue_for_date_range double precision;
total_cost_for_date_range double precision;
BEGIN
SELECT sum(revenue) INTO total_revenue_for_date_range
FROM ad_block_summary_hourly
WHERE ad_run_id IN (
SELECT ad_run_id FROM ad_run WHERE ad_campaign_id = campaign_id)
AND ad_summary_time >= begin_date
AND ad_summary_time < end_date
AND (revenue IS NOT NULL);
RAISE NOTICE 'Total revenue for given date range and campaign % was %',
campaign_id, total_revenue_for_date_range;
When I run this I always get a null value for the revenue
SELECT rollback_date_range_revenue(8818, '2015-07-20 18:00:00'::timestamp,
'2015-07-20 20:00:00'::timestamp, false);
NOTICE: Total revenue for given date range and campaign 8818 was <NULL>
When I run it from command line outside of the function it works completely fine
select sum(revenue) from ad_block_summary_hourly where ad_run_id in (
select ad_run_id from ad_run where ad_campaign_id = 8818) and ad_summary_time
>= '2015-07-20 18:00:00'::TIMESTAMP and ad_summary_time < '2015-07-20
20:00:00'::TIMESTAMP ;
sum
----------
3122.533
(1 row)
EDIT
Huge thanks to a_horse_with_no_name and Patrick. This was indeed a problem with a place holder I had called revenue which overlapped with my query. I was thrown off by the fact that the two queries that were not working were both double precision. It just happened to be that those two were also the place holders that I had overlapped with column names.
2 things to take away from this.
I adopted the p_ naming scheme for place holders suggested by a_horse_with_no_name, so as to not run into this issue again.
Post a full code example, this could have been identified much quicker by the experts.
First of all, PostgreSQL 8.4 is no longer supported so you should upgrade to 9.4 as soon as you can. Second, your function is obviously abbreviated because some declared variables are not used and there is no END clause. These two points together make it somewhat guesswork to give you an answer, but here goes.
Try casting the double precision to text, or convert it with to_char(). RAISE NOTICE expects a string for the expressions to be inserted; possibly in 8.4 this is not automatic.
You could also improve upon your query:
...
SELECT sum(sh.revenue) INTO total_revenue_for_date_range
FROM ad_block_summary_hourly sh
JOIN ad_run r USING (ad_run_id)
WHERE r.ad_campaign_id = campaign_id
AND sh.ad_summary_time BETWEEN begin_date AND end_date;
RAISE NOTICE 'Total revenue for given date range and campaign % was %',
campaign_id, to_char(total_revenue_for_date_range, '9D999');
...
Another potential cause of the problem (guessing again due to lack of information) is a name collision between a function parameter or variable with a column name from either of the two tables.
When creating a user defined function is it "bad" to just automatically use the largest string possible?
For example, given the following UDF, I've used nvarchar(max) for my input string, where I know perfectly well that the function currently isn't going to need to accept nvarchar(max) and maybe I'm just forward thinking too much, but I supposed that there would always be the possibility that maybe an nvarchar(max) would be passed to this function.
By doing something "bad" I'm wondering that by declaring that this function could possibly receive and actual nvarchar(max) am I doing anything to possibly cripple performance?
CREATE FUNCTION [dbo].[IsMasNull] (#value nvarchar(max))
RETURNS BIT
AS
BEGIN
RETURN
CASE
WHEN #value IS NULL THEN 1
WHEN CHARINDEX(char(0), #value) > 0 THEN 1
ELSE 0
END
END
NVARCHAR(MAX) will affect performance if its a database column. As a parameter to a stored procedure it should make no difference. If at all there is a degraded performance its because of the sheer size of the data and not the datatype.
I think this is best asked in the form of a simple example. The following chunk of SQL causes a "DB-Library Error:20049 Severity:4 Message:Data-conversion resulted in overflow" message, but how come?
declare #a numeric(18,6), #b numeric(18,6), #c numeric(18,6)
select #a = 1.000000, #b = 1.000000, #c = 1.000000
select #a/(#b/#c)
go
How is this any different to:
select 1.000000/(1.000000/1.000000)
go
which works fine?
I ran into the same problem the last time I tried to use Sybase (many years ago). Coming from a SQL Server mindset, I didn't realize that Sybase would attempt to coerce the decimals out -- which, mathematically, is what it should do. :)
From the Sybase manual:
Arithmetic overflow errors occur when
the new type has too few decimal
places to accommodate the results.
And further down:
During implicit conversions to numeric
or decimal types, loss of scale
generates a scale error. Use the
arithabort numeric_truncation option
to determine how serious such an error
is considered. The default setting,
arithabort numeric_truncation on,
aborts the statement that causes the
error but continues to process other
statements in the transaction or
batch. If you set arithabort
numeric_truncation off, Adaptive
Server truncates the query results and
continues processing.
So assuming that the loss of precision is acceptable in your scenario, you probably want the following at the beginning of your transaction:
SET ARITHABORT NUMERIC_TRUNCATION OFF
And then at the end of your transaction:
SET ARITHABORT NUMERIC_TRUNCATION ON
This is what solved it for me those many years ago ...
This is just speculation, but could it be that the DBMS doesn't look at the dynamic value of your variables but only the potential values? Thus, a six-decimal numeric divided by a six-decimal numeric could result in a twelve-decimal numeric; in the literal division, the DBMS knows there is no overflow. Still not sure why the DBMS would care, though--shouldn't it return the result of two six-decimal divisions as up to a 18-decimal numeric?
Because you have declared the variables in the first example the result is expected to be of the same declaration (i.e. numeric (18,6)) but it is not.
I have to say that the first one worked in SQL2005 though (returned 1.000000 [The same declared type]) while the second one returned (1.00000000000000000000000 [A total different declaration]).
Not directly related, but could possibly save someone some time with the Arithmetic overflow errors using Sybase ASE (12.5.0.3).
I was setting a few default values in a temporary table which I intended to update later on, and stumbled on to an Arithmetic overflow error.
declare #a numeric(6,3)
select 0.000 as thenumber into #test --indirect declare
select #a = ( select thenumber + 100 from #test )
update #test set thenumber = #a
select * from #test
Shows the error:
Arithmetic overflow during implicit conversion of NUMERIC value '100.000' to a NUMERIC field .
Which in my head should work, but doesn't as the 'thenumber' column wasn't declared ( or indirectly declared as decimal(4,3) ). So you would have to indirectly declare the temp table column with scale and precision to the format you want, as in my case was 000.000.
select 000.000 as thenumber into #test --this solved it
Hopefully that saves someone some time :)