postgres STDDEV aggregate behavior when n<2 - postgresql

My Postgres query calculates statistical aggregate from a bunch of sensor readings:
SELECT to_char(ipstimestamp, 'YYYYMMDDHH24') As row_name,
to_char(ipstimestamp, 'FMDD mon FMHH24h') As hour_row_name,
varid As category,
(AVG(ipsvalue)::NUMERIC(5,2)) ||', ' ||
(MAX(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' ||
(MIN(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' ||
(STDDEV(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' As StatisticsValue
FROM loggingdb_ips_integer As log
JOIN ipsobjects_with_parent ips ON log.varid = ips.objectid
AND (ipstimestamp > (now()- '2 days'::interval))
GROUP BY row_name, hour_row_name, category;
This works fine as long as I have >1 ipsvalue/hour. If the hourly COUNT(ipsvalue)<2, however, StatisticsValue returns NULL without any Postgres errors.
If I comment out STTDEV, as in the following:
(AVG(ipsvalue)::NUMERIC(5,2)) ||', ' ||
(MAX(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' ||
(MIN(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' As value
then all three stats are calculated correctly. I therefore conclude that an illegittimate STDDEV brings down the whole query. I would rather have illegittimate STDDEVs returning 0. I tried to COALESCE the STDDEV line, to no avail. What can be done???

COALESCE should work.
You could also use (it that fits you) the "population standard deviation" stddev_pop, instead of the "sample standard deviation" stddev_samp; the later is divides by n-1 and is aliased to STDDEV. stddev_pop, instead , divides by n , and it returns zero (instead of NULL) when given one sample.
If you don't know the difference between these estimators, it's explained in every statistic textbook, eg http://en.wikipedia.org/wiki/Standard_deviation#Estimation

I found a workaround which is an alternative to COALESCE. In my specific instance, COALESCE is likely to perform better, but the workaround is potentially more flexible.
I have taken advantage of the IIF simulation described by Emanuel Calvo Franco and Hector de los Santos. IIF works pretty much like its homologue in MS Access. In my instance, the IIF function tests the result of STDDEV for NULL, and returns a "0" if true. The good thing about IIF is that it can test all sorts of conditions, not only NULL.

Related

PostgreSQL. How to concatenate two strings value without duplicates

I have two strings as below:
_var_1 text := '815 PAADLEY ROAD PL';
_var_2 text := 'PAADLEY ROAD PL';
_var_3 text;
I want to merge these two strings into one string and to remove duplicates:
_var_3 := _var_1 || _var_2;
As a result, the variable (_var_3) should contain only - 815 PAADLEY ROAD PL without dublicate.
Can you advise or help recommend any PostgreSQL feature?
I read the documentation and could not find the necessary string function to solve this problem... I am trying to use regexp_split_to_table but nothing is working.
I tried to use this method, but it's not what I need and the words in the output are mixed up::
WITH ts AS (
SELECT
unnest(
string_to_array('815 PAADLEY ROAD PL PAADLEY ROAD PL', ' ')
) f
)
SELECT
f
FROM ts
GROUP BY f
-- f
-- 815
-- ROAD
-- PL
-- PAADLEY
I assume you want to treat strings as word lists and then you have to concat them like they were a sets to be unioned, with retaining order. This is basically done by following SQL:
with splitted (val, input_number, word_number) as (
select v, 1, i
from unnest(regexp_split_to_array('815 PAADLEY 2 ROAD 3 PL',' ')) with ordinality as t(v,i)
union
select v, 2, i
from unnest(regexp_split_to_array('PAADLEY ROAD 4 PL',' ')) with ordinality as t(v,i)
), numbered as (
select val, input_number, word_number, row_number() over (partition by val order by input_number, word_number) as rn
from splitted
)
select string_agg(val,' ' order by input_number, word_number)
from numbered
where rn = 1
string_agg
815 PAADLEY 2 ROAD 3 PL 4
fiddle
However this is not kind of task to be solved in SQL in smart and elegant way. Moreover, it is not clear from your specification what to do with duplicate words or if you want to process multiple input pairs (both requirements would be possible, though SQL is probably not the right tool). At least please provide more sample inputs with expected outputs.

In DB2 SQL, is it possible to set a variable in the SELECT statement to use multiple times..?

In DB2 SQL, is it possible to SET a variable with the contents of a returned field in the SELECT statement, to use multiple times for calculated fields and criteria further along in the same SELECT statement?
The purpose is to shrink and streamline the code, by doing a calculation once at the beginning and using it multiple times later on...including the HAVING, WHERE, and ORDER BY.
To be honest, I'm not sure this is possible in any version of SQL, much less DB2.
This is on an IBM iSeries 8202 with DB2 SQL v6, which unfortunately is not a candidate for upgrade at this time. This is a very old & messy database, which I have no control over. I must regularly include "cleanup functions" in my SQL.
To to clarify the question, note the following pseudocode. Actual working code follows further below.
DECLARE smnum INTEGER --Not sure if this is correct.
SELECT
-- This is where I'm not sure what to do.
SET CAST((CASE WHEN %smnum%='' THEN '0' ELSE %smnum% END) AS INTEGER) INTO smnum,
%smnum% AS sm,
invdat,
invno,
daqty,
dapric,
dacost,
(dapric-dacost)*daqty AS profit
FROM
saleshistory
WHERE
%smNum% = 30
ORDER BY
%smnum%
Below is my actual working SQL. When adjusted for 2017 or 2016, it can return >10K rows, depending on the salesperson. The complete table has >22M rows.
That buttload of CASE((CAST... function is what I wish to replace with a variable. This is not the only example of this. If I can make it work, I have many other queries that could benefit from the technique.
SELECT
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER) AS DASM,
DAIDAT,
DAINV# AS DAINV,
DALIN# AS DALIN,
CAST(TRIM(DAITEM) AS INTEGER) AS DAITEM,
TRIM(DABSW) AS DABSW,
TRIM(DAPCLS) AS DAPCLS,
DAQTY,
DAPRIC,
DAICOS,
DADPAL,
(DAPRIC-DAICOS+DADPAL)*DAQTY AS PROFIT
FROM
VIPDTAB.DAILYV
WHERE
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER)=30 AND
TRIM(DABSW)='B' AND
DAIDAT BETWEEN (YEAR(CURDATE())*10000) AND (((YEAR(CURDATE())+1)*10000)-1) AND
CAST(TRIM(DACOMP) AS INTEGER)=1
ORDER BY
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER),
DAIDAT,
DAINV#,
DALIN#
Just use a subquery or CTE. I can't figure out the actual logic you want, but the structure looks like this:
select . . .
from (select d.*,
(CASE . . . END) as calc_field
from VIPDTAB.DAILYV d
) d
No variable declaration is needed.
Here is what your SQL would look like with the sub-query that Gordon suggested:
SELECT
DASM,
DAIDAT,
DAINV# AS DAINV,
DALIN# AS DALIN,
CAST(DAITEM AS INTEGER) AS DAITEM,
TRIM(DABSW) AS DABSW,
TRIM(DAPCLS) AS DAPCLS,
DAQTY,
DAPRIC,
DAICOS,
DADPAL,
(DAPRIC-DAICOS+DADPAL)*DAQTY AS PROFIT
FROM
(SELECT
D.*,
CAST((CASE WHEN D.DASM#='' THEN '0' ELSE D.DASM# END) AS INTEGER) AS DASM
FROM VIPDTAB.DAILYV D
) D
WHERE
DASM=30 AND
TRIM(DABSW)='B' AND
DAIDAT BETWEEN (YEAR(CURDATE())*10000) AND (((YEAR(CURDATE())+1)*10000)-1) AND
CAST(DACOMP AS INTEGER)=1
ORDER BY
DASM,
DAIDAT,
DAINV#,
DALIN#
Notice that I removed a lot of the trim() functions, and you could likely remove the rest. The way IBM resolves the Varchar vs. Char comparison thing is by ignoring trailing blanks. So trim(anything) = '' is the same as anything = ''. And since cast(' 123 ' as integer) = 123, I have removed trims from within the cast functions as well. In addition trim(dabsw) = 'B' is the same as dabsw = 'B' as long as the 'B' is the first character in dabsw. So you could even remove that trim if all you are concerned with is trailing blanks.
Here are some additional notes based on comments. The above paragraph is not talking about auto-trim. Fixed length fields will always return as fixed length fields, the trailing blanks will remain. But in comparisons and expressions where trailing blanks are unimportant, or even a hindrance, they are ignored. In expressions where trailing blanks are important, like concatenation, the trailing blanks are not ignored. Another thing, trim() removes both leading and trailing blanks. If you are using trim() to read a fixed length character field into a Varchar, then rtrim() is likely the better choice as it only removes the trailing blanks.
Also, I didn't go through your fields to make sure I got everything you need, I just used * in the sub-query. For performance, it would be best to only return the fields you need. So if you replace D.* with an actual field list, you can remove the correlation name in the from clause of the sub-query. But, the sub-query itself still needs a correlation clause.
My verification was done using IBM i v7.1.
You can encapsalate the case statement in a view. I even have the fancy profit calc in there for you to order by profit. Now the biggest issue you have is the CCSID on the view for calculated columns but that's another question.
create or replace view VIPDTAB.DAILYVQ as
SELECT
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER) AS DASM,
DAIDAT,
DAINV# AS DAINV,
DALIN# AS DALIN,
CAST(TRIM(DAITEM) AS INTEGER) AS DAITEM,
TRIM(DABSW) AS DABSW,
TRIM(DAPCLS) AS DAPCLS,
DAQTY,
DAPRIC,
DAICOS,
DADPAL,
(DAPRIC-DAICOS+DADPAL)*DAQTY AS PROFIT
FROM
VIPDTAB.DAILYV
now you can
select dasm, count(*) from vipdtab.dailyvq where dasm = 0 group by dasm order by dasm
or
select * from vipdtab.dailyvq order by profit desc

PostgreSQL Division by ZERO

I'd like to perform division in a SELECT clause. When I join some tables and use aggregate function I often have either null or zero values as the dividers. As for now I only come up with this method of avoiding the division by zero and null values.
select
date_part('week', startmeasurement::date) AS week,
(COUNT(CASE WHEN new_spm.status IN ('Closed','Resolved')THEN 1 ELSE NULL END)
*100/count(case when new_spm.status !='Cancelled' THEN 1 ELSE NULL END)::double precision) AS percentage_closed_and_resolved
from new_spm
WHERE new_spm.divisi='CNOS-HQ'
GROUP BY week;
Take a look at the COALESCE expression available in postgresql. It should significantly simplify your current approach.
https://www.postgresql.org/docs/current/static/functions-conditional.html

Unexpected SQL results: string vs. direct SQL

Working SQL
The following code works as expected, returning two columns of data (a row number and a valid value):
sql_amounts := '
SELECT
row_number() OVER (ORDER BY taken)::integer,
avg( amount )::double precision
FROM
x_function( '|| id || ', 25 ) ca,
x_table m
WHERE
m.category_id = 1 AND
m.location_id = ca.id AND
extract( month from m.taken ) = 1 AND
extract( day from m.taken ) = 1
GROUP BY
m.taken
ORDER BY
m.taken';
FOR r, amount IN EXECUTE sql_amounts LOOP
SELECT array_append( v_row, r::integer ) INTO v_row;
SELECT array_append( v_amount, amount::double precision ) INTO v_amount;
END LOOP;
Non-Working SQL
The following code does not work as expected; the first column is a row number, the second column is NULL.
FOR r, amount IN
SELECT
row_number() OVER (ORDER BY taken)::integer,
avg( amount )::double precision
FROM
x_function( id, 25 ) ca,
x_table m
WHERE
m.category_id = 1 AND
m.location_id = ca.id AND
extract( month from m.taken ) = 1 AND
extract( day from m.taken ) = 1
GROUP BY
m.taken
ORDER BY
m.taken
LOOP
SELECT array_append( v_row, r::integer ) INTO v_row;
SELECT array_append( v_amount, amount::double precision ) INTO v_amount;
END LOOP;
Question
Why does the non-working code return a NULL value for the second column when the query itself returns two valid columns? (This question is mostly academic; if there is a way to express the query without resorting to wrapping it in a text string, that would be great to know.)
Full Code
http://pastebin.com/hgV8f8gL
Software
PostgreSQL 8.4
Thank you.
The two statements aren't strictly equivalent.
Assuming id = 4, the first one gets planned/prepared on each pass, and behaves like:
prepare dyn_stmt as '... x_function( 4, 25 ) ...'; execute dyn_stmt;
The other gets planned/prepared on the first pass only, and behaves more like:
prepare stc_stmt as '... x_function( $1, 25 ) ...'; execute stc_stmt(4);
(The loop will actually make it prepare a cursor for the above, but that's besides the point for our sake.)
A number of factors can make the two yield different results.
Search path changes before calling the procedure will be ignored by the second call. In particular if this makes x_table point to something different.
Constants of all kinds and calls to immutable functions are "hard-wired" in the second call's plan.
Consider this as an illustration of these side-effects:
deallocate all;
begin;
prepare good as select now();
prepare bad as select current_timestamp;
execute good; -- yields the current timestamp
execute bad; -- yields the current timestamp
commit;
execute good; -- yields the current timestamp
execute bad; -- yields the timestamp at which it was prepared
Why the two aren't returning the same results in your case would depend on the context (you only posted part of your pl/pgsql function, so it's hard to tell), but my guess is you're running into a variation of the above kind of problem.
From Tom Lane:
I think the problem is that you're assuming "amount" will refer to a table column of the query, when actually it's a local variable of the plpgsql function. The second interpretation will take precedence unless you qualify the column reference with the table's name/alias.
Note: PG 9.0 will throw an error by default when there is an ambiguity of this type.

SUBSTR does not work with datatype "timestamp" in Postgres 8.3

I have a problem with the query below in postgres
SELECT u.username,l.description,l.ip,SUBSTRING(l.createdate,0,11) as createdate,l.action
FROM n_logs AS l LEFT JOIN n_users AS u ON u.id = l.userid
WHERE SUBSTRING(l.createdate,0,11) >= '2009-06-07'
AND SUBSTRING(l.createdate,0,11) <= '2009-07-07';
I always used the above query in an older version of postgres and it worked 100%. Now with the new version of posgres it gives me errors like below
**ERROR: function pg_catalog.substring(timestamp without time zone, integer, integer) does not exist
LINE 1: SELECT u.username,l.description,l.ip,SUBSTRING(l.createdate,...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.**
I assume it has something to do with datatypes, that the data is a time zone and that substring only support string datatypes, now my question is what can I do about my query so that my results would come up?
The explicit solution to your problem is to cast the datetime to string.
...,SUBSTRING(l.createdate::varchar,...
Now, this isn't at all a good practice to use the result to compare dates.
So, the good solution to your need is to change your query using the explicit datetime manipulation, comparison and formatting functions, like extract() and to_char()
You'd have to change your query to have a clause like
l.createdate::DATE >= '2009-06-07'::DATE
AND l.createdate::DATE < '2009-07-08'::DATE;
or one of the alternatives below (which you should really accept instead of this.)
SELECT u.username, l.description, l.ip,
CAST(l.createdate AS DATE) as createdate,
l.action
FROM n_logs AS l
LEFT JOIN
n_users AS u
ON u.id = l.userid
WHERE l.createdate >= '2009-06-07'::TIMESTAMP
AND l.createdate < '2009-07-07'::TIMESTAMP + '1 DAY'::INTERVAL
I'm not sure what you want to achieve, but basically "substring" on date datatypes is not really well defined, as it depends on external format of said data.
In most of the cases you should use extract() or to_char() functions.
Generally - for returning data you want to_char(), and for operations on it (including comparison) - extract(). There are some cases where this general rule does not apply, but these are usually signs of not really well thought data-structure.
Example:
# select to_char( now(), 'YYYY-MM-DD');
to_char
------------
2009-07-07
(1 row)
For extract let's write a simple query that will list all objects created after 8pm:
select * from objects where extract(hour from created) >= 20;
A variation on the Quassnoi's answer:
SELECT
u.username,
l.description,
l.ip,
CAST(l.createdate AS DATE) as createdate,
l.action
FROM
n_logs AS l
LEFT JOIN
n_users AS u
ON
(u.id = l.userid)
WHERE
l.createdate::DATE BETWEEN '2009-06-07'::DATE AND '2009-07-07'::DATE
If you use Postgresql, you will receive:
select('SUBSTRING(offer.date_closed, 0, 11)')
function substr(timestamp without time zone integer integer) does not
exist
Use:
select('SUBSTRING(CONCAT(offer.date_closed, \'\'), 0, 11)')