I'd like to perform division in a SELECT clause. When I join some tables and use aggregate function I often have either null or zero values as the dividers. As for now I only come up with this method of avoiding the division by zero and null values.
select
date_part('week', startmeasurement::date) AS week,
(COUNT(CASE WHEN new_spm.status IN ('Closed','Resolved')THEN 1 ELSE NULL END)
*100/count(case when new_spm.status !='Cancelled' THEN 1 ELSE NULL END)::double precision) AS percentage_closed_and_resolved
from new_spm
WHERE new_spm.divisi='CNOS-HQ'
GROUP BY week;
Take a look at the COALESCE expression available in postgresql. It should significantly simplify your current approach.
https://www.postgresql.org/docs/current/static/functions-conditional.html
Related
Suppose that I have a lot of NULL values (missing values) in a column named 'score'. I want to replace them by a specific average not from all the values of the column 'score' but by groups that I built with a crosscategory from two concatenated categories:
This kind of query works for getting averages by groups:
SELECT
category1 || ' > ' || category2 AS crosscategory,
ROUND(CAST(AVG(score) AS FLOAT), 2) AS score_avg
FROM DatabaseName.TableName
GROUP BY crosscategory
ORDER BY score_avg;
This one works to replace NULL values by a constant:
SELECT
NVL(score, 0) AS score_without_missing_values
FROM DatabaseName.TableName
The problem that I cannot solve now is how to articulate the replacement of NULL values with a constant here the averages computed with the functions AVG and GROUP BY.
Thank you very much for your help!
Seems you want a Group Average:
SELECT
t.*,
coalesce(score, AVG(score) OVER (PARTITION BY category1, category2)) AS score_avg
FROM DatabaseName.TableName AS t
I removed the ROUND/CAST, because AVG returns FLOAT by default and ROUND in probably not needed (if you need it, you might better cast to a DECIMAL).
What is the argument type for the order by clause in Postgresql?
I came across a very strange behaviour (using Postgresql 9.5). Namely, the query
select * from unnest(array[1,4,3,2]) as x order by 1;
produces 1,2,3,4 as expected. However the query
select * from unnest(array[1,4,3,2]) as x order by 1::int;
produces 1,4,3,2, which seems strange. Similarly, whenever I replace 1::int with whatever function (e.g. greatest(0,1)) or even case operator, the results are unordered (on the contrary to what I would expect).
So which type should an argument of order by have, and how do I get the expected behaviour?
This is expected (and documented) behaviour:
A sort_expression can also be the column label or number of an output column
So the expression:
order by 1
sorts by the first column of the result set (as defined by the SQL standard)
However the expression:
order by 1::int
sorts by the constant value 1, it's essentially the same as:
order by 'foo'
By using a constant value for the order by all rows have the same sort value and thus aren't really sorted.
To sort by an expression, just use that:
order by
case
when some_column = 'foo' then 1
when some_column = 'bar' then 2
else 3
end
The above sorts the result based on the result of the case expression.
Actually I have a function with an integer argument which indicates the column to be used in the order by clause.
In a case when all columns are of the same type, this can work: :
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1
WHEN 2 THEN column2
.....
WHEN 1235 THEN column1235
END
If columns are of different types, you can try:
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1::varchar
WHEN 2 THEN column2::varchar
.....
WHEN 1235 THEN column1235::varchar
END
But these "workarounds" are horrible. You need some other approach than the function returning a column number.
Maybe a dynamic SQL ?
I would say that dynamic SQL (thanks #kordirko and the others for the hints) is the best solution to the problem I originally had in mind:
create temp table my_data (
id serial,
val text
);
insert into my_data(id, val)
values (default, 'a'), (default, 'c'), (default, 'd'), (default, 'b');
create function fetch_my_data(col text)
returns setof my_data as
$f$
begin
return query execute $$
select * from my_data
order by $$|| quote_ident(col);
end
$f$ language plpgsql;
select * from fetch_my_data('val'); -- order by val
select * from fetch_my_data('id'); -- order by id
In the beginning I thought this could be achieved using case expression in the argument of the order by clause - the sort_expression. And here comes the tricky part which confused me: when sort_expression is a kind of identifier (name of a column or a number of a column), the corresponding column is used when ordering the results. But when sort_expression is some value, we actually order the results using that value itself (computed for each row). This is #a_horse_with_no_name's answer rephrased.
So when I queried ... order by 1::int, in a way I have assigned value 1 to each row and then tried to sort an array of ones, which clearly is useless.
There are some workarounds without dynamic queries, but they require writing more code and do not seem to have any significant advantages.
My Postgres query calculates statistical aggregate from a bunch of sensor readings:
SELECT to_char(ipstimestamp, 'YYYYMMDDHH24') As row_name,
to_char(ipstimestamp, 'FMDD mon FMHH24h') As hour_row_name,
varid As category,
(AVG(ipsvalue)::NUMERIC(5,2)) ||', ' ||
(MAX(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' ||
(MIN(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' ||
(STDDEV(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' As StatisticsValue
FROM loggingdb_ips_integer As log
JOIN ipsobjects_with_parent ips ON log.varid = ips.objectid
AND (ipstimestamp > (now()- '2 days'::interval))
GROUP BY row_name, hour_row_name, category;
This works fine as long as I have >1 ipsvalue/hour. If the hourly COUNT(ipsvalue)<2, however, StatisticsValue returns NULL without any Postgres errors.
If I comment out STTDEV, as in the following:
(AVG(ipsvalue)::NUMERIC(5,2)) ||', ' ||
(MAX(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' ||
(MIN(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' As value
then all three stats are calculated correctly. I therefore conclude that an illegittimate STDDEV brings down the whole query. I would rather have illegittimate STDDEVs returning 0. I tried to COALESCE the STDDEV line, to no avail. What can be done???
COALESCE should work.
You could also use (it that fits you) the "population standard deviation" stddev_pop, instead of the "sample standard deviation" stddev_samp; the later is divides by n-1 and is aliased to STDDEV. stddev_pop, instead , divides by n , and it returns zero (instead of NULL) when given one sample.
If you don't know the difference between these estimators, it's explained in every statistic textbook, eg http://en.wikipedia.org/wiki/Standard_deviation#Estimation
I found a workaround which is an alternative to COALESCE. In my specific instance, COALESCE is likely to perform better, but the workaround is potentially more flexible.
I have taken advantage of the IIF simulation described by Emanuel Calvo Franco and Hector de los Santos. IIF works pretty much like its homologue in MS Access. In my instance, the IIF function tests the result of STDDEV for NULL, and returns a "0" if true. The good thing about IIF is that it can test all sorts of conditions, not only NULL.
I have a table with a varchar column, and I want to find values that match a certain number. So lets say that column contains the following entries (except with millions of rows in real life):
123456789012
2345678
3456
23 45
713?2
00123456789012
So I decide I want all the rows which are numerically 123456789012 write a statement that looks something like this:
SELECT * FROM MyTable WHERE CAST(MyColumn as bigint) = 123456789012
It should return the first and last row, but instead the whole query blows up because it can't convert the "23 45" and "713?2" to bigint.
Is there another way to do the conversion that will return NULL for values that can't convert?
SQL Server does NOT guarantee boolean operator short-circuit, see On SQL Server boolean operator short-circuit. So all solution using ISNUMERIC(...) AND CAST(...) are fundamentally flawed (they may work, but hey can arbitrarily fail later dependiong on the generated plan). A better solution is using CASE, as Thomas suggests: CASE ISNUMERIC(...) WHEN 1 THEN CAST(...) ELSE NULL END. But, as gbn pointed out, ISNUMERIC is notoriously finicky in identifying what 'numeric' means and many cases where one would expect it to return 0 it returns 1. So mixing the CASE with the LIKE:
CASE WHEN MyRow NOT LIKE '%[^0-9]%' THEN CAST(MyRow as bigint) ELSE NULL END
But the real problem is that if you have millions of rows and you have to search them like this, you'll always end up scanning end-to-end since the expression is not SARG-able (no matter how we rewrite it). The real issue here is data purity, and should be addressed at the appropriate level, where the data is populated. Another thing to consider is if is possible to create a persisted computed column with this expression and create a filtered index on it which eliminates NULL (ie. non-numeric). That would speed up things a little.
If you are using SQL Server 2012 you can use the 2 new methods:
TRY_CAST()
TRY_CONVERT()
Both methods are equivalent. They return a value cast to the specified data type if the cast succeeds; otherwise, returns null. The only difference is that CONVERT is SQL Server specific, CAST is ANSI. using CAST will make your code more portable (although not sure if any other database provider implements TRY_CAST)
ISNUMERIC will accept empty string and values like 1.23 or 5E-04 so could be unreliable.
And you don't know what order things will be evaluated in so it could still fail (SQL is declarative, not procedural, so the WHERE clause probably won't be evaluated left to right)
So:
you want to accept value that consist only of the characters 0-9
you need to materialise the "number" filter so it's applied before CAST
Something like:
SELECT
*
FROM
(
SELECT TOP 2000000000 *
FROM MyTable
WHERE MyColumn NOT LIKE '%[^0-9]%' --double negative rejects anything except 0-9
ORDER BY MyColumn
) foo
WHERE
CAST(MyColumn as bigint) = 123456789012 --applied after number check
Edit: quick example that fails.
CREATE TABLE #foo (bigintstring varchar(100))
INSERT #foo (bigintstring )VALUES ('1.23')
INSERT #foo (bigintstring )VALUES ('1 23')
INSERT #foo (bigintstring )VALUES ('123')
SELECT * FROM #foo
WHERE
ISNUMERIC(bigintstring) = 1
AND
CAST(bigintstring AS bigint) = 123
SELECT *
FROM MyTable
WHERE ISNUMERIC(MyRow) = 1
AND CAST(MyRow as float) = 123456789012
The ISNUMERIC() function should give you what you need.
SELECT * FROM MyTable
WHERE ISNUMERIC(MyRow) = 1
AND CAST(MyRow as bigint) = 123456789012
And to add a case statement like Thomas suggested:
SELECT * FROM MyTable
WHERE CASE(ISNUMERIC(MyRow)
WHEN 1 THEN CAST(MyRow as bigint)
ELSE NULL
END = 123456789012
http://msdn.microsoft.com/en-us/library/ms186272.aspx
SELECT *
FROM MyTable
WHERE (ISNUMERIC(MyColumn) = 1) AND (CAST(MyColumn as bigint) = 123456789012)
Additionally you can use a CASE statement in order to get null values.
SELECT
CASE
WHEN (ISNUMERIC(MyColumn) = 1) THEN CAST(MyColumn as bigint)
ELSE NULL
END AS 'MyColumnAsBigInt'
FROM tableName
If you require additional filtering, for numerics which are not valid to be cast to bigint, you can use the following instead of ISNUMERIC:
PATINDEX('%[^0-9]%',MyColumn)) = 0
If you need decimal values instead of integers, cast to float instead and change the regex to '%[^0-9.]%'
I have a problem with the query below in postgres
SELECT u.username,l.description,l.ip,SUBSTRING(l.createdate,0,11) as createdate,l.action
FROM n_logs AS l LEFT JOIN n_users AS u ON u.id = l.userid
WHERE SUBSTRING(l.createdate,0,11) >= '2009-06-07'
AND SUBSTRING(l.createdate,0,11) <= '2009-07-07';
I always used the above query in an older version of postgres and it worked 100%. Now with the new version of posgres it gives me errors like below
**ERROR: function pg_catalog.substring(timestamp without time zone, integer, integer) does not exist
LINE 1: SELECT u.username,l.description,l.ip,SUBSTRING(l.createdate,...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.**
I assume it has something to do with datatypes, that the data is a time zone and that substring only support string datatypes, now my question is what can I do about my query so that my results would come up?
The explicit solution to your problem is to cast the datetime to string.
...,SUBSTRING(l.createdate::varchar,...
Now, this isn't at all a good practice to use the result to compare dates.
So, the good solution to your need is to change your query using the explicit datetime manipulation, comparison and formatting functions, like extract() and to_char()
You'd have to change your query to have a clause like
l.createdate::DATE >= '2009-06-07'::DATE
AND l.createdate::DATE < '2009-07-08'::DATE;
or one of the alternatives below (which you should really accept instead of this.)
SELECT u.username, l.description, l.ip,
CAST(l.createdate AS DATE) as createdate,
l.action
FROM n_logs AS l
LEFT JOIN
n_users AS u
ON u.id = l.userid
WHERE l.createdate >= '2009-06-07'::TIMESTAMP
AND l.createdate < '2009-07-07'::TIMESTAMP + '1 DAY'::INTERVAL
I'm not sure what you want to achieve, but basically "substring" on date datatypes is not really well defined, as it depends on external format of said data.
In most of the cases you should use extract() or to_char() functions.
Generally - for returning data you want to_char(), and for operations on it (including comparison) - extract(). There are some cases where this general rule does not apply, but these are usually signs of not really well thought data-structure.
Example:
# select to_char( now(), 'YYYY-MM-DD');
to_char
------------
2009-07-07
(1 row)
For extract let's write a simple query that will list all objects created after 8pm:
select * from objects where extract(hour from created) >= 20;
A variation on the Quassnoi's answer:
SELECT
u.username,
l.description,
l.ip,
CAST(l.createdate AS DATE) as createdate,
l.action
FROM
n_logs AS l
LEFT JOIN
n_users AS u
ON
(u.id = l.userid)
WHERE
l.createdate::DATE BETWEEN '2009-06-07'::DATE AND '2009-07-07'::DATE
If you use Postgresql, you will receive:
select('SUBSTRING(offer.date_closed, 0, 11)')
function substr(timestamp without time zone integer integer) does not
exist
Use:
select('SUBSTRING(CONCAT(offer.date_closed, \'\'), 0, 11)')