PostgreSQL doesn't update/copy boolean from subselect? - postgresql

Here's a short code sample that behaves unexpectedly for me in PostgreSQL v9.5:
create table test (a boolean, b boolean);
insert into test (a) values ('true'),('true'),('true'),('true'),('true');
update test set b = rand.val from (select random() > 0.5 as val from test) as rand;
I expect column b to take on random true/false values, but for some reason it's always false. If I run the subselect of:
select random() > 0.5 as val from test;
It returns random true/false combinations as I want. But for some reason once I try to update the actual table it fails. I've tried several casting combinations but it doesn't seem to help.
What am I missing here?

How about:
update test set b = (random() > 0.5);

Related

Postgresql update column of numeric type with NULL value fails if all value of this column is NULL

I have a database table like this:
idx[PK]
a[numeric]
b[numeric]
1
1
1
2
2
2
3
3
3
4
4
4
...
...
...
In pgadmin4, I tried to update this table with some null values, and I noticed the following queries failed:
UPDATE test as t SET
a = e.a,b = e.b
FROM (VALUES (1,NULL,NULL),(2,NULL,NULL),(3,NULL,NULL))
AS e(idx, a, b)
WHERE t.idx = e.idx
UPDATE test as t SET
a = e.a,b = e.b
FROM (VALUES (1,NULL,1),(2,NULL,2),(3,NULL,NULL))
AS e(idx, a, b)
WHERE t.idx = e.idx
The error message is like this:
ERROR: column "a" is of type numeric but expression is of type text
LINE 2: a = e.a,b = e.b
^
HINT: You will need to rewrite or cast the expression.
SQL state: 42804
Character: 43
However, this will be successful:
UPDATE test as t SET
a = e.a,b = e.b
FROM (VALUES (1,NULL,1),(2,2,NULL),(3,NULL,NULL))
AS e(idx, a, b)
WHERE t.idx = e.idx
It seems like if the new values for one of the columns I am updating are all NULL, then the query fails. However, as long as there is at least one value is numeric but NOT NULL, the query would be successful. Why is this?
I did simplify my real world case here as my actual table has millions of rows and more than 10 columns. Using Python and psycopg2, when I tried to update 50,000 rows in one query, even though there is a value in a column is NOT NULL, the previous error could still show up. I guess that is because the system scans a certain number of rows to decide if the type is correct or not instead of all 50,000 rows.
Therefore, how to avoid this failure in my real world situation? Is there a better query to use instead of UPDATE?
Thank you very much!
UPDATE
Per comments from #Marth and #Gordon Linoff, and as I am using psycopg2, I did the following in my code:
from psycopg2.extras import execute_values
sql = """UPDATE test as t SET
a = (e.a::numeric),
b = (e.b::numeric)
FROM (VALUES %s)
AS e(idx, a, b)
WHERE t.idx = e.idx"""
execute_values(cursor, sql, data)
cursor is from the database connection. data is a list of tuples in the form (idx, a, b) of my values.
This is due to the default behavior of how NULL works in these situations. NULL is generally an unknown type, which is then treated as whatever type is necessary.
In a values() statement, Postgres tries to decipher the types. It treats the individual records as it would with a union. But if all are NULL . . . well, then there is no information. And Postgres decides on using text as the universal default.
It is also important to understand that this fails with the same error:
UPDATE test t
SET a = ''
WHERE t.id = 1;
The issue is that Postgres does not convert empty strings to numbers (unlike some other databases).
In any case, this is easily fixed by casting the NULL to an appropriate type:
UPDATE test t
SET a = e.a,b = e.b
FROM (VALUES (1, NULL::numeric, NULL::numeric),
(2, NULL, NULL),
(3, NULL, NULL)
) e(idx, a, b)
WHERE t.idx = e.idx ;
You can be explicit for all occurrences of NULL, but that is not necessary.
Here is a db<>fiddle that illustrates some of this.

Postgres WHERE clause with infinity

I have the following WHERE clause on a table x with column 1 as integer.
The SELECT is called in a function with parameters (a integer, b integer)
Call the function:
SELECT * FROM function(0, 10)
Script in funcion:
SELECT * FROM tablex x WHERE x.column1 between a and b
Now i miss the results where column1 is null, but in this case it is important to get these. Depends who calls the function. What should be parameter "a" to also get the null values. Or is there a way to disable the where clause depending on the paramter which are coming?
I found a nice solution for this:
SELECT * FROM tablex x WHERE coalesce(x.column1, 0) between a and b
Add logic to admit null column1 values:
SELECT *
FROM tablex x
WHERE x.column1 BETWEEN a AND b OR x.column1 IS NULL;
To make the NULLs behave as actual values, invert the logic.
[and add a comment to the code, because the negation might confuse future readers/maintainers]:
SELECT * FROM tablex x WHERE NOT x.column1 < a and NOT x.column1 >= b;
Note the above assumes you want NULLs in both a and b to be treated as -inf and +inf , respectively.

PostgreSQL - Create randomized percentages?

For testing purposes I need to create a JSONB object with n random percentages, so that they sum up to 1.0.
Obviously, the very simple
SELECT jsonb_object(array_agg(ARRAY['prefix_' || n::TEXT, round(random()::NUMERIC, 2)::TEXT])) AS e
FROM generate_series(1,8) AS n
;
will produce numbers that won't sum up to 1.0.
I expect the solution to be much more complicated, and I can't seem to put my head around this.
How to do this in PostgreSQL 10?
You just have to adjust the values.
You could base your generator on this query:
WITH r AS (
SELECT random()
FROM generate_series(1, 8)
)
SELECT random/sum(random) OVER ()
FROM r;

SQL invalid conversion return null instead of throwing error

I have a table with a varchar column, and I want to find values that match a certain number. So lets say that column contains the following entries (except with millions of rows in real life):
123456789012
2345678
3456
23 45
713?2
00123456789012
So I decide I want all the rows which are numerically 123456789012 write a statement that looks something like this:
SELECT * FROM MyTable WHERE CAST(MyColumn as bigint) = 123456789012
It should return the first and last row, but instead the whole query blows up because it can't convert the "23 45" and "713?2" to bigint.
Is there another way to do the conversion that will return NULL for values that can't convert?
SQL Server does NOT guarantee boolean operator short-circuit, see On SQL Server boolean operator short-circuit. So all solution using ISNUMERIC(...) AND CAST(...) are fundamentally flawed (they may work, but hey can arbitrarily fail later dependiong on the generated plan). A better solution is using CASE, as Thomas suggests: CASE ISNUMERIC(...) WHEN 1 THEN CAST(...) ELSE NULL END. But, as gbn pointed out, ISNUMERIC is notoriously finicky in identifying what 'numeric' means and many cases where one would expect it to return 0 it returns 1. So mixing the CASE with the LIKE:
CASE WHEN MyRow NOT LIKE '%[^0-9]%' THEN CAST(MyRow as bigint) ELSE NULL END
But the real problem is that if you have millions of rows and you have to search them like this, you'll always end up scanning end-to-end since the expression is not SARG-able (no matter how we rewrite it). The real issue here is data purity, and should be addressed at the appropriate level, where the data is populated. Another thing to consider is if is possible to create a persisted computed column with this expression and create a filtered index on it which eliminates NULL (ie. non-numeric). That would speed up things a little.
If you are using SQL Server 2012 you can use the 2 new methods:
TRY_CAST()
TRY_CONVERT()
Both methods are equivalent. They return a value cast to the specified data type if the cast succeeds; otherwise, returns null. The only difference is that CONVERT is SQL Server specific, CAST is ANSI. using CAST will make your code more portable (although not sure if any other database provider implements TRY_CAST)
ISNUMERIC will accept empty string and values like 1.23 or 5E-04 so could be unreliable.
And you don't know what order things will be evaluated in so it could still fail (SQL is declarative, not procedural, so the WHERE clause probably won't be evaluated left to right)
So:
you want to accept value that consist only of the characters 0-9
you need to materialise the "number" filter so it's applied before CAST
Something like:
SELECT
*
FROM
(
SELECT TOP 2000000000 *
FROM MyTable
WHERE MyColumn NOT LIKE '%[^0-9]%' --double negative rejects anything except 0-9
ORDER BY MyColumn
) foo
WHERE
CAST(MyColumn as bigint) = 123456789012 --applied after number check
Edit: quick example that fails.
CREATE TABLE #foo (bigintstring varchar(100))
INSERT #foo (bigintstring )VALUES ('1.23')
INSERT #foo (bigintstring )VALUES ('1 23')
INSERT #foo (bigintstring )VALUES ('123')
SELECT * FROM #foo
WHERE
ISNUMERIC(bigintstring) = 1
AND
CAST(bigintstring AS bigint) = 123
SELECT *
FROM MyTable
WHERE ISNUMERIC(MyRow) = 1
AND CAST(MyRow as float) = 123456789012
The ISNUMERIC() function should give you what you need.
SELECT * FROM MyTable
WHERE ISNUMERIC(MyRow) = 1
AND CAST(MyRow as bigint) = 123456789012
And to add a case statement like Thomas suggested:
SELECT * FROM MyTable
WHERE CASE(ISNUMERIC(MyRow)
WHEN 1 THEN CAST(MyRow as bigint)
ELSE NULL
END = 123456789012
http://msdn.microsoft.com/en-us/library/ms186272.aspx
SELECT *
FROM MyTable
WHERE (ISNUMERIC(MyColumn) = 1) AND (CAST(MyColumn as bigint) = 123456789012)
Additionally you can use a CASE statement in order to get null values.
SELECT
CASE
WHEN (ISNUMERIC(MyColumn) = 1) THEN CAST(MyColumn as bigint)
ELSE NULL
END AS 'MyColumnAsBigInt'
FROM tableName
If you require additional filtering, for numerics which are not valid to be cast to bigint, you can use the following instead of ISNUMERIC:
PATINDEX('%[^0-9]%',MyColumn)) = 0
If you need decimal values instead of integers, cast to float instead and change the regex to '%[^0-9.]%'

How to get min/max of two integers in Postgres/SQL?

How do I find the maximum (or minimum) of two integers in Postgres/SQL? One of the integers is not a column value.
I will give an example scenario:
I would like to subtract an integer from a column (in all rows), but the result should not be less than zero. So, to begin with, I have:
UPDATE my_table
SET my_column = my_column - 10;
But this can make some of the values negative. What I would like (in pseudo code) is:
UPDATE my_table
SET my_column = MAXIMUM(my_column - 10, 0);
Have a look at GREATEST and LEAST.
UPDATE my_table
SET my_column = GREATEST(my_column - 10, 0);
You want the inline sql case:
set my_column = case when my_column - 10 > 0 then my_column - 10 else 0 end
max() is an aggregate function and gets the maximum of a row of a result set.
Edit: oops, didn't know about greatest and least in postgres. Use that instead.