Postgresql update column of numeric type with NULL value fails if all value of this column is NULL - postgresql

I have a database table like this:
idx[PK]
a[numeric]
b[numeric]
1
1
1
2
2
2
3
3
3
4
4
4
...
...
...
In pgadmin4, I tried to update this table with some null values, and I noticed the following queries failed:
UPDATE test as t SET
a = e.a,b = e.b
FROM (VALUES (1,NULL,NULL),(2,NULL,NULL),(3,NULL,NULL))
AS e(idx, a, b)
WHERE t.idx = e.idx
UPDATE test as t SET
a = e.a,b = e.b
FROM (VALUES (1,NULL,1),(2,NULL,2),(3,NULL,NULL))
AS e(idx, a, b)
WHERE t.idx = e.idx
The error message is like this:
ERROR: column "a" is of type numeric but expression is of type text
LINE 2: a = e.a,b = e.b
^
HINT: You will need to rewrite or cast the expression.
SQL state: 42804
Character: 43
However, this will be successful:
UPDATE test as t SET
a = e.a,b = e.b
FROM (VALUES (1,NULL,1),(2,2,NULL),(3,NULL,NULL))
AS e(idx, a, b)
WHERE t.idx = e.idx
It seems like if the new values for one of the columns I am updating are all NULL, then the query fails. However, as long as there is at least one value is numeric but NOT NULL, the query would be successful. Why is this?
I did simplify my real world case here as my actual table has millions of rows and more than 10 columns. Using Python and psycopg2, when I tried to update 50,000 rows in one query, even though there is a value in a column is NOT NULL, the previous error could still show up. I guess that is because the system scans a certain number of rows to decide if the type is correct or not instead of all 50,000 rows.
Therefore, how to avoid this failure in my real world situation? Is there a better query to use instead of UPDATE?
Thank you very much!
UPDATE
Per comments from #Marth and #Gordon Linoff, and as I am using psycopg2, I did the following in my code:
from psycopg2.extras import execute_values
sql = """UPDATE test as t SET
a = (e.a::numeric),
b = (e.b::numeric)
FROM (VALUES %s)
AS e(idx, a, b)
WHERE t.idx = e.idx"""
execute_values(cursor, sql, data)
cursor is from the database connection. data is a list of tuples in the form (idx, a, b) of my values.

This is due to the default behavior of how NULL works in these situations. NULL is generally an unknown type, which is then treated as whatever type is necessary.
In a values() statement, Postgres tries to decipher the types. It treats the individual records as it would with a union. But if all are NULL . . . well, then there is no information. And Postgres decides on using text as the universal default.
It is also important to understand that this fails with the same error:
UPDATE test t
SET a = ''
WHERE t.id = 1;
The issue is that Postgres does not convert empty strings to numbers (unlike some other databases).
In any case, this is easily fixed by casting the NULL to an appropriate type:
UPDATE test t
SET a = e.a,b = e.b
FROM (VALUES (1, NULL::numeric, NULL::numeric),
(2, NULL, NULL),
(3, NULL, NULL)
) e(idx, a, b)
WHERE t.idx = e.idx ;
You can be explicit for all occurrences of NULL, but that is not necessary.
Here is a db<>fiddle that illustrates some of this.

Related

How retrieve CHAR and VARCHAR field definitions depending on Firebird database encoding

In Firebird (4.0), I get the field length with following query:
SELECT T.RDB$RELATION_NAME, RF.RDB$FIELD_NAME, RF.RDB$NULL_FLAG, F.RDB$FIELD_TYPE, F.RDB$FIELD_LENGTH
FROM RDB$RELATIONS T, RDB$RELATION_FIELDS RF, RDB$FIELDS F
WHERE T.RDB$VIEW_BLR IS NULL AND (T.RDB$SYSTEM_FLAG IS NULL OR T.RDB$SYSTEM_FLAG = 0)
AND (RF.RDB$SYSTEM_FLAG IS NULL OR RF.RDB$SYSTEM_FLAG = 0)
AND (T.RDB$RELATION_NAME = RF.RDB$RELATION_NAME)
AND (RF.RDB$FIELD_SOURCE = F.RDB$FIELD_NAME)
ORDER BY T.RDB$RELATION_NAME, RF.RDB$FIELD_NAME
F.RDB$FIELD_LENGTH is the allocated size, i.e. if I have a VARCHAR(256) with UTF8 encoding, then the corresponding F.RDB$FIELD_LENGTH value is 1024, but if this is default NONE encoded, the allocated value would be 256.
Is there a way to find the actual value X of VARCHAR(X) directly within the query itself, depending on the database encoding?
In my case, I only have two possible encodings, NONE or UTF8, and I can check it with following query:
SELECT A.RDB$CHARACTER_SET_NAME containing 'UTF8' FROM RDB$DATABASE A;
will return true and field lengths should be divided by 4, otherwise simply return the value itself.
Should I build a query based on an IF statement on the above queries? If so, what would it be? Or is there a better solution?
There are two ways:
Use RDB$FIELDS.RDB$CHARACTER_LENGTH, which contains the length in characters
Join RDB$CHARACTER_SETS and use RDB$CHARACTER_SETS.RDB$BYTES_PER_CHARACTER to calculate based on RDB$FIELDS.RDB$FIELD_LENGTH
An example including both:
SELECT T.RDB$RELATION_NAME, RF.RDB$FIELD_NAME, RF.RDB$NULL_FLAG, F.RDB$FIELD_TYPE,
F.RDB$FIELD_LENGTH,
F.RDB$CHARACTER_LENGTH,
F.RDB$FIELD_LENGTH / RCS.RDB$BYTES_PER_CHARACTER as calculated
FROM RDB$RELATIONS T
inner join RDB$RELATION_FIELDS RF
on RF.RDB$RELATION_NAME = T.RDB$RELATION_NAME
inner join RDB$FIELDS F
on F.RDB$FIELD_NAME = RF.RDB$FIELD_SOURCE
left join RDB$CHARACTER_SETS RCS
on RCS.RDB$CHARACTER_SET_ID = F.RDB$CHARACTER_SET_ID
WHERE T.RDB$VIEW_BLR IS NULL
AND (T.RDB$SYSTEM_FLAG IS NULL OR T.RDB$SYSTEM_FLAG = 0)
AND (RF.RDB$SYSTEM_FLAG IS NULL OR RF.RDB$SYSTEM_FLAG = 0)
ORDER BY T.RDB$RELATION_NAME, RF.RDB$FIELD_NAME
You should not rely on the default character set of the database for this, because that only tells you the character set of newly created columns without an explicit character set. Each column has a character set, which was either specified explicitly or derived from the default character set at the time the column was created.

How to update multiple rows by keeping some column values the same and updating others?

I am trying to bulk update some rows in postgres. Now not all of the rows need to update the same column values. For example, row 1 needs to update column 1 and 3 whereas row 2 needs to update column 2 and 4. so row 1 column 2 and 4 should not change and row 2 column 1 and 3 should not change.
I have tried using CASEs to conditionally SET the correct column values but it doesn't work with multiple rows. It DOES work if I only try to update 1 row at a time.
update topic as tmp set
"from" = (CASE WHEN tmp2."from2"::text = 'OLD_VALUE' THEN tmp."from"::int2 ELSE tmp2."from2"::int2 end),
"text_search" = (CASE WHEN tmp2."text_search2"::text = 'OLD_VALUE' THEN tmp."text_search"::text ELSE tmp2."text_search2"::text end),
"weight" = (CASE WHEN tmp2."weight2"::text = 'OLD_VALUE' THEN tmp."weight"::numeric ELSE tmp2."weight2"::numeric end)
from (values
(1051,1,'Electronic Devices',3),
(1052,'OLD_VALUE','OLD_VALUE',100)
) as tmp2("id2","from2","text_search2","weight2")
where tmp2."id2" = tmp."id"
This is the error message i get
SQL Error [22P02]: ERROR: invalid input syntax for type integer: "OLD_VALUE"
When I try with only 1 FROM value
from (values (1051,1,'Electronic Devices',3))
or
from (values (1052,'OLD_VALUE','OLD_VALUE',100))
it works correctly.
It even works correctly if the same columns need to be updated eg.
from (values
(1051,1,'Electronic Devices',3),
(1052,2,'Topic 2',100)
)
Why is it not working correctly when I need to update different columns for each row?
When you provide the list of values as values (1051,1,'Electronic Devices',3),(1052,'OLD_VALUE','OLD_VALUE',100)), the first set of values is interpreted as the "template" of data types, and in this case (1051,1,'Electronic Devices',3), it's int, int, text, int. Then, any subsequent values provided, will be expected to have the same data type signature. When it parses (1052,'OLD_VALUE','OLD_VALUE',100), it sees int,text,text,int, which doesn't match the data type signature it expects, so it reports an error.
When you omit the first value and provide only (1052,'OLD_VALUE','OLD_VALUE',100), then it identifies int,text,text,int as the "template" data type signature, and it proceeds without complaint.

Update with ISNULL and operation

original query looks like this :
UPDATE reponse_question_finale t1, reponse_question_finale t2 SET
t1.nb_question_repondu = (9-(ISNULL(t1.valeur_question_4)+ISNULL(t1.valeur_question_6)+ISNULL(t1.valeur_question_7)+ISNULL(t1.valeur_question_9))) WHERE t1.APPLICATION = t2.APPLICATION;
I know you cannot update 2 tables in a single query so i tried this :
UPDATE reponse_question_finale t1
SET nb_question_repondu = (9-(COALESCE(t1.valeur_question_4,'')::int+COALESCE(t1.valeur_question_6,'')::int+COALESCE(t1.valeur_question_7)::int+COALESCE(t1.valeur_question_9,'')::int))
WHERE t1.APPLICATION = t1.APPLICATION;
But this query gaves me an error : invalid input syntax for integer: ""
I saw that the Postgres equivalent to MySQL is COALESCE() so i think i'm on the good way here.
I also know you cannot add varchar to varchar so i tried to cast it to integer to do that. I'm not sure if i casted it correctly with parenthesis at the good place and regarding to error maybe i cannot cast to int with coalesce.
Last thing, i can certainly do a co-related sub-select to update my two tables but i'm a little lost at this point.
The output must be an integer matching the number of questions answered to a backup survey.
Any thoughts?
Thanks.
coalesce() returns the first non-null value from the list supplied. So, if the column value is null the expression COALESCE(t1.valeur_question_4,'') returns an empty string and that's why you get the error.
But it seems you want something completely different: you want check if the column is null (or empty) and then subtract a value if it is to count the number of non-null columns.
To return 1 if a value is not null or 0 if it isn't you can use:
(nullif(valeur_question_4, '') is null)::int
nullif returns null if the first value equals the second. The IS NULL condition returns a boolean (something that MySQL doesn't have) and that can be cast to an integer (where false will be cast to 0 and true to 1)
So the whole expression should be:
nb_question_repondu = 9 - (
(nullif(t1.valeur_question_4,'') is null)::int
+ (nullif(t1.valeur_question_6,'') is null)::int
+ (nullif(t1.valeur_question_7,'') is null)::int
+ (nullif(t1.valeur_question_9,'') is null)::int
)
Another option is to unpivot the columns and do a select on them in a sub-select:
update reponse_question_finale
set nb_question_repondu = (select count(*)
from (
values
(valeur_question_4),
(valeur_question_6),
(valeur_question_7),
(valeur_question_9)
) as t(q)
where nullif(trim(q),'') is not null);
Adding more columns to be considered is quite easy then, as you just need to add a single line to the values() clause

updating a table in tsql with multiple conditions

If I have a table MyTable with columns a,b and c, which are ints. Given that I want to update all 'a's based on the values of b and c.
Update MyTable set a = 2 where b = 1 and c = 1
It's far too late, and I cannot for the life of me see why this statement doesn't work, am I missing something silly?
Edit, woops, forgot the error.
"Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression."
Edit2: That was the exact query I was using (different column names). Turns out there was a trigger on the table which was broken. I feel a little silly now, but thanks for the help anyway :)
There's nothing wrong with the statement you posted. The error is elsewhere.
Could you have posted the wrong query? Or perhaps you over-simplified it? A subquery looks something like this:
UPDATE MyTable
SET a = 2
WHERE b = 1 AND c = (SELECT c FROM MyTable2 WHERE id = 5)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <--- subquery
An invalid query that could give the error message you get could look like this:
UPDATE MyTable
SET a = 2
WHERE b = 1 AND c = (SELECT c, d FROM MyTable2 WHERE id = 5)
The second query is invalid because it returns two values but the = operator only allows comparison to a single value.
The solution is to ensure that all subqueries used in equality comparisons only return a single row consisting of a single column.

SQL invalid conversion return null instead of throwing error

I have a table with a varchar column, and I want to find values that match a certain number. So lets say that column contains the following entries (except with millions of rows in real life):
123456789012
2345678
3456
23 45
713?2
00123456789012
So I decide I want all the rows which are numerically 123456789012 write a statement that looks something like this:
SELECT * FROM MyTable WHERE CAST(MyColumn as bigint) = 123456789012
It should return the first and last row, but instead the whole query blows up because it can't convert the "23 45" and "713?2" to bigint.
Is there another way to do the conversion that will return NULL for values that can't convert?
SQL Server does NOT guarantee boolean operator short-circuit, see On SQL Server boolean operator short-circuit. So all solution using ISNUMERIC(...) AND CAST(...) are fundamentally flawed (they may work, but hey can arbitrarily fail later dependiong on the generated plan). A better solution is using CASE, as Thomas suggests: CASE ISNUMERIC(...) WHEN 1 THEN CAST(...) ELSE NULL END. But, as gbn pointed out, ISNUMERIC is notoriously finicky in identifying what 'numeric' means and many cases where one would expect it to return 0 it returns 1. So mixing the CASE with the LIKE:
CASE WHEN MyRow NOT LIKE '%[^0-9]%' THEN CAST(MyRow as bigint) ELSE NULL END
But the real problem is that if you have millions of rows and you have to search them like this, you'll always end up scanning end-to-end since the expression is not SARG-able (no matter how we rewrite it). The real issue here is data purity, and should be addressed at the appropriate level, where the data is populated. Another thing to consider is if is possible to create a persisted computed column with this expression and create a filtered index on it which eliminates NULL (ie. non-numeric). That would speed up things a little.
If you are using SQL Server 2012 you can use the 2 new methods:
TRY_CAST()
TRY_CONVERT()
Both methods are equivalent. They return a value cast to the specified data type if the cast succeeds; otherwise, returns null. The only difference is that CONVERT is SQL Server specific, CAST is ANSI. using CAST will make your code more portable (although not sure if any other database provider implements TRY_CAST)
ISNUMERIC will accept empty string and values like 1.23 or 5E-04 so could be unreliable.
And you don't know what order things will be evaluated in so it could still fail (SQL is declarative, not procedural, so the WHERE clause probably won't be evaluated left to right)
So:
you want to accept value that consist only of the characters 0-9
you need to materialise the "number" filter so it's applied before CAST
Something like:
SELECT
*
FROM
(
SELECT TOP 2000000000 *
FROM MyTable
WHERE MyColumn NOT LIKE '%[^0-9]%' --double negative rejects anything except 0-9
ORDER BY MyColumn
) foo
WHERE
CAST(MyColumn as bigint) = 123456789012 --applied after number check
Edit: quick example that fails.
CREATE TABLE #foo (bigintstring varchar(100))
INSERT #foo (bigintstring )VALUES ('1.23')
INSERT #foo (bigintstring )VALUES ('1 23')
INSERT #foo (bigintstring )VALUES ('123')
SELECT * FROM #foo
WHERE
ISNUMERIC(bigintstring) = 1
AND
CAST(bigintstring AS bigint) = 123
SELECT *
FROM MyTable
WHERE ISNUMERIC(MyRow) = 1
AND CAST(MyRow as float) = 123456789012
The ISNUMERIC() function should give you what you need.
SELECT * FROM MyTable
WHERE ISNUMERIC(MyRow) = 1
AND CAST(MyRow as bigint) = 123456789012
And to add a case statement like Thomas suggested:
SELECT * FROM MyTable
WHERE CASE(ISNUMERIC(MyRow)
WHEN 1 THEN CAST(MyRow as bigint)
ELSE NULL
END = 123456789012
http://msdn.microsoft.com/en-us/library/ms186272.aspx
SELECT *
FROM MyTable
WHERE (ISNUMERIC(MyColumn) = 1) AND (CAST(MyColumn as bigint) = 123456789012)
Additionally you can use a CASE statement in order to get null values.
SELECT
CASE
WHEN (ISNUMERIC(MyColumn) = 1) THEN CAST(MyColumn as bigint)
ELSE NULL
END AS 'MyColumnAsBigInt'
FROM tableName
If you require additional filtering, for numerics which are not valid to be cast to bigint, you can use the following instead of ISNUMERIC:
PATINDEX('%[^0-9]%',MyColumn)) = 0
If you need decimal values instead of integers, cast to float instead and change the regex to '%[^0-9.]%'