How to update multiple rows by keeping some column values the same and updating others? - postgresql

I am trying to bulk update some rows in postgres. Now not all of the rows need to update the same column values. For example, row 1 needs to update column 1 and 3 whereas row 2 needs to update column 2 and 4. so row 1 column 2 and 4 should not change and row 2 column 1 and 3 should not change.
I have tried using CASEs to conditionally SET the correct column values but it doesn't work with multiple rows. It DOES work if I only try to update 1 row at a time.
update topic as tmp set
"from" = (CASE WHEN tmp2."from2"::text = 'OLD_VALUE' THEN tmp."from"::int2 ELSE tmp2."from2"::int2 end),
"text_search" = (CASE WHEN tmp2."text_search2"::text = 'OLD_VALUE' THEN tmp."text_search"::text ELSE tmp2."text_search2"::text end),
"weight" = (CASE WHEN tmp2."weight2"::text = 'OLD_VALUE' THEN tmp."weight"::numeric ELSE tmp2."weight2"::numeric end)
from (values
(1051,1,'Electronic Devices',3),
(1052,'OLD_VALUE','OLD_VALUE',100)
) as tmp2("id2","from2","text_search2","weight2")
where tmp2."id2" = tmp."id"
This is the error message i get
SQL Error [22P02]: ERROR: invalid input syntax for type integer: "OLD_VALUE"
When I try with only 1 FROM value
from (values (1051,1,'Electronic Devices',3))
or
from (values (1052,'OLD_VALUE','OLD_VALUE',100))
it works correctly.
It even works correctly if the same columns need to be updated eg.
from (values
(1051,1,'Electronic Devices',3),
(1052,2,'Topic 2',100)
)
Why is it not working correctly when I need to update different columns for each row?

When you provide the list of values as values (1051,1,'Electronic Devices',3),(1052,'OLD_VALUE','OLD_VALUE',100)), the first set of values is interpreted as the "template" of data types, and in this case (1051,1,'Electronic Devices',3), it's int, int, text, int. Then, any subsequent values provided, will be expected to have the same data type signature. When it parses (1052,'OLD_VALUE','OLD_VALUE',100), it sees int,text,text,int, which doesn't match the data type signature it expects, so it reports an error.
When you omit the first value and provide only (1052,'OLD_VALUE','OLD_VALUE',100), then it identifies int,text,text,int as the "template" data type signature, and it proceeds without complaint.

Related

Divide a value in JSON using postgreSQL

Im relatively new and would like to redenominate some values in my current database. This means going into my jasonb column in my database, selecting a key value and dividing it by a 1000. I know how to select values but update after I have performed a calculation has failed me. My table name is property_calculation and has two columns as follows: * dynamic_fields is my jasonb column
ID
dynamic_fields
1
{"totalBaseValue": 4198571.230720645844841865113039602874778211116790771484375,"surfaceAreaValue": 18.108285497586717127660449477843940258026123046875,"assessedAnnualValue": 1801819.534798908603936834409607936611000776616631213755681528709828853607177734375}
2
{"totalBaseValue": 7406547.28939837918763178237213651300407946109771728515625,"surfaceAreaValue": 31.94416993248973568597648409195244312286376953125,"assessedAnnualValue": 9121964.022681592442116216621222042691512210677018401838722638785839080810546875}
I would like to update the dynamic_fields.totalBaseValue by dividing it by 1000 and committing it back as the new value. I have tried the following with no success:
update property_calculation
set dynamic_fields = (
select jsonb_agg(case
when jsonb_typeof(elem -> 'totalBaseValue') = 'number'
then jsonb_set(elem, array['totalBaseValue'], to_jsonb((elem ->> 'totalBaseValue')::numeric / 1000))
else elem
end)
from jsonb_array_elements(dynamic_fields::jsonb) elem)::json;
I get the following error:
ERROR: cannot extract elements from an object
SQL state: 22023
My json column has no zero string or null values.
Move the jsonb_typeof() check into the where clause:
update property_calculation
set dynamic_fields =
jsonb_set(
dynamic_fields,
'{totalBaseValue}',
to_jsonb((dynamic_fields->>'totalBaseValue')::numeric / 1000)
)
where jsonb_typeof(dynamic_fields->'totalBaseValue') = 'number';
db<>fiddle here

Postgresql update column of numeric type with NULL value fails if all value of this column is NULL

I have a database table like this:
idx[PK]
a[numeric]
b[numeric]
1
1
1
2
2
2
3
3
3
4
4
4
...
...
...
In pgadmin4, I tried to update this table with some null values, and I noticed the following queries failed:
UPDATE test as t SET
a = e.a,b = e.b
FROM (VALUES (1,NULL,NULL),(2,NULL,NULL),(3,NULL,NULL))
AS e(idx, a, b)
WHERE t.idx = e.idx
UPDATE test as t SET
a = e.a,b = e.b
FROM (VALUES (1,NULL,1),(2,NULL,2),(3,NULL,NULL))
AS e(idx, a, b)
WHERE t.idx = e.idx
The error message is like this:
ERROR: column "a" is of type numeric but expression is of type text
LINE 2: a = e.a,b = e.b
^
HINT: You will need to rewrite or cast the expression.
SQL state: 42804
Character: 43
However, this will be successful:
UPDATE test as t SET
a = e.a,b = e.b
FROM (VALUES (1,NULL,1),(2,2,NULL),(3,NULL,NULL))
AS e(idx, a, b)
WHERE t.idx = e.idx
It seems like if the new values for one of the columns I am updating are all NULL, then the query fails. However, as long as there is at least one value is numeric but NOT NULL, the query would be successful. Why is this?
I did simplify my real world case here as my actual table has millions of rows and more than 10 columns. Using Python and psycopg2, when I tried to update 50,000 rows in one query, even though there is a value in a column is NOT NULL, the previous error could still show up. I guess that is because the system scans a certain number of rows to decide if the type is correct or not instead of all 50,000 rows.
Therefore, how to avoid this failure in my real world situation? Is there a better query to use instead of UPDATE?
Thank you very much!
UPDATE
Per comments from #Marth and #Gordon Linoff, and as I am using psycopg2, I did the following in my code:
from psycopg2.extras import execute_values
sql = """UPDATE test as t SET
a = (e.a::numeric),
b = (e.b::numeric)
FROM (VALUES %s)
AS e(idx, a, b)
WHERE t.idx = e.idx"""
execute_values(cursor, sql, data)
cursor is from the database connection. data is a list of tuples in the form (idx, a, b) of my values.
This is due to the default behavior of how NULL works in these situations. NULL is generally an unknown type, which is then treated as whatever type is necessary.
In a values() statement, Postgres tries to decipher the types. It treats the individual records as it would with a union. But if all are NULL . . . well, then there is no information. And Postgres decides on using text as the universal default.
It is also important to understand that this fails with the same error:
UPDATE test t
SET a = ''
WHERE t.id = 1;
The issue is that Postgres does not convert empty strings to numbers (unlike some other databases).
In any case, this is easily fixed by casting the NULL to an appropriate type:
UPDATE test t
SET a = e.a,b = e.b
FROM (VALUES (1, NULL::numeric, NULL::numeric),
(2, NULL, NULL),
(3, NULL, NULL)
) e(idx, a, b)
WHERE t.idx = e.idx ;
You can be explicit for all occurrences of NULL, but that is not necessary.
Here is a db<>fiddle that illustrates some of this.

Why am I getting an error that I cannot concat two different datatypes even after casting the fields datatype

I have a query in postgresql where I want to append a minus sign to the transactions.amount field when the transaction.type = 2 (which refers to withdrawals). I am trying to concat a minus sign and the transactions.amount field which is an int. I casted the transactions.amount field to a text/varchar but no matter what I still get the error, "PostgreSql Error: case types numeric and text cannot be matched"
Here is the query I am running,
SELECT CAST(CASE WHEN "IsVoided" IS TRUE THEN 0
WHEN "Transactions"."TransactionType" = 2
THEN CONCAT('-', CAST("Transactions"."Amount" AS TEXT))
ELSE "Transactions"."Amount" END AS Text) AS "TransAmount"
FROM "Transactions"
LEFT JOIN "DepositSources"
ON "Transactions"."DepositSourceId" =
"DepositSources"."DepositSourceId"
LEFT JOIN "WithdrawalSources"
ON "Transactions"."WithdrawalSourceId" =
"DepositSources"."DepositSourceId"
WHERE "Transactions"."FundId" = 4
AND "Transactions"."ReconciliationId" = 24
What's very perplexing is when i run the below query it works as expected,
SELECT CONCAT('-', CAST("Transactions"."Amount" AS TEXT)) FROM
"Transactions"
All branches of a CASE expression need to have the same type. In this case, you're stuck with making all branches text, because what follows THEN can only be text. Try this version:
CASE WHEN IsVoided IS TRUE
THEN '0'
WHEN Transactions.TransactionType = 2
THEN CONCAT('-', Transactions.Amount::text)
ELSE Transactions.Amount::text END AS TransAmount
Note that it is unusual to be using the logic you have in a CASE expression. Typically, you would just be checking the values of a single column, not multiple different columns.
Edit:
It appears that your call to CONCAT mainly serves to negative a value. Here is one more simple way to do this:
CASE WHEN IsVoided IS TRUE
THEN 0
WHEN Transactions.TransactionType = 2
THEN -1.0 * Transactions.Amount
ELSE Transactions.Amount END AS TransAmount
In this case, we can make the CASE expression just generate numeric output, which might be really what you are after.

Update with ISNULL and operation

original query looks like this :
UPDATE reponse_question_finale t1, reponse_question_finale t2 SET
t1.nb_question_repondu = (9-(ISNULL(t1.valeur_question_4)+ISNULL(t1.valeur_question_6)+ISNULL(t1.valeur_question_7)+ISNULL(t1.valeur_question_9))) WHERE t1.APPLICATION = t2.APPLICATION;
I know you cannot update 2 tables in a single query so i tried this :
UPDATE reponse_question_finale t1
SET nb_question_repondu = (9-(COALESCE(t1.valeur_question_4,'')::int+COALESCE(t1.valeur_question_6,'')::int+COALESCE(t1.valeur_question_7)::int+COALESCE(t1.valeur_question_9,'')::int))
WHERE t1.APPLICATION = t1.APPLICATION;
But this query gaves me an error : invalid input syntax for integer: ""
I saw that the Postgres equivalent to MySQL is COALESCE() so i think i'm on the good way here.
I also know you cannot add varchar to varchar so i tried to cast it to integer to do that. I'm not sure if i casted it correctly with parenthesis at the good place and regarding to error maybe i cannot cast to int with coalesce.
Last thing, i can certainly do a co-related sub-select to update my two tables but i'm a little lost at this point.
The output must be an integer matching the number of questions answered to a backup survey.
Any thoughts?
Thanks.
coalesce() returns the first non-null value from the list supplied. So, if the column value is null the expression COALESCE(t1.valeur_question_4,'') returns an empty string and that's why you get the error.
But it seems you want something completely different: you want check if the column is null (or empty) and then subtract a value if it is to count the number of non-null columns.
To return 1 if a value is not null or 0 if it isn't you can use:
(nullif(valeur_question_4, '') is null)::int
nullif returns null if the first value equals the second. The IS NULL condition returns a boolean (something that MySQL doesn't have) and that can be cast to an integer (where false will be cast to 0 and true to 1)
So the whole expression should be:
nb_question_repondu = 9 - (
(nullif(t1.valeur_question_4,'') is null)::int
+ (nullif(t1.valeur_question_6,'') is null)::int
+ (nullif(t1.valeur_question_7,'') is null)::int
+ (nullif(t1.valeur_question_9,'') is null)::int
)
Another option is to unpivot the columns and do a select on them in a sub-select:
update reponse_question_finale
set nb_question_repondu = (select count(*)
from (
values
(valeur_question_4),
(valeur_question_6),
(valeur_question_7),
(valeur_question_9)
) as t(q)
where nullif(trim(q),'') is not null);
Adding more columns to be considered is quite easy then, as you just need to add a single line to the values() clause

filtering on a range of values in a db column with sqlalchemy orm

I have a postgresql database and in one particular table, with many rows. One column in this table, called data, is a float array, REAL[], and gets filled with an array of ~4500 elements. I want to access this table through some query via SQLAlchemy and the ORM.
How do I select all rows in the table where a subset of this column satisfies some condition, e.g.contains a range of values? Like I want to select all rows where the data contains values >= 10, or values between >=10 and <=20.
Can I do this with a straight session query like
rows = session.query(Table).filter(Table.data.(some conditional)).all()
where my conditional is something like "VALUES >= 10 and VALUES <= 20"?
Or do I need to define some special methods, or setup, when I'm defining my SQLAlchemy table class. For example, I have my table set up as
class Table(Base):
__tablename__ = 'table'
__table_args__ = {'autoload' : True, 'schema' : 'testdb', 'extend_existing':True}
data = deferred(Column(ARRAY(Float)))
def __repr__(self):
return '<Table (pk={0})>'.format(self.pk)
Ideally I'd like to set it up so I can just do simple filtering in my session.query calls. Is this possible? I'm not super familiar with the ORM, so maybe it is?
I've had a look at the ARRAY Comparator sqlalchemy docs but those only seem to work on exact values. My data is precise to 6 sigfigs, and I don't know the exact values ahead of time.
What's the best way to do this? Thanks.
EDIT:
Based on the below comment, here is the code I'm using in attempting to select all rows (out of 1000) that have data (from 1 column) >= 1.0. There should be 537 rows.
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le)).all()
This gives the correct subset number. len(rows) = 537. However, I don't understand the logic of with this operator, where to select data >=1.0 , I use the le operator? Also, along those same lines, there should be 234 rows that have data between the values >=1.0 and <1.0, but this statement fails to give the correct subset..
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le)).filter(datadb.Table.data.any(1.2,operator=operators.ge)).all()
* EDIT 2 *
Here's an example of my database Table with a few rows. pk is an integer, and data is a real[].
db datadb
schema Table
pk data
0 [0.0,0.0,0.5,0.3,1.3,1.9,0.3,0.0,0.0]
1 [0.1,0.0,1.0,0.7,1.1,1.5,1.2,0.3,1.4]
2 [0.0,0.6,0.4,0.3,1.6,1.7,0.4,1.3,0.0]
3 [0.0,0.1,0.2,0.4,1.0,1.1,1.2,0.9,0.0]
4 [0.0,0.0,0.5,0.3,0.2,0.1,0.7,0.3,0.1]
I have 5 rows, 4 of them have data with values >= 1.0, while just 2 have values in the range >= 1.0 and <= 1.2. The query I would do to grab the rows is in the first case
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le)).all()
This should return the 4 rows, at pk=0,1,2,3. This query does what I expect. The second case
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le)).filter(datadb.Table.data.any(1.2,operator=operators.ge)).all()
and should return the 2 rows at pk=1,3. However this query just returns the 4 rows from the first query. For the second query, I also tried
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le),datadb.Table.data.any(1.2,operator=operators.ge)).all()
which also didn't work.
Please read documentation on ARRAY.Comparator, according to which you should be able to do the following:
rows = (session.query(Table)
.filter(Table.data.any(10, operator=operators.le))
.filter(Table.data.any(20, operator=operators.ge)
).all()
EDIT:
# combined filter does not work,
# but applying one or the other is still useful as it reduces the result set
q = (session.query(MyTable)
.filter(MyTable.data.any(1.0, operator=operators.le))
# .filter(MyTable.data.any(1.2, operator=operators.ge))
)
# filter in memory
items = [_row for _row in q.all()
if any(1.0 <= item <= 1.2 for item in _row.data)]
for item in items:
print(item)