How can I either insert or update multiple rows with different values using Ecto with Postgres?
If I have a schema/struct: %Counter{key: String.t(), count: integer()}
How can I insert or update multiple entries? If the record does not exist I want to insert it, but if it does exist I want to increment the value.
[
%{key: "questions:asked", count: 1},
%{key: "questions:answered", count: 1},
%{key: "ads:viewed", count: 3}
]
Ecto.Repo.insert_all with :on_replace looks like it should work, but I want unique values for each row.
You can use Ecto.Repo.insert_all, but you must provide a query and take advantage of Postgresql's excluded table available in the conflict action.
upsert_query =
Counter
|> where([o], o.key == fragment("EXCLUDED.key"))
|> update(inc: [count: fragment("EXCLUDED.count")])
Repo.insert_all(Counter, records,
on_conflict: upsert_query,
conflict_target: [:key],
returning: false
)
The excluded values are the values that you passed in represented as a temporary table only available in the on conflict.
It should be noted that this can be used with set instead of inc if you wish to set a particular value.
Is there a better solution?
Related
I have a composite type:
CREATE TYPE mydata_t AS
(
user_id integer,
value character(4)
);
Also, I have a table, uses this composite type as an array of mydata_t.
CREATE TABLE tbl
(
id serial NOT NULL,
data_list mydata_t[],
PRIMARY KEY (id)
);
Here I want to update the mydata_t in data_list, where mydata_t.user_id is 100000
But I don't know which array element's user_id is equal to 100000
So I have to make a search first to find the element where its user_id is equal to 100000 ... that's my problem ... I don't know how to make the query .... in fact, I want to update the value of the array element, where it's user_id is equal to 100000 (Also where the id of tbl is for example 1) ... What will be my query?
Something like this (I know it's wrong !!!)
UPDATE "tbl" SET "data_list"[i]."value"='YYYY'
WHERE "id"=1 AND EXISTS (SELECT ROW_NUMBER() OVER() AS i
FROM unnest("data_list") "d" WHERE "d"."user_id"=10000 LIMIT 1)
For example, this is my tbl data:
Row1 => id = 1, data = ARRAY[ROW(5,'YYYY'),ROW(6,'YYYY')]
Row2 => id = 2, data = ARRAY[ROW(10,'YYYY'),ROW(11,'YYYY')]
Now i want to update tbl where id is 2 and set the value of one of the tbl.data elements to 'XXXX' where the user_id of element is equal to 11
In fact, the final result of Row2 will be this:
Row2 => id = 2, data = ARRAY[ROW(10,'YYYY'),ROW(11,'XXXX')]
If you know the value value, you can use the array_replace() function to make the change:
UPDATE tbl
SET data_list = array_replace(data_list, (11, 'YYYY')::mydata_t, (11, 'XXXX')::mydata_t)
WHERE id = 2
If you do not know the value value then the situation becomes more complex:
UPDATE tbl SET data_list = data_arr
FROM (
-- UPDATE doesn't allow aggregate functions so aggregate here
SELECT array_agg(new_data) AS data_arr
FROM (
-- For the id value, get the data_list values that are NOT modified
SELECT (user_id, value)::mydata_t AS new_data
FROM tbl, unnest(data_list)
WHERE id = 2 AND user_id != 11
UNION
-- Add the values to update
VALUES ((11, 'XXXX')::mydata_t)
) x
) y
WHERE id = 2
You should keep in mind, though, that there is an awful lot of work going on in the background that cannot be optimised. The array of mydata_t values has to be examined from start to finish and you cannot use an index on this. Furthermore, updates actually insert a new row in the underlying file on disk and if your array has more than a few entries this will involve substantial work. This gets even more problematic when your arrays are larger than the pagesize of your PostgreSQL server, typically 8kB. All behind the scene so it will work, but at a performance penalty. Even though array_replace sounds like changes are made in-place (and they indeed are in memory), the UPDATE command will write a completely new tuple to disk. So if you have 4,000 array elements that means that at least 40kB of data will have to be read (8 bytes for the mydata_t type on a typical system x 4,000 = 32kB in a TOAST file, plus the main page of the table, 8kB) and then written to disk after the update. A real performance killer.
As #klin pointed out, this design may be more trouble than it is worth. Should you make data_list as table (as I would do), the update query becomes:
UPDATE data_list SET value = 'XXXX'
WHERE id = 2 AND user_id = 11
This will have MUCH better performance, especially if you add the appropriate indexes. You could then still create a view to publish the data in an aggregated form with a custom type if your business logic so requires.
Using PostgreSQL 14.0 PL/SQL (inside a do block). Attempting to:
query a certain key ('county') in a jsonb array of objects (which in turn has object + nested arrays) based on dynamic variable value (named cty.cty_name)
retrieve value and change it
update said jsonb to reflect the updated value in (2)
after executing (3) on multiple values, create new table with above jsonb as a row with one column
steps (1) and (2) execute properly. But, for the life of me, I can't figure (3) out.
jsonb object(res) -- may have 100s of array items at index root:
[ {"county": "x", "dpa": ["a", "b", "c"]},
{"county": "y", "dpa": ["d", "e", "f"]},
{"county": "z", "dpa": ["h", "i", "j"]},
...
]
code for (1) and (2) above:
execute format('select jsonb_path_query_array(''%s'', ''$[*]?(#.%s=="%s")'')',
res,'county',cty.cty_name) into s1;
execute format('select jsonb_array_elements(''%s'')->''%s''', s1,'dpa') into s2;
s2 := s2 || jsonb_build_array(r1.name);
where say:
cty.cty_name is y (which is created from a select in for loop)
r1.name is m
s2 holds the new value e.g. ["d", "e", "f", "m"]
Now, to execute (3) I need path to dpa for which key county matches value y in some index in res. Having tried (and failed miserably) at various permutations of jsonb_query_path with SQL/JSON Path, dollar-quoted strings, jsonb_path_to_array with double-quoted hell for format queries, other SO solutions which use idx or idx-1 (but I don't have JSON in table), I had to resort to soliciting the Borg collective's wisdom. Help please.
The problem you're running into with the current approach is twofold:
there's no way to "delete" the matching county object in vitro (via jsonb_set(), etc.)
there's no way to force uniqueness (to utilize the ON CONFLICT ... DO UPDATE mechanism) within the json document itself to accomplish the same
When we get to
Now, to execute (3) I need path to dpa for which key county matches value y in some index in res.
instead of updating the existing record in-place, why not just remove the matching record (with now-stale value for "dpa"), re-aggregate what remains (i.e. the non-matching objects), and then append the updated matching object to the aggregated jsonb array, similar to:
SELECT jsonb_agg(a) || jsonb_build_object('county', 'y', 'dpa', (jsonb_path_query_array(res, '$[*] ? (#.county=="y")')#>'{0,dpa}') || jsonb_build_array('m') )
FROM jsonb_array_elements(res) a
WHERE NOT a #> (jsonb_path_query_array(res, '$[*] ? (#.county=="y")')->0);
This gives a single jsonb value back per your specification in (4); you should be able to parameterize this into your EXECUTE invocation as necessary.
Worth noting on the output order, if you're looping over the initial "res" array, then the order of the objects within the array (with respect to the "county" values of the driving cursor) will be restored according to the order of the cursor you're iterating through for "county".
This is because a full cycle through said cursor will remove each of the old objects and replace them at the end of the resultant array, so defining an ORDER BY clause in this cursor will be important if this is relevant.
I am using jsonb type in a column in postgresql11. And I'd like to update one field in the json data and I see there is a function jsonb_set which can be used. (https://www.postgresql.org/docs/current/functions-json.html).
However, based on the document,
jsonb_set ( target jsonb, path text[], new_value jsonb [, create_if_missing boolean ] ) → jsonb
Returns target with the item designated by path replaced by new_value, or with new_value added
if create_if_missing is true (which is the default) and the item designated by path does not
exist. All earlier steps in the path must exist, or the target is returned unchanged. As with
the path oriented operators, negative integers that appear in the path count from the end of
JSON arrays. If the last path step is an array index that is out of range, and create_if_missing
is true, the new value is added at the beginning of the array if the index is negative, or at
the end of the array if it is positive.
The first argument is target. What does target mean here? Do I need to do a query to get existing value and put it as target?
I have tried below update statement:
my current data is:
# select "taxes" from "Sites" where "id" = '6daa9b5d-d5b2-4b0d-a8ee-5ad2cb141594';
taxes
--------------------------------------------------------------------------------------------------------------
{"feePercent": 0, "percent": 0}
And I tried below update:
# update "Sites" set "feePercent" = jsonb_set('{"feePercent": 0, "percent": 0}', '{feePercent}', 1) where "siteUuid"='6daa9b5d-d5b2-4b0d-a8ee-5ad2cb141594';
but I got below error:
ERROR: function jsonb_set(unknown, unknown, integer) does not exist
LINE 1: update "Sites" set "feePercent" = jsonb_set('{"feePerce...
jsonb_set() modifies a specific JSON object. So, your target is the JSON object (or JSON column) which you want to modify.
jsonb_set(my_jsonb_to_be_modified, ...)
So, if you had this JSON object;
{"my":"old", "json":"object"}
With the function you can turn it into:
{"my":"new", "json":"object"}
The code is:
demo:db<>fiddle
SELECT jsonb_set('{"my":"old", "json":"object"}', '{my}', '"new"')
The target is the original JSON object, the path points to the element you want to modify, and new_value is the new value for the element you specified in the path.
In that case my had the value old, which turns into new now.
From PostgreSQL v14 on, you can use subscripts to make this UPDATE statement look natural:
UPDATE "Sites"
SET taxes['feePercent'] = to_jsonb(1)
WHERE id = '6daa9b5d-d5b2-4b0d-a8ee-5ad2cb141594';
For earlier versions, you will have to use jsonb_set like this:
UPDATE "Sites"
SET taxes = jsonb_set(taxes, ARRAY['feePercent'], to_jsonb(1))
WHERE id = '6daa9b5d-d5b2-4b0d-a8ee-5ad2cb141594';
The effect is the same: the whole JSON is read, a new JSON is created and stored in a new version of the row.
All this becomes much simpler if you don't use JSON, but regular table columns.
Suppose I want to do a bulk update, setting a=b for a collection of a values. This can easily be done with a sequence of UPDATE queries:
UPDATE foo SET value='foo' WHERE id=1
UPDATE foo SET value='bar' WHERE id=2
UPDATE foo SET value='baz' WHERE id=3
But now I suppose I want to do this in bulk. I have a two dimensional array containing the ids and new values:
[ [ 1, 'foo' ]
[ 2, 'bar' ]
[ 3, 'baz' ] ]
Is there an efficient way to do these three UPDATEs in a single SQL query?
Some solutions I have considered:
A temporary table
CREATE TABLE temp ...;
INSERT INTO temp (id,value) VALUES (....);
UPDATE foo USING temp ...
But this really just moves the problem. Although it may be easier (or at least less ugly) to do a bulk INSERT, there are still a minimum of three queries.
Denormalize the input by passing the data pairs as SQL arrays. This makes the query incredibly ugly, though
UPDATE foo
USING (
SELECT
split_part(x,',',1)::INT AS id,
split_part(x,',',2)::VARCHAR AS value
FROM (
SELECT UNNEST(ARRAY['1,foo','2,bar','3,baz']) AS x
) AS x;
)
SET value=x.value WHERE id=x.id
This makes it possible to use a single query, but makes that query ugly, and inefficient (especially for mixed and/or complex data types).
Is there a better solution? Or should I resort to multiple UPDATE queries?
Normally you want to batch-update from a table with sufficient index to make the merge easy:
CREATE TEMP TABLE updates_table
( id integer not null primary key
, val varchar
);
INSERT into updates_table(id, val) VALUES
( 1, 'foo' ) ,( 2, 'bar' ) ,( 3, 'baz' )
;
UPDATE target_table t
SET value = u.val
FROM updates_table u
WHERE t.id = u.id
;
So you should probably populate your update_table by something like:
INSERT into updates_table(id, val)
SELECT
split_part(x,',',1)::INT AS id,
split_part(x,',',2)::VARCHAR AS value
FROM (
SELECT UNNEST(ARRAY['1,foo','2,bar','3,baz'])
) AS x
;
Remember: an index (or the primary key) on the id field in the updates_table is important. (but for small sets like this one, a hashjoin will probably by chosen by the optimiser)
In addition: for updates, it is important to avoid updates with the same value, these cause extra rowversions to be created + plus the resulting VACUUM activity after the update was committed:
UPDATE target_table t
SET value = u.val
FROM updates_table u
WHERE t.id = u.id
AND (t.value IS NULL OR t.value <> u.value)
;
You can use CASE conditional expression:
UPDATE foo
SET "value" = CASE id
WHEN 1 THEN 'foo'
WHEN 2 THEN 'bar'
WHEN 3 THEN 'baz'
END
I have the following document schema:
{
date: dateValue
items:
[
{ name: 'a', counter: 4},
{ name: 'b', counter: 17},
{ name: 'aabbb', counter: 15},
...
]
}
I would like to have an update query with upsert that creates the entire record if the record does not exist.
In addition, i want to check if a certain item exists on the list (by it's name),
if the item doesn't exist i want to add a new one to the list with counter = 1.
If the item exists raise the counter by 1.
Is there any way to do this query in with one update statment ?
You need to do two things:
Use the {upsert:1} flag on update to insert particular date document if it doesn't already exist.
Use {$inc} operator to increment your item values. It turns out that if you increment a field that doesn't exist by 1 it will be created with value 1 (it's as if it existed with value 0).
You may not be able to get the above accomplished with the schema you currently have. In order to increment a counter it has to be the name - i.e. "a":1, "b":17, etc. You currently have it as key:"name", counter:"value" which means you can only update them with positional operator. But positional operator requires that you match an element in order to successfully update it, so there goes the strategy to use $inc
So it looks like if you want to do this in a single update statement you would need to change your schema - only you can decide if that's the way to go since it may affect how your other reads and writes interact with the data.