How do I efficiently update multiple rows with particular values for a_id 84, and then all other rows set to 0 for that same a_id?
products
p_id a_id best
111 81 99
222 81 99
666 82 99
222 83 99
111 84 99
222 84 99
333 84 99
111 85 99
222 85 99
Right now I'm doing this:
SQL Fiddle
update products as u set
best = u2.best
from (values
(111, 84, 1),
(222, 84, 2)
) as u2(p_id, a_id, best)
where u2.p_id = u.p_id AND u2.a_id = u.a_id
RETURNING u2.p_id, u2.a_id, u2.best
But this only updates the rows within values as expected. How do I also update rows not in values to be 0 with a_id = 84?
Meaning the p_id of 333 should have best = 0. I could explicitly include every single p_id but the table is huge.
The values set into best will always be in order from 1 to n, defined by the order of values.
The products table has 1 million rows
Assuming (p_id, a_id) is the PK - or at least UNIQUE and NOT NULL, this is one way:
UPDATE products AS u
SET best = COALESCE(u2.best, 0)
FROM products AS p
LEFT JOIN ( VALUES
(111, 84, 1),
(222, 84, 2)
) AS u2(p_id, a_id, best) USING (p_id, a_id)
WHERE u.a_id = 84
AND u.a_id = p.p_id
AND u.p_id = p.p_id
RETURNING u2.p_id, u2.a_id, u2.best;
The difficulty is that the FROM list of an UPDATE is the equivalent of an INNER JOIN, while you need an OUTER JOIN. This workaround adds the table products to the FROM list (which is normally redundant), to act as left table for the LEFT OUTER JOIN. Then the INNER JOIN from products to products works.
To restrict to a_id = 84 additionally, add another WHERE clause saying so. That makes a_id = 84 redundant in the VALUES expression, but keep it there to avoid multiple joins that would only be filtered later. Cheaper.
If you don't have a PK or any other (combination of) UNIQUE NOT NULL columns, you can fall back to the system column ctid for joining products rows. Example:
Numbering rows consecutively for a number of tables
Remove the condition u2.a_id = u.a_id from the ON clause and put it in the assignment with a CASE statement:
update products as u set
best = case when u2.a_id = u.a_id then u2.best else 0 end
from (values
(111, 84, 1),
(222, 84, 2)
) as u2(p_id, a_id, best)
where u2.p_id = u.p_id
Related
How to change "65→67→69" to "J7,G2,P9" in SQL/PostgreSQL/MySQL? Or use split fields/value mapper in Pentaho Data Integration (Spoon) to realize it?
I use KETTLE(Pentaho Data Integration/Spoon) to insert data to PostgreSQL from other databases, I have a field with below data
value
-----------
65→67→69
15→19→17
25→23→45
19→28→98
ID value
--------
65 J7
67 G2
69 P9
15 A8
19 b9
17 C1
25 b12
23 e12
45 A23
28 C17
98 F18
And how to change the above value to the below value? Is there any SQL way or KETTLE way to realize it?
new_value
-----------
J7,G2,P9
A8,b9,C1
b12,e12,A23
b9,C17,B18
Thanks so much for any advice.
Assuming these tables:
create table table1 (value text);
insert into table1 (value)
values
('65→67→69'),
('15→19→17'),
('25→23→45'),
('19→28→98')
;
create table table2 (id int, value text);
insert into table2 (id, value)
values
(65, 'J7'),
(67, 'G2'),
(69, 'P9'),
(15, 'A8'),
(19, 'b9'),
(17, 'C1'),
(25, 'b12'),
(23, 'e12'),
(45, 'A23'),
(28, 'C17'),
(98, 'F18')
;
In Postgres you can use a scalar subselect:
select t1.value,
(select string_agg(t2.value, ',' order by t.idx)
from table_2 t2
join lateral unnest(string_to_array(t1.value,'→')) with ordinality as t(val,idx) on t2.id::text = t.val
) as new_value
from table_1 t1;
Online example
I need to duplicate records in a table that have a parent child relationship. Since the new records will have a new record id, the parent/child relationship needs to be preserved and updated to reflect the new record IDs (primary keys).
Lets say I have the following table:
asset_id parent_asset_id
1 NULL
5 1
23 1
25 23
When I copy the records into the table, they look like this:
asset_id parent_asset_id
42 NULL
43 1
44 1
45 23
I need the new asset-to-parent relationship to be as follows:
asset_id parent_asset_id
42 NULL
43 42
44 42
45 44
I am trying to use CTE (but maybe this is not the best approach). I update the current asset.asset table with the new records and then want to update the parent child relationship, but can't figure out how to join tables to get the relationship correct.
parent_asset_id is the column that needs to be have the correct relationship, all other fields stay the same.
CREATE FUNCTION asset.copy_asset_store(
IN _username VARCHAR(100),
IN _src_asset_store_id VARCHAR,
IN _dest_asset_store_id VARCHAR
)
DECLARE
v_user_id INT;
BEGIN
WITH copy_into_study AS(
UPDATE
asset.asset
SET
origin_asset_id = original.origin_asset_id,
asset_type_id = original.asset_type_id,
asset_store_id = _dest_asset_store_id,
parent_asset_id = original.parent_asset_id
asset_name = original.asset_name,
asset_desc = original.asset_desc,
is_signature_required = original.is_signature_required,
s3_bucket_id = original.s3_bucket_id,
color_id = original.color_id,
asset_expiration = original.asset_expiration,
is_deleted = original.is_deleted,
rec_created_by_user_id = v_user_id,
rec_updated_by_user_id = v_user_id
FROM
asset.asset AS original
WHERE
original.asset_store_id = _src_asset_store_id::INT
RETURNING *
),update_parent AS (
UPDATE
asset.asset
SET
parent_asset_id = copy_into_study.parent_asset_id
FROM
copy_into_study
WHERE
)
I have two tables, one is information about a sampleid (sample id is primary key) and the other is conditions the sampleid has (sampleid is not primary key in this table as it may have multiple conditions). I would like to know if my sampleid has a specific condition (Y/N) but not sure how to join them without getting a query that returns mulitple rows of the sampleid.
eg
sampleid colour
-----------------------
1 blue
2 red
3 green
sampleid condition
-----------------------
1 23
1 81
1 94
2 81
2 94
3 23
I want to ask if the sampleid has condition 23 and return:
sampleid colour condition23
----------------------------------------------
1 blue Y
2 red N
3 green Y
Hope this is clear, every time I join them i end up with multiple sampleid- I am a newbie and trying to find my way!
Thanks in advance
F
This can be done using a left join and case something like this:
SELECT
s.sampleId,
s.color,
case when c.condition is null
then 'N'
else 'Y'
end condition23
FROM
samples s
LEFT JOIN conditions c
ON s.sampleId = c.sampleId
AND c.condition = 23
Try this query:
select s.*, case when c.condition is null then 'N' else 'Y' end condition23
from samples s
left join
(select * from conditions where condition = 23) c on s.sampleid = c.sampleid
With EXISTS:
select
s.*,
case
when exists (
select 1 from conditions where sampleid = s.sampleid and condition = 23
) then 'Y'
else 'N'
end condition23
from samples s
I'm trying to select data in a table for companies and dates that don't exist for a different type/id of data.
Put another way, I want company_id, dates_id, daily_val where wh_calc_id = 344 if the same company_id/dates_id combination doesn't exist where wh_calc_id = 368.
I'm loosely following this example:
Select rows which are not present in other table
These are my two attempts at it:
attempt 1:
SELECT distinct on (company_id, dates_id) company_id, dates_id, daily_val
FROM daily_data d1
WHERE NOT EXISTS (
SELECT 1
FROM daily_data d2
WHERE d1.company_id = d2.company_id
and d1.dates_id = d2.dates_id
and d1.wh_calc_id = 368
and d2.wh_calc_id = 368
)
and d1.wh_calc_id = 344
The problem:
It's super slow: 27 minutes
attempt 2: [removed]
All in one (giant) table:
company_id int (indexed),
dates_id int (indexed),
wh_calc_id int (indexed),
daily_val numeric
I'm open to adding an index that would help speed things up, but what index?
Postgres 10
PS - I've had to kill both queries before they completed, so I don't really know if they are written correctly. Hopefully my description helps.
I would do it with a left join this way:
SELECT distinct on (company_id, dates_id) company_id, dates_id, daily_val FROM daily_data d1 LEFT JOIN daily_data d2 ON d1.company_id = d2.company_id and d1.dates_id = d2.dates_id and d1.wh_calc_id = 368 and d2.wh_calc_id = 368 WHERE d1.wh_calc_id = 344 AND d2.company_id IS NULL;
and create the index over the columns to use:
Create index on table daily_data ( company_id, dates_id, wh_calc_id);
This does what I want I think:
SELECT
d1.*
from
daily_data d1
LEFT JOIN
daily_data d2
ON
d1.company_id = d2.company_id
AND d1.dates_id = d2.dates_id
AND d2.wh_calc_id = 368
AND d1.wh_calc_id = 344
where
and d1.wh_calc_id = 344
and d2.wh_calc_id is null
DB structure:
CREATE TABLE page
(
id serial primary key,
title VARCHAR(40) not null
);
CREATE TABLE page_rating
(
id serial primary key,
page_id INTEGER,
rating_type INTEGER,
rating INTEGER
);
CREATE TABLE user_history
(
id serial primary key,
page_id INTEGER
)
Data:
INSERT INTO page (id,title) VALUES(1,'Page #1');
INSERT INTO page (id,title) VALUES(2,'Page #2');
INSERT INTO page (id,title) VALUES(3,'Page #3');
INSERT INTO page (id,title) VALUES(4,'Page #4');
INSERT INTO page (id,title) VALUES(5,'Page #5');
INSERT INTO page_rating VALUES (1,1,60,100);
INSERT INTO page_rating VALUES (2,1,99,140);
INSERT INTO page_rating VALUES (3,1,58,120);
INSERT INTO page_rating VALUES (4,1,70,110);
INSERT INTO page_rating VALUES (5,2,60,50);
INSERT INTO page_rating VALUES (6,2,99,60);
INSERT INTO page_rating VALUES (7,2,58,90);
INSERT INTO page_rating VALUES (8,2,70,140);
Purpose - select unique values for rating_type in a table "page" sorted by "rating_page.rating". And exclude table user_history from the result
My query:
SELECT DISTINCT ON(pr.rating_type) p.*,pr.rating,pr.rating_type FROM page as p
LEFT JOIN page_rating as pr ON p.id = pr.page_id
LEFT JOIN user_history uh ON uh.page_id = p.id
WHERE
pr.rating_type IN (60, 99, 58, 45, 73, 97, 55, 59, 70, 43, 74, 97, 64, 71, 46)
AND uh.page_id IS NULL
ORDER BY pr.rating_type,pr.rating DESC
Result:
ID TITLE RATING RATING_TYPE
1 "Page #1" 120 58
1 "Page #1" 100 60
2 "Page #2" 140 70
1 "Page #1" 140 99
Duplicate values ( Ideal:
ID TITLE RATING RATING_TYPE
1 "Page #1" 120 58
1 "Page #2" 50 60
Thx for help!
You almost certainly need a UNIQUE constraint on {page_id, rating_type} in the table "page_rating". You're also missing every necessary foreign key constraint. The primary key on "user_history" is suspicious, too.
Purpose - select unique values for rating_type in a table "page"
sorted by "rating_page.rating".
You can select distinct values for rating_type without referring to any other tables. And you should, at first. Let's look at the data.
select page_id, rating_type, rating
from page_rating
order by page_id, rating_type;
page_id rating_type rating
--
1 58 120 *
1 60 100
1 70 110
1 99 140
2 58 90
2 60 50 *
2 70 140
2 99 60
You seem to want one row per page_id. Those rows are marked with an asterisk in the table above. How can we get those two rows?
Those rows have different values for rating_type, so we can't just use rating_type in the WHERE clause. The values in rating are neither the max nor the min for both values of rating_type, so we can't use GROUP BY with max() or min(). And we can't use GROUP BY with an aggregate function, because you want the unaggregated value of "rating" for an arbitrary value of "rating_type".
So, based on what you've told us, the only way to get the result set you want is to specify rating_type and page_id in the WHERE clause.
select page_id, rating_type, rating
from page_rating
where (page_id = 1 and rating_type = 58)
or (page_id = 2 and rating_type = 60)
order by page_id, rating_type;
page_id rating_type rating
--
1 58 120
2 60 50
I'm not going to follow through with the joins, because I'm 100% confident that you don't really want to do this.