Need cleaner update method in PostgreSQL 9.1.3 - postgresql

Using PostgreSQL 9.1.3 I have a points table like so (What's the right way to show tables here??)
| Column | Type | Table Modifiers | Storage
|--------|-------------------|-----------------------------------------------------|----------|
| id | integer | not null default nextval('points_id_seq'::regclass) | plain |
| name | character varying | not null | extended |
| abbrev | character varying | not null | extended |
| amount | real | not null | plain |
In another table, orders I have a bunch of columns where the name of the column exists in the points table via the abbrev column, as well as a total_points column
| Column | Type | Table Modifiers |
|--------------|--------|--------------------|
| ud | real | not null default 0 |
| sw | real | not null default 0 |
| prrv | real | not null default 0 |
| total_points | real | default 0 |
So in orders I have the sw column, and in points I'll now have an amount that realtes to the column where abbrev = sw
I have about 15 columns like that in the points table, and now I want to set a trigger so that when I create/update an entry in the points table, I calculate a total score. Basically with just those three shown I could do it long-hand like this:
UPDATE points
SET total_points =
ud * (SELECT amount FROM points WHERE abbrev = 'ud') +
sw * (SELECT amount FROM points WHERE abbrev = 'sw') +
prrv * (SELECT amount FROM points WHERE abbrev = 'prrv')
WHERE ....
But that's just plain ugly and repetative, and like I said there are really 15 of them (right now...). I'm hoping there's a more sophisticated way to handle this.
In general each of those silly names on the orders table represents a type of work associated with the order, and each of those types has a 'cost' to it, which is stores in the points table. I'm not married to this structure if there's a cleaner setup.

"Serialize" the costs for orders:
CREATE TABLE order_cost (
order_cost_id serial PRIMARY KEY
, order_id int NOT NULL REFERENCES order
, cost_type_id int NOT NULL REFERENCES points
, cost int NOT NULL DEFAULT 0 -- in Cent
);
For a single row:
UPDATE orders o
SET total_points = COALESCE((
SELECT sum(oc.cost * p.amount) AS order_cost
FROM order_cost oc
JOIN points p ON oc.cost_type_id = p.id
WHERE oc.order_id = o.order_id
), 0);
WHERE o.order_id = $<order_id> -- your order_id here ...
Never use the lossy type real for currency data. Use exact types like money, numeric or just integer - where integer is supposed to store the amount in Cent.
More advice in this closely related example:
How to implement a many-to-many relationship in PostgreSQL?

Related

How can I ensure that a join table is referencing two tables with a composite FK, one of the two column being in common on both tables?

I have 3 tables : employee, event, and these are N-N so the 3rd table employee_event.
The trick is, they can only N-N within the same group
employee
+---------+--------------+
| id | group |
+---------+--------------+
| 1 | A |
| 2 | B |
+---------+--------------+
event
+---------+--------------+
| id | group |
+---------+--------------+
| 43 | A |
| 44 | B |
+----
employee_event
+---------+--------------+
| employee_id | event_id |
+-------------+--------------+
| 1 | 43 |
| 2 | 44 |
+---------+--------------+
So the combination employee_id=1 event_id=44 should not be possible, because employee from group A can not attend an event from group B. How can I secure my DB with this?
My first idea is to add the column employee_event.group so that I can make my two FK (composite) with employee_id + group and event_id + group respectively to the table employee and event. But is there a way to avoid adding a column in the join table for the only purpose of FKs?
Thx!
You may create a function and use it as a check constraint on table employee_event.
create or replace function groups_match (employee_id integer, event_id integer)
returns boolean language sql as
$$
select
(select group from employee where id = employee_id) =
(select group from event where id = event_id);
$$;
and then add a check constraint on table employee_event.
ALTER TABLE employee_event
ADD CONSTRAINT groups_match_check
CHECK groups_match(employee_id, event_id);
Still bear in mind that rows in employee_event that used to be valid may become invalid but still remain intact if certain changes in tables employee and event occur.

Fetch records with distinct value of one column while replacing another col's value when multiple records

I have 2 tables that I need to join based on distinct rid while replacing the column value with having different values in multiple rows. Better explained with an example set below.
CREATE TABLE usr (rid INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(12) NOT NULL,
email VARCHAR(20) NOT NULL);
CREATE TABLE usr_loc
(rid INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
code CHAR NOT NULL PRIMARY KEY,
loc_id INT NOT NULL PRIMARY KEY);
INSERT INTO usr VALUES
(1,'John','john#product'),
(2,'Linda','linda#product'),
(3,'Greg','greg#product'),
(4,'Kate','kate#product'),
(5,'Johny','johny#product'),
(6,'Mary','mary#test');
INSERT INTO usr_loc VALUES
(1,'A',4532),
(1,'I',4538),
(1,'I',4545),
(2,'I',3123),
(3,'A',4512),
(3,'A',4527),
(4,'I',4567),
(4,'A',4565),
(5,'I',4512),
(6,'I',4567);
(6,'I',4569);
Required Result Set
+-----+-------+------+-----------------+
| rid | name | Code | email |
+-----+-------+------+-----------------+
| 1 | John | B | 'john#product' |
| 2 | Linda | I | 'linda#product' |
| 3 | Greg | A | 'greg#product' |
| 4 | Kate | B | 'kate#product' |
| 5 | Johny | I | 'johny#product' |
| 6 | Mary | I | 'mary#test' |
+-----+-------+------+-----------------+
I have tried some queries to join and some to count but lost with the one which exactly satisfies the whole scenario.
The query I came up with is
SELECT distinct(a.rid)as rid, a.name, a.email, 'B' as code
FROM usr
JOIN usr_loc b ON a.rid=b.rid
WHERE a.rid IN (SELECT rid FROM usr_loc GROUP BY rid HAVING COUNT(*) > 1);`
You need to group by the users and count how many occurrences you have in usr_loc. If more than a single one, then replace the code by B. See below:
select
rid,
name,
case when cnt > 1 then 'B' else min_code end as code,
email
from (
select u.rid, u.name, u.email, min(l.code) as min_code, count(*) as cnt
from usr u
join usr_loc l on l.rid = u.rid
group by u.rid, u.name, u.email
) x;
Seems to me that you are using MySQL, rather than IBM DB2. Is that so?

Postgres - updates with join gives wrong results

I'm having some hard time understanding what I'm doing wrong.
The result of this query shows the same results for each row instead of being updated by the right result.
My DATA
I'm trying to update a table of stats over a set of business
business_stats ( id SERIAL,
pk integer not null,
b_total integer,
PRIMARY KEY(pk)
);
the details of each business are stored here
business_details (id SERIAL,
category CHARACTER VARYING,
feature_a CHARACTER VARYING,
feature_b CHARACTER VARYING,
feature_c CHARACTER VARYING
);
and here a table that associate the pk with the category
datasets (id SERIAL,
pk integer not null,
category CHARACTER VARYING;
PRIMARY KEY(pk)
);
WHAT I DID (wrong)
UPDATE business_stats
SET b_total = agg.total
FROM business_stats b,
( SELECT d.pk, count(bd.id) total
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
GROUP BY d.pk
) agg
WHERE b.pk = agg.pk;
The result of this query is
| id | pk | b_total |
+----+----+-----------+
| 1 | 14 | 273611 |
| 2 | 15 | 273611 |
| 3 | 16 | 273611 |
| 4 | 17 | 273611 |
but if I run just the SELECT the results of each pk are completely different
| pk | agg.total |
+----+-------------+
| 14 | 273611 |
| 15 | 407802 |
| 16 | 179996 |
| 17 | 815580 |
THE QUESTION
why is this happening?
why is the WHERE clause not working?
Before writing this question I've used as reference these posts: a, b, c
Do the following (I always recommend against joins in Updates)
UPDATE business_stats bs
SET b_total =
( SELECT count(c.id) total
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
where d.pk=bs.pk
)
/*optional*/
where exists (SELECT *
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
where d.pk=bs.pk)
The issue is your FROM clause. The repeated reference to business_stats means you aren't restricting the join like you expect to. You're joining agg against the second unrelated mention of business_stats rather than the row you want to update.
Something like this is what you are after (warning not tested):
UPDATE business_stats AS b
SET b_total = agg.total
FROM
(...) agg
WHERE b.pk = agg.pk;

Insert output of query into another table in postgres?

I'm working in Postgres 9.4. I have two tables:
Table "public.parcel"
Column | Type | Modifiers
ogc_fid | integer | not null default
wkb_geometry | geometry(Polygon,4326) |
county | character varying |
parcel_area | double precision |
Table "public.county"
Column | Type | Modifiers
--------+------------------------+-----------
name | character(1) |
chull | geometry(Polygon,4326) |
area | double precision |
I would like to find all the unique values of county in parcel, and the total areas of the attached parcels, and then insert them into the county table as name and area respectively.
I know how to do the first half of this:
SELECT county,
SUM(parcel_area) AS area
FROM inspire_parcel
GROUP BY county;
But what I don't know is how to insert these values into county. Can anyone advise?
I think it's something like:
UPDATE county SET name, area = (SELECT county, SUM(parcel_area) AS area
FROM inspire_parcel GROUP BY county)
You use INSERT INTO. So, something like this:
INSERT INTO county
SELECT county, SUM(parcel_area) AS area
FROM inspire_parcel GROUP BY county;

Postgres chooses a bad query plan

The following query finishes in 1.5s (which is ok, the table contains about 500M rows):
explain (analyze, buffers)
select sales.*
from sales
join product on (product.id = sales.productid)
join date on (date.id = sales.dateid)
where product.id = 24
order by date.timestamp
limit 200;
Query plan: http://explain.depesz.com/s/8Ix
Searching for product.name instead increases the runtime to totally unacceptable 200s:
explain (analyze, buffers)
select sales.*
from sales
join product on (product.id = sales.productid)
join date on (date.id = sales.dateid)
where product.name = 'new00000006'
order by date.timestamp
limit 200;
Query plan: http://explain.depesz.com/s/0RfQ
Note that the product named 'new00000006' has id 24 (same id as in the fast query above). Proof:
select name from product where id = 24;
name
-------------
new00000006
Why does that query take 200 times longer than the first query?
Another interesting modification of this query is this.. instead of product.id = 24 (like in the first query), I use product.id = (select 24). This also takes 200s to run (it actually results in the same bad query plan as when searching for product.name):
explain (analyze, buffers)
select sales.*
from sales
join product on (product.id = sales.productid)
join date on (date.id = sales.dateid)
where product.id = (select 24)
order by date.timestamp
limit 200;
Query plan: http://explain.depesz.com/s/K3VO
The statistics table shows that product id 24 is "rare":
select most_common_vals from pg_stats where tablename='sales' and attname='productid';
{19,2,7,39,40,14,33,18,8,37,16,48,6,23,49,29,46,41,20,53,47,26,38,1,32,42,56,57,10,15,27,50,30,45,51,58,17,36,4,25,44,43,5,22,11,35,52,9,21,12,24,31,28,54,34,3,55,13}
select most_common_freqs from pg_stats where tablename='sales' and attname='productid';
{0.020225,0.020119,0.0201133,0.0201087,0.0201,0.0200903,0.0200843,0.020069,0.0200557,0.0200477,0.0200427,0.0200303,0.0200197,0.020019,0.020012,0.0200107,0.0200067,0.020006,0.019995,0.0199947,0.0199917,0.019986,0.019986,0.0199777,0.0199747,0.0199713,0.0199693,0.019969,0.019967,0.019962,0.0199607,0.0199603,0.01996,0.0199567,0.0199567,0.0199533,0.019952,0.019951,0.0199467,0.019944,0.019944,0.01993,0.0199297,0.0199257,0.0199223,0.0199143,0.01989,0.0198887,0.019883,0.0198747,6.7e-005,6e-005,5.9e-005,5.6e-005,5.46667e-005,5.43333e-005,5.13333e-005,4.96667e-005}
Product id 24 has a frequency of 6.7e-005 (it's a "new product"), while older products have frequencies of about 0.01.
Statistics show that the first query plan (the one that runs in 1.5s) makes perfect sense. It uses the sales_productid_index to quickly find the sales of this product. Why isn't the same query plan used in the other two cases? It seems like statistics are ignored.
Table definitions (slightly obfuscated / renamed):
Tabelle äpublic.salesô
Spalte | Typ | Attribute | Speicherung | Statistikziel | Beschreibung
-----------+---------+-----------+-------------+---------------+--------------
id | uuid | not null | plain | |
dateid | integer | | plain | 10000 |
productid | integer | | plain | 10000 |
a | text | | extended | 10000 |
b | integer | | plain | 10000 |
x1 | boolean | | plain | |
x2 | boolean | | plain | |
x3 | boolean | | plain | |
x4 | boolean | | plain | |
x5 | boolean | | plain | |
Indexe:
"sales_pkey" PRIMARY KEY, btree (id)
"sales_a_index" btree (a)
"sales_b_index" btree (b)
"sales_dateid_index" btree (dateid)
"sales_productid_index" btree (productid)
Fremdschlnssel-Constraints:
"sales_dateid_fkey" FOREIGN KEY (dateid) REFERENCES date(id)
"sales_productid_fkey" FOREIGN KEY (productid) REFERENCES product(id)
Hat OIDs: nein
Tabelle äpublic.productô
Spalte | Typ | Attribute | Speicherung | Statistikziel | Beschreibung
--------+---------+----------------------------------------------------------+-------------+---------------+--------------
id | integer | not null Vorgabewert nextval('product_id_seq'::regclass) | plain | |
name | text | | extended | |
Indexe:
"product_pkey" PRIMARY KEY, btree (id)
"product_name_index" UNIQUE, btree (name)
Fremdschlnsselverweise von:
TABLE "sales" CONSTRAINT "sales_productid_fkey" FOREIGN KEY (productid) REFERENCES product(id)
TABLE "salesaggr" CONSTRAINT "salesaggr_productid_fkey" FOREIGN KEY (productid) REFERENCES product(id)
Hat OIDs: nein
Version: PostgreSQL 9.3.1, compiled by Visual C++ build 1600, 64-bit
Config: default configuration except for maintenance_work_mem, which has been increased to 1GB.
Operating system: Microsoft Windows [Version 6.2.9200]
Amount and size of RAM installed: 32GB
Storage: a single 1TB SSD
In your first query, the planner takes a shortcut and uses the sales_productid_index available on sales.productid since it is told that sales.productid = product.id. The only thing the join with product actually does in this query is assuring that a row with id = 24 actually exists in the table.
In the second query, this shortcut isn't available. The planner could choose to go to product, get the id and then scan sales using the index on productid, probably getting similar performance, but because he doesn't know that name='new00000006' will lead to id=24, he can't guess how many rows in sales this path would result in*. For all he knows he'd be index scanning a significant part of the 300M rows sales table.
*Note that in the first query, he guesses that productid=24 will result in 42393 rows, while getting 34560 rows. Quite close considering the table has 300M rows.