Postgres - Trigger with matched key - postgresql

I have several tables. A table, cexp, is a table that has attributes cid and total. Cid is grouped and total is the sum of quantity * price for that cid (matched on cid)
The cexp table was populated with the results of the following code:
SELECT c.cid, sum(ol.quantity*b.price) as total
FROM customers c join orders o on c.cid=o.cid
join orderlist ol on o.ordernum=ol.ordernum
join books b on b.isbn=ol.isbn
GROUP BY C.CID
My task is to create a trigger that, when inserting rows for order and orderderlist, finds the matching name, in cexp and increments the existing total by the product of new quantity (from orderlist) and the price (from books). If no match, insert a row in cexp.
Tables are as follows:
Customers-cid,name pk-cid
Books - isbn,title,price pk-isbn
Orders - ordernum,cid pk-ordernum
Orderlist - ordernum,isbn, quantity - pk-(ordernum,isbn)
cexp - cid,total - pk-cid
I am getting syntax errors. Can anyone correct this code?
CREATE OR REPLACE FUNCTION cexpupd()
RETURNS trigger as
$cexpupd$
BEGIN
UPDATE cexp
SET new.total=total+Select (b.price*new.quantity) FROM customers c
join orders o on c.cid=o.cid
join orderlist ol on o.ordernum=ol.ordernum
join books b on b.isbn=ol.isbn
where b.isbn=new.isbn;
--INSERT CODE WHEN ABOVE LINE DOES NOT OCCUR -INSERTS NEW ROW INTO CEXP
END;
$cexpupd$
LANGUAGE plpgsql

I would say that your UPDATE statement is an unsalvageable birds-nest. Fortunately, there's an easier way to achieve the same result.
Keep in mind that a trigger function is procedural code, so there's no need to concurrently load the gun, pull the trigger, scream, and shoot yourself in the foot in a single statement. You have access to all the procedural goodies like local variables and flow control statements.
The following should give you a good headstart on what you want:
CREATE OR REPLACE FUNCTION cexpupd()
RETURNS trigger as
$cexpupd$
DECLARE
bookprice books.price%TYPE;
ext_total cexp.total%TYPE;
custid orders.cid%TYPE;
BEGIN
SELECT cid
INTO custid
FROM orders
WHERE orders.ordernum = NEW.ordernum;
SELECT price
INTO bookprice
FROM books
WHERE books.isbn = NEW.isbn;
ext_total = bookprice * NEW.quantity;
UPDATE cexp
SET total = ext_total
WHERE cid = custid;
IF NOT FOUND THEN
--INSERT new CID record here
INSERT INTO cexp
(cid, total)
VALUES
(custid, ext_total);
END IF;
RETURN NEW;
END;
$cexpupd$
LANGUAGE plpgsql;
Note the statement near the end, RETURN NEW; This line is crucial, as it is how a PostgreSQL trigger function tells the database to go ahead and finish executing the statement that fired the trigger.
If you need any clarification, please don't hesitate to ask. Please note I have not tried executing this function, but I did create the necessary tables to compile it successfully.

Related

Optimizing an insert/update loop in a stored procedure

I have two tables wholesaler_catalog and wholesaler_catalog_prices. The latter has a foreign key reference to the former.
wholesaler_catalog_prices has a column called cost_type which can be either RETAIL or DISCOUNT.
Consider row Foo in wholesaler_catalog. Foo has two entries in wholesaler_catalog_prices - one for RETAIL and one for DISCOUNT. I want to split up Foo into Foo1 and Foo2, such that Foo1 points to RETAIL and Foo2 points to DISCOUNT. (The reasons for doing this are complex which I won't go into - it's part of a larger migration)
I have made a stored procedure that looks like this:
do
$$
declare
f record;
new_id int;
begin
for f in select catalog_id from
(select catalog_id, cost_type, row_number() over (partition by catalog_id) from wholesaler_catalog_prices
group by catalog_id, cost_type
order by catalog_id) as x
where row_number > 1
loop
insert into wholesaler_catalog
(item_number, name, catalog_log_id)
select item_number, name, catalog_log_id from wholesaler_catalog
where id = f.catalog_id
returning id into new_id;
-- RAISE NOTICE '% copied to %', f.catalog_id, new_id;
update wholesaler_catalog_prices set catalog_id = new_id where catalog_id = f.catalog_id and cost_type = 'RETAIL';
end loop;
end;
$$
The problem is that there are about 100k such records and it takes a very long time to run (I cancelled the run after 30 minutes). Is there anyway I can optimize the procedure to run faster?

String replacement in Postgresql originating an array of additional strings

Suppose you have two tables with substitutions which MUST be kept as they are and another table containing a body of names. How could I get all the possible substitutions?
Substitution Table
--------------------------------------
word subs_list
MOUNTAIN MOUNTAIN, MOUNT, MT, MTN
HOUSE HAUS, HOUSE
VIEW VU, VIEW
Synonyms table
-------------------------------------------------
EDUCATION SCHOOL, UNIVERSITY, COLLEGE, TRAINING
FOOD STORE, FOOD, CAFE
STORE FOOD, STORE, MARKET
REFRIGERATION FOODLOCKER, FREEZE, FRIDGE
names table
------------------------------------------------
MOUNT VU FOOD USA
MOUNTAIN VU STORE CA
Note: I know that it would be desirable to have just one substitution table, but both substution tables must remain because they served to additional purposes than the one explained above, those tables are already in used. In addition, the list of replacements in both tables are just a varchar with a string separated by commas
Considering the previous, the problem is to generate possible names derived by substitution. For instance, the name MOUNT VU FOOD USA should be decomposed to MOUNTAIN VIEW FOOD USA and MOUNTAIN VIEW STORE USA, the same fashion would apply for the second.
I have been able to get the replacements in a wrong order and all together in function, there is a way
to get an array as output with the different names generated after replacement? So far I have created this function for replacement:
create or replace function replace_companies_array(i_sentence IN VARCHAR) returns VARCHAR[] AS $p_replaced$
DECLARE
p_replaced VARCHAR[];
subs RECORD;
flag boolean:= True;
cur_s CURSOR(i_sentence VARCHAR)
FOR SELECT w.input, coalesce(x.word, w.input) as word, count(*) OVER (PARTITION BY w.input) as counter
FROM regexp_split_to_table(trim(i_sentence), '\s') as w(input)
LEFT JOIN (
select s.word, trim(s1.token) as token
from subs01 s
cross join unnest(string_to_array(s.subs_list, ',')) s1(token)
union
select sy.word, trim(s2.token) as token
from syns01 sy
cross join unnest(string_to_array(sy.syn_list, ',')) s2(token)
) as x on lower(trim(w.input)) = lower(x.token)
order by counter;
BEGIN
OPEN cur_s(i_sentence);
LOOP
--fetch row into the substitutions
FETCH cur_s INTO subs;
--Exit when no more rows to fetch
EXIT WHEN NOT FOUND;
SELECT REGEXP_REPLACE(i_sentence,'(^|[^a-z0-9])' || subs.input || '($|[^a-z0-9])','\1' || UPPER(subs.word) || '\2','g')
INTO i_sentence;
END LOOP;
p_replaced:=array_append(p_replaced, i_sentence);
RETURN p_replaced;
END;
$p_replaced$ LANGUAGE plpgsql;
Thank you so much for your contributions
I didn't manage to get the final result, but I'w quite close to it!
From sentence: MOUNT VU FOOD USA, I obtain {"MOUNTAIN VIEW MARKET USA","MOUNTAIN VIEW STORE USA","MOUNTAIN VIEW CAFE USA","MOUNTAIN VIEW FOOD USA"}
Here are all my script to recreate the synonyms & substitute tables:
DROP TABLE IF EXISTS subs01;
DROP TABLE IF EXISTS syns01;
CREATE TABLE subs01 (word VARCHAR(20), subs_list VARCHAR(200));
CREATE TABLE syns01 (word VARCHAR(20), syn_list VARCHAR(200));
INSERT INTO subs01 (word, subs_list) VALUES ('MOUNTAIN', 'MOUNTAIN, MOUNT, MT, MTN'),('HOUSE', 'HAUS, HOUSE'),('VIEW', 'VU, VIEW');
INSERT INTO syns01 (word, syn_list) VALUES ('EDUCATION', 'SCHOOL, UNIVERSITY, COLLEGE, TRAINING'),('FOOD', 'STORE, FOOD, CAFE'),('STORE', 'FOOD, STORE, MARKET'),('REFRIGERATION', 'FOODLOCKER, FREEZE, FRIDGE');
I decided to split the job into 2 phases:
Substitute the words:
CREATE OR REPLACE function substitute_words (i_sentence IN VARCHAR) returns VARCHAR AS $p_substituted$
DECLARE
--p_substituted VARCHAR;
subs_cursor CURSOR FOR select su.word, trim(s2.token) as token from subs01 su cross join unnest(string_to_array(su.subs_list, ',')) s2(token);
subs_record record;
BEGIN
OPEN subs_cursor;
LOOP
FETCH subs_cursor INTO subs_record;
EXIT WHEN NOT FOUND;
RAISE NOTICE 'INFO : TOKEN (%) ',subs_record.token ;
IF i_sentence LIKE '%'|| subs_record.token || '%' THEN
RAISE NOTICE '-- FOUND : TOKEN (%) ',subs_record.token ;
SELECT replace (i_sentence, subs_record.token, subs_record.word) INTO i_sentence;
END IF;
END LOOP;
CLOSE subs_cursor;
RETURN i_sentence;
END
$p_substituted$ LANGUAGE plpgsql;
Replace known words by their synomyms:
CREATE OR REPLACE function synonymize_sentence (i_sentence IN VARCHAR) returns TABLE (sentence_result VARCHAR) AS $p_syn$
DECLARE
syn_cursor CURSOR FOR select su.word, trim(s2.token) as token from syns01 su cross join unnest(string_to_array(su.syn_list, ',')) s2(token);
syn_record record;
BEGIN
CREATE TEMPORARY TABLE record_syn (result VARCHAR(200)) ON COMMIT DROP;
INSERT INTO record_syn (result) SELECT i_sentence;
OPEN syn_cursor;
LOOP
FETCH syn_cursor INTO syn_record;
EXIT WHEN NOT FOUND;
RAISE NOTICE 'INFO : WORD (%) ',syn_record.word ;
INSERT INTO record_syn (result) SELECT replace (result, syn_record.word, syn_record.token) FROM record_syn where result LIKE '%'|| syn_record.word || '%';
END LOOP;
CLOSE syn_cursor;
RETURN QUERY SELECT distinct result FROM record_syn;
END;
$p_syn$ LANGUAGE plpgsql;
Then, to generate the result array, I perform this statement:
SELECT ARRAY(SELECT synonymize_sentence (substitute_words ('MOUNT VU FOOD USA')));

stored procedure, accessing data that has just been inserted

I am trying to insert values into a table based upon a person being inserted into another table. I have a trigger for this, when someone is assigned to employee, they are automatically assigned to employeepark with the first spot that is available. I cannot figure out how to access the id that is being input into the employee table. I would appreciate any tips or ideas, thank you !
This is the error I am receiving.
ERROR: record "new" is not assigned yet
create or replace function new_employeeAssign() returns trigger as $new_employeeAssign$
declare
open_spotID int := (select parkingspot.spotid
from employeepark e full outer join parkingspot on e.spotid = parkingspot.spotid
where e.spotid isNull limit 1);
begin
insert into employeepark(employeeid, spotid)
values(new.employeeid ,open_spotID);
End;
$new_employeeAssign$ language plpgsql;
create trigger new_employeeAssign after insert on employee
execute procedure new_employeeAssign();
insert into people(peopleid, fname, lname)
values(686, 'random', 'person');
insert into employee(employeeid)
values(686);
Patrick figured this out for me now I am running into THIS PROBLEM:
I want to select the first value out of all of these ranges that is null, I keep getting back one though and it is just bypassing the ranges and going straight to the isNull.
(select parkingspot.spotid
from employeepark e full outer join parkingspot on e.spotid = parkingspot.spotid
where (e.spotid = 301)
or (e.spotid = 1601)
or (e.spotid = 2001)
or (e.spotid = 2011)
or (e.spotid = 2121)
or (e.spotid = 2021)
or (e.spotid = 2771)
or (e.spotid = 2921)
or (e.spotid = 3021)
or (e.spotid = 3823) isNull
limit 1)
Your trigger definition is incorrect. By default, a trigger applies to FOR EACH STATEMENT and then the NEW parameter does not exist (the trigger does not apply to a row, after all). Instead you should have:
CREATE TRIGGER new_employeeAssign AFTER INSERT ON employee
FOR EACH ROW EXECUTE PROCEDURE new_employeeAssign();
There are also some issues with your trigger function, in particular the query that assigns to variable open_spotID. This query will always select NULL because e.spotid IS NULL and you join on e.spotid = parkingspot.spotid. The logic that you are looking for is probably that you want to assign a parking slot to a new employee by making a row in table employeepark with a spot_id that is not already assigned to some other employee. See code below.
You also have to RETURN NEW from the function.
Other than that, your trigger function could be much optimized like so:
CREATE FUNCTION new_employeeAssign() RETURNS trigger AS $new_employeeAssign$
BEGIN
INSERT INTO employeepark(employeeid, spotid)
SELECT NEW.employeeid, spotid
FROM parkingspot p
LEFT JOIN employeepark e USING (spotid)
WHERE e.employeeid IS NULL
LIMIT 1;
RETURN NEW;
END;
$new_employeeAssign$ LANGUAGE plpgsql;

PostgreSQL: How to figure out missing numbers in a column using generate_series()?

SELECT commandid
FROM results
WHERE NOT EXISTS (
SELECT *
FROM generate_series(0,119999)
WHERE generate_series = results.commandid
);
I have a column in results of type int but various tests failed and were not added to the table. I would like to create a query that returns a list of commandid that are not found in results. I thought the above query would do what I wanted. However, it does not even work if I use a range that is outside the expected possible range of commandid (like negative numbers).
Given sample data:
create table results ( commandid integer primary key);
insert into results (commandid) select * from generate_series(1,1000);
delete from results where random() < 0.20;
This works:
SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
WHERE NOT EXISTS (SELECT 1 FROM results WHERE commandid = s.i);
as does this alternative formulation:
SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
LEFT OUTER JOIN results ON (results.commandid = s.i)
WHERE results.commandid IS NULL;
Both of the above appear to result in identical query plans in my tests, but you should compare with your data on your database using EXPLAIN ANALYZE to see which is best.
Explanation
Note that instead of NOT IN I've used NOT EXISTS with a subquery in one formulation, and an ordinary OUTER JOIN in the other. It's much easier for the DB server to optimise these and it avoids the confusing issues that can arise with NULLs in NOT IN.
I initially favoured the OUTER JOIN formulation, but at least in 9.1 with my test data the NOT EXISTS form optimizes to the same plan.
Both will perform better than the NOT IN formulation below when the series is large, as in your case. NOT IN used to require Pg to do a linear search of the IN list for every tuple being tested, but examination of the query plan suggests Pg may be smart enough to hash it now. The NOT EXISTS (transformed into a JOIN by the query planner) and the JOIN work better.
The NOT IN formulation is both confusing in the presence of NULL commandids and can be inefficient:
SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
WHERE s.i NOT IN (SELECT commandid FROM results);
so I'd avoid it. With 1,000,000 rows the other two completed in 1.2 seconds and the NOT IN formulation ran CPU-bound until I got bored and cancelled it.
As I mentioned in the comment, you need to do the reverse of the above query.
SELECT
generate_series
FROM
generate_series(0, 119999)
WHERE
NOT generate_series IN (SELECT commandid FROM results);
At that point, you should find values that do not exist within the commandid column within the selected range.
I am not so experienced SQL guru, but I like other ways to solve problem.
Just today I had similar problem - to find unused numbers in one character column.
I have solved my problem by using pl/pgsql and was very interested in what will be speed of my procedure.
I used #Craig Ringer's way to generate table with serial column, add one million records, and then delete every 99th record. This procedure work about 3 sec in searching for missing numbers:
-- creating table
create table results (commandid character(7) primary key);
-- populating table with serial numbers formatted as characters
insert into results (commandid) select cast(num_id as character(7)) from generate_series(1,1000000) as num_id;
-- delete some records
delete from results where cast(commandid as integer) % 99 = 0;
create or replace function unused_numbers()
returns setof integer as
$body$
declare
i integer;
r record;
begin
-- looping trough table with sychronized counter:
i := 1;
for r in
(select distinct cast(commandid as integer) as num_value
from results
order by num_value asc)
loop
if not (i = r.num_value) then
while true loop
return next i;
i = i + 1;
if (i = r.num_value) then
i = i + 1;
exit;
else
continue;
end if;
end loop;
else
i := i + 1;
end if;
end loop;
return;
end;
$body$
language plpgsql volatile
cost 100
rows 1000;
select * from unused_numbers();
Maybe it will be usable for someone.
If you're on AWS redshift, you might end up needing to defy the question, since it doesn't support generate_series. You'll end up with something like this:
select
startpoints.id gapstart,
min(endpoints.id) resume
from (
select id+1 id
from yourtable outer_series
where not exists
(select null
from yourtable inner_series
where inner_series.id = outer_series.id + 1
)
order by id
) startpoints,
yourtable endpoints
where
endpoints.id > startpoints.id
group by
startpoints.id;

Massive insertions from one big table to other related tables

Into:
Currently i have scraped all the data into one PostgreSQL 'Bigtable' table(there are about 1.2M rows). Now i need to split the design into separate tables which all have dependency on the Bigtable. Some of the tables might have subtables. The model looks pretty much like snowflake.
Problem:
What would be best option to inserting data into tables? I thought to make the insertion with functions written in 'SQL' or PLgSQL. But the problem is still with auto-generated ID-s.
Also if you know what tools might make this problem solving easier then post!
//Edit i have added example, this not the real case just for illustration
1.2 M rows is not too much. The best tool is sql script executed from console "psql". If you have a some newer version of Pg, then you can use inline functions (DO statement) when it is necessary. But probably the most useful command is INSERT INTO SELECT statement.
-- file conversion.sql
DROP TABLE IF EXISTS f1 CASCADE;
CREATE TABLE f1(a int, b int);
INSERT INTO f1
SELECT x1, y1
FROM data
WHERE x1 = 10;
...
-- end file
psql mydb -f conversion.sql
If I understand your question, you can use a psql function like this:
CREATE OR REPLACE FUNCTION migration() RETURNS integer AS
$BODY$
DECLARE
currentProductId INTEGER;
currentUserId INTEGER;
currentReg RECORD;
BEGIN
FOR currentReg IN
SELECT * FROM bigtable
LOOP
-- Product
SELECT productid INTO currentProductId
FROM product
WHERE name = currentReg.product_name;
IF currentProductId IS NULL THEN
EXECUTE 'INSERT INTO product (name) VALUES (''' || currentReg.product_name || ''') RETURNING productid'
INTO currentProductId;
END IF;
-- User
SELECT userid INTO currentUserId
FROM user
WHERE first_name = currentReg.first_name and last_name = currentReg.last_name;
IF currentUserId IS NULL THEN
EXECUTE 'INSERT INTO user (first_name, last_name) VALUES (''' || currentReg.first_name || ''', ''' || currentReg.last_name || ''') RETURNING userid'
INTO currentUserId;
-- Insert into userAdded too with: currentUserId and currentProductId
[...]
END IF;
-- Rest of tables
[...]
END LOOP;
RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;
select * from migration();
In this case it's assumed that each table runs its own primary key sequence and I have reduced the number of fields in the tables to simplify.
I hope you have been helpful.
No need to use a function for this (unless I misunderstood your problem)
If your id columns are all defined as serial column (i.e. they automatically generate the values), then this can be done with simple INSERT statements. This assumes that the target tables are all empty.
INSERT INTO users (firstname, lastname)
SELECT DISTINCT firstname, lastname
FROM bigtable;
INSERT INTO category (name)
SELECT DISTINCT category_name
FROM bigtable;
-- the following assumes a column categoryid in the product table
-- which is not visible from your screenshot
INSERT INTO product (product_name, description, categoryid)
SELECT DISTINCT b.product_name, b.description, c.categoryid
FROM bigtable b
JOIN category c ON c.category_name = b.category_name;
INSERT INTO product_added (product_productid, user_userid)
SELECT p.productid, u.userid
FROM bigtable b
JOIN product p ON p.product_name = b.product_name
JOIN users u ON u.firstname = b.firstname AND u.lastname = b.lastname