Transaction Outbox Pattern with AWS Aurora Postgres and aws_lambda.invoke

Transaction Outbox Pattern with AWS Aurora Postgres and aws_lambda.invoke - postgresql

I am working on a project which is made out of a couple of microservices. I am planning to use Transaction Outbox Pattern by calling a lambda function in Postgres after insert trigger.
I am thinking something like this
CREATE OR REPLACE FUNCTION tx_msg_func() RETURNS trigger AS
$$
DECLARE newRecord JSON;
BEGIN
newRecord := row_to_json(NEW.*);
PERFORM * FROM aws_lambda.invoke(
aws_commons.create_lambda_function_arn('my_lambda_function'),
newRecord,
'Event'
);
RETURN NEW;
END;
$$
LANGUAGE 'plpgsql';
CREATE TRIGGER tx_msg_insert AFTER INSERT ON tx_outbox_table
FOR EACH ROW EXECUTE PROCEDURE tx_msg_func();
Here, the lambda function will receive the new record as JSON and will send an SQS message. After sending the message successfully, it will delete the record from tx_outbox_table
I am wondering if there is any downside here that I am missing. Do you think this is a production-ready solution? Is there anything I should be aware of?

Well, what about transaction? It should be as short as possible. After insert is executed inside transaction, so... TCP call goes inside the transaction. What can be done with it? Here is an idea. Exceptions are processed outside current transaction. New transaction is lighter when nothing is written, so maybe that is the way to go?

Related

How do you handle error handling and commits in Postgres

I am using Postgres 13.5 and I am unsure how to combine commit and error handling in a stored procedure or DO block. I know that if I include the EXCEPTION clause in my block, then I cannot include a commit.
I am new to Postgres. It has also been over 15 years since I have written SQL that was working with transactions. When I was working with transactions I was using Oracle and recall using AUTONOMOUS_TRANSACTION to resolve some of these issues. I am just not sure how to do something like that in Postgres.
Here is a very simplified DO block. As I said above, I know that the Commits will cause the procedure to throw and exception. But, if I remove the EXCEPTION clause, then how will I trap an error if it happens? After reading many things, I still have not found a solution. So, I am not understanding something that will lead me to the solution.
Do
$$
DECLARE
v_Start timestamptz;
v_id integer;
v_message_type varchar(500);
Begin
select current_timestamp into start;
select q.id, q.message_type into (v_id, v_message_type) from message_queue;
call Load_data(v_id, v_message_type);
commit; -- if Load_Data completes successfully, I want to commmit the data
insert into log (id, message_type, Status, start, end)
values (v_id, v_message_type, 'Success', v_start, Currrent_Timestamp);
commit; -- commit the log issert for success
EXCEPTION
WHEN others THEN
insert into log (id, message_type, status, start, end, error_message)
values (v_id, v_message_type, 'Failue', v_start, Currrent_Timestamp, SQLERRM || '', ' ||
SQLSTATE );
commit; -- commit the log insert for failure.
end;
$$
Thanks!
Since this is a pattern that I will have to do tens of times, I want to understand the right way to do this.

Since you cannot use transaction management statements in a subtransaction, you will have to move part of the processing to the client side.
But your sample code doesn't need any transaction management at all! Simply remove all the COMMIT statements, and the procedure will work just as you want it to. Remember that PostgreSQL uses the autocommit mode, so your procedure call from the client will automatically run in its own transaction and commit when it is done.
But perhaps your sample code is simplified, and you would like more complicated processing (looping etc.) in your actual use cases. So let's discuss your options:
One option is to remove the EXCEPTION handler and move only that part to the client side: if the procedure causes an error, roll back and insert a log message. Another, perhaps cleaner, method is to move the whole transaction management to the client side. In that case, you would replace the complete procedure with client code and call load_data directly from client code.

select statement in postgres function called inside a trigger

I'm trying to develop a notification system for the backend of a social media application/website. For now I'm focusing on status updates. What I'm going to do, is putting a trigger on the postgres table that related to status updates so that every time a new status update is posted, a notification is sent to my code.
So far I have been able to do that. But an extra feature that I like to implement is extracting all of the people who follow the user who posted the status update, so that I can also send them a notification that the person they're following has posted a new status update.
Of course it can be implemented by first receiving the notification for a new status update from postgres, extracting the user id of the person who posted it, and make a query to the database to find out which users follow them.
But I figured it would be more efficient if I don't make the query, and instead, each time postgres wants to send me a notification about a new status update, it also makes a query to find out which users are following the poster of the status update and send that information along with the notification for the new status update.
But I can't figure out how I can make a query in a postgres function that depends on the argument of that function, and then send the result of that query along with the argument as a notification.
this is what I've tried:
create table example (c1 text, c2 text);
create function notif()
returns trigger as
$$
begin
perform pg_notify('event',row_to_json(new)::text);
return new;
end;
$$ language plpgsql;
create trigger trig after insert
on example
for each row execute procedure notif();
And then I listen to the event channel from my code and receive the row that was inserted. But I want to do a select statement based on the new row in my notif() function and send the result with the new row to the listening code.
I'd appreciate any clarification
Thanks

Something like this?
CREATE FUNCTION notif()
RETURNS TRIGGER AS $$
DECLARE
data JSONB;
result JSONB;
BEGIN
SELECT json_agg(tmp) -- requires Postgres9.3+
INTO data
FROM (
-- your subquery goes here, for example:
SELECT followers.following_user_id
FROM followers
WHERE followers.followed_user_id = NEW.user_id
) tmp;
result := json_build_object('data', data, 'row', row_to_json(NEW));
PERFORM pg_notify('event', result::TEXT);
RETURN NEW;
END;
$$ language plpgsql;
Also from comments:
But somehow magically using return new the row is returned within the notification.
You misunderstand things. Return and notification are two different things.
First of all lets deal with return. For AFTER INSERT triggers the return value is totally ignored:
The return value of a row-level trigger fired AFTER or a statement-level trigger fired BEFORE or AFTER is always ignored; it might as well be null.
The return value only matters for BEFORE triggers. In which case you can modify (or even prevent) the row before writing to the table. See this: https://www.postgresql.org/docs/9.2/plpgsql-trigger.html This has nothing to do with notifications.
So what about notifications? Whatever you receive from a notification is what you pass as second argument to pg_notify. All of that is quite well documented: https://www.postgresql.org/docs/9.0/sql-notify.html

Insert values in a loop and see the progress postgresql [duplicate]

I have Postgresql Function which has to INSERT about 1.5 million data into a table. What I want is I want to see the table getting populated with every one records insertion. Currently what is happening when I am trying with say about 1000 records, the get gets populated only after the complete function gets executed. If I stop the function half way through, no data gets populated. How can I make the record committed even if I stop after certain number of records have been inserted?

This can be done using dblink. I showed an example with one insert being committed you will need to add your while loop logic and commit every loop. You can http://www.postgresql.org/docs/9.3/static/contrib-dblink-connect.html
CREATE OR REPLACE FUNCTION log_the_dancing(ip_dance_entry text)
RETURNS INT AS
$BODY$
DECLARE
BEGIN
PERFORM dblink_connect('dblink_trans','dbname=sandbox port=5433 user=postgres');
PERFORM dblink('dblink_trans','INSERT INTO dance_log(dance_entry) SELECT ' || '''' || ip_dance_entry || '''');
PERFORM dblink('dblink_trans','COMMIT;');
PERFORM dblink_disconnect('dblink_trans');
RETURN 0;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION log_the_dancing(ip_dance_entry text)
OWNER TO postgres;
BEGIN TRANSACTION;
select log_the_dancing('The Flamingo');
select log_the_dancing('Break Dance');
select log_the_dancing('Cha Cha');
ROLLBACK TRANSACTION;
--Show records committed even though we rolled back outer transaction
select *
from dance_log;

What you're asking for is generally called an autonomous transaction.
PostgreSQL does not support autonomous transactions at this time (9.4).
To properly support them it really needs stored procedures, not just the user-defined functions it currently supports. It's also very complicated to implement autonomous tx's in PostgreSQL for a variety of internal reasons related to its session and process model.
For now, use dblink as suggested by Bob.

If you have the flexibility to change from function to procedure, from PostgreSQL 12 onwards you can do internal commits if you use procedures instead of functions, invoked by CALL command. Therefore your function will be changed to a procedure and invoked with CALL command: e.g:
CREATE PROCEDURE transaction_test2()
LANGUAGE plpgsql
AS $$
DECLARE
r RECORD;
BEGIN
FOR r IN SELECT * FROM test2 ORDER BY x LOOP
INSERT INTO test1 (a) VALUES (r.x);
COMMIT;
END LOOP;
END;
$$;
CALL transaction_test2();
More details about transaction management regarding Postgres are available here: https://www.postgresql.org/docs/12/plpgsql-transactions.html

For Postgresql 9.5 or newer you can use dynamic background workers provided by pg_background extension. It creates autonomous transaction. Please, refer the github page of the extension. The sollution is better then db_link. There is a complete guide on Autonomous transaction support in PostgreSQL. There is a third way to start autonomous transaction in Postgres, but some patching neede. Please see Peter's Eisentraut patch proposal for OracleDB-style transactions.

How can I send column values as the payload in a postgresql NOTIFY message?

If an entry in a table satisfies certain conditions, a NOTIFY is sent out. I want the payload to include the ID number and several other columns of information. Is there a postgres method to convert variables (OLD.ColumnID, etc) to strings?
using postgres 9.3

#klin is correct that NOTIFY doesn't support anything other than string literals. However there is a function pg_notify() which takes normal arguments to deal with exactly this situation. It's been around since at least 9.0 and that link is to the official documentation - always worth reading it carefully, there is a wealth of information there.

My guess is that the notify has to be done within a trigger function. Use a dynamic query, e.g.
execute format('notify channel, ''id: %s''', old.id);

The solution was to upgrade Postgres to a version that supported JSON.

Even postgresql 9.3 supports json. You could have just used row_to_json(payload)::text

Sorry for the long answer, i just cant walk away without reacting to the other answers too.
The format version fails in many ways. Before EXECUTE, you shoud prepare the plan. The "pseudo command" does not fits the syntax of execute which is
EXECUTE somepreparedplanname (parameter1, ...)
The %s in format is again too bad, this way you can summon sql injection attacks. When constructing a query with format, you need to use %L for literals %I for column/table/function/etc ids, and use %s almost never.
The other solution with the pg_notify function is correct. Try
LISTEN channel;
SELECT pg_notify('channel','Id: '|| pg_backend_pid ());
in psql command line.
So back to the original question: sdemurjian,
Its not clarified in the question, if you wants to use this notification thing in some trigger function. So here is an example (maybe not) for you (because im a little late. sorry for that too):
CREATE TABLE columns("columnID" oid, "columnData" text);
CREATE FUNCTION column_trigger_func() RETURNS TRIGGER AS
$$ BEGIN PERFORM pg_notify('columnchannel', 'Id: '||OLD."columnID");
RETURN NEW; END; $$ LANGUAGE plpgsql;
CREATE TRIGGER column_notify BEFORE UPDATE ON columns FOR EACH ROW
EXECUTE PROCEDURE column_trigger_func();
LISTEN columnchannel;
INSERT INTO columns VALUES(1,'testdata');
BEGIN; UPDATE columns SET "columnData" = 'success'; END;
BEGIN; UPDATE columns SET "columnData" = 'fail'; ROLLBACK;
Please note that in early postgres versions (any before 9), the notify command does not accepts any payload and there is no pg_notify function.
In 8.1 the trigger function stil works if you define it like
CREATE FUNCTION column_trigger_func() RETURNS TRIGGER AS
$$ BEGIN NOTIFY columnchannel; RETURN NEW; END; $$ LANGUAGE plpgsql;

plpython get all rows on INSERT TRIGGER

I'm trying to implement something similar to replication with python trigger procedures.
procedure
CREATE OR REPLACE FUNCTION foo.send_payload()
RETURNS trigger AS
$$
import json, zmq
try:
payload = json.dumps(TD)
ctx = zmq.Context()
socket = ctx.socket(zmq.PUSH)
socket.connect("ipc:///tmp/feeds/0")
socket.send(payload)
socket.close()
except:
pass
$$
LANGUAGE plpython VOLATILE;
trigger
CREATE TRIGGER foo.my_trigger
AFTER INSERT
ON foo.my_table
FOR EACH ROW
EXECUTE PROCEDURE foo.send_payload();
This does work, but it's not very efficient.
Rows are inserted in bulk and I want to reuse the socket to send all of them.
However, when I do a statement level trigger I don't have access to the rows.
I was thinking about defining a sequence which would be the last row id processed.
Then use that to grab all the data in the procedure with a SELECT inside the statement level trigger.
The problem is that there doesn't seem to be a way of getting a sequence value without incrementing it.
Any suggestions on how to approach this problem?

Use two triggers. "FOR EACH ROW" would stack the rows in some temporary place (maybe SD), and "FOR EACH STATEMENT" would get data from shared place, send, and clear the shared place.
Alternatively (and I think it's better idea), you can use LISTEN/NOTIFY, as I once described in my blog.