Aggregate data while inserting into raw table - postgresql

I'm currently building a forum alike application. Users will be able to see recent posts with the total like count. If the post is interesting to the user, they can like it as well and contribute to the total like count.
The normalized approach would be to have two tables: user_post(contains id, metadata ...), liked_post(which includes the user id + post id). When posts are getting queried, the like count would be determined with the COUNT() statement on the liked_post table grouped by the post id.
Im thinking of another approach, which requires no group by on a potential huge table. That would be to add a like_count column to the user_post table and break the normalization. This column would be always updated when a new liked_post entry gets inserted or deleted. That means: Every time a user likes a post -> there will be an update on the user_post table (increment the like_count column) + insert/delete entity in liked_post table (With a trigger or code in App layer).
Would this aggregation on the fly approach have any disadvantages, except for consistency concerns? This would enable very simple and fast select queries but Im not sure if the additional update would be an issue.
What are your thoughts ?
Im really interested in the performance impact and not if you should do this from the project begin or not.

Your idea is correct and widely used. Problem that you will face:
how do you make sure that like_count is valid? Can this number be delayed or approximated somehow?
In general you can do this following ways
update like_count within application code
update like_count by triggers
If you want to have exact values correct you could accumulate those sums by triggers or do it programatically ensuring that like count update is always within same transaction that insert to liked_posts
Using triggers it could be something like this:
CREATE FUNCTION public.update_like_count() RETURNS trigger
LANGUAGE plpgsql
AS $$
BEGIN
UPDATE user_post SET user_post.liked_count = user_post.liked_count + 1
WHERE user_post.id = NEW.post_id;
RETURN NEW;
END;
$$;
CREATE TRIGGER update_like_counts
AFTER INSERT ON public.liked_posts
FOR EACH ROW EXECUTE PROCEDURE public.update_like_count();
Also you should handle AFTER DELETE by separate trigger.
Be aware that depending on transaction isolation level you might enter concurrency problem here (if 2 inserts are done at the same time - like_count may be exactly same number for two transactions) and end up with invalid total.

So I've had a problem similar to this in the past, the solution I went with is similar to what you've described, which is having an aggregated stored value like_count. Like you mentioned the only downside would be consistency concerns however this problem exists even in the former.
The solution to something like this lies more in the application dev, so utilizing something like web-sockets to keep posts up to date, without too much fluff
When a user's browser/client loads a post they join a room with the post id, and when user interacts with a post ( like, dislike etc ) that interaction is broadcasted to all users in that room ( post id ).
Finally when it comes to finding out which users liked this post, you can query/load at the point of when the user clicks to find out. ~ cheers

Related

How to design security policies for a following system including counters in postgres/supabase if postgres functions are used?

I am unsure how to design security policies for a following system including counters in postgres/supabase. My database includes two tables:
Users:
uuid|name|follower_counter
------------------------------
xyz |tobi| 1
Following-Relationship
follower| following
---------------------------
uuid_1 | uuid_2
Once a user follows a different user, I would like to use a postgres function/transaction to
Insert a new following-follower relationship
Update the followed users' counter
BEGIN
create follower_relationship(follower_id, following_id);
update increment_counter_of_followed_person(following_id);
END;
The constraint should be that the users table (e.g. the name column) can only be altered by the user owning the row. However, the follower_counter should open to changes from users who start following that user.
What is the best security policy design here? Should I add column security or should exclude the counters to a different table?
Do I have to pass parameters to the "block transaction" to ensure that the update and insert functions are called with the needed rights? With which rights should I call the block function?
It might be better to take a different approach to solve this problem. Instead of having a column dedicated to counting the followers, I would recommend actually counting the number of followers when you query the users. Since you already have Following-Relationship table, we just need to count the rows within the table where following or follower is the querying user.
When you have a counter, it might be hard to keep the counter accurate. You have to make sure the number gets decremented when someone unfollows. What if someone blocks a user? What if a user was deleted? There could be a lot of situations that could throw off the counter.
If you count the number of followings/followers on the fly, you don't need to worry about those situations at all.
Now obvious concern with this approach that you might have is performance, but you should not worry too much about it. Postgres is a powerful database that has been battle tested for decades, and with a proper index in place, it can easily perform these query on the fly.
The easiest way of doing this in Supabase would be to create a view like this the following. Once you create a view, you can query it from your Supabase client just like a typical table!
create or replace view profiles as
select
id,
name,
(select count(*) from following_relationship where followed_user_id = id) as follower_count,
(select count(*) from following_relationship where following_user_id = id) as following_count
from users;

Postgres count(*) optimization idea

I'm currently working on a project involving keeping track of users and their actions with my database (PostgreSQL as the RDMS), and I have run into an issue when trying to perform COUNT(*) on occurrences of each user. What I want is to be able to, efficiently, count the number of times each user appears from every record, and also be able to achieve looking at counts on a particular date range.
So, the problem is how do we achieve counting the total number of times a user appears from the tables contents, and how do we count the total number on a date range.
What I've tried
As you might know, Postgres doesn't support COUNT(*) very well using indices, so we have to consider other ways to reduce the # of records it looks at in order to speed up the query. So my first approach is to create a table to keep track of the number of times a user has a log message associated with them, and on what day (similar to the idea behind a materialized view, but I dont want continually refresh the materialized view with my count query). Here is what I've come up with:
CREATE TABLE users_counts(user varchar(65536), counter int default 0, day date);
CREATE RULE inc_user_date_count
AS ON INSERT TO main_table
DO ALSO UPDATE users_counts SET counter = counter + 1
WHERE user = NEW.user AND day = DATE(NEW.date_);
What this does is every time a new record is inserted into my 'main_table', we update the current users_counts table to increment the records whose date is equal to the new records date, and the user names are the same.
NOTE: the date_ column in 'main_table' is a timestamp so I must cast the new records date_ to be a DATE type.
The problem is, what if the user column value doesn't already exist in my new table 'users_count' for the current day, then nothing is updated.
Here is my question:
How do I write the rule such that we check if a user exists for the current day, if so increment that counter, otherwise insert new row with user, day, and counter of 1;
I also would like to know if my approach makes sense to do, or is there any ideas I am missing that I just haven't thought about. As my database grows, it is increasingly inefficient to perform counting, so I want to avoid any performance bottlenecks.
EDIT 1: I was able to actually figure this out by creating a separate RULE but I'm not sure if this is correct:
CREATE RULE test_insert AS ON INSERT TO main_table
DO ALSO INSERT INTO users_counts(user, counter, day)
SELECT NEW.user, 1, DATE(NEW.date)
WHERE NOT EXISTS (SELECT user FROM users.log_messages WHERE user = NEW.user_);
Basically, an insert happens if the user doesn't already exist in my CACHED table called user_counts, and the first rule above updates the count.
What I'm unsure of is how do I know when which rule is called first, the update rule or insert.. And there must be a better way, how do I combine the two rules? Can this be done with a function?
It is true that postgresql is notoriously slow when it comes to count(*) queries. However if you do have a where clause that limits the number of entries the query will be much faster. If you are using postgresql 9.2 or newer this query will be just as fast as it's in mysql because of index only scans which was added in 9.2 but it's best to explain analyze your query to make sure.
Does my solution make sense?
Very much so provided that your explain analyze show that index only scans are not being used. Trigger based solutions like the one that you have adapted find wide usage. But as you have realized the problem with the initial state arises (whether to do an update or an insert).
which rule is called first
Multiple rules on the same table and same event type are applied in
alphabetical name order.
from http://www.postgresql.org/docs/9.1/static/sql-createrule.html
the same applies for triggers. If you want a particular rule to be executed first change it's name so that it comes up higher in the alphabetical order.
how do I combine the two rules?
One solution is to modify your rule to perform an upsert (Look right at the bottom of that page for a sample upsert ). The other is to populate the counter table with initial values. The trick is to create the trigger at the same time to avoid errors. This blog post explains it really well.
While the initial setup will be slow each individual insert will probably be faster. The two opposing factors being the slowness of a WHERE NOT EXISTS query vs the overhead of catching an exception.
Tip: A block containing an EXCEPTION clause is significantly more
expensive to enter and exit than a block without one. Therefore, don't
use EXCEPTION without need.
Source the postgresql documentation page linked above.

How do I do conditional check, return error, or continue?

A user wants to invite a friend but I want to do a check first. For example:
SELECT friends_email from invites where friends_email = $1 limit 1;
If that finds one then I want to return a message such as "This friend already invited."
If that does not find one then I want to do an insert
INSERT INTO invites etc...
but then I need to return the primary user's region_id
SELECT region_id from users where user_id = $2
What's the best way to do this?
Thanks.
EDIT --------------------------------------------------------------
After many hours below is what I ended up with in 'plpgsql'.
IF EXISTS (SELECT * FROM invitations WHERE email = friends_email) THEN
return 'Already Invited';
END IF;
INSERT INTO invitations (email) VALUES (friends_email);
return 'Invited';
I undestand that there are probably dozens of better ways but this worked for me.
Without writing the exact code snippet for you...
Consider solving this problem by shaping your data to conform to your business rules. If you can only invite someone once, then you should have an "invites" table that reflects this by a UNIQUE rule across whatever columns define a unique invite. If it is just an email address, then declare the "invites.email" as a unique column.
Then do an INSERT. Write the insert so that it takes advantage of Postgres' RETURNING clause to give an answer on success. If the INSERT fails (because you already have that email address -- which was the point of the check you wanted to do), then catch the failure in your application code, and return the appropriate response.
Psuedocode:
Application:
try
invite.insert(NewGuy)
catch error.UniqueFail
return "He's already been invited"
# ...do other stuff
Postgres:
INSERT INTO invites
(data fields + SELECT region thingy)
VALUES
(some arrangement of data that includes "region_id")
RETURNING region_id
If that's hard to make work the first time you try it, phrasing the insert target as a CTE may be helpful. If all else fails, write it procedurally in plpgsql for the time being, making sure the external interface accepts a normal INSERT (so you don't have to change application code later) and sort it out once you know whether or not performance is an issue.
The basic idea here is to let the relational shape of your data obviate the need for any procedural checking wherever you can. That's at the heart of relational data modeling ...somewhat of a lost art these days.
You can create SQL stored procedure for implement functionality like described above.
But it is wrong form architecture point of view. See: Direct database manipulation an anti-pattern?
DB have scope of responsibility: store data.
You have to put business logic into your business layer.

APEX - Creating a page with multiple forms linked to multiple related tables... that all submit with one button?

I have two tables in APEX that are linked by their primary key. One table (APEX_MAIN) holds the basic metadata of a document in our system and the other (APEX_DATES) holds important dates related to that document's processing.
For my team I have created a contrl panel where they can interact with all of this data. The issue is that right now they alter the information in APEX_MAIN on a page then they alter APEX_DATES on another. I would really like to be able to have these forms on the same page and submit updates to their respective tables & rows with a single submit button. I have set this up currently using two different regions on the same page but I am getting errors both with the initial fetching of the rows (Which ever row is fetched 2nd seems to work but then the page items in the form that was fetched 1st are empty?) and with submitting (It give some error about information in the DB having been altered since the update request was sent). Can anyone help me?
It is a limitation of the built-in Apex forms that you can only have one automated row fetch process per page, unfortunately. You can have more than one form region per page, but you have to code all the fetch and submit processing yourself if you do (not that difficult really, but you need to take care of optimistic locking etc. yourself too).
Splitting one table's form over several regions is perfectly possible, even using the built-in form functionality, because the region itself is just a layout object, it has no functionality associated with it.
Building forms manually is quite straight-forward but a bit more work.
Items
These should have the source set to "Static Text" rather than database column.
Buttons
You will need button like Create, Apply Changes, Delete that submit the page. These need unique request values so that you know which table is being processed, e.g. CREATE_EMP. You can make the buttons display conditionally, e.g. Create only when PK item is null.
Row Fetch Process
This will be a simple PL/SQL process like:
select ename, job, sal
into :p1_ename, :p1_job, :p1_sal
from emp
where empno = :p1_empno;
It will need to be conditional so that it only fires on entry to the form and not after every page load - otherwise if there are validation errors any edits will be lost. This can be controlled by a hidden item that is initially null but set to a non-null value on page load. Only fetch the row if the hidden item is null.
Submit Process(es)
You could have 3 separate processes for insert, update, delete associated with the buttons, or a single process that looks at the :request value to see what needs doing. Either way the processes will contain simple DML like:
insert into emp (empno, ename, job, sal)
values (:p1_empno, :p1_ename, :p1_job, :p1_sal);
Optimistic Locking
I omitted this above for simplicity, but one thing the built-in forms do for you is handle "optimistic locking" to prevent 2 users updating the same record simultaneously, with one's update overwriting the other's. There are various methods you can use to do this. A common one is to use OWA_OPT_LOCK.CHECKSUM to compare the record as it was when selected with as it is at the point of committing the update.
In fetch process:
select ename, job, sal, owa_opt_lock.checksum('SCOTT','EMP',ROWID)
into :p1_ename, :p1_job, :p1_sal, :p1_checksum
from emp
where empno = :p1_empno;
In submit process for update:
update emp
set job = :p1_job, sal = :p1_sal
where empno = :p1_empno
and owa_opt_lock.checksum('SCOTT','EMP',ROWID) = :p1_checksum;
if sql%rowcount = 0 then
-- handle fact that update failed e.g. raise_application_error
end if;
Another, easier solution for the fetching part is creating a view with all the feilds that you need.
The weak point is it that you later need to alter the "submit" code to insert to the tables that are the source for the view data

How do you store and display if a user has voted or not on something?

I'm working on a voting site and I'm wondering how I should handle votes.
For example on SO when you vote for a question (or answer) your vote is stored, and each time I go back on the page I can see that I already voted for this question because the up/down button are colored.
How do you do that? I mean I've several ideas but I'm wondering if it won't be an heavy load for the database.
Here is my ideas:
Write an helper which will check for every question if a voted has been casted
That's means that the number of queries will depends on the number of items displayed on the page (usually ~20)
Loop on my items get the ids and for each page write a query which will returns if a vote has been casted or NULL
Looks ok because only one query doesn't matter how much items on the page but may be break some MVC/Domain Model design, dunno.
When User log in (or a guest for whom an anonymous user is created) retrieve all votes, store them in session, if a new vote is casted, just add it to the session.
Looks nice because no queries is needed at all except the first one, however, this one and, depending on the number of votes casted (maybe a bunch for each user) can increase the size of the session for each users and potentially make the authentification slow.
How do you do? Any other ideas?
For eg : Lets assume you have a table to store votes and the user who cast it.
Lets assume you keep votes in user_votes when a vote is cast with a table structure something like the below one.
id of type int autoincrement
user_id type int, Foreign key representing users table
question_id type of int, Foreign key representing questions table
Now as the user will be logged in , when you are doing a fetch for the questions do a left join with the user_id in the user_votes table.
Something like
SELECT q.id, q.question, uv.id
FROM questions AS q
LEFT JOIN user_votes AS uv ON
uv.question_id = q.id AND
uv.user_id = <logged_in_user_id>
WHERE <Your criteria>
From the view you can check whether the id is present. If so mark voted, else not.
You may need to change your fields of the questions table and all. I am assuming you store questions in questions table and users in user table so and so. All having the primary key id .
Thanks
You could use a combination of your suggested strategies.
Retrieve all the votes made by the logged in user for recent/active questions only and store them in the session.
You then have the ones that are more likely to be needed while still reducing the amount you need to store in the session.
In the less likely event that you need other results, query for just those as and when you need to.
This strategy will reduce the amount you need to store in the session and also reduce the number of calls you make to your database.
Just based on the information than you've given so far, I would take the second approach: get the IDs of all the items on the page, and then do a single query to get all the user's votes for that list of item IDs. Then pass the collection of the user's item votes to your view, so it can render items differently when the user has voted for that item.
The other two approaches seem like they would tend to be less efficient, if I understood you correctly. Using a view helper to initiate an individual query for each item to check if the user has voted on it could lead to a lot of unnecessary queries. And preloading all the user's voting history at login seems to add unnecessary overhead, getting data that isn't always needed and adding the burden of keeping it up to date for the duration of the session.