Is one of those solutions objectively better, or it all depends on the data? Explain option shows me that indeed, the optimizer does the query differently. This is just an example, I'm going to have lot's of such queries in my application, and I want to know the best way to perform such filtering.
SELECT *
FROM
(SELECT expand(in('hasPermission'))
FROM Permission
WHERE type IN ['USER'])
WHERE
login >="admin"
ORDER BY
login ASC
LIMIT 3
SELECT *
FROM User
WHERE login >= "admin"
AND out("hasPermission").type IN ["USER"]
ORDER BY login ASC
LIMIT 3
It depends a lot on the domain. Consider that the query optimizer in v 2.2 just starts scanning (or querying an index on) the target class (Permission in the first query, User in the second one) and then proceeds with traversing.
The first query is more efficient if you have few Permission records that have "USER" in type attribute and if the number of incoming edges per permission is low. Even better if you have an index on type
The second query is more efficient if you have few users with login > "admin" (again, with an index likely) and if the number of outgoing edges per single user is low.
Related
I am unsure how to design security policies for a following system including counters in postgres/supabase. My database includes two tables:
Users:
uuid|name|follower_counter
------------------------------
xyz |tobi| 1
Following-Relationship
follower| following
---------------------------
uuid_1 | uuid_2
Once a user follows a different user, I would like to use a postgres function/transaction to
Insert a new following-follower relationship
Update the followed users' counter
BEGIN
create follower_relationship(follower_id, following_id);
update increment_counter_of_followed_person(following_id);
END;
The constraint should be that the users table (e.g. the name column) can only be altered by the user owning the row. However, the follower_counter should open to changes from users who start following that user.
What is the best security policy design here? Should I add column security or should exclude the counters to a different table?
Do I have to pass parameters to the "block transaction" to ensure that the update and insert functions are called with the needed rights? With which rights should I call the block function?
It might be better to take a different approach to solve this problem. Instead of having a column dedicated to counting the followers, I would recommend actually counting the number of followers when you query the users. Since you already have Following-Relationship table, we just need to count the rows within the table where following or follower is the querying user.
When you have a counter, it might be hard to keep the counter accurate. You have to make sure the number gets decremented when someone unfollows. What if someone blocks a user? What if a user was deleted? There could be a lot of situations that could throw off the counter.
If you count the number of followings/followers on the fly, you don't need to worry about those situations at all.
Now obvious concern with this approach that you might have is performance, but you should not worry too much about it. Postgres is a powerful database that has been battle tested for decades, and with a proper index in place, it can easily perform these query on the fly.
The easiest way of doing this in Supabase would be to create a view like this the following. Once you create a view, you can query it from your Supabase client just like a typical table!
create or replace view profiles as
select
id,
name,
(select count(*) from following_relationship where followed_user_id = id) as follower_count,
(select count(*) from following_relationship where following_user_id = id) as following_count
from users;
I am trying to implement Relay-Style pagination on a postgres. I have gone through the relay specification, and it recommends having a globally unique cursor for each entity. The specification for pagination is that the client provides first and after for Forward pagination and last and before for Backward pagination.
Assume I have a table called Products, with an auto incrementing primary key integer ID. In the most basic case, pagination is very straight forward. For forward pagination it would be something like SELECT * FROM products WHERE id > cursor LIMIT 10 for forward pagination, and SELECT * FROM products WHERE id < cursor ORDER BY id DESC LIMIT 10
The problem arises when we have additional sorting requirements. Let us say, our product table has a name field on which we want to sort.
Name Id (cursor field)
Trendy 1
Autumn 2
Winter 3
Crazy 4
Antman 5
Marvel 6
And we want to paginate, with page size of 3, sorted on name ASC. The query would be:
SELECT * FROM products ORDER BY name ASC LIMIT 3, And we would get the following output:
(Antman, 5), (Autumn 1), (Crazy 4)
But now, how will I get the next three? This query SELECT * FROM products WHERE id >= 4 ORDER BY name ASC LIMIT 3; would not work, as Trendy has ID < 1. Do I create a different cursor for Name and use that for paginating with Name? Okay, what if we wanted to sort on multiple fields together (let us say we have a price field, we want to sort from decreasing to increasing price, and then sort based on created time, and then on name. The price might not be unique and any other additional constraints), how would I create a cursor then?
What is the standard approach to solve this problem? I have tried to research as much as I can but I couldn't find any information on how to accomplish this in fastapi and sqlalchemy. Django has FilterableConnectionField, but I couldn't understand the implementation. Any help is very appreciated, thank you.
edit: I figured out how to do this. It is called queryset pagination, and sqlakeyset repo on github had a good implementation of this pattern. https://github.com/djrobstep/sqlakeyset
Assume a table that has many pages and I only want some of it, for example
SELECT * FROM t WHERE...
From pg_class.relpages I know how many pages are there, how can I access some page like
SELECT * FROM t WHERE PAGE_NUMBER=1
Since you mention relpages I assume you are talking about "physical" pages where the data is stored on disk. In general you should mind your own business and let the database server mind its own business. But if you don't want to do that, the you can reference tuples by the hidden system column "ctid", which is a composite consisting of the page number and the slot within the page.
select * from pgbench_accounts where ctid between '(18,0)' and '(18,65535)';
This is the 19th page, as page numbers start at 0. Slot numbers start at 1, but for convenience you are allowed to reference slot 0 as if it were a valid (but empty) slot.
Note that until very recent versions, this will not be fast as it would scan the whole table and filter out the disqualified rows one by one.
For that, you need a sort order. For example
SELECT * FROM t WHERE ...
ORDER BY customer, created_ts;
Make sure that the sort condition is unique. If it isn't, add the primary key at the end.
Then you create an index for the ORDER BY condition:
CREATE INDEX ON t (customer, created_ts);
Then you select the first page of 50 items like this:
SELECT * FROM t WHERE ...
ORDER BY customer, created_ts
LIMIT 50;
You display the page and remember customer and created_ts from the last result row in <latest_cust> and <latest_ts>.
The next page is fetched with
SELECT * FROM t WHERE ...
AND (customer, created_ts) > (<latest_cust>, <latest_ts>)
ORDER BY customer, created_ts
LIMIT 50;
and so on.
This method (it is called “keyset pagination”) is efficient and quite stable in the face of concurrent data modifications.
Scenario:
I am displaying a table of records. It initially displays the first 500 with "show more" at the bottom, which returns the next 500.
Issue:
If between initial display and clicking "show more" 1 record is added, that will cause "order by date, offset 500, limit 500" to overlap by 1 row.
I'd like to "order by date, offset until 'id of last row shown', limit 500"
My row IDs are UUIDs. I am open to alternative approaches that achieve the same result.
If you can order by ID, you can paginate using
where id > $last_seen_id limit 500
but that's not going to be useful where you're sorting by date.
Sort stability!
I really hope that "date" actually means "timestamp" though, otherwise your ordering will be unstable and you can miss rows in pagination; you'll have to order by date, id to get stable ordering if it's really a date, and should probably do so even for timestamp.
State on client
One option is to push the state out to the client. Have the client remember the last-seen (date,id) tuple, and use:
where date > $last_seen_date and id > $last_seen_id limit 500
Cursors
Do you care about scalability? If not, you can use a server-side cursor. Declare the cursor for the full query, without the LIMIT. Then FETCH chunks of rows as requested. To do this your app must have a way to consistently bind a connection to a specific user's requests, though, and not to reset that connection or return it to the pool between requests. This might not be practical with your pool/framework, but is probably the best solution if you can do it.
Temp tables
Another even less scalable option is to CREATE TABLE sessiondata.myuser_myrequest_blah AS SELECT .... then paginate that table. It's guaranteed not to change. This avoids the difficulty of needing to keep a consistent connection across requests, but will have a very slow first-request response time and is completely impractical for large user counts or large amounts of data.
Related questions
Handling paging with changing sort orders
Using "Cursors" for paging in PostgreSQL
How to provide an API client with 1,000,000 database results?
i think you can use a subquery in the where to accomplish this.
e.g. given you're paginating through a users table, and you want the records after a given user:
SELECT *
FROM users
WHERE created_at > (
SELECT created_at
FROM users
WHERE users.id = '00000000-1111-2222-3333-444444444444'
LIMIT 1
)
ORDER BY created_at DESC limit 5;
Here is my requirement.
Front (Client) end will do a search based on predefined conditions (for instance: customer id, account number, first name, last name, etc). I need to get the data corresponding to this request from a db2 database and send it back to them (Server). We use CICS channels and containers to pass requests and responses between the Client and Server.
Front end needs the data ordered by: Receive date descending, Customer id Ascending, Account number Ascending. Data are fetched in pages of 500 records. For example, if for a search request from front end would retrieve 50,000 records from the db2 database, we need to return this data in 500 record "pages". For pagination concept, we use the field security deposit number which is primary key to our database but the sorting order is not based on this field.
I would like to know whether we can use scrollable cursor logic in CICS to implement pagination.
Please note that I do not prefer to go for internal array bubble sort to send the data in response as it would degrade performance. I like to do it via query logic.any thoughts?
Example (Initial Front end input request):
Customer id : A
First time request (To identify whether it is first time or next or previous request for pagination)
First security deposit number : 0
Last security deposit number : 0
Since this is first time request, both this field will be having zero from front end and we need to retrieve records from database based on condition of security deposit > 0
Db2 database:
There are 700 records for this criteria
Mainframe response for first time:
We will send the first 500 records
Front end will then send request for getting next set of records which will contain:
Customer id: A
Next request
First security deposit numbr: 0
Last security deposit number : 17980
So for this detail, if I query my datbase based on security deposit number > 17980, it may result in duplicate records listing in the screen once again since our sorting order in database is not based on security deposit number
How to impelement this logic??
Many Client/Server applications in an IBM Mainframe environment involve psuedo conversational CICS transactions.
If you are using CICS in psueudo conversational mode it
is not possible for the Server to hold cusors when it RETURNs to the Client. Therefore scrollable cusors
are of little use in this environment. So to answer your basic question: No scrollable cursors cannot be used here.
The "trick" here is to create an SQL predicate in the Server that is restartable. It will then pick up rows in the correct order from any given
stating point. When the Client calls your Server it must pass all of the positioning information to your Server.
Typically, on a first call from a Client all of the positioning values are set to cause the cursor to
position itself starting with what must be the the first row. The Server then pulls in a "page" worth of data
and returns it to the Client. On the next page forward request the Client sets these positioning values to
the last row it displayed and calls the Server for the next "page" of data.
In your situation I would assume that the page forward cursor would look something like this, all the
variables prefixed with RESTART... are what the Client must provide to the Server to start the cursor
in the correct position.
DECLARE CURSOR Page-forward FOR
SELECT Receive_Date, Customer_id, Account_Nbr, Security_Dep_Id
FROM Table_Name
WHERE ( (Receive_Date < :RESTART-RCV-DT)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id > :RESTART-CUSTOMER-ID)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id = :RESTART-CUSTOMER-ID AND
Account_Nbr > :RESTART-ACCT-NBR)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id = :RESTART-CUSTOMER-ID AND
Account_Nbr = :RESTART-ACCT-NBR AND
Security_Dep_Id > :RESTART-SEC-DEP-ID))
ORDER BY 1 DESC, 2 ASC , 3 ASC, 4 ASC
For the initial call the Client would have passed something like '9999-12-31' as the RESTART-RCV-DT, zero
for the RESTART-CUSTOMER-ID, RESTART-ACCT-NBR and SEC-DEP-ID (assuming these are all numeric). If you look at
the cursor predicate carefully you can verify that there cannot be any rows prior to these values - therefore this
will return the first page of data. If the Client needs to page forward after this, it must tell the Server to start
with the next row after the last one it received. To do this it would populate the RESTART... variables with
the values from the last row on the page it just
displayed. This process will drive the cursor selects forward one page at a time.
When paging up, the process is reversed (you will need a second cursor to support this, and the Client needs to tell you which direction to page: Forward or Back). The Client
will need to populate the RESTART variables with the first row it recieved from the Server. The trick
for the Server on a page up request is to return the data
to the Client in reverse order. You may have to populate the data page passed back
to the Client in reverse order (ie. put the first row retrieved into the last row of the paging area shared between
the Client and the Server). The page backward cursor would look something like:
DECLARE CURSOR Page-backward FOR
SELECT Receive_Date, Customer_id, Account_Nbr, Security_Dep_Id
FROM Table_Name
WHERE ( (Receive_Date > :RESTART-RCV-DT)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id < :RESTART-CUSTOMER-ID)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id = :RESTART-CUSTOMER-ID AND
Account_Nbr < :RESTART-ACCT-NBR)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id = :RESTART-CUSTOMER-ID AND
Account_Nbr = :RESTART-ACCT-NBR AND
Security_Dep_Id < :RESTART-SEC-DEP-ID))
ORDER BY 1 ASC, 2 DESC , 3 DESC, 4 DESC
As has been pointed out in other answers, this type of paging process does not manage or detect concurrent
updates to the database that may occur duing paging transactions. That is another topic for another day...
Developing Restartable Cursors
The the key to building a paging Server is to develop a cursor that is restartable from a set of values received
from a Client transaction. This leaves control of cursor positioning and direction with the Client.
It also means the Client must receive all critical positioning data from the Server even though
the Client might not actually
use these data for any other purpose (e.g. From your question I got the impression that the Client may not require
the Security Deposit Id except to supply as a positioning parameter for your Server)
To build a paging Server you need to know
what the required sorting order of the data are (e.g. Receive Date Descending then Customer Id Ascending then
Account Number Ascending).
You also need know the set of data that uniquely identify a row
returned by the cursor. In your case that would be the Security Deposit Id (this is the primary key for the
table you are selecting from so it must be unique for each and every row in that table). Knowing this you then build a
cursor predicate (the stuff in the WHERE clause) that will return data needed by the Client in the required sort order that
also includes
the full positioning key (i.e. Security Deposit Id). In the event that two or more returned rows may contain identical data if
the final positioning key were elimiminated makes it important that the positioning key be included as a sort condition.
It doesn't matter if it is ascending or descending, but it needs to be included on the sort to ensure consistent
order of data retrieval.
A fairly simple formula may be followed to build the predicate for a restartable cusor needed to
support paging Servers. Basically this is a cascade of "OR" clauses connecting a series of "AND" clauses
that become progressively more selective following the sort order required by the Client and end up with the positioning
key.
To see how this works consider how the query for your Server might be developed...
Start with the column from the sort order that changes least often...
SELECT ...
FROM ...
WHERE Receive_Date < restart value
This will retrieve all rows prior to the specified restart Receieve date regardless of what the other
column restart values are (e.g. Customer ID's can range from minimum to maximum values, as long as the Receive Date
is less than any Receive Date "seen" so far). Since this column only changes value after all subortinate sort columns values
have been exausted you can be sure that this does not pick up any rows prior to the full restart key.
But what about those rows that occur on the same date as the restart request but have a
larger Customer Id? These can be picked up with....
SELECT ...
FROM ...
WHERE Receive_Date = restart value AND
Customer_id > restart value
What about those where the Receive Date and Customer Id are the same as the restart key but have
a larger Account Number? These can be picked up with...
SELECT ...
FROM ...
WHERE Receive_Date = restart value AND
Customer_Id = restart value AND
Account_Nbr > restart value
Continue this pattern until the full restart key has been processed. Notice that the inequality
signs are determined by the sort order. Use < when the column is sorted Descending and > when Ascending.
Also notice that the SELECT and FROM clauses
are exactly the same for each query - which means you can put them all together using OR conjuctions...
SELECT Receive_Date, Customer_id, Account_Nbr, Security_Dep_Id
FROM Table_Name
WHERE ( (Receive_Date < :RESTART-RCV-DT)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id > :RESTART-CUSTOMER-ID)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id = :RESTART-CUSTOMER-ID AND
Account_Nbr > :RESTART-ACCT-NBR)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id = :RESTART-CUSTOMER-ID AND
Account_Nbr = :RESTART-ACCT-NBR AND
Security_Dep_Id > :RESTART-SEC-DEP-ID))
ORDER BY 1 DESC, 2 ASC , 3 ASC, 4 ASC
There you go... a restartable cursor for forward paging. Construction of the cursor for backward paging follows a similar pattern, just flip the
sort orders and repeat.
A simplistic approach: Write your SQL to retrieve data according to your criteria, in the sort order you specify. Then only retrieve the keys to the rows you want. Save the keys somewhere you will have access to upon subsequent invocations of your transaction. Look into multi-row select in DB2. Also understand pseudo-conversational programming techniques in CICS.
And now we get to the design implications Bill Woodger mentions, that you do not specify in your question, and which are the reason I'm just hitting the high points of a simplistic approach.
If changes to your result set occur between one invocation and the next, your results will not reflect those changes. You must decide if this is important.
You mention a "front end" but do not specify what it is. If it is a BMS application, you may be able to save the keys in your commarea or in a container. If your front end is a distributed application invoking your transactions via CICS Web Services or CICS Web Support or MQ or raw sockets or whatever, you must design a mechanism to store those keys such that you can uniquely retrieve them — perhaps by sending a contrived key back to the distributed application which it must supply upon subsequent invocations. Then you must have some process to clean up your key store.
Creating a solution to your problem that is unique in your IT shop is not something to be done in isolation. You must involve others who will be tasked with maintaining your application, there may be a group external to your project tasked with making such decisions, there may be infrastructure issues with your solution.
So this isn't so much as an answer to your question as it is an elaboration upon why you may not get an answer, or at least the answer you seem to desire.