I have a RDS postgres db. When i try to retrieve the recently inserted data, it's retrieving for some occasion and it's not for other occasions. I'm calling my node API from a react front end. Ex, after an insertion,
This query is retrieving the data instantaneously. Here id is the PK of the table.
SELECT * from glacier_restore_progress where id in (1,2,3);
But
select *
from glacier_restore_progress
where email='ahk#gmail.com' and restore_expire >= CURRENT_TIMESTAMP
order by restore_start;
above query is not retrieving the data instantaneously. I'm calling the endpoint again and again to fetch the data (ie polling). after certain number of calls, it's returning.
But when i see the db via the dbeaver client the records are there as soon after i insert them.
table schema
create table glacier_restore_progress(
id SERIAL NOT NULL ,
file_path VARCHAR(100) NOT NULL,
email VARCHAR(50),
restore_start timestamp,
restore_end timestamp,
restore_expire timestamp,
status VARCHAR(10),
file_data_obj jsonb,
field_mapping jsonb,
primary key (id)
);
The library i'm using "pg": "8.5.1".
What am i missing here?
I only work with Node.js and Postgre for only a while. My answer might not be good. But based on what I know, one of the issues might be caching. Sometimes your browser stores the cached data, and only update after a certain time interval.
If you wish to use real-time application, you might need externally library that can help supports real-time application, or find a workaround with WebSocket.
Related
I have been trying to understand after lots of hours and still cannot understand why it is happening.
I have created two tables with ALTER:
CREATE TABLE stores (
id SERIAL PRIMARY KEY,
store_name TEXT
-- add more fields if needed
);
CREATE TABLE products (
id SERIAL,
store_id INTEGER NOT NULL,
title TEXT,
image TEXT,
url TEXT UNIQUE,
added_date timestamp without time zone NOT NULL DEFAULT NOW(),
PRIMARY KEY(id, store_id)
);
ALTER TABLE products
ADD CONSTRAINT "FK_products_stores" FOREIGN KEY ("store_id")
REFERENCES stores (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE RESTRICT;
and everytime I am inserting a value to products by doing
INSERT
INTO
public.products(store_id, title, image, url)
VALUES((SELECT id FROM stores WHERE store_name = 'footish'),
'Teva Flatform Universal Pride',
'https://www.footish.se/sneakers/teva-flatform-universal-pride-t1116376',
'https://www.footish.se/pub_images/large/teva-flatform-universal-pride-t1116376-p77148.jpg?timestamp=1623417840')
I can see that the column of id increases by two everytime I insert instead of one and I would like to know what is the reason behind that?
I have not been able to figure out why and it would be nice to know! :)
There could be 3 reasons:
You've tried to create data but it failed. Even on failed creation and transaction rollback, a sequence does count. A used number will never be put back.
You're using a global sequence and created other data on other data meanwhile. Using a global sequence will always increase on any table data added, even on other tables be modified.
DB configuration for your sequence is set to stepsize/allocationsize=2. It can be configured however you want.
Overall it is not important. The most important thing is that it increases automatically and that even on a error/delete a already tried ID will never be put back.
If you want to have concrete information you need to procive the information about the sequence. You can check that using a SQL CLI or show it via DBeaver/....
I am writing an application backed by Postgres DB.
The application is like a logging system, the main table is like this
create table if not exists logs
(
user_id bigint not null,
log bytea not null,
timestamp timestamptz not null default clock_timestamp() at time zone 'UTC'
);
One of the main query is to fetch all log about a certain user_id, ordered by timestamp desc. It would be nice that under the hood Postgres DB stores all rows about the same user_id in one page or sequential pages, instead of scattering here and there on the disk.
As I recall from textbooks, is this the so-called "index-sequential files"? How can I guide Postgres to do that?
The simple thing to do is to create a B-tree index to speed up the search:
CREATE INDEX logs_user_time_idx ON logs (user_id, timestamp);
That would speed up the query, but take extra space on the disk and slow down all INSERT operations on the table (the index has to be maintained). There is no free lunch!
I assume that you were talking about that when you mentioned "index-sequential files". But perhaps you meant what is called a clustered index or index-organized table, which essentially keeps the table itself in a certain order. That can speed up searches like that even more. However, PostgreSQL does not have that feature.
The best you can do to make disk access more efficient in PostgreSQL is to run the CLUSTER command, which rewrites the table in index order:
CLUSTER logs USING logs_user_time_idx;
But be warned:
That statement rewrites the whole table, so it could take a long time. During that time, the table is inaccessible.
Subsequent INSERTs won't maintain the order in the table, so it “rots” over time, and after a while you will have to CLUSTER the table again.
In some database project, I have a users table which somehow has a computed value avg_service_rating. And there is another table called services with all the services associated to the user and the ratings for that service. Is there a computationally-lite way which I can maintain the avg_service_rating rating without updating it every time an INSERT is done on the services table? Perhaps like a generate column but with a function call instead? Any direct advice or link to resources will be greatly appreciated as well!
CREATE TABLE users (
username VARCHAR PRIMARY KEY,
avg_service_ratings NUMERIC -- is it possible to store some function call for this column?,
...
);
CREATE TABLE service (
username VARCHAR NOT NULL REFERENCE users (username);
service_date DATE NOT NULL,
rating INTEGER,
PRIMARY KEY (username, service_date),
);
If the values should be consistent, a generated column won't fit the bill, since it is only recomputed if the row itself is modified.
I see two solutions:
have a trigger on the services table that updates the users table whenever a rating is added or modified. That slows down data modifications, but not your queries.
Turn users into a view. The original users table would be renamed, and it loses the avg_service_rating column, which is computed on the fly by the view.
To make the illusion perfect, create an INSTEAD OF INSERT OR UPDATE OR DELETE trigger on the view that modifies the underlying table. Then your application does not need to be changed.
With this solution you pay a certain price both on SELECT and on data modifications, but the latter price will be lower, since you don't have to modify two tables (and users might receive fewer modifications than services). An added advantage is that you avoid data duplication.
A generated column would only be useful if the source data is in the same table row.
Otherwise your options are a view (where you could call a function or calculate the value via a subquery), or an AFTER UPDATE OR INSERT trigger on the service table, which updates users.avg_service_ratings. With a trigger, if you get a lot of updates on the service table you'd need to consider possible concurrency issues, but it would mean the figure doesn't need to be calculated every time a row in the users table is accessed.
I'm trying to determine whether or not postgresql keeps internal (but accessible via a query) sequential record ids and / or record creation dates.
In the past I have created a serial id field and a record creation date field, but I have been asked to see if Postgres already does that. I have not found any indication that it does, but I might be overlooking something.
I'm currently using Postgresql 9.5, but I would be interested in knowing if that data is kept in any version.
Any help is appreciated.
Thanks.
No is the short answer.
There is no automatic timestamp for rows in PostgreSQL.
You could create the table with a timestamp with a default.
create table foo (
foo_id serial not null unique
, created_timestamp timestamp not null
default current_timestamp
) without oids;
So
insert into foo values (1);
Gives us
You could also have a modified_timestamp column, which you could
update with an after update trigger.
Hope this helps
I'd like to create a web service that allows a client to fetch all rows in a table, and then later allows the client to only fetch new or updated rows.
The simplest implementation seems to be to send the current timestamp to the client, and then have the client ask for rows that are newer than the timestamp in the following request.
It seems that this is doable by keeping an "updated_at" column with a timestamp set to NOW() in update and insert triggers, and then querying newer rows, and also passing down the value of NOW().
The problem is that if there are uncommitted transactions, these transactions will set updated_at to the start time of the transaction, not the commit time.
As a result, this simple implementation doesn't work, because rows can be lost since they can appear with a timestamp in the past.
I have been unable to find any simple solution to this problem, despite the fact that it seems to be a very common need: any ideas?
Possible solutions:
Keep a monotonic timestamp in a table, update it at the start of every transaction to MAX(NOW(), last_timestamp + 1) and use it as a row timestamp. Problem: this effectively means that all write transactions are fully serialized and lock the whole database since they conflict on the update time table.
At the end of the transaction, add a mapping from NOW() to the time in an update table like the above solution. This seems to require to take an explicit lock and use a sequence to generate non-temporal "timestamps" because just using an UPDATE on a single row would cause rollbacks in SERIALIZABLE mode.
Somehow have PostgreSQL, at commit time, iterate over all updated rows and set updated_at to a monotonic timestamp
Somehow have PostgreSQL itself maintain a table of transaction commit times, which it doesn't seem to do at the moment
Using the built-in xmin column also seems impossible, because VACUUM can trash it.
It would be nice to be able to do this in the database without modifications to all updates in the application.
What is the usual way this is done?
The problem with the naive solution
In case it's not obvious, this is the problem with using NOW() or CLOCK_TIMESTAMP():
At time 1, we run NOW() or CLOCK_TIMESTAMP() in a transaction, which gives 1 and we update a row setting time 1 as the update time
At time 2, a client fetches all rows, and we tell him that we gave it all rows until time 2
At time 3, the transaction commits with "time 1" in the updated_at field
The client asks for updated rows since time 2 (the time he got from the previous full fetch request), we query for updated_at >= 2 and we return nothing, instead of returning the row that is just added
That row is lost and will never seen by the client
Your whole proposition goes against some of the underlying fundamentals of an ACID-compliant RDBMS like PostgreSQL. Time of transaction start (e.g. current_timestamp()) and other time-based metrics are meaningless as a measure of what a particular client has received or not. Abandon the whole idea.
Assuming that your clients connect through a persistent session to the database you can follow this procedure:
When the session starts, CREATE TEMP UNLOGGED TABLE for the session user. This table contains nothing but the PK and the last update time of the table you want to fetch the data from.
The client polls for new data and receives only those records that have a PK not yet in the temp table or an existing PK but a newer last update time. Currently uncommitted transactions are invisible but will be retrieved at the next poll for new or updated records. The update time is required because there is no way to delete records from the temp tables of all concurrent clients.
The PK and last update time of retrieved record are stored in the temp table.
When the user closes the session, the temp table is deleted.
If you want to persist the retrieved records over multiple sessions for each client or the client disconnects after every query, then you need a regular table but then I would suggest to also add the oid of the user such that all users can use a single table for keeping track of the retrieved records. In that latter case you can create an AFTER UPDATE trigger on the table with your data which deletes the PK from the table with fetched records, for all users in one sweep. On their next poll the clients will then get the updated record.
Add a column, which will be used to track what record has been sent to a client:
alter table table_under_view
add column access_order int null;
create sequence table_under_view_access_order_seq
owned by table_under_view.access_order;
create function table_under_view_reset_access_order()
returns trigger
language plpgsql
as $func$
new.access_order := null;
$func$;
create trigger table_under_view_reset_access_order_before_update
before update on table_under_view
for each row execute procedure table_under_view_reset_access_order();
create index table_under_view_access_order_idx
on table_under_view (access_order);
create index table_under_view_access_order_where_null_idx
on table_under_view (access_order)
where (access_order is null);
(You could use a before insert on table_under_view trigger too, to ensure only NULL values are inserted into access_order).
You need to update this column after transactions with INSERTs & UPDATEs on this table is finished, but before any client query your data. You cannot do anything just after a transaction finished, so let's do it before a query happens. You can do this with a function, f.ex:
create function table_under_access(from_access int)
returns setof table_under_view
language sql
as $func$
update table_under_view
set access_order = nextval('table_under_view_access_order_seq'::regclass)
where access_order is null;
select *
from table_under_view
where access_order > from_access;
$func$;
Now, your first "chunk" of data (which will fetch all rows in a table), looks like:
select *
from table_under_access(0);
The key element after this is that your client needs to process every "chunk" of data to determine which is the greatest access_order it last got (unless you include it in your result with f.ex. window functions, but if you're going to process the results - which seems highly likely - you don't need that). Always use that for the subsequent calls.
You can add an updated_at column too for ordering your results, if you want to.
You can also use a view + rule(s) for the last part (instead of the function), to make it more transparent.