Query rows by time of creation? - postgresql

I have a table that contains no date or time related fields. Still I want to query that table based on when records/rows were created. Is there a way to do this in PostgreSQL?
I prefer an answer about doing it in PostgreSQL directly. But if that's not possible, can hibernate do it for PostgreSQL?

Basically: no. There is no automatic timestamp for rows in PostgreSQL.
I usually add a column like this to my tables (ignoring time zones):
ALTER TABLE tbl ADD COLUMN log_in timestamp DEFAULT localtimestamp NOT NULL;
As long as you don't manipulate the values in that column, you got your creation timestamp. You can add a trigger and / or restrict write privileges to avoid tempering with the values.
Second class options
If you have a serial column, you could at least tell with some probability in what order rows were entered. That's not 100% reliable, because the values can be changed by hand, and applications can get values from the sequence and INSERT out of order.
If you created your table WITH (OIDS=TRUE), then the OID column could be some indication - unless your database is heavily written and / or very old, then you may have gone through OID wrap-around and later rows can have a smaller OID. That's one of the reasons, why this feature is hardly used any more.
The default depends on the setting of default_with_oids I quote the manual:
The parameter is off by default; in PostgreSQL 8.0 and earlier, it was
on by default.
If you have not updated your rows or went through a dump / restore cycle, or ran VACUUM FULL or CLUSTER or .. , a plain SELECT * FROM tbl returns all rows in the order they were entered. But this is very unreliable and implementation-dependent. PostgreSQL (like any RDBMS) does not guarantee any order without an ORDER BY clause.

Related

How to add a column to a table on production PostgreSQL with zero downtime?

Here
https://stackoverflow.com/a/53016193/10894456
is an answer provided for Oracle 11g,
My question is the same:
What is the best approach to add a not null column with default value
in production oracle database when that table contain one million
records and it is live. Does it create any locks if we do the column
creation , adding default value and making it as not null in a single
statement?
but for PostgreSQL ?
This prior answer essentially answers your query.
Cross referencing the relevant PostgreSQL doc with the PostgreSQL sourcecode for AlterTableGetLockLevel mentioned in the above answer shows that ALTER TABLE ... ADD COLUMN will always obtain an ACCESS EXCLUSIVE table lock, precluding any other transaction from accessing the table for the duration of the ADD COLUMN operation.
This same exclusive lock is obtained for any ADD COLUMN variation; ie. it doesn't matter whether you add a NULL column (with or without DEFAULT) or have a NOT NULL with a default.
However, as mentioned in the linked answer above, adding a NULL column with no DEFAULT should be very quick as this operation simply updates the catalog.
In contrast, adding a column with a DEFAULT specifier necessitates a rewrite the entire table in PostgreSQL 10 or less.
This operation is likely to take a considerable time on your 1M record table.
According to the linked answer, PostgreSQL >= 11 does not require such a rewrite for adding such a column, so should perform more similarly to the no-DEFAULT case.
I should add that for PostgreSQL 11 and above, the ALTER TABLE docs note that table rewrites are only avoided for non-volatile DEFAULT specifiers:
When a column is added with ADD COLUMN and a non-volatile DEFAULT is specified, the default is evaluated at the time of the statement and the result stored in the table's metadata. That value will be used for the column for all existing rows. If no DEFAULT is specified, NULL is used. In neither case is a rewrite of the table required.
Adding a column with a volatile DEFAULT [...] will require the entire table and its indexes to be rewritten. [...] Table and/or index rebuilds may take a significant amount of time for a large table; and will temporarily require as much as double the disk space.

PostgreSQL Latest Record w/o id nor date

I have a foreign table without id nor date.
If for example other users insert a number of records, is it possible in PostgreSQL to select the last record inserted?
*Note: My only access to that table is select only
SQL tables represent unordered sets and the result sets too. You cannot guarantee your data without specify ORDER BY.
And :
I have a foreign table without id nor date
There is no other way to workaround without this to specify what you need.
My only access to that table is select only
If you only get just Select privilege you should tell your DBA you cannot give the data with 100% guarantee if that is the last data inserted from that user.
Based on my knowledge PostgreSQL does not guarantee to preserve insertion order. Without a timestamp field or sequential primary key I do not think guaranteed fetching of the last row is possible.
You can try this
SELECT * FROM YOUR_TABLE WHERE CTID = (SELECT MAX(CTID) FROM YOUR_TABLE)
provided that the target table does not do update operations.

Can we setup a table in postgres to always view the latest inserts first?

Right now when I create a table and do a
select * from table
I always see the first insert rows first. I'd like to have my latest inserts displayed first. Is it possible to achieve with minimal performance impact?
I believe that Postgres uses an internal field called OID that can be sorted by. Try the following.
select *,OID from table order by OID desc;
There are some limitations to this approach as described in SQL, Postgres OIDs, What are they and why are they useful?
Apparently the OID sequence "does" wrap if it exceeds 4B 6. So in essence it's a global counter that can wrap. If it does wrap, some slowdown may start occurring when it's used and "searched" for unique values, etc.
See also https://wiki.postgresql.org/wiki/FAQ#What_is_an_OID.3F
NB - in more recent version of Postgres this could be deprecated ( https://www.postgresql.org/docs/8.4/static/runtime-config-compatible.html#GUC-DEFAULT-WITH-OIDS )
Although you should be able to create tables with OID even in the most recent version if done explicitly on table create as per https://www.postgresql.org/docs/9.5/static/sql-createtable.html
Although the behaviour you are observing in the CLI appears consistent, it isn't a standard and cannot be depended on. If you are regularly needing to manually see the most recently added rows on a specific table you could add a timestamp field or some other sortable field and perhaps even wrap the query into a stored function .. I guess the approach depends on your particular use case.

Implications of using ADD COLUMN on large dataset

Docs for Redshift say:
ALTER TABLE locks the table for reads and writes until the operation completes.
My question is:
Say I have a table with 500 million rows and I want to add a column. This sounds like a heavy operation that could lock the table for a long time - yes? Or is it actually a quick operation since Redshift is a columnar db? Or it depends if column is nullable / has default value?
I find that adding (and dropping) columns is a very fast operation even on tables with many billions of rows, regardless of whether there is a default value or it's just NULL.
As you suggest, I believe this is a feature of the it being a columnar database so the rest of the table is undisturbed. It simply creates empty (or nearly empty) column blocks for the new column on each node.
I added an integer column with a default to a table of around 65M rows in Redshift recently and it took about a second to process. This was on a dw2.large (SSD type) single node cluster.
Just remember you can only add a column to the end (right) of the table, you have to use temporary tables etc if you want to insert a column somewhere in the middle.
Personally I have seen rebuilding the table works best.
I do it in following ways
Create a new table N_OLD_TABLE table
Define the datatype/compression encoding in the new table
Insert data into N_OLD(old_columns) select(old_columns) from old_table Rename OLD_Table to OLD_TABLE_BKP
Rename N_OLD_TABLE to OLD_TABLE
This is a much faster process. Doesn't block any table and you always have a backup of old table incase anything goes wrong

ADD COLUMN with DEFAULT value to a huge table

I have a postgresql DB and a table with almost billion of rows.
when I try to add a new column with default value:
ALTER TABLE big_table
ADD COLUMN some_flag integer NOT NULL DEFAULT 0;
The transaction goes on for 30+ min .. and the DB logs starts to shoots warnings.
Any way to optimize the query ?
Besides doing it in batches (which will still take a while):
You could dump the table as COPY statements and write a script to edit the contents of the COPY statements to insert another column (COPY can be CSV IIRC).
Then you just reload your altered COPY dump and it should in theory be faster than the ALTER because COPY will not log transactions.
The other option is to turn off fsync while you run the command... just remember to turn it back on.
You can also do both of the above in batches.
Starting from PostgreSQL 11 this behaviour will change.
Waiting for PostgreSQL 11 – Fast ALTER TABLE ADD COLUMN with a non-NULL default:
So, for the longest time, when you did:
alter table x add column z text;
it was virtually instantaneous. Get a lock on table, add information about new column to system catalogs, and it's done.
But when you tried:
alter table x add column z text default 'some value';
then it took long time. How long it did depend on size of table.
This was because postgresql was actually rewriting the whole table, adding the column to each row, and filling it with default value.
"What happens if you want to set the column to NOT NULL also? Are we back to the slow version in that case or does this handle that as well?"
not null doesn’t change anything. it is a constraint for new rows. so adding a column with “not null default ‘xxx'” will be fast.
I'd consider creating the column without the default and manually updating the rows in batches with intermittent commits to apply the default.