On development server I'd like to remove unused databases. To realize that I need to know if database is still used by someone or not.
Is there a way to get last access or modification date of given database, schema or table?
You can do it via checking last modification time of table's file.
In postgresql,every table correspond one or more os files,like this:
select relfilenode from pg_class where relname = 'test';
the relfilenode is the file name of table "test".Then you could find the file in the database's directory.
in my test environment:
cd /data/pgdata/base/18976
ls -l -t | head
the last command means listing all files ordered by last modification time.
There is no built-in way to do this - and all the approaches that check the file mtime described in other answers here are wrong. The only reliable option is to add triggers to every table that record a change to a single change-history table, which is horribly inefficient and can't be done retroactively.
If you only care about "database used" vs "database not used" you can potentially collect this information from the CSV-format database log files. Detecting "modified" vs "not modified" is a lot harder; consider SELECT writes_to_some_table(...).
If you don't need to detect old activity, you can use pg_stat_database, which records activity since the last stats reset. e.g.:
-[ RECORD 6 ]--+------------------------------
datid | 51160
datname | regress
numbackends | 0
xact_commit | 54224
xact_rollback | 157
blks_read | 2591
blks_hit | 1592931
tup_returned | 26658392
tup_fetched | 327541
tup_inserted | 1664
tup_updated | 1371
tup_deleted | 246
conflicts | 0
temp_files | 0
temp_bytes | 0
deadlocks | 0
blk_read_time | 0
blk_write_time | 0
stats_reset | 2013-12-13 18:51:26.650521+08
so I can see that there has been activity on this DB since the last stats reset. However, I don't know anything about what happened before the stats reset, so if I had a DB showing zero activity since a stats reset half an hour ago, I'd know nothing useful.
PostgreSQL 9.5 let us to track last modified commit.
Check track commit is on or off using the following query
show track_commit_timestamp;
If it return "ON" go to step 3 else modify postgresql.conf
cd /etc/postgresql/9.5/main/
vi postgresql.conf
Change
track_commit_timestamp = off
to
track_commit_timestamp = on
Restart the postgres / system
Repeat step 1.
Use the following query to track last commit
SELECT pg_xact_commit_timestamp(xmin), * FROM YOUR_TABLE_NAME;
SELECT pg_xact_commit_timestamp(xmin), * FROM YOUR_TABLE_NAME where COLUMN_NAME=VALUE;
My way to get the modification date of my tables:
Python Function
CREATE OR REPLACE FUNCTION py_get_file_modification_timestamp(afilename text)
RETURNS timestamp without time zone AS
$BODY$
import os
import datetime
return datetime.datetime.fromtimestamp(os.path.getmtime(afilename))
$BODY$
LANGUAGE plpythonu VOLATILE
COST 100;
SQL Query
SELECT
schemaname,
tablename,
py_get_file_modification_timestamp('*postgresql_data_dir*/*tablespace_folder*/'||relfilenode)
FROM
pg_class
INNER JOIN
pg_catalog.pg_tables ON (tablename = relname)
WHERE
schemaname = 'public'
I'm not sure if things like vacuum can mess this aproach, but in my tests it's a pretty acurrate way to get tables that are no longer used, at least, on INSERT/UPDATE operations.
I guess you should activate some log options. You can get information about logging on postgreSQL here.
Related
I'm using pgAdmin 4.23, PostgreSQL 12.3 for Windows
Looks like all functions are dumped into one "folder". I installed the uuid and tablefunc extensions and they get tossed in with my own user defined functions. At least the 10 uuid ones are all prefixed with "uuid_". The 11 tablefunc ones all start with "connectby", "crosstab"*, "normal_rand".
I did prefix my own functions so those at least grouped together. But as this thing grows and I add extensions, I'm concerned that maintenance will become more difficult. Is there some sort of sub-foldering option I am missing, or is naming convention the normal approach for organization? Looks like stored procs would work the same way.
Would also be nice to be able to filter the Functions based on the names. I see the Search Objects popup, but it isn't as useful as a filter.
To filter functions based on their names you can use this function:
CREATE OR REPLACE FUNCTION public.find_function(fname text DEFAULT NULL::text)
RETURNS TABLE(routine_name text, routine_schema text, return_type text)
LANGUAGE sql
AS $function$
select routine_name, routine_schema, data_type
from information_schema.routines
where specific_schema not in ('pg_catalog', 'information_schema')
and case when fname is null then true else routine_name ~* fname end
order by routine_name;
$function$;
Here is an example - find all functions that have "test" in their name:
select * from find_function('test');
+-----------------------------+----------------+-------------+
| routine_name | routine_schema | return_type |
+-----------------------------+----------------+-------------+
| clear_web_tests | datavato | void |
+-----------------------------+----------------+-------------+
| etl_generic_tests | webaccess | text |
+-----------------------------+----------------+-------------+
| fill_web_tests | datavato | void |
+-----------------------------+----------------+-------------+
| pan_arguments_test | helpers | jsonb |
+-----------------------------+----------------+-------------+
| test_bizday | public | boolean |
+-----------------------------+----------------+-------------+
| test_checkdigits | public | boolean |
+-----------------------------+----------------+-------------+
| test_jasper_dynamic_columns | scratch | record |
+-----------------------------+----------------+-------------+
I'm adding this here in case it helps for future searches...
Per #horse_with_no_name's suggestion above, and the official documentation I went with a separate schema for my third-party stuff. Here is a bit from conversation with the other db people in our company:
I didn't want third party extensions (like uuid-ossp for Guids)
installed in my public schema since that is where we keep all of our
user defined functions specific to that database. When I originally
installed the extension I put it into the public schema, then just
transferred it to a schema named extfunc. Then we have a table named
usr that references extfunc.uuid_generate_v4() in one of the column
constraints. Everything works as expected.
However, when I try to backup our Dev db and restore to QA, the
pg_dump and pg_restore tasks do not handle it properly. The restore
would error and not create the usr table. The extension was not being
properly restored to extfunc, which prevented the usr table from being
created.
The solution is to create the extension in the desired schema from the
start. Do not create it and try and move it to the destination
schema. Backup/restore now work as expected.
I am using Postgres 9.4.5 and I detected corruption in one of my tables. I noticed this when running queries on a specific table caused the entire database to go into recovery mode. The symptoms lined up with those found in this article:
https://www.endpoint.com/blog/2010/06/01/tracking-down-database-corruption-with
I tried following the steps to zero out the corrupt block but after following the steps, I got a ctid of 507578.
database=# \set FETCH_COUNT 1
database=# \pset pager off
Pager usage is off.
database=# SELECT ctid, left(coded_element_key::text, 20) FROM coded_element WHERE ctid >= '(507577,1)';
ctid | left
------------+----------
(507577,1) | 30010491
(507577,2) | 30010507
(507577,3) | 30010552
(507577,4) | 30010556
(507577,5) | 30010559
(507577,6) | 30010564
(507577,7) | 30010565
(507577,8) | 30010625
...
...
...
(507578,26) | 0A1717281.0002L270&.
(507578,27) | L270&.*)0000.0000000
(507578,28) | 30011452
(507578,29) | -L0092917\x10)*(0117001
(507578,30) | 0.00003840\x10)*)300114
ERROR: invalid memory alloc request size 1908473862
The problem is that when I went to my /data/base directory and found the corresponding file for my table, the file was only 1073741824 bytes. With a block size of 8192 bytes this only gives me a block count of 131072, way under the 507578 value where the supposed corruption is. Is this the correct way to determine the block offset or is there a different way?
PostgreSQL stores the table data in files of 1GB in size.
Assuming that the result of
SELECT relfilenode
FROM pg_class
WHERE relname = 'coded_element';
is 12345, those files would be called 12345, 12345.1, 12345.2 and so on.
Since each of these 1GB segments contains 131072 blocks, block 507578 is actually block 114362 in 12345.4.
The data corruption is either in that block or the following one.
At this point, make sure you have a backup the complete data directory.
To zero out block 507578, you can use
dd if=/dev/zero of=12345.4 bs=8192 seek=114362 count=1 conv=notrunc,nocreat,fsync
If that doesn't do the trick, try the next block.
To salvage data from the block before zeroing it, you can use the pageinspect extension.
I'm trying to create indices through hikari by "create index concurrently" statement with postgresql 9.6
The create statement is blocked by another transaction which is working on another table and the transaction state is IIT(idle in transaction)
The code is creating the indices dynamically through hikari connection
pool
There is only one connection pool for all the actions, such as select/create index and so on
The 2 SQLs are running in the same thread with different db connection with async mode
When using "create index" instead of "create index concurrently", all the things are OK
postgresql shows the "create index concurrently on B"(active) is blocked by "select * from A"(idle in transaction)
I tried to reproduce the issue through command line, all the things worked well, just open 2 window, execute "begin; select * from A;" in first window, and tried to execute "create index concurrently on B;" in second window, the index created as expected, no block happened(I checked that, the first one is in "IIT" state)
To use fetch_size on cursor, the select statement will disable autocommit when get connection from pool and set the value back with pool global settings by hikari itself, the default pool setting is autocommit=true
2 statement are working on different tables, no relations between these 2 tables
When the "IIT" statement cancelled, the "create" statement continued to work as expected
wait_event_type | pid | state | query
Lock | 25707 | active | CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS "idx_tr-parameters__id_json" ON "tr-parameters" ((info->'_id') ASC)
| 25701 | idle in transaction | SELECT t.info FROM "configuration-profiles" t
05-29 21:22:53.458 [vert.x-worker-thread-11] DEBUG com.calix.sxa.VertxPGVertice - SELECT t.info FROM "organizations" t HikariProxyConnection#379242839 wrapping org.postgresql.jdbc.PgConnection#645bae4d
05-29 21:22:53.529 [vert.x-worker-thread-11] DEBUG com.zaxxer.hikari.pool.PoolBase - hikari-cp-threads - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection#645bae4d
05-29 21:22:53.533 [vert.x-worker-thread-11] DEBUG com.calix.sxa.VertxPGVertice - SELECT t.info FROM "configuration-profiles" t HikariProxyConnection#358392671 wrapping org.postgresql.jdbc.PgConnection#645bae4d
05-29 21:22:53.693 [vert.x-worker-thread-11] DEBUG com.calix.sxa.VertxPGVertice - SELECT t.info FROM "groups" t HikariProxyConnection#269112314 wrapping org.postgresql.jdbc.PgConnection#63822471
05-29 21:22:53.701 [vert.x-worker-thread-11] DEBUG com.zaxxer.hikari.pool.PoolBase - hikari-cp-threads - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection#63822471
05-29 21:22:53.701 [vert.x-worker-thread-11] DEBUG com.calix.sxa.VertxPGVertice - SELECT t.info FROM "configuration-profiles" t WHERE COALESCE((t.info->'configurations'->'parameterValues')::jsonb ?? 'OUI_FilterList', false) = true HikariProxyConnection#1431456353 wrapping org.postgresql.jdbc.PgConnection#63822471
05-29 21:22:53.704 [vert.x-worker-thread-11] DEBUG com.zaxxer.hikari.pool.PoolBase - hikari-cp-threads - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection#63822471
05-29 21:22:53.712 [vert.x-worker-thread-11] DEBUG com.calix.sxa.VertxPGVertice - CREATE INDEX CONCURRENTLY IF NOT EXISTS "idx_tr-parameters__id_json" ON "tr-parameters" ((info->>'timestamp') ASC) HikariProxyConnection#454316525 wrapping org.postgresql.jdbc.PgConnection#63822471
I don't know why the block happened since the tables are totally different and when testing manually with 2 command line windows, all the things worked well
How can I fix this issue? Any workaround on it?
Thanks for you kindly help
As mentioned by #jjanes, it's blocked by other transactions(transactions on the same table or any transaction with snapshot) during the 2 scans when building index concurrently
The official doc also mentioned, [1]: https://www.postgresql.org/docs/9.1/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY
After the second scan, the index build must wait for any transactions
that have a snapshot (see Chapter 13) predating the second scan to terminate
In my case,
wait_event_type | pid | state | backend_xid | backend_xmin | query
----------------+-------+---------------------+-------------+--------------+--------------------------------------------------
| 5226 | idle in transaction | | 7973432 | select * from "configuration-profiles"
The backend_xmin is 7973432 of IIT and then the "create index concurrently" is blocked by the IIT with snapshot
BTW, when using command line with isolation level "read committed", the "create index concurrently" is not blocked, but with Java code, the "create" action is also blocked with the same isolation level,
wait_event_type | pid | state | backend_xid | backend_xmin | query
----------------+-------+---------------------+-------------+--------------+--------------------------------------------------
| 5226 | idle in transaction | | 7973432 | select * from "configuration-profiles"
| 5210 | idle in transaction | | | select * from "configuration-profiles";
| 5455 | idle in transaction | | 7973432 | declare cur cursor for select * from "configuration-profiles";
As showed above,
Using "select * from "configuration-profiles";" in command line, there is no backend_xmin since no cursor opened, all the records should be returned after executed the statement
Using "declare cur cursor for select * from "configuration-profiles";" in command line, backend_xmin has value since the cursor opened and waiting for query
Using "select * from "configuration-profiles"" through Java, backend_xmin also has value since the lib also uses cursor
PostgreSQL has no way of knowing that that other connection will not want to use the table the index is being built on at some point in the future of its snapshot. Just because it hasn't used the table yet doesn't mean it never will. The way to know that for sure is to wait for that transaction (or snapshot) to finish, which is what it does.
The 2 SQLs are running in the same thread with different db connection with async mode
Why is the other connection IIT? What is it waiting for? (It is waiting in your code, not in the database). Since it is async, it shouldn't be waiting on the CIC. Is it just waiting for you to issue a COMMIT? Since you turned off autocommit, it is your responsibility to issue COMMITs at the appropriate points. If you are using higher isolation levels, even SELECT only statements need to be committed.
I tried to reproduce the issue through command line, all the things worked well, just open 2 window, execute "begin; select * from A;" in first window, and tried to execute "create index concurrently on B;" in second window,
You can reproduce by changing the first one to begin isolation level repeatable read; select * from A;
Workarounds:
Don't leave things hanging around in an open transaction
Don't use higher isolation levels than you need to.
Don't use CIC.
In the official documentation, both functions have the same description:
Current date and time (start of current transaction)
Is there a difference between the two functions, and if not, why do both exist? Thanks.
now and transaction_timestamp are equivalent to the SQL standard current_timestamp. All report the start time of the transaction.
In terms of transactions, there are two timestamps to think of. the start of the transaction & the time each individual statement is executed. Conceptually, there is also the end time of the transaction which one can get by running a select statement_timestamp() trx_end_timestamp at the very end, just before the commit / rollback.
If you run the following in psql [copy the whole line & paste into psql shell]
BEGIN; SELECT PG_SLEEP(5); SELECT NOW(), CURRENT_TIMESTAMP, TRANSACTION_TIMESTAMP(), STATEMENT_TIMESTAMP(); COMMIT;
I got this output:
now | current_timestamp | transaction_timestamp | statement_timestamp
-------------------------------+-------------------------------+-------------------------------+-------------------------------
2019-04-23 11:15:18.676855-04 | 2019-04-23 11:15:18.676855-04 | 2019-04-23 11:15:18.676855-04 | 2019-04-23 11:15:23.713275-04
(1 row)
You can see clearly that NOW, CURRENT_TIMESTAMP, TRANSACTION_TIMESTAMP are equivalent, and STATEMENT_TIMESTAMP has a 5 second offset because we slept for 5 seconds.
BTW, CURRENT_TIMESTAMP is in the sql standard. The others are postgresql specific, though other databases may also implement them
The answer is in the doc your mention:
now() is a traditional PostgreSQL equivalent to
transaction_timestamp().
So, they are the same, and they are here for historical / backward compatibility, and some could argue for the simplicity of the function name.
Background Information:
I need to auto generate a bunch of records in a table. The only piece of information I have is a start range and an end range.
Let's say my table looks like this:
id
widgetnumber
The logic needs to be contained within a .sql file.
I'm running postgresql
Code
This is what I have so far... as a test... and it seems to be working:
DO $$
DECLARE widgetnum text;
BEGIN
SELECT 5 INTO widgetnum;
INSERT INTO widgets VALUES(DEFAULT, widgetnum);
END $$;
And then to run it, I do this from a command line on my database server:
testbox:/tmp# psql -U myuser -d widgets -f addwidgets.sql
DO
Questions
How would I modify this code to loop through a range of widget numbers and insert them all?
for example, I would be provided with a start range and an end range (100 to 150 let's say)
Can you point me to a good online resource to learn the syntax i should be using?
Thanks.
How would I modify this code to loop through a range of widget numbers and insert them all?
You can use generate_series() for that.
insert into widgets (widgetnumber)
select i
from generate_series(100, 150) as t(i);
Can you point me to a good online resource to learn the syntax i should be using?
https://www.postgresql.org/docs/current/static/index.html
dvdrental=# \d test
Table "public.test"
Column | Type | Collation | Nullable | Default
--------+------------------------+-----------+----------+---------
id | integer | | |
name | character varying(250) | | |
dvdrental=# begin;
BEGIN
dvdrental=# insert into test(id,name) select generate_series(1,100000),'Kishore';
INSERT 0 100000