How to run functions every time postgresql starts?

How to run functions every time postgresql starts? - postgresql

I am using pg_trgm extension for fuzzy search. The default threshold is 0.3 as show in:
# select show_limit();
show_limit
------------
0.3
(1 row)
I can change it with:
# select set_limit(0.1);
set_limit
-----------
0.1
(1 row)
# select show_limit();
show_limit
------------
0.1
(1 row)
But when I restart my session, the threshold is reset to default value:
# \q
$ psql -Upostgres my_db
psql (9.3.5)
Type "help" for help.
# select show_limit();
show_limit
------------
0.3
(1 row)
I want to execute set_limit(0.1) every time I start postgresql. Or in other words, I want to set 0.1 as default value for threshold of pg_trgm extension. How do I do that?

This has been asked before:
Set default limit for pg_trgm
The initial setting is hard coded in the source. One could hack the source and recompile the extension.
To address your comment:
You could put a command in your psqlrc or ~/.psqlrc file. A plain and simple SELECT command in a separate line:
SELECT set_limit(0.1)
Be aware that the additional module is installed per database, while psql can connect to any database cluster and any database within that cluster. It will cause an error message when connecting to any database where pg_trgm is not installed. Nothing bad will happen, though.
On the other hand, connecting with any other client will not set the limit, this may be a bit of a trap.
pg_trgm should really provide a config setting ...

Related

Postgres messages in Chinese despite LC_Messages being set to en_GB.UTF-8

Using the Windows PostgreSQL terminal to connect to the same database, we are getting responses in different languages on 2 different machines (one in Chinese, one in English). I've not been able to work out what is different about the setup of these 2 machines to fix it. Of specific note, several questions (here and here) seem to indicate that the LC_MESSAGES setting is what needs to be changed, except both machines are set to en_GB.UTF-8.
Machine 1:
show LC_MESSAGES;
lc_messages
-------------
en_GB.UTF-8
(1 row)
Machine 2:
show LC_MESSAGES;
lc_messages
-------------
en_GB.UTF-8
(1 行记录)
There is clearly something else involved in deciding what language messages from Postgres are returned in, but I've been unable to figure out what.
Update: While Lauenz Albe's answer explains why what I've tried so far has failed, I've still be unable to find any documentation or advice which deals with how language in PSQL is set, or how to fix it.

Set the LANG environment variable, exemplary batch file:
#echo off
set PGDATABASE=my_database
set PGUSER=my_user
set PGPASSWORD=my_password
set LANG=C
psql -f %1

lc_messages determines the language for messages from the server.
The (1 行记录) you see is written by psql, and determined by the locale environment of psql.
To change this, you'd have to change the locale environment of your Windows session. Not sure how that is done.

How to handle large result sets with psql?

I have a query which gives about 14M rows (I was not aware of this). When I use psql to run the query, my Fedora machine froze. Also after the query was done, I could not use Fedora anymore and had to restart my machine. When I redirected standard output to a file, Fedora also froze.
So how should I handle large resultsets with psql?

psql accumulates complete results in client memory by default. This behavior is usual for all libpq based Postgres applications or drivers. The solutions are cursors - then you are fetching only N rows from server. Cursors can be used by psql too. You can change it by setting FETCH_COUNT variable, then it will use cursors with batch retrieval size FETCH_COUNT.
postgres=# \set FETCH_COUNT 1000
postgres=# select * from generate_series(1,100000); -- big query

psql set default statement_timeout as a user in postgres

I want to set a default statement_timeout for my access to a postgres database. After configuring my environment variables, I now have it where psql logs me on my preferred database and table. However, while I'm exploring several of tables in it, I'd like to have a statement timeout of around a minute. This can be done simply by typing SET statement_timeout TO '1min'; at the beginning of each session, but this is obnoxious to type every time. I don't have access to the server configuration nor would I want to change it. Ideally I could do something to the effect of alias psql='psql -c "SET statement_timeout TO '1min';"' except the-c` flag of psql doesn't allow interactive input. Are there any nice solutions to this problem, or am I always doomed to set the timeout manually for each interactive session?

You could use your .psqlrc file (if you don't have one in your home directory, create it; if you're on Windows instead of *nix, the file is %APPDATA%\postgresql\psqlrc.conf instead) and set the following command:
set statement_timeout to 60000; commit;
That setting is in milliseconds, so that'll set the timeout to 1 minute. .psqlrc isn't used with -c nor -X invocations of psql, so that should allow you to get your interactive-mode timeout to 1 minute.
You can then execute the following in psql to verify that the configuration has taken effect:
show statement_timeout;

Postgres allows you to set configuration parameters such as statement_timeout on a per-role (user) level.
ALTER ROLE <your-username> SET statement_timeout = '60s';
This change will apply to all new sessions for that user, starting on the next login.
Source: Postgres docs

How to tell if data checksum feature is on in Postgres

Postgres 9.3 introduces a data checksum feature which can detect corruption in pages. Is there a way to query the database to determine if this is on?
Being hosted on a PaaS system, I don't have access to the actual server to check any configurations settings there. I also only have access to our database and not the main postgres database either. Is there a way to determine if this is on from a psql console only?

show data_checksums;
data_checksums
----------------
off
http://www.postgresql.org/docs/current/static/runtime-config-preset.html

You can use pg_controldata see if your postgresql cluster enable data_checksum.
if version=0 then your cluster disable the function.
And data_checksums parameter add by PostgreSQL 9.3.4, if your postgresql version small than that, you cannt select this guc parameter. you must check it by control file.
pg93#db-172-16-3-150-> pg_controldata |grep checksum
Data page checksum version: 0

From 9.4 on, you can try the following query:
select * from pg_settings where name ~ 'checksum';
https://paquier.xyz/postgresql-2/postgres-9-4-feature-highlight-data-checksum-switch-as-a-guc-parameter/

verifying data consistency between two postgresql databases

This is specifically about maintaining confidence in using various replication solutions that you'd be able to failover to the other server without data loss. Or in a master-master situation that you could know within a reasonable amount of time if one of the databases has fallen out of sync.
Are there any tools out there for this, or do people generally depend on the replication system itself to warn over inconsistencies? I'm currently most familiar with postgresql WAL shipping in a master-standby setup, but am considering a master-master setup with something like PgPool. However, as that solution is a little less directly tied with PostgreSQL itself (my basic understanding is that it provides the connection an app would use, thus intercepting the various SQL statements, and would then send them on to whatever servers were in its pool), it got me thinking more about actually verifying data consistency.
Specific requirements:
I'm not talking about just table structure. I'd want to know that actual record data is the same, so that I'd know if records were corrupted or missed (in which case, I would re-initialize the bad database with a recent backup + WAL files before bringing it back into the pool)
Databases are in the order of 30-50 GB. I'm doubting that raw SELECT queries would work very well.
I don't see the need for real-time checking (though it would, of course, be nice). Hourly or even daily would be better than nothing.
Block-level checking wouldn't work. It would be two databases with independent storage.
Or is this type of verification simply not realistic?

You can check the current WAL locations on both the machines...
If they represent the same value, that means your underlying databases are consistent with each other...
$ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (do it on primary host)
pg_current_xlog_location
--------------------------
0/2000000
(1 row)
$ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (do it on standby host)
pg_last_xlog_receive_location
-------------------------------
0/2000000
(1 row)
$ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (do it on standby host)
pg_last_xlog_replay_location
------------------------------
0/2000000
(1 row)
you can also check this with the help of walsender and walreceiver processes:
[do it on primary] $ ps -ef | grep sender
postgres 6879 6831 0 10:31 ? 00:00:00 postgres: wal sender process postgres 127.0.0.1(44663) streaming 0/2000000
[ do it on standby] $ ps -ef | grep receiver
postgres 6878 6872 1 10:31 ? 00:00:01 postgres: wal receiver process streaming 0/2000000

If you are looking for the whole table you should be able to do something like this (assuming a table that quite easily fits in RAM):
SELECT md5(array_to_string(array_agg(mytable), ' '))
FROM mytable order by id;
That will give you a hash on the tuple representation on the tables.
Note that you could break this down by ranges, etc. Depending on the type of replication you could even break it down by page range (for streaming replication).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse