Why is pgsql sometimes not listening for the first few seconds after start even though "service postgres status" returns OK? - postgresql

I have a web app that uses postgresql 9.0 with some plperl functions that call custom libraries of mine. So, when I want to start fresh as if just released, my build process for my development area does basically this:
dumps data and roles from production
drops dev data and roles
restores production data and roles onto dev
restarts postgresql so that any cached versions of my custom libraries are flushed and newly-changed ones will be picked up
applies my dev delta
vacuums
Since switching my app's stack from win32 to CentOS, I now sometimes (i.e., it seems, only if and only if I haven't run this build process in "a while"--perhaps at least a day) get an error when my build script tries to apply the delta:
psql: could not connect to server: No such file or directory
Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
Specifically, what's failing to execute at the shell level is this:
psql --host=$host -U $superuser -p $port -d $db -f "$delta_filename.sql"
If, immediately after seeing this error, I try to connect to the dev database with psql, I can do so with no trouble. Also, if I just re-run the build script, it works fine the second time, every time I've encountered this. Acceptable workaround, but is the underlying cause something to be concerned about?
So far in my attempts to debug this, I inserted a step just after the server restart (which of course reports OK shutdown, OK startup) whereby I check the results of service postgresql-dev status in a loop, waiting 2 seconds between tries if it fails. On my latest build script run, said loop succeeds on the first try--status returns "is running"--but then applying the delta still fails with the above connection error. Again, second try succeeds, as does connecting via psql outside the script just after it fails.
My next debug attempt was to sleep for 5 seconds before the first status check and see what happens. So far this seems to solve the problem.
So why is pgsql not listening on the socket after it starts [OK] and also has status running ok, for up to 5 seconds, unless it has "recently" been restarted?

The status check only checks whether the process is running. It doesn't check whether you can connect. There can be any amount of time between starting the process and the process being ready to accept connections. It's usually a few seconds, but it could be longer. If you need to cope with this, you need to script it so that it checks whether it is possible to connect before proceeding. You could argue that the CentOS package should do this for you, but it doesn't.
Actually, I think in your case there is no reason to do a full restart. Unless you are loading libraries with shared_preload_libraries, it is sufficient to restart the connection to pick up new libraries.

Related

pgagent - not running jobs - pgpass file is correct - postgresql

I have Pgagent installed on my Debian OS. Along with Postgresql 9.4.
I have checked .pgpass file as this seems to be the most common cause for a job to not run.
host port 5432 database = * username = postgres password = xxxx.
for both local and the remote host. The database I'm trying to set a job for is on a remote host.
I made sure it was enabled. It's just a simple INSERT script that should repeat every 5 minutes.
No errors are being triggered that I can find. Any ideas of what would cause the job not to run at all - even when selecting 'run now'?
Check postgre db, pgAgent Catalog, pga_jobsteplog
IDK about Linux but I had similar problem in windows where the thing won't run and it doesn't raise any notice on the error even after doing RUN NOW. The only error i could find out was that if i click on the job and click on statistics, i could see like shit ton of times it ran and everytime it ran, its status was F.
The reason for this failure is becuase the pgagent couldn't connect to the main database of postgresql.
The services of pgagent isn't running at all (as we can see this information under services in task manager in windows).
Forcing the service to run would create a failure which can be viewed in the event manager in windows.
To solve this issue, first try putting that pgpass.txt file in the environment variable (if not automatically put), if this didn't work, then what I did was to uninstall and delete all possible folders of Postgres, pgagent, and pgadmin, clearing out all temp files, clearing out registry details which have been put by Postgres, pgagent, and pgadmin and also from environment variable. Then reinstall it and it would normally work :)

Is (sudo) service postgresql restart a clean shutdown

I know database indexes can become corrupted if the server crashes. If I do:
sudo service postgresql restart
can that cause the same kind of corruption as a server crash?
That depends on the system I belive. You should look into the script to check the actual command issued. Eg. here we see, that restart is equal to stop & start. then checking stop we see it does killproc postmaster and removes pid. From the man killproc sends SIGTERM if otherly not specified. By the documentation
SIGTERM
This is the Smart Shutdown mode. After receiving SIGTERM, the
server disallows new connections, but lets existing sessions end their
work normally. It shuts down only after all of the sessions terminate.
If the server is in online backup mode, it additionally waits until
online backup mode is no longer active. While backup mode is active,
new connections will still be allowed, but only to superusers (this
exception allows a superuser to connect to terminate online backup
mode). If the server is in recovery when a smart shutdown is
requested, recovery and streaming replication will be stopped only
after all regular sessions have terminated.
So in presented case, indexes should survive. But you definetely should watch your /etc/init.d/ script to be sure.

Postgres Restore taking ages (days)

I've been working on a backup / restore for a Postgres server for quite a while now. It's an Azure Windows Virtual Machine (Windows server 2012).
The database isn't that big (near 5Gb), but the restore takes (literally) days. I've tried (several) times with different settings to restore the database, but all of the times it took days to "finish" (it didn't finish - I killed the process because I didn't see anything happening, that's why I'm running the job verbose this time).
I've now been running the job (verbose one) for 5 days straight and still it isn't finished. It's inserting rows (or at least displaying the rows), but it's still running.
Currently I'm using this command:
pg_restore -Fc -v --jobs=2 --host=localhost [filename]
Jobs is set at 2 because it's a dual core server. Like I said: different settings still very very slow.
What is wrong - should I be "tuning" the database before the restore or what?
This is a test-server setup. When we're doing with the test the current data need to be restored (again) to the new production server: we can't afford to wait days on end before the production environment comes online.
It's not pushing errors into the logs or something - it just keeps running and running and running...
So what am I doing wrong?

breakpoints in eclipse using postgresql

I am using helios Eclipse for debugging my code in postgresql.
My aim is to know how postgresql uses join algorithms during the join query, so I started to debug nodenestloop.c which is in the Executor folder.
I gave break points in that file, But whenever I try to debug that file, the control goes to main.c and never comes back,How do I constraint the control only to that particular file(nodenestloop.c)
Below are the following fields which I gave in Debug configurations of Helios Eclipse.
C/C++ Application - src/backend/postgres and
project - pgsql
I followed the steps given in the following link for running the program.
https://wiki.postgresql.org/wiki/Working_with_Eclipse#
I even uncheked the field "Start on Start up=main" , but When I do that, The step in and Step over buttons are not activated and the following problem has popped up.
Could not save master table to file '/home/ravi/workspace/.metadata/.plugins/org.eclipse.core.resources/.safetable/org.eclipse.core.resources'.
/home/ravi/workspace/.metadata/.plugins/org.eclipse.core.resources/.safetable/org.eclipse.core.resources (Permission denied)
So I started eclipse using sudo, but this time the following error has come in the console of eclipse.
"root" execution of the PostgreSQL server is not permitted.
The server must be started under an unprivileged user ID to prevent
possible system security compromise. See the documentation for
more information on how to properly start the server.
Could any one help me with this.
Thank you
Problem 1: User ID mismatch
Reading between the lines, it sounds like you're trying to debug a PostgreSQL instance that's running as the postgres user, or a different user ID to your own anyway. Hence your attempt to use sudo.
That's painful, especially when using an IDE like Eclipse. With plain gdb you can just sudo the gdb command to the desired uid, e.g. sudo -u postgres -p 12345 to attach to pid 12345 running as user postgres. This will not work with Eclipse. In fact, running it with sudo has probably left your workspace with some messed up file permissions; run:
sudo chown -R ravi /home/ravi/workspace/
to fix file ownership.
If you want to debug processes under other user IDs with Eclipse, you'll need to figure out how to make Eclipse run gdb with sudo. Do not just run all of Eclipse with sudo.
Problem 2: Trying to run PostgreSQL under the control of Eclipse
This:
"root" execution of the PostgreSQL server is not permitted. The server must be started under an unprivileged user ID to prevent possible system security compromise. See the documentation for more information on how to properly start the server.
suggests that you're also attempting to let Eclipse start postgres directly. That's very useful if you're trying to debug the postmaster, but since you're talking about the query planner it's clear you want to debug a particular backend. Launching the postmaster under Eclipse is useless for that, you'll be attached to the wrong process.
I think you probably need to read the documentation on PostgreSQL's internals:
Tour of PostgreSQL Internals
PostgreSQL internals through pictures
Documentation chapter - internals
Doing it right
Here's what you need to do - rough outline, since I've only used Eclipse for Java development and do my C development with vim and gdb:
Compile a debug build of PostgreSQL (compiled with ./configure --enable-debug and preferably also CFLAGS="-ggdb -Og -fno-omit-frame-pointer"). Specify a --prefix within your homedir, like --prefix=$HOME/postgres-debug
Put your debug build's bin directory first on your PATH, e.g. export PATH=$HOME/postgres-debug/bin:$PATH
initdb -U postgres -D $HOME/postgres-debug-data a new instance of PostgreSQL from your debug build
Start the new instance with PGPORT=5599 pg_ctl -D $HOME/postgres-debug-data -l $HOME/postgres-debug-data.log -w start
Connect with PGPORT=5599 psql postgres
Do whatever setup you need to do
Get the backend process ID with SELECT pg_backend_pid() in a psql session. Leave that session open; it's the one you'll be debugging.
Attach Eclipse's debugger to that process ID, using the Eclipse project that contains the PostgreSQL extension source code you're debugging. Make sure Eclipse is configured so it can find the PostgreSQL source code you compiled with too (no idea how to do that, see the manual).
Set any desired breakpoints and resume execution
In the psql session, do whatever you need to do to make your extension run and hit the breakpoint
When execution pauses at the breakpoint in Eclipse, debug as desired.
Basic misunderstandings?
Also, in case you're really confused about how all this works: PostgreSQL is a client/server application. If you are attempting to debug a client program that uses libpq or odbc, and expecting a breakpoint to trigger in some PostgreSQL backend extension code, that is not going to happen. The client application communicates with PostgreSQL over a TCP/IP socket. It's a separate program. gdb cannot set breakpoints in the PostgreSQL server when it's connected to the client, because they are separate programs. If you want to debug the server, you have to attach gdb to the server. PostgreSQL uses one process per connection, so you have to attach gdb to the correct server process. Which is why I said to use SELECT pg_backend_pid() above, and attach to the process ID.
See the internals documentation linked above, and:
PostgreSQL site - coding
PostgreSQL wiki - developer resources
Developer FAQ
Attaching gdb to a backend on linux/bsd/unix
I also faced similar issue and resolved it after some struggle
I misunderstood the following point under Debugging with child processes in the wiki (https://wiki.postgresql.org/wiki/Working_with_Eclipse).
5."Start postmaster & one instant of postgresql client (for creating one new postgres)"
The above step should be performed from terminal by starting postgres server and one client.
Hope this helps
Once this is done then debugger in eclipse needs to be started for C/C++ Attach to Application

postgres libpq: synchronous COPY mysteriously cancelled "due to user request"

My application is using libpq to write data to Postgres using the COPY API. After over 900000 successful COPY+commit (each containing a single row, don't ask) actions, one errored out with the following:
ERROR: canceling statement due to user request
CONTEXT: COPY [...]
My code never calls PQcancel or related friends, which I think is precluded anyway by the fact that libpq is being used synchronously and my app is not multi-threaded.
libpq v8.3.0
Postgres v9.2.4
Is there any reasonable explanation for what might have caused the COPY to be cancelled? Will upgrading libpq (as I have done in more recent versions of my application) be expected to improve the situation?
The customer reports that the Postgres server may have been shut down when this error was reported, but I'm not convinced since the error text is pretty specific.
That error will be emitted when you:
send a PQcancel
use pg_cancel_backend
Hit control-C in psql (which invokes PQcancel)
Send SIGINT to a backend, e.g. kill -INT or kill -2.
My initial answer was incorrect, claiming that the following also produced the same error. They don't; these:
pg_terminate_backend
pg_ctl shutdown -m fast
will emit a different error FATAL: terminating connection due to administrator command.