Postgresql query did not terminate, after restart the service postgresql doesn't start

Postgresql query did not terminate, after restart the service postgresql doesn't start - postgresql

At my work I was running a complex query. I cancelled it and went home yesterday. This morning in the back the query was impossible to be terminated, also with the 'terminate backend' functionality. A colleague of mine restarted the host machine where postgres is installed. After the machine restart, the postgres database sever would not start up.
In my log files I see the error:
'pg_ctl: this data directory appears to be running a pre-existing postmaster'
I am not sure how to handle this problem. I could try to fix it or try to extrapolate the data from the save files. What is the most logical step to take and do you know how to fix this?
Earlier it gave this error message :
2016-01-28 15:52:33 GMT FATAL: lock file "postmaster.pid" already exists
2016-01-28 15:52:33 GMT HINT: Is another postmaster (PID 2100) running in data directory "C:/PostgreSQL/9.1/data"?
UPDATE... I located the file postmaster.pid and deleted it. Now I am restarting the computer and hoping it will start.
UPDATE... It works now. I rebooted the computer and postgres just instantly started. Happy as a child but at the same time not fully satisfied because of the following forum: https://superuser.com/questions/553045/fatal-lock-file-postmaster-pid-already-exists . Here it is stated to NEVER delete the postmaster.pid because of possible data corruption. So because of that I will backup all databases I have in postgres now.
So if anyone can share some more light on my ICT adventure of today I would be very satisfied. That is why I will not state that this question is answered, since I have no idea what went wrong and perhaps will run into it again someday.

The explanation is pretty straightforward. PostgreSQL writes the process ID to a file called postmaster.pid — the presence of the file is supposed to indicate that the server is running. When the PostgreSQL shuts down cleanly, it removes the postmaster.pid file.
However, when your colleague restarted the host machine, the PostgreSQL server got killed without having had a chance to remove the postmaster.pid file. Therefore, when you tried to start PostgreSQL, the presence of the file made it look complain that the server was already running.
This answer provides more complete advice. In general, you should never delete postmaster.pid for no good reason, because it's supposed to help prevent two servers from running at once on the same data files. However, if you are certain that the process indicated by the postmaster.pid file is already dead, then by all means just delete the stale PID file manually.

In windows Delete all running postgres processes and start the service

Related

My MongoDB database was lost after running a read-only script

I use MongoDB 4.2 on my local machine (windows 10). I have not changed any configurations, so the default behavior of only accepting local connections should be in place. (I only need to access it locally)
I was running a script that was reading data from my MongoDB, there are no writes to the db in this script. When all the numbers were crunched I noticed weird results, and saw that my database was suddenly gone. I checked my dbpath and the data was gone from there too! Could it be a hack, or was it MongoDB that dropped both the database and the raw data in the dbpath?
I've seen similar questions on this forum, mostly resolved by the author forgetting to reroute to the correct dbpath, which is not the case here. I've checked the log but the log seems to be very limited (I restarted mongod and could only see logging happening after the restart).

MongoDB does not delete all of the files in its data directory.
Most likely either you are checking in the wrong place or something external to MongoDB deleted its files.

What is the /var/lib/postgresql/10/main/ equivilent in PSQL-12?

I'm working on setting up point in time backups for a PSQL server that I have running, and I'm following a tutorial for an earlier version. I'm trying to figure out what the specific directory is for the DB cluster in PSQL-12 so that I can clear out that directory and test what I've setup. In the video, he runs a recursive remove on the PSQL-10 directory /var/lib/postgresql/10/main, and is still able to start the PSQL-10 service again when he's finished the restoration.
When I attempted it, I ran the recursive remove on the directory /var/lib/pgsql/12/data/ because the command SHOW data_directory; told me that is where my server's cluster data is stored. Removing all the data, however, messes up the postgresql-12.service, so I can't start it back up when I've completed the recovery.
This is displayed when I restore the backup and run systemctl start postgresql-12.service:
Process: 26672 ExecStart=/usr/pgsql-12/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)
Dec 31 11:07:29 localhost.localdomain systemd[1]: Failed to start PostgreSQL 12 data....
I've tried making a backup of the working /data/ directory and doing a diff -qr to see what files differ between the working backup and the point in time backup, but coping those files from the working directory to the PIT directory doesn't seem to fix the issue, and I'm still unable to start the postgresql-12.service. It seems, however, that I am able to start the service back up successfully if I just do a mass copy of the working directory to /var/lib/postgresql/10/main.
Can someone please point me in the right direction? I've done plenty of research trying to find the working cluster directory so I can just erase table information and work on a PIT recovery without messing up the core application prereqs (such as the service), but I can't seem to find the information I'm looking for. Any assistance would be greatly appreciated! Additionally, if there's a way to spot this directory more quickly in the future, either by a command or looking at the files within, I would love to know so I can implement this procedure on different PSQL versions. Thank you!

MongoDB WiredTiger error: WiredTiger.turtle: handle-open: open: operation not permitted

MongoDB was working beautifully for me for several months until I had an unexpected shutdown a week or two ago. Since then, I've been getting the error in the title that snowballs into an invalid argument, then a library panic, then some fatal assertions which cause MongoDB to crash.
Now, I've done my research: the normal answers are to run the repair function and to make sure SELinux isn't screwing up the process. Neither of those have worked. The error gets thrown during WiredTiger's checkpoint process, so reads/writes to the database aren't the issue, and because it's during the checkpoint process, it guarantees that MongoDB won't stay up for more than a day.
To be clear: all the files in the database are owned by mongod:mongod, have permissions set to 600 (default, and I tried setting them to 755 to see if that fixed it, and it didn't). I'm running mongodb as a service on a CentOS 7 box, and the service file specifies that it should run as user mongod. The mongod.conf file specifies a mounted filesystem as the database, and it was happy with that until the unexpected shutdown. I'm running MongoDB version 4.0.1, so WiredTiger really doesn't like it if I disable Journaling either (disregarding the fact that I shouldn't disable it in the first place).
I feel like I've exhausted all my options, and that the only thing I can do is backup my data and reinstall MongoDB. Are there any that I've missed?

After creating a backup of my data via mongodump, shutting down mongo, removing the entire database with rm -rf 'path-to-database', rebooting mongo (without the replication config), and restoring the data with mongorestore, mongodb still crashes. This time, however, it's with an Invariant failure after the open: operation not permitted. The only conclusion I can think of is that the data itself has become corrupted in some way. Thankfully, this isn't "mission critical" data, so to speak, and I can easily obtain new data.
Unfortunately, this doesn't answer my original question of "what other options do I have?". However, I'm still posting this in case others run into this same kind of issue.
EDIT: invariant issue was caused by me forgetting to re-initialize my replication set. After fixing that, it's clean. Because of this, I no longer believe it was a data corruption issue, but a checkpoint corruption issue.
EDIT 2: So the issue arose again after about a week, and after another week of trying various debugging methods, I tried simply moving the mongo process to another server. So far, that's been working. The previous server was acting up (I couldn't even run top at one point - another process had a lock on a necessary library file to run it), so here's to hoping that the current server doesn't follow suite.

Can not stop postgres despite immediate stop

I have this issue that is driving me nuts. Despite all my efforts, I am not able to force my postgres server to shut down. I have followed those instructions : http://www.question-defense.com/2008/10/17/pg_ctl-server-does-not-shut-down-force-postgres-to-shutdown
but still, nothing happens and all I got in the shell is
waiting for server to shut down............................................................... failed
pg_ctl: server does not shut down
Any help much appreciated.
Update: Checking the logs, I have this recurring error :
LOG: checkpoints are occurring too frequently (25 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".

After giving it a lot of thoughts especially on the way I installed it at the first place, I realize that I set up the install so the daemon would launch postgres at the start of my machine. Thus, any manual killing would simply result in the recreation of those process by the same daemon.
To resolve this problem you need to stop the daemon from working using launchctl and remove a .plist file in your postgres directory.
Good luck if you face the same problem.

You probably run with the default setting of "checkpoint_segments = 3", that produces the warnings. Your database does many writes, right? It takes some time to write all of this to disk, and your database is quite busy rotating the logfiles, instead doing real work.
If you increase checkpint_segments, you will see performance improvements, and less I/O.
For further readings: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

Mongo DB Invariant failure

Our DB of +- 400Gb is stopping on our one server.
From the logs:
2015-07-07T09:09:51.072+0200 I STORAGE [conn10] _getOpenFile() invalid file index requested 8388701
2015-07-07T09:09:51.072+0200 I - [conn10] Invariant failure false src/mongo/db/storage/mmap_v1/mmap_v1_extent_manager.cpp 201
2015-07-07T09:09:51.082+0200 I CONTROL [conn10]
Any idea in what are I should start looking? Storage issue?

I am just answering this question in case some people make the same non-technical mistake again:
I tried to scp all the files in the /data/db directory to the server. As the files are many (dbname.1 to dbname.55, about 100GB), it was interrupted in the middle (last successful file dbname.22), and I restarted and uploaded dbname.23 to dbname.55. And when I run queries in mongo client, it worked for some cases, and failed for some others showing the error message the same as in the question. I thought it might be some file broken in the file transferring, but the md5 check was all right. Only after I spent a long time finishing all the md5 check I found the reason.
It turned out to be that scp uploads dbname.21 to dbname.29 after it uploads dbname.2, so dbname.3 to dbname.9 was never uploaded to the server. I am going to upload them, and this should solve the problem.

I ran into a variant of this today as well. Mysteriously one of my data files disappeared (or didn't make it in a migration from another server). None of the repair/recovery procedures would work, failing on the same error you reference. Luckily I have a separate mongod that has a collection with the same name, so as a cheap hack I copied the (admittedly wrong) data file to the other server, and while I knew I wouldn't get any data back, the repair tools (such as mongod --repair) were then able to work their magic, but as expected, they recovered some data from the bad file I copied in, so I had to weed out some docs. Luckily it was the "mycollection.1" file, which is only 128MB.
I don't think this applies in your case since index of the missing data file your log is talking about is ridiculously high. Your log is essentially saying it can't find /data/dbname/mycollection.8388701. You said your data-set is only 400GB, so an index that high just doesn't make sense. You should have only roughly 200 data files since most of them are 2GB each by default. What is the result of db.stats() (specifically the fileSize attribute)?
This mongolab blog entry helped me understand the data file structure.
My advice for where you should start looking:
run the db.stats() command to get an idea of how big your data on
disk actually is.
Does it make sense for your server to be looking for a data file with a crazy high index? If not, the issue isn't really with storage, but with the extents and the metadata of your collection/database.
Do your repair tools work? If you have at least enough free disk space as the size of your data set (on disk), try the mongod --repair, or db.repairDatabase() tools to start a repair. I'm assuming it won't work since my repair attempts crashed with the same invalid file index requested error.
Try copying a "bad" file like I did that roughly matches what the missing file would look like (keeping in mind how the file sizes of the data files aren't all the same, do your best to match it up and try a repair). If this works, your data files will be cleaned up (but it does take a lot of disk space).
Hope that helps point you in the right direction.

In my case this happened in a development setting with MongoDB 3.6.20 on macOS 10.14.6. Another program restarted the mac and close any open terminals, including the terminal that ran the mongod process. After the OS restart, I could not restart the mongod because the Invariant failure. The error also mentioned a bad lockfile.
I was able to solve the issue with the following steps, yet I am not exactly sure which did the job:
remove corrupted lock file: rm -rf data/db/mongod.lock
direct outcome: mongod still failed due to Invariant failure but at least no mention about the lockfile anymore.
run mongod --repair
direct outcome: repair still failed due to Invariant failure. Error output mentions SocketException: Address already in use.
restart the machine again to free the socket.
direct outcome: mongod starts and runs without problems. Yay.
The first successful mongod run after the issue gave the following output:
[ftdc] Unclean full-time diagnostic data capture shutdown detected, found interim file, some metrics may have been lost.
Thus, it runs smoothly again. Maybe I was fortunate. I hope the same approach helps some of you.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse