I have a Postgres installed in a Centos and another application is using Postgres the save data.
For sometime, and I can't find the reason, all the database tables become empty on the weekends.
I have been searching a lot to try to find some clues of the reason of that behaviour, but logs are not giving me that info.
I am pretty sure the application is not executing anything to clean the records, my thoughts are pointing to some process for some reason in the Postgres side.
The pg_log only shows this warning the day it happens:
HINT: Consider increasing the configuration parameter "checkpoint_segments".
LOG: checkpoints are occurring too frequently (11 seconds apart)
Apart from that I have no other clues.
Performing a VACUUM ANALYZE VERBOSE it says there is no dead data so it has nothing to delete.
Can you tell me what should I look to get the reason? Should it be any Postgres process to do it?
LOG: checkpoints are occurring too frequently (11 seconds apart)
This log message should also include all the information log_line_prefix tells it to include. So you should set log_line_prefix to include more information, like application name (voluntarily supplied by the client), database username, and host name/IP from which the connection came.
But perhaps more directly at issue, if things are connecting to your database and doing things you don't understand or approve of, it is time to change your passwords.
Related
MongoDB was working beautifully for me for several months until I had an unexpected shutdown a week or two ago. Since then, I've been getting the error in the title that snowballs into an invalid argument, then a library panic, then some fatal assertions which cause MongoDB to crash.
Now, I've done my research: the normal answers are to run the repair function and to make sure SELinux isn't screwing up the process. Neither of those have worked. The error gets thrown during WiredTiger's checkpoint process, so reads/writes to the database aren't the issue, and because it's during the checkpoint process, it guarantees that MongoDB won't stay up for more than a day.
To be clear: all the files in the database are owned by mongod:mongod, have permissions set to 600 (default, and I tried setting them to 755 to see if that fixed it, and it didn't). I'm running mongodb as a service on a CentOS 7 box, and the service file specifies that it should run as user mongod. The mongod.conf file specifies a mounted filesystem as the database, and it was happy with that until the unexpected shutdown. I'm running MongoDB version 4.0.1, so WiredTiger really doesn't like it if I disable Journaling either (disregarding the fact that I shouldn't disable it in the first place).
I feel like I've exhausted all my options, and that the only thing I can do is backup my data and reinstall MongoDB. Are there any that I've missed?
After creating a backup of my data via mongodump, shutting down mongo, removing the entire database with rm -rf 'path-to-database', rebooting mongo (without the replication config), and restoring the data with mongorestore, mongodb still crashes. This time, however, it's with an Invariant failure after the open: operation not permitted. The only conclusion I can think of is that the data itself has become corrupted in some way. Thankfully, this isn't "mission critical" data, so to speak, and I can easily obtain new data.
Unfortunately, this doesn't answer my original question of "what other options do I have?". However, I'm still posting this in case others run into this same kind of issue.
EDIT: invariant issue was caused by me forgetting to re-initialize my replication set. After fixing that, it's clean. Because of this, I no longer believe it was a data corruption issue, but a checkpoint corruption issue.
EDIT 2: So the issue arose again after about a week, and after another week of trying various debugging methods, I tried simply moving the mongo process to another server. So far, that's been working. The previous server was acting up (I couldn't even run top at one point - another process had a lock on a necessary library file to run it), so here's to hoping that the current server doesn't follow suite.
I have 2 cloud servers of postgresql, 1st one is working fine but in second after 30 mins i am not able to connect from java application. When i connect from pgadmin it shows 30 to 40 connection and after killing those connection every thing runs smooth.
its
configuration:
postgresql/9.3
max_connections = 100
shared_buffers = 4GB
When same application is connect to other postgresql with same schema every thing works fine forever
Configuration:
postgresql/9.1
max_connections = 100
shared_buffers = 32MB
Can u please help me to understand or fix the issue
I work on a PostgreSQL 9.3 instance with hundreds of open connections. I concur to you that the open connections themselves shouldn't be a problem. Sine we don't have much information, what follows is a description of how to get started troubleshooting.
Check server logs for anything wrong. Maybe there is an issue on the OS level with initiating connections?
Try logging in with psql as the application user. Does the problem persist? If not, the problem is not with PostgreSQL. I would take a closer look at the Java code and see if something is happening there.
Note that psql and other libpq actions may not give you the full picture. Try connecting locally over a non-SSL connection while watching a packet capture. You can find (and look up) the SQLSTATE error of the connection in this case. This is because, for legacy and backwards compatibility reasons libpq does not pass the sqlstate up to the client app when connecting to the database.
My bet though is that this is not a postgresql issue. It may be an operating system issue. It may be a resource issue. It may be a client application issue.
I have this issue that is driving me nuts. Despite all my efforts, I am not able to force my postgres server to shut down. I have followed those instructions : http://www.question-defense.com/2008/10/17/pg_ctl-server-does-not-shut-down-force-postgres-to-shutdown
but still, nothing happens and all I got in the shell is
waiting for server to shut down............................................................... failed
pg_ctl: server does not shut down
Any help much appreciated.
Update: Checking the logs, I have this recurring error :
LOG: checkpoints are occurring too frequently (25 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
After giving it a lot of thoughts especially on the way I installed it at the first place, I realize that I set up the install so the daemon would launch postgres at the start of my machine. Thus, any manual killing would simply result in the recreation of those process by the same daemon.
To resolve this problem you need to stop the daemon from working using launchctl and remove a .plist file in your postgres directory.
Good luck if you face the same problem.
You probably run with the default setting of "checkpoint_segments = 3", that produces the warnings. Your database does many writes, right? It takes some time to write all of this to disk, and your database is quite busy rotating the logfiles, instead doing real work.
If you increase checkpint_segments, you will see performance improvements, and less I/O.
For further readings: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
I am using PostgreSQL 8.4 and PostGIS 1.5. What I'm trying to do is INSERT data from one table to another (but not strictly the same data). For each column, a few queries are run and there are a total of 50143 rows stored in the table. But the query is quite resource-heavy: after the query has run for a few minutes, the connection is lost. Its happening about 21-22k MS into the execution of the query, after which I have to start the DBMS manually again. How should I go about solving this issue?
The error message is as follows:
[Err] server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Additionally, here is the psql error log:
2013-07-03 05:33:06 AZOST HINT: In a moment you should be able to reconnect to the database and repeat your command.
2013-07-03 05:33:06 AZOST WARNING: terminating connection because of crash of another server process
2013-07-03 05:33:06 AZOST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
My guess, reading your problem, is that you are hitting out of memory issues. Craig's suggestion to turn off overcommit is a good one. You may also need to reduce work_mem if this is a big query. This may slow down your query but it will free up memory. work_mem is per operation so a query can use many times that setting.
Another possibility is you are hitting some sort of bug in a C-language module in PostgreSQL. If this is the case, try updating to the latest version of PostGIS etc.
I'm trying to run a Drupal migration via SSH and drush (a command line shell), copying data from a postgres database to mysql.
It works fine for a while (~5 mins or so), but then I get the error:
SQLSTATE[HY000]: General error: 7 SSL [error] SYSCALL error: EOF detected
The postgres database connection seems to have gone, and I just get errors:
SQLSTATE[HY000]: General error: 7 no [error] connection to the server
It works fine locally, so I think the problem must be with postgres and running a script over SSH - but googling these errors returns nothing useful. Does anyone know what could be causing this?
Could be a timeout. first inspect the log (maybe change ssl_renegotiation_limit)
BTW: IIRC, the renegotiation does not take place after a fixed amount of time, but after a certain amount of transmitted characters (2GB?)
You should check both your PostgreSQL and MySQL logs for further potential details. If there's not much in the PostgreSQL log, look at the log_min_error_statement in postgresql.conf. As you'll find through that link, you can tune it to increase the amount of logging. If there's still not clues in the PostgreSQL log, I would look at other components in your system for the problem.