Mongo cannot find master during data lookups - mongodb

I am running a large data update using pymongo. To run the updates, individual records are found using collection.find_one(unique criteria), changes are made, the updates are batched, and finally sent in chunks using db.collection.save([long list of records to save])
On my local machine (running 1.6.3), the imports work fine.
On a remote server (running 1.6.0), which is much faster than my local machine, I can get through a portion of the inserts just fine, but then will suddenly get the following error when looking up original records:
connection = Connection(...)
...
raise AutoReconnect("could not find master/primary")
pymongo.errors.AutoReconnect: could not find master/primary
The number of records I can get through is varies somewhat, but is not random.
At first I thought I was running into the connection limit. I started closing connections manually after each record lookup:
collection.database.connection.disconnect()
Which didn't solve the problem. Am I on the right track?

So there are a couple of potential issues here:
raise AutoReconnect("could not find master/primary")
pymongo.errors.AutoReconnect: could not find master/primary
That error indicates that the existing connection has somehow been invalidated. There are a number of reasons this could happen.
The most common reason this happens is that that the Primary of a Replica Set has stepped down or has failed. In this case your code needs to:
Catch (or trap) the error.
Decide on a retry strategy. (fail? retry once?...)
Are you doing this?
Are you running Replica Sets or Master/Slave?
Do you have any tracking for the performance of these servers?
Are they having network issues?
Are they switching roles?
collection.database.connection.disconnect()
Which didn't solve the problem. Am I on the right track?
Where is the exception "happening"? Is it coming from the connection itself or the save command?
On a remote server (running 1.6.0)
As of this writing, 1.6.0 is a very old version of MongoDB. There were multiple replication bugs fixed in the subsequent 1.6.x versions and 1.7.x versions. (we're already at 1.8.1rc-0)
I would start by looking at what's happening with your servers, but that may well lead you down the upgrade path.

I've encountered this problem in interactive python usage with pymongo, where I leave the session idle and encounter AutoReconnect upon returning. I've handled it this way:
import functools
import pymongo
import time
MAX_AUTO_RECONNECT_ATTEMPTS = 5
def graceful_auto_reconnect(mongo_op_func):
"""Gracefully handle a reconnection event."""
#functools.wraps(mongo_op_func)
def wrapper(*args, **kwargs):
for attempt in xrange(MAX_AUTO_RECONNECT_ATTEMPTS):
try:
return mongo_op_func(*args, **kwargs)
except pymongo.errors.AutoReconnect as e:
wait_t = 0.5 * pow(2, attempt) # exponential back off
time.sleep(wait_t)
return wrapper
#graceful_auto_reconnect
def some_func_that_does_mongodb_ops():
...
...
YMMV

Related

Connected To XEPDB1 From SQL Developer [duplicate]

I am using ORACLE database in a windows environment and running a JSP/servlet web application in tomcat. After I do some operations with the application it gives me the following error.
ORA-12518, TNS: listener could not hand off client connection
can any one help me to identify the reason for this problem and propose me a solution?
The solution to this question is to increase the number of processes :
1. Open command prompt
2. sqlplus / as sysdba; //login sysdba user
3. startup force;
4. show parameter processes; // This shows 150(some default) processes allocated, then increase the count to 800
5. alter system set processes=800 scope=spfile;
As Tried and tested.
In my case I found that it is because I haven't closed the database connections properly in my application. Too many connections are open and Oracle can not make more connections. This is a resource limitation. Later when I check with oracle forum I could see some reasons that have mentioned there about this problem. Some of them are.
In most cases this happens due to a network problem.
Your server is probably running out of memory and need to swap memory to disk.One cause can be an Oracle process consuming too much memory.
if it is the second one, please verify large_pool_size or check dispatcher were enough for all connection.
You can refer bellow link for further details.
https://community.oracle.com/message/1874842#1874842
I ran across the same problem, in my case it was a new install of the Oracle client on a new desktop that was giving the error, other clients were working so I knew it wouldn't be a fix to the database configuration. tnsping worked properly but sqlplus failed with the ora-12518 listener error.
I had the tnsnames.ora entry with a SID instead of a service_name, then once I fixed that, still the same error and found I had the wrong service_name as well. Once I fixed that, the error went away.
If from one day to another the issue shows for no apparent reasons, add these following lines at the bottom of the listner.ora file. If your oracle_home environment variable is set like this:
(ORACLE_HOME = C:\oracle11\app\oracle\product\11.2.0\server)
The lines to add are:
ADR_BASE_LISTENER = C:\oracle11\app\oracle\
DIRECT_HANDOFF_TTC_LISTENER=OFF
I had the same problem when executing queries in my application. I'm using Oracle client with Ruby on Rails.
The problem started when I accidentally started several connections with the DB and didn't close them.
When I fixed this, everything started to work fine again.
Hope this helps another one with the same problem.
I experienced the same error after upgrading to Windows 10. I solved it by starting services for Oracle which are stopped.
Start all the services as shown in the following image:
I had the same issue. After restarting all Oracle services it worked again.
same problem encountered for me.
And from oracle server listener log, can see more information.
and I found that the SERVICE_NAME is not match the tnsnames.ora configured Service name. so I changed the application's data source configuration from SID value to Service_NAME value and it fixed.
23-MAY-2019 02:44:21 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=XXXXXX$))(SERVICE_NAME=orclaic)) * (ADDRESS=(PROTOCOL=tcp)(HOST=::1)(PORT=50818)) * establish * orclaic * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
64-bit Windows Error: 203: Unknown error
I had the same issue in real time application and the issue gone by itself next day. upon checking, it was found that server ran out of memory due to additional processes running.
So in my case, the reason was server run out of memory
first of all
check the listener log
check the show parameter processes vs select count(*) from v$processes;
increase the process, it may require SGA increase
;

mongodb i/o timeout when using clustered mongo instances

I have an application that is using the upper.io/db package for communication with a Mongo database server (which is a fairly simple wrapper around gopkg.in/mgo.v2). The way the application works is that it creates a session in the main thread on start-up, and then each individual go routine that needs to make requests to the mongo server calls Clone on the session and does a defer session.Close on the resulting value. As far as I can tell, this is all standard operating procedure.
This setup works without any errors in our development environments where we are either using a locally run MongoDB or a sandbox instance on MongoLab. Recently we promoted the application up to our staging environment where we have the application talking to a Shared Cluster instance of MongoDB on MongoLab (the cheapest 15$ option). This is where the weirdness starts happening. The /first/ request that goes through (from the first go-routine that gets invoked) comes back with the expected response, but the subsequent ones all return
read tcp <ip address>:47112: i/o timeout
This happens both from our local development machines pointed at the cluster or from the AWS host for the staging environment. Since the Mongo cluster is from Mongolabs I am going to assume that they've configured everything correctly on their end.
The code is somewhat boring TBH: It literally just opens the session in the main function and maintains a reference to it, and then there are multiple goroutines with this basic structure:
sess := session.Clone()
defer sess.Close()
// make requests to Mongo
During testing, I even restricted it to run only one thing at once (i.e. only one goroutine is active at any given time), and it still fails in the same fashion.
Has anybody run into this before? Do I need to configure upper.io/db in a specific fashion? Maybe use mgo directly? I am at my wits end with this :(
In a rather long and grueling process, we finally tracked down where this issue and similar ones like it came from in our program. It ended up being a session leak in the v1 version of the upper.io/db library. The bug and fix are outlined here, but the v1 version of this library is horribly outdated at this point and the later versions do not exhibit this issue.
I doubt this answer will be useful for anybody so late in the game (especially since we ourselves solved it like.. 3 years ago at this point), but just wanted to leave the answer here for completeness.

Orientdb network connection lost during commit

I am using the blueprints graph api for orient against a 2 node cluster running orient 1.7.10. When ingesting simple parent child data I intermittently get the following error on commit -
Warning: caught I/O errors from not connected (local socket=?), trying to reconnect (error: java.io.IOException: Channel is closed)
The connection is then reestablished:
Connection re-acquired transparently after 31ms and 1 retries: no errors will be thrown at application level.
This occurs mid way through the commit (100 vertices and edges) with the result that the server thinks it has sent the response but the client hangs forever.
Is there a way to catch this at the application level and e.g. rollback?
I would be very grateful for any help?
As far as i know a very similar issue was fixed some time ago: https://github.com/orientechnologies/orientdb/issues/2930
one thing to be aware is that autostart transaction of the graph, if is enabled (and it is by default) you don't need to do begin, but just commit, if you do begin the transaction will be committed at shutdown and in that case can create that problem.
another suggestion is migrate to 2.0-* releases that have important improvement also in that side, especially if you are in development phase, the 2.0 final is going to be released very soon and will be the one with major focus in the next months.
bye

IBM DB2 ODBC Driver Issue [Error 69899] Error occurred in the database host server code. SQLSTATE= S1000

After upgrade our IBM System i (aka i5/OS or AS/400) from V5R4 to V7R1, one of our applications that connect to DB2 using ODBC fails with the following error:
Error Code: 69899
SQLSTATE: S1000
[IBM] [System i Access ODBC Driver] [DB2 for i5/OS] PWS0005
Error occurred in the database host server code.
The symptoms are:
In a While / Wend loop a CURSOR is declared, then opens, do fetch(s) and close.
If at any iteration the cursor does not retrieve any rows, in the following iteration the error occurs after declaring the cursor (with a different SQL query) when you try to open it.
First we updated the ODBC driver to the latest version available, but the problem persists.
Because we needed an urgent solution, I solved the problem by making a pre-select to determine if the cursor will return rows, otherwise skip that iteration, this solves the problem for now but does not seem a very elegant solution.
Any idea how to get more information about the error that occurs on the host?
Thank you very much in advance.
Generally speaking, if an error occurs in the server side code, you should call IBM support and report it. They'll ask if you're on the latest cume and probably the latest database group PTFs.
The server runs the ODBC connexion in a job called QZDASOINIT. Since there are probably many connexions to the system, there are probably many QZDASOINIT jobs. To find yours, go to a terminal session and WRKOBJLCK MYPROFILE *USRPRF. You'll be presented with a list of jobs running with your user profile. At least one of them will be the QZDASOINIT job you're looking for. Use option 5 to look at the job, then option 10 to see the job log. Press F10 to see the detailed messages and F18 to go to the bottom (most recent) entries.
If the error was so severe that the server job terminated abnormally, there won't be a lock on your user profile. Instead, go to the spooled job log by using WRKSPLF.
IBM have been logging some SQL internal errors since V5R4. select * from qrecovery.qsq901s; to see any SQLCODE -901 errors.
Make sure that you have installed the latest fix pack for the latest version of System I Access
I've had this error before and it was caused by a syntax error in the connection string. It was a setting that was insignificant in older versions of the OS and more significant in newer versions, but did not cause the connection itself to fail so it was hard to track down.
For example: Port Number:8471 had a spelling mistake and was Porte Number:8471 hard to spot but once found, it fixed the problem for me. Basically everything past this part of the connection got ignored.
Wanted to add another solution to this problem. The SQL Packages that exist on your system get corrupted after/and or during upgrades. You MUST delete these packages after an upgrade. This will get rid of the old packages and will allow the system to recreate the packages at the new OS version level. When deleting SQL packages some connections/jobs may have locks on those packages so you might have to shut host services down. Use the DLTSQLPKG command to do the delete. In v7r2 and higher there are some additional steps to do as IBM changed somethings when it comes to packages you can find the info here http://www-01.ibm.com/support/docview.wss?uid=nas8N1015556
Or tell your ODBC/JDBC/.Net Data adapter/provider to not use packages. This is probably less desirable as there are performance benefits to packages.

Postgres: "ERROR: cached plan must not change result type"

This exception is being thrown by the PostgreSQL 8.3.7 server to my application.
Does anyone know what this error means and what I can do about it?
ERROR: cached plan must not change result type
STATEMENT: select code,is_deprecated from country where code=$1
I figured out what was causing this error.
My application opened a database connection and prepared a SELECT statement for execution.
Meanwhile, another script was modifying the database table, changing the data type of one of the columns being returned in the above SELECT statement.
I resolved this by restarting the application after the database table was modified. This reset the database connection, allowing the prepared statement to execute without errors.
I'm adding this answer for anyone landing here by googling ERROR: cached plan must not change result type when trying to solve the problem in the context of a Java / JDBC application.
I was able to reliably reproduce the error by running schema upgrades (i.e. DDL statements) while my back-end app that used the DB was running. If the app was querying a table that had been changed by the schema upgrade (i.e. the app ran queries before and after the upgrade on a changed table) - the postgres driver would return this error because apparently it does caching of some schema details.
You can avoid the problem by configuring your pgjdbc driver with autosave=conservative. With this option, the driver will be able to flush whatever details it is caching and you shouldn't have to bounce your server or flush your connection pool or whatever workaround you may have come up with.
Reproduced on Postgres 9.6 (AWS RDS) and my initial testing seems to indicate the problem is completely resolved with this option.
Documentation: https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters
You can look at the pgjdbc Github issue 451 for more details and history of the issue.
JRuby ActiveRecords users see this: https://github.com/jruby/activerecord-jdbc-adapter/blob/master/lib/arjdbc/postgresql/connection_methods.rb#L60
Note on performance:
As per the reported performance issues in the above link - you should do some performance / load / soak testing of your application before switching this on blindly.
On doing performance testing on my own app running on an AWS RDS Postgres 10 instance, enabling the conservative setting does result in extra CPU usage on the database server. It wasn't much though, I could only even see the autosave functionality show up as using a measurable amount of CPU after I'd tuned every single query my load test was using and started pushing the load test hard.
For us, we were facing similar issue. Our application works on multiple schema. Whenever we were doing schema changes, this issue started occruding.
Setting up prepareThreshold=0 parameter inside JDBC parameter disables statement caching at database level. This solved it for us.
I got this error, I manually ran the failing select query and it fixed the error.