Unable to insert or retrieved data to MongoDB - mongodb

I try to insert and pull some data from MongoDB.
The connection was setup correctly follow there instruction on mongodb.com
try:
client = MongoClient(
'mongodb+srv://user:pw!#cluster0-nghj0.gcp.mongodb.net/test?retryWrites=true',
ssl=True)
print("connected")
except:
print('failed')
I manually create a Database: messager.messager and put some json file in it
when I try to use collection.find() or collection.insert_one(...)
db = client.messager
collection = db.messager
for i in collection.find():
print(i)
It returns Timeout error:
File "/Users/anhnguyen/Documents/GitHub/GoogleCloud_Flask/comming soon/env/lib/python3.7/site-packages/pymongo/cursor.py", line 1225, in next
if len(self.__data) or self._refresh():
File "/Users/anhnguyen/Documents/GitHub/GoogleCloud_Flask/comming soon/env/lib/python3.7/site-packages/pymongo/cursor.py", line 1117, in _refresh
self.__session = self.__collection.database.client._ensure_session()
File "/Users/anhnguyen/Documents/GitHub/GoogleCloud_Flask/comming soon/env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1598, in _ensure_session
return self.__start_session(True, causal_consistency=False)
File "/Users/anhnguyen/Documents/GitHub/GoogleCloud_Flask/comming soon/env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1551, in __start_session
server_session = self._get_server_session()
File "/Users/anhnguyen/Documents/GitHub/GoogleCloud_Flask/comming soon/env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1584, in _get_server_session
return self._topology.get_server_session()
File "/Users/anhnguyen/Documents/GitHub/GoogleCloud_Flask/comming soon/env/lib/python3.7/site-packages/pymongo/topology.py", line 434, in get_server_session
None)
File "/Users/anhnguyen/Documents/GitHub/GoogleCloud_Flask/comming soon/env/lib/python3.7/site-packages/pymongo/topology.py", line 200, in _select_servers_loop
self._error_message(selector))
pymongo.errors.ServerSelectionTimeoutError: connection closed,connection closed,connection closed
Where did it goes wrong ?
Here is my Mongodb.com setup:

From the pymongo documentation for errors you have the following issue.
exception pymongo.errors.ServerSelectionTimeoutError(message='', errors=None)
Thrown when no MongoDB server is available for an operation
If there is no suitable server for an operation PyMongo tries for serverSelectionTimeoutMS (default 30 seconds) to find one, then throws this exception. For example, it is thrown after attempting an operation when PyMongo cannot connect to any server, or if you attempt an insert into a replica set that has no primary and does not elect one within the timeout window, or if you attempt to query with a Read Preference that the replica set cannot satisfy.
You have to check whether your network has good connectivity to the network where your MongoDB server resides.
There might be a case, where the Primary Node of the Replica Set becomes unresponsive. In such cases you need to restart your cluster(if you have the access permissions).
Also, create the connections as follows:
mongo_conn = MongoClient('mongodb+srv://cluster0-nghj0.gcp.mongodb.net/test?retryWrites=true', username=your_username, password=pwd, authSource='admin', authMechanism='SCRAM-SHA-1')
above is the best practice to follow. mongodb+srv urls do not need the ssl=True mention.

Related

MongoDB replica set error reading from secondary (using primaryPreferred) after primary shut down

I created a MongoDB replica set and I am testing the failover scenarios.
I am using pymongo as my driver and I am connecting using an mongodb+srv:// connection string and I set my readPreference to primaryPreferred.
However when I shut down my primary server I get the following error from pymongo (prior to the secondary assuming the role of the primary, once the secondary is primary it works as expected), my question is that since I am using a readPreference of primaryPreferred shouldn't the read be directed to the secondary and not error out?
Traceback (most recent call last):
File ".\main.py", line 86, in <module>
main()
File ".\main.py", line 18, in main
commands[command](*args)
File ".\main.py", line 26, in print_specs
for db in client.list_databases():
File "C:\Python38\lib\site-packages\pymongo\mongo_client.py", line 1890, in list_databases
res = admin._retryable_read_command(cmd, session=session)
File "C:\Python38\lib\site-packages\pymongo\database.py", line 748, in _retryable_read_command
return self.__client._retryable_read(
File "C:\Python38\lib\site-packages\pymongo\mongo_client.py", line 1453, in _retryable_read
server = self._select_server(
File "C:\Python38\lib\site-packages\pymongo\mongo_client.py", line 1253, in _select_server
server = topology.select_server(server_selector)
File "C:\Python38\lib\site-packages\pymongo\topology.py", line 233, in select_server
return random.choice(self.select_servers(selector,
File "C:\Python38\lib\site-packages\pymongo\topology.py", line 192, in select_servers
server_descriptions = self._select_servers_loop(
File "C:\Python38\lib\site-packages\pymongo\topology.py", line 208, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: No replica set members match selector "Primary()"
You have either:
Not specified the read preference at all, or specified it incorrectly, or
Have issued an operation that doesn't respect read preference, or
Are attempting a write, to which read preference doesn't apply.
Going by the stack trace, you are listing databases which is a read operation so it's one of the first two reasons.

Connecting Apache Superset with PostgreSQL

Suppose I run my Apache Superset on top of the Docker and I want this to connect with my local postgreSQL server. I used the following URI but I got an error:
postgresql+psycopg2://username:password#localhost:5432/mydb
The error is:
ERROR: {"error": "Connection failed!\n\nThe error message returned was:\n(psycopg2.OperationalError) could not connect to server: Connection refused\n\tIs the server running on host \"localhost\" (127.0.0.1) and accepting\n\tTCP/IP connections on port 5432?\ncould not connect to server: Cannot assign requested address\n\tIs the server running on host \"localhost\" (::1) and accepting\n\tTCP/IP connections on port 5432?\n\n(Background on this error at: http://sqlalche.me/e/e3q8)", "stacktrace": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py\", line 2265, in _wrap_pool_connect\n return fn()\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py\", line 303, in unique_connection\n return _ConnectionFairy._checkout(self)\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py\", line 760, in _checkout\n fairy = _ConnectionRecord.checkout(pool)\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py\", line 492, in checkout\n rec = pool._do_get()\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py\", line 238, in _do_get\n return self._create_connection()\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py\", line 308, in _create_connection\n return _ConnectionRecord(self)\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py\", line 437, in __init__\n self.__connect(first_connect_check=True)\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py\", line 639, in __connect\n connection = pool._invoke_creator(self)\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py\", line 114, in ...
How can I solve it?
Instead of using localhost or 127.0.0.1, open up your pgAdmin.
The servers are on the left.
Click the dropdown.
Right click on the now opened cluster (level above "Databases") & open properties.
Navigate to the opened connection tab and the Hostname/Address is your replacement for "localhost"
Also make sure the final part of your connection string is pointed at your database which is one level below "Databases" in your pgAdmin.
I encountered the same problem with connecting superset to local database (postgresql), and after consulting many sites on the internet this trick solved it.Instead of local host, try to put this in SQLalchemy URI:
postgresql+psycopg2://user:password#host.docker.internal:5432/database
I understand that It is a bad practice action to connect the Docker with a host database so I changed my opinion and use the postgres image inside the docker and push my data to that postgres server.
It would be helpful if you notify me if I am wrong.

Airflow psycopg2.OperationalError: FATAL: sorry, too many clients already

I have a four node clustered Airflow environment that's been working fine for me for a few months now.
ec2-instances
Server 1: Webserver, Scheduler, Redis Queue, PostgreSQL Database
Server 2: Webserver
Server 3: Worker
Server 4: Worker
Recently I've been working on a more complex DAG that has a few dozen tasks in it compared to my relatively small ones I was working on beforehand. I'm not sure if that's why I'm just now seeing this error pop up or what but I'll sporadically get this error:
On the Airflow UI under the logs for the task:
psycopg2.OperationalError: FATAL: sorry, too many clients already
And on the Webserver (output from running airflow webserver) I get the same error too:
[2018-07-23 17:43:46 -0400] [8116] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2158, in _wrap_pool_connect
return fn()
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 403, in connect
return _ConnectionFairy._checkout(self)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 788, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 532, in checkout
rec = pool._do_get()
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1193, in _do_get
self._dec_overflow()
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 187, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1190, in _do_get
return self._create_connection()
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 350, in _create_connection
return _ConnectionRecord(self)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 477, in __init__
self.__connect(first_connect_check=True)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 671, in __connect
connection = pool._invoke_creator(self)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 106, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 410, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/usr/local/lib64/python3.6/site-packages/psycopg2/__init__.py", line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL: sorry, too many clients already
I can fix this by running sudo /etc/init.d/postgresql restart and restarting the DAG but then after about three runs I'll start seeing the error again.
I can't find any specifics on this issue in regards to Airflow but from other posts I've found such as this one they're saying it's because my client (I guess in this case that's Airflow) is trying to open up more connections to PostgreSQL than what PostgreSQL is configured to handle. I ran this command to find that my PostgreSQL can accept 100 connections:
[ec2-user#ip-1-2-3-4 ~]$ sudo su
root#ip-1-2-3-4
[/home/ec2-user]# psql -U postgres
psql (9.2.24)
Type "help" for help.
postgres=# show max_connections;
max_connections
-----------------
100
(1 row)
In this solution the post says I can increase my PostgreSQL max connections but I'm wondering if I should instead set a value in my Airflow.cfg file so that I can match the Airflow allowed connections size to my PoastgreSQL max connections size. Does anyone know where I can set this value in Airflow? Here are the fields I think are relevant:
# The SqlAlchemy pool size is the maximum number of database connections
# in the pool.
sql_alchemy_pool_size = 5
# The SqlAlchemy pool recycle is the number of seconds a connection
# can be idle in the pool before it is invalidated. This config does
# not apply to sqlite.
sql_alchemy_pool_recycle = 3600
# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 32
# The number of task instances allowed to run concurrently by the scheduler
dag_concurrency = 32
# When not using pools, tasks are run in the "default pool",
# whose size is guided by this config element
non_pooled_task_slot_count = 128
# The maximum number of active DAG runs per DAG
max_active_runs_per_dag = 32
Open to any suggestions for fixing this issue. Is this something related to my Airflow configuration or is it an issue with my PostgreSQL configuration?
Also, because I'm testing a new DAG I'll sometimes terminate the running tasks and start them over. Perhaps doing this is causing some of the processes to not die correctly and they're keeping dead connections open to PostgreSQL?
Ran into similar issue. I changed max_connections in postgres to 10000 and sql_alchemy_pool_size in airflow config to 1000. Now I am able to run hundreds of tasks in parallel.
PS: My machine has 32 cores and 60GB memory. Hence, its taking the load.
Quoting the airflow documentation:
sql_alchemy_max_overflow: The maximum overflow size of the pool. When the number of checked-out connections reaches the size set in pool_size, additional connections will be returned up to this limit. When those additional connections are returned to the pool, they are disconnected and discarded. It follows then that the total number of simultaneous connections the pool will allow is pool_size + max_overflow, and the total number of “sleeping” connections the pool will allow is pool_size. max_overflow can be set to -1 to indicate no overflow limit; no limit will be placed on the total number of concurrent connections. Defaults to 10.
It seems that the variables you'll want to set on your airflow.cfg are both sql_alchemy_pool_size and sql_alchemy_max_overflow. Your PostgreSQL max_connections must be equal to or greater than the sum of those two Airflow configuration variables, since Airflow can have at most sql_alchemy_pool_size + sql_alchemy_max_overflow open connections with your database.

Does pymongo replica set client connection support auto fail over?

I created the following mongo replica sets by using mongo cli:
> config = { _id:"repset", members:[{_id:0,host:"192.168.0.1:27017"},{_id:1,host:"192.168.0.2:27017"},{_id:2,host:"192.168.0.3:27017"}]}
> rs.initiate(config);
All the mongo servers run properly.
>>> import pymongo
>>> from pymongo import MongoClient
>>> servers = ["192.168.0.1:27017", "192.168.0.2:27017", "192.168.0.3:27017"]
>>> MongoClient(servers)
>>> xc = MongoClient()
>>> print xc
MongoClient('localhost', 27017)
>>> print xc.database_names()
[u'test_repsets', u'local', u'admin', u'test']
After I kill the local mongodb server, it shows me connection timeout error:
pymongo.errors.ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused
It seems there is no auto fail over, although I defined the mongodb servers.
I am wondering if pymongo handles fail over automatically, or how this situation is handled properly?
Thank you in advance.
in Pymongo 3.x you may want to explicitly state what replica set you are connecting to. I know Pymongo 3.x switched up some of the ways it handles being given an array of servers. I got this off the Pymongo API about connections to replicas and auto failover
In your code :
MongoClient(servers)
Above line is not assigned to any variable. It should assign to variable (in your case you again created instance which causes error.)
Please add following line
>>> #MongoClient(servers) # remove this line
>>> #xc = MongoClient() # remove this line
>>> xc = MongoClient(servers) # add this line

Pymongo permissions issue for safe inserts

I have an instance of Mongo running and can connect and authenticate successfully to a database. I can bulk insert records using collection.insert([list of records to insert]).
However, when I add safe=True to ensure that the records are inserted, like the following command, I get the error below, which seems like a permissions issue. How can I fix this?
collection.insert(records_to_insert, safe=True)
File "/.../python2.6/site-packages/pymongo/collection.py", line 270, in insert
check_keys, safe, kwargs), safe)
File "/.../python2.6/site-packages/pymongo/connection.py", line 732, in _send_message
return self.__check_response_to_last_error(response)
File "/.../lib/python2.6/site-packages/pymongo/connection.py", line 684, in __check_response_to_last_error
raise OperationFailure(error["err"])
pymongo.errors.OperationFailure: unauthorized
You are running in MongoDB in auth mode and did not provide the related the related credentials upon connection time. Calling db.authenticate(...) should be your friend.