MongoEngine connects to incorrect database during testing - mongodb

Context
I'm creating Flask app connected to mongodb using MongoEngine via flask-mongoengine extension. I create my app using application factory pattern as specified in configuration instructions.
Problem
While running test(s), I specified testing database named datazilla_test which is passed to mongo instance via mongo.init_app(app). Even though my app.config['MONGODB_DB'] and mongo.app.config['MONGODB_DB'] instance has correct value (datazilla_test), this value is not reflected in mongo instance. Thus, when I run assertion assert mongo.get_db().name == mongo.app.config['MONGODB_DB'] this error is triggered AssertionError: assert 'datazzilla' == 'datazzilla_test'
Question
What am I doing wrong? Why database connection persist with default database datazzilla rather, than datazilla_test? How to fix it?
Source Code
# __init__.py
from flask_mongoengine import MongoEngine
mongo = MongoEngine()
def create_app(config=None):
app = Flask(__name__)
app.config['MONGODB_HOST'] = 'localhost'
app.config['MONGODB_PORT'] = '27017'
app.config['MONGODB_DB'] = 'datazzilla'
# override default config
if config is not None:
app.config.from_mapping(config)
mongo.init_app(app)
return app
# conftest.py
import pytest
from app import mongo
from app import create_app
#pytest.fixture
def app():
app = create_app({
'MONGODB_DB': 'datazzilla_test',
})
assert mongo.get_db().name == mongo.app.config['MONGODB_DB']
# AssertionError: assert 'datazzilla' == 'datazzilla_test'
return app

Mongoengine is already connected when your fixture is called, when you call app = create_app from your fixture, it tries to re-establish the connection but fails silently (as it sees that there is an existing default connection established).
This got reworked in the development version of mongoengine (see https://github.com/MongoEngine/mongoengine/pull/2038) but wasn't released yet (As of 04-JUN-2019). When that version gets out, you'll be able to disconnect any existing mongoengine connection by calling disconnect_all
In the meantime, you can either:
- check where the existing connection is created and prevent it
- try to disconnect the existing connection by using the following:
from mongoengine.connection import disconnect, _connection_settings
#pytest.fixture
def app():
disconnect()
del _connection_settings['default']
app = create_app(...)
...
But it may have other side effects

Context
Coincidentally, I figure-out fix for this problem. #bagerard answer is correct! It works for MongoClient where client's connect is set to True -this is/should be default value.
MongoClient(host=['mongo:27017'], document_class=dict, tz_aware=False, connect=False, read_preference=Primary())
If that is the case, then you have to disconnect database and delete connection settings as #bagerard explains.
Solution
However, if you change MongoClient connection to False, then you don't have to disconnect database and delete connection settings. At the end solution that worked for me was this solution.
def create_app(config=None):
...
app.config['MONGODB_CONNECT'] = False
...
Notes
As I wrote earlier. I found this solution coincidentally, I was trying to solve this problem MongoClient opened before fork. Create MongoClient only after forking. It turned out that it fixes both problems :)
P.S If there are any side effects I'm not aware of them at this point! If you find some then please share them in comments section.

Related

Authenticate with ECE ElasticSearch Sink from Apache Fink (Scala code)

Compiler error when using example provided in Flink documentation. The Flink documentation provides sample Scala code to set the REST client factory parameters when talking to Elasticsearch, https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/elasticsearch.html.
When trying out this code i get a compiler error in IntelliJ which says "Cannot resolve symbol restClientBuilder".
I found the following SO which is EXACTLY my problem except that it is in Java and i am doing this in Scala.
Apache Flink (v1.6.0) authenticate Elasticsearch Sink (v6.4)
I tried copy pasting the solution code provided in the above SO into IntelliJ, the auto-converted code also has compiler errors.
// provide a RestClientFactory for custom configuration on the internally created REST client
// i only show the setMaxRetryTimeoutMillis for illustration purposes, the actual code will use HTTP cutom callback
esSinkBuilder.setRestClientFactory(
restClientBuilder -> {
restClientBuilder.setMaxRetryTimeoutMillis(10)
}
)
Then i tried (auto generated Java to Scala code by IntelliJ)
// provide a RestClientFactory for custom configuration on the internally created REST client// provide a RestClientFactory for custom configuration on the internally created REST client
import org.apache.http.auth.AuthScope
import org.apache.http.auth.UsernamePasswordCredentials
import org.apache.http.client.CredentialsProvider
import org.apache.http.impl.client.BasicCredentialsProvider
import org.apache.http.impl.nio.client.HttpAsyncClientBuilder
import org.elasticsearch.client.RestClientBuilder
// provide a RestClientFactory for custom configuration on the internally created REST client// provide a RestClientFactory for custom configuration on the internally created REST client
esSinkBuilder.setRestClientFactory((restClientBuilder) => {
def foo(restClientBuilder) = restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
override def customizeHttpClient(httpClientBuilder: HttpAsyncClientBuilder): HttpAsyncClientBuilder = { // elasticsearch username and password
val credentialsProvider = new BasicCredentialsProvider
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(es_user, es_password))
httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider)
}
})
foo(restClientBuilder)
})
The original code snippet produces the error "cannot resolve RestClientFactory" and then Java to Scala shows several other errors.
So basically i need to find a Scala version of the solution described in Apache Flink (v1.6.0) authenticate Elasticsearch Sink (v6.4)
Update 1: I was able to make some progress with some help from IntelliJ. The following code compiles and runs but there is another problem.
esSinkBuilder.setRestClientFactory(
new RestClientFactory {
override def configureRestClientBuilder(restClientBuilder: RestClientBuilder): Unit = {
restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
override def customizeHttpClient(httpClientBuilder: HttpAsyncClientBuilder): HttpAsyncClientBuilder = {
// elasticsearch username and password
val credentialsProvider = new BasicCredentialsProvider
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(es_user, es_password))
httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider)
httpClientBuilder.setSSLContext(trustfulSslContext)
}
})
}
}
The problem is that i am not sure if i should be doing a new of the RestClientFactory object. What happens is that the application connects to the elasticsearch cluster but then discovers that the SSL CERT is not valid, so i had to put the trustfullSslContext (as described here https://gist.github.com/iRevive/4a3c7cb96374da5da80d4538f3da17cb), this got me past the SSL issue but now the ES REST Client does a ping test and the ping fails, it throws an exception and the app shutsdown. I am suspecting that the ping fails because of the SSL error and maybe it is not using the trustfulSslContext i setup as part of new RestClientFactory and this makes me suspect that i should not have done the new, there should be a simple way to update the existing RestclientFactory object and basically this is all happening because of my lack of Scala knowledge.
Happy to report that this is resolved. The code i posted in Update 1 is correct. The ping to ECE was not working for two reasons:
The certificate needs to include the complete chain including the root CA, the intermediate CA and the cert for the ECE. This helped get rid of the whole trustfulSslContext stuff.
The ECE was sitting behind an ha-proxy and the proxy did the mapping for the hostname in the HTTP request to the actual deployment cluster name in ECE. this mapping logic did not take into account that the Java REST High Level client uses the org.apache.httphost class which creates the hostname as hostname:port_number even when the port number is 443. Since it did not find the mapping because of the 443 therefore the ECE returned a 404 error instead of 200 ok (only way to find this was to look at unencrypted packets at the ha-proxy). Once the mapping logic in ha-proxy was fixed, the mapping was found and the pings are now successfull.

How to share connection pool with multiprocessing Python

I'm trying to share a psycopg2.pool.(Simple/Threaded)ConnectionPool among multiple python processes. What is the correct way to approach this?
I am using Python 2.7 and Postgres 9.
I would like to provide some context. I want to use a connection pool is because I am running an arbitrary, but large (>80) number of processes that initially use the database to query results, perform some actions, then update the database with the results of the actions.
So far, I've tried to use the multiprocessing.managers.BaseManager and pass it to child processes so that the connections that are being used/unused are synchronized across the processes.
from multiprocessing import Manager, Process
from multiprocessing.managers import BaseManager
PASSWORD = 'xxxx'
def f(connection, connection_pool):
with connection.cursor() as curs: # ** LINE REFERENCED BELOW
curs.execute('SELECT * FROM database')
connection_pool.putconn(connection)
BaseManager.register('SimpleConnectionPool', SimpleConnectionPool)
manager = BaseManager()
manager.start()
conn_pool = manager.SimpleConnectionPool(5, 20, dbname='database', user='admin', host='xxxx', password=PASSWORD, port='8080')
with conn_pool.getconn() as conn:
print conn # prints '<connection object at 0x7f48243edb48; dsn:'<unintialized>', closed:0>
proc = Process(target=f, args=(conn, conn_pool))
proc.start()
proc.join()
** raises an error, 'Operational Error: asynchronous connection attempt underway'
If anyone could recommend a method to share the connection pool with numerous processes it would be greatly appreciated.
You don't. You can't pass a socket to another process. The other process must open the connection itself.

uWSGI, Flask, sqlalchemy, and postgres: SSL error: decryption failed or bad record mac

I'm trying to setup an application webserver using uWSGI + Nginx, which runs a Flask application using SQLAlchemy to communicate to a Postgres database.
When I make requests to the webserver, every other response will be a 500 error.
The error is:
Traceback (most recent call last):
File "/var/env/argos/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 867, in _execute_context
context)
File "/var/env/argos/lib/python3.3/site-packages/sqlalchemy/engine/default.py", line 388, in do_execute
cursor.execute(statement, parameters)
psycopg2.OperationalError: SSL error: decryption failed or bad record mac
The above exception was the direct cause of the following exception:
sqlalchemy.exc.OperationalError: (OperationalError) SSL error: decryption failed or bad record mac
The error is triggered by a simple Flask-SQLAlchemy method:
result = models.Event.query.get(id)
uwsgi is being managed by supervisor, which has a config:
[program:my_app]
command=/usr/bin/uwsgi --ini /etc/uwsgi/apps-enabled/myapp.ini --catch-exceptions
directory=/path/to/my/app
stopsignal=QUIT
autostart=true
autorestart=true
and uwsgi's config looks like:
[uwsgi]
socket = /tmp/my_app.sock
logto = /var/log/my_app.log
plugins = python3
virtualenv = /path/to/my/venv
pythonpath = /path/to/my/app
wsgi-file = /path/to/my/app/application.py
callable = app
max-requests = 1000
chmod-socket = 666
chown-socket = www-data:www-data
master = true
processes = 2
no-orphans = true
log-date = true
uid = www-data
gid = www-data
The furthest that I can get is that it has something to do with uwsgi's forking. But beyond that I'm not clear on what needs to be done.
The issue ended up being uwsgi's forking.
When working with multiple processes with a master process, uwsgi initializes the application in the master process and then copies the application over to each worker process. The problem is if you open a database connection when initializing your application, you then have multiple processes sharing the same connection, which causes the error above.
The solution is to set the lazy configuration option for uwsgi, which forces a complete loading of the application in each process:
lazy
Set lazy mode (load apps in workers instead of master).
This option may have memory usage implications as Copy-on-Write semantics can not be used. When lazy is enabled, only workers will be reloaded by uWSGI’s reload signals; the master will remain alive. As such, uWSGI configuration changes are not picked up on reload by the master.
There's also a lazy-apps option:
lazy-apps
Load apps in each worker instead of the master.
This option may have memory usage implications as Copy-on-Write semantics can not be used. Unlike lazy, this only affects the way applications are loaded, not master’s behavior on reload.
This uwsgi configuration ended up working for me:
[uwsgi]
socket = /tmp/my_app.sock
logto = /var/log/my_app.log
plugins = python3
virtualenv = /path/to/my/venv
pythonpath = /path/to/my/app
wsgi-file = /path/to/my/app/application.py
callable = app
max-requests = 1000
chmod-socket = 666
chown-socket = www-data:www-data
master = true
processes = 2
no-orphans = true
log-date = true
uid = www-data
gid = www-data
# the fix
lazy = true
lazy-apps = true
As an alternative you might dispose the engine. This is how I solved the problem.
Such issues may happen if there is a query during the creation of the app, that is, in the module that creates the app itself. If that states, the engine allocates a pool of connections and then uwsgi forks.
By invoking 'engine.dispose()', the connection pool itself is closed and new connections will come up as soon as someone starts making queries again. So if you do that at the end of the module where you create your app, new connections will be created after the UWSGI fork.
I am running a flask app using gunicorn on Heroku. My application started exhibiting this problem when I added the --preload option to my Procfile. When I removed that option, my application resumed functioning as normal.
Not sure whether to add this as an answer to this question or ask a separate question and put this as an answer there. I was getting this exact same error for reasons that are slightly different from the people who have posted and answered. In my setup, I using gunicorn as a wsgi for a Flask application. In this application, I was offloading some intense database operations off to a celery worker. The error would come from the celery worker.
From reading a lot of the answers here and looking at the psycopg2 as well as sqlalchemy session documentation, it became apparent to me that it is a bad idea to share an SQLAlchemy session between separate processes (the gunicorn worker and the sqlalchemy worker in my case).
What ended up solving this for me was creating a new session in the celery worker function so it used a new session each time it was called and also destroying the session after every web request so flask used a session per request. The overall solution looked like this:
Flask_app.py
#app.teardown_appcontext
def shutdown_session(exception=None):
session.close()
celery_func.py
#celery_app.task(bind=True, throws=(IntegrityError))
def access_db(self,entity_dict, tablename):
with Session() as session:
try:
session.add(ORM_obj)
session.commit()
except IntegrityError as e:
session.rollback()
print('primary key violated')
raise e

FTP timing out on Heroku

Using Apache Commons FTPClient in a Scala application works as expected on my local machine, but always times out when running on Heroku. Relevant code:
val ftp = new FTPClient
ftp.connect(hostname)
val success = ftp.login(username, pw)
if (success) {
ftp.changeWorkingDirectory(path)
//a logging statement here WILL print
val engine = ftp.initiateListParsing(ftp.printWorkingDirectory)
//a logging statement here will NOT print
while (engine.hasNext) {
val files = engine.getNext(5)
//do stuff with files
}
}
By adding some loggings I've confirmed the client is successfully connecting, logging in, and changing the directory. But stops when trying to begin retrieving files (either via the above paged access or unpaged with ftp.listFiles) and throws a connection time out exception (after about 15 minutes).
Any reason the above code works fine locally but not on Heroku?
Turns out FTP has two modes, active and passive, and Apache-Commons' FTPClient defaults to active. Heroku's firewall presumably doesn't allow active FTP (which is why this was functioning properly locally but not once deployed) - changing to passive mode by adding ftp.enterLocalPassiveMode() resolved the issue.

Squeryl 0.9.5 (with Lift 2.4) not releasing database connections/pools

Following the recommended transaction setup for Squeryl, in my Boot.scala:
import net.liftweb.squerylrecord.SquerylRecord
import org.squeryl.Session
import org.squeryl.adapters.H2Adapter
SquerylRecord.initWithSquerylSession(Session.create(
DriverManager.getConnection("jdbc:h2:lift_proto.db;DB_CLOSE_DELAY=-1", "sa", ""),
new H2Adapter
))
The first startup works fine. I can connect via H2's web-interface and if I use my app, it updates the database appropriately. However if I restart jetty without restarting the JVM, I get:
java.sql.SQLException: No suitable driver found for jdbc:h2:lift_proto.db;DB_CLOSE_DELAY=-1
The same result is had if I replace "DB_CLOSE_DELAY=-1" with "AUTO_SERVER=TRUE", or remove it entirely.
Following the recommendations on the Squeryl list, I tried C3P0:
import com.mchange.v2.c3p0.ComboPooledDataSource
val cpds = new ComboPooledDataSource
cpds.setDriverClass("org.h2.Driver")
cpds.setJdbcUrl("jdbc:h2:lift_proto")
cpds.setUser("sa")
cpds.setPassword("")
org.squeryl.SessionFactory.concreteFactory =
Some(() => Session.create(
cpds.getConnection, new H2Adapter())
)
This produces similar behavior:
WARNING: A C3P0Registry mbean is already registered. This probably means that an application using c3p0 was undeployed, but not all PooledDataSources were closed prior to undeployment. This may lead to resource leaks over time. Please take care to close all PooledDataSources.
To be sure it wasn't anything I was doing which was causing this, I started and stopped the server without calling a transaction { } block. No exceptions were thrown. I then added to my Boot.scala:
transaction { /* Do nothing */ }
And the exception was once again thrown (I'm assuming because connections are lazy). So I moved the db initialization code to its own file away from Lift:
SessionFactory.concreteFactory = Some(()=>
Session.create(
java.sql.DriverManager.getConnection("jdbc:h2:mem:test", "sa", ""),
new H2Adapter
))
transaction {}
Results were unchanged. What am I doing wrong? I cannot find any mention of needing to explicitly close connections or sessions in the Squeryl documentation, and this is my first time using JDBC.
I found mention of the same issue here on the Lift google group, but no resolution.
Thanks for any help.
When you say you are restarting Jetty, I think what you're actually doing is reloading your webapp within Jetty. Neither the h2 database or C3P0 will automatically shut down when your app reloads, which explains the errors you are receiving when Lift tries to initialize them a second time. You don't see the error when you don't create a transaction block because both h2 and C3P0 are initialized when the first DB connection is retrieved.
I tend to use BoneCP as a connection pool myself. You can configure the minimum number of pooled connections to be > 1, which will stop h2 from shutting down without the need for DB_CLOSE_DELAY=-1. Then you can use:
LiftRules.unloadHooks append { () =>
yourPool.close() //should destroy the pool and it's associated threads
}
That will close all of the connections when Lift is shutdown, which should properly shutdown the h2 database as well.