Options for tuning mongoose - mongodb

I am new to MongoDB and having a hard time understanding the different flag used inside connect() method which are passed inside the second object argument.
const connectDB = async() => {
const conn = await mongoose.connect(process.env.MONGO_URI,{
useNewUrlParser: true, useCreateIndex:true, useFindAndModify:false, useUnifiedTopology: true
});
}

There is documentation available for mongoose.connect.
[options] «Object» passed down to the MongoDB driver's connect()
function, except for 4 mongoose-specific options explained below.
[options.bufferCommands=true] «Boolean» Mongoose specific option. Set to false to disable buffering on all models associated with this
connection.
[options.dbName] «String» The name of the database we want to use. If not provided, use database name from connection string.
[options.user] «String» username for authentication, equivalent to options.auth.user. Maintained for backwards compatibility.
[options.pass] «String» password for authentication, equivalent to options.auth.password. Maintained for backwards compatibility.
[options.autoIndex=true] «Boolean» Mongoose-specific option. Set to false to disable automatic index creation for all models associated
with this connection.
[options.useNewUrlParser=false] «Boolean» False by default. Set to true to opt in to the MongoDB driver's new URL parser logic.
[options.useUnifiedTopology=false] «Boolean» False by default. Set to true to opt in to the MongoDB driver's replica set and sharded
cluster monitoring engine.
[options.useCreateIndex=true] «Boolean» Mongoose-specific option. If true, this connection will use createIndex() instead of
ensureIndex() for automatic index builds via Model.init().
[options.useFindAndModify=true] «Boolean» True by default. Set to false to make findOneAndUpdate() and findOneAndRemove() use native
findOneAndUpdate() rather than findAndModify().
[options.reconnectTries=30] «Number» If you're connected to a single server or mongos proxy (as opposed to a replica set), the
MongoDB driver will try to reconnect every reconnectInterval
milliseconds for reconnectTries times, and give up afterward. When the
driver gives up, the mongoose connection emits a reconnectFailed
event. This option does nothing for replica set connections.
[options.reconnectInterval=1000] «Number» See reconnectTries option above.
[options.promiseLibrary] «Class» Sets the underlying driver's promise library.
[options.poolSize=5] «Number» The maximum number of sockets the MongoDB driver will keep open for this connection. By default,
poolSize is 5. Keep in mind that, as of MongoDB 3.4, MongoDB only
allows one operation per socket at a time, so you may want to increase
this if you find you have a few slow queries that are blocking faster
queries from proceeding. See Slow Trains in MongoDB and Node.js.
[options.bufferMaxEntries] «Number» The MongoDB driver also has its own buffering mechanism that kicks in when the driver is
disconnected. Set this option to 0 and set bufferCommands to false on
your schemas if you want your database operations to fail immediately
when the driver is not connected, as opposed to waiting for
reconnection.
[options.connectTimeoutMS=30000] «Number» How long the MongoDB driver will wait before killing a socket due to inactivity during
initial connection. Defaults to 30000. This option is passed
transparently to Node.js' socket#setTimeout() function.
[options.socketTimeoutMS=30000] «Number» How long the MongoDB driver will wait before killing a socket due to inactivity after
initial connection. A socket may be inactive because of either no
activity or a long-running operation. This is set to 30000 by default,
you should set this to 2-3x your longest running operation if you
expect some of your database operations to run longer than 20 seconds.
This option is passed to Node.js socket#setTimeout() function after
the MongoDB driver successfully completes.
[options.family=0] «Number» Passed transparently to Node.js' dns.lookup() function. May be either 0, 4, or 6. '4' means use IPv4
only, '6' means use IPv6 only, '0' means try both.

Related

Psycopg2 idle session timeout with ThreadedConnectionPool

I am trying to setup ThreadedConnecionPool in my AWS Lambda, Postgres 14 is being used. The lambda might die abruptly and I want to make sure that the Postgres server closes the connection after 1 minute of idle activity, for example.
idle_session_timeout parameter states the following:
Be wary of enforcing this timeout on connections made through connection-pooling software or other middleware, as such a layer may not react well to unexpected connection closure. It may be helpful to enable this timeout only for interactive sessions, perhaps by applying it only to particular users.
Is PgBouncer the right answer here? Or is it safe to apply this setting in my case? Or is there a better approach? What I want to make sure is that the server does its own cleanup of connections created by the lambda ThreadedConnectionPool if it so happens the lambda died.
Are you explicitly closing the connection when you are done with it? If not and you just let the connection go out of scope, maybe the garbage collection system is just not very aggressive about cleaning it up.
Pgbouncer could be helpful for this, but it would have to be run in transaction pooling mode (because the default session pooling mode can't be very useful when the sessions don't get closed promptly), and that does impose some restrictions on what you can do, like prepared transactions.
Or, if you created a database user for your lambdas to use, then you could apply the idle timeout only to that user, and so prevent it from killing administrator, monitoring, or developer connections. But combining the pooler and the timeout is probably not needed, or advisable.

Postgres: processes terminated after connetion break / invalidation

I don't understand some of Postgres mechanism and it makes me quite upset.
I usually use DBeaver as SQL client to query external pg base. If run create.. or insert.. queries and then connection for some reason is broken or invalidated, the pid is still running and finishes transaction.
But for some more complicated PL/pgSQL functions (with temp tables, loops, inserts, etc.) we wrote, breaking connection always causes process termination (it disappears from session list just before making next sql operation, eg. inserting a row in logtable). No matter if it's DBeaver editor or psql command.
I know that maybe disconnecting is critical problem, which should be eliminated and maybe I shouldn't expect process to successfully continue, but I do:) Or just to know why it happened and is it possible to prevent it?
If the network connection fails, the database server can detect that in two ways:
if it tries to send data to the client, it will figure out pretty quickly that the connection is down
if it tries to receive data from the client, it will only notice when the kernel's TCP keepalive mechanism has determined that the connection is down
When you say that sometimes execution of a function is terminated right away, I would say that is because the function returned data to the client.
In the case where a query keeps running, it is not attempting to return any data yet.
There is no cure for the former, but in PostgreSQL v14 you can prevent the latter by setting client_connection_check_interval. In addition, you have to set the PostgreSQL keepalive parameters so that the dead connection becomes known quickly.
See my article for more.

What to do after a query when auto_commit is disabled

In some scenarios we should setAutoCommit(false) before query, see here https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor and When does the PostgreSQL JDBC driver fetch rows after executing a query? .
But none of these topics mentioned how to do after query, when ResultSet and Statement is closed but Connection is not (may be recycled by ConnectionPool or DataSource).
I have these choices:
Do nothing (keep autoCommit = false for next query)
set autoCommit = true
commit
rollback
Which one is the best practice?
Even queries are executed in a transaction. If you started a transaction (which implicitly happened when you executed the query), then you should also end it. Generally, doing nothing would - with a well-behaved connection pool - result in a rollback when your connection is returned to the pool. However, it is best not the rely on such implicit behaviour, because not all connection pools or drivers will adhere to it. For example the Oracle JDBC driver will commit on connection close (or at least, it did so in the past, I'm not sure if it still does), and it might not be the correct behaviour for your program. Explicitly calling commit() or rollback() will clearly document the boundaries and expectations of your program.
Though committing or rolling back a transaction that only executed a query (and thus did not modify the database), will have the same end result, I would recommend using commit() rather than rollback(), to clearly indicate that the result was successful. For some databases, committing might be cheaper than rollback (or vice versa), but such systems usually have heuristics that will convert a commit to rollback (or vice versa, whatever is 'cheaper'), if the result would be equivalent.
You generally don't need to switch auto-commit mode when you're done. A well-behaved connection pool should do that for you (though not all do, or sometimes you need to explicitly configure this). Double check the behaviour and options of your connection pool to be sure.
If you want to continue using a connection yourself (without returning to the pool), then switching back to auto-commit mode is sufficient: calling setAutoCommit(true) with an active transaction will automatically commit that transaction.
It depends what you want to do afterwards. If you want to return to autocommit mode after the operation:
conn.setAutoCommit(true);
This will automatically commit the open transaction.

How to abort mongo operation after given time limit using the pymongo MongoClient constructor?

According to this response you can set a time limit for a query operation via find() parameter or a collection method:
cursor = db.collection.find(max_time_ms=1)
or
cursor = db.collection.find().max_time_ms(1)
The doc says:
max_time_ms (optional): Specifies a time limit for a query operation. If the specified time is exceeded, the operation will be aborted and ExecutionTimeout is raised. Pass this as an alternative to calling max_time_ms() on the cursor.
We're currently experiencing a problem that a query runs for ~30 minutes before it eats all the RAM and the server dies. I hope this parameter gives a hard limit on the query and after the given time the server gives up.
Since our app is full of finds and cursors: is there a way how to set this parameter directly in the MongoClient constructor?
The doc says:
socketTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait for a response after sending an ordinary (non-monitoring) database operation before concluding that a network error has occurred. Defaults to None (no timeout).
connectTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait during server monitoring when connecting a new socket to a server before concluding the server is unavailable. Defaults to 20000 (20 seconds).
serverSelectionTimeoutMS: (integer) Controls how long (in milliseconds) the driver will wait to find an available, appropriate server to carry out a database operation; while it is waiting, multiple server monitoring operations may be carried out, each controlled by connectTimeoutMS. Defaults to 30000 (30 seconds).
...couldn't find another timeout and none of these seem to be the equivalent of max_time_ms. Am I missing something?

How do I place a read lock on MongoDB?

My application needs to access a Mongo db where if more than one process/thread is reading from a specific collection, bad things will happen.
I need to restrict the ability of a group of processes to read from the collection (or db, if need be). So for example, if there are multiple processes trying to read from the db, they read sequentially, not in parallel.
This could be done in the driver level. If you set connection pool size to 1 then all access to to database will be in sequence.
In nodejs you can set the driver as:
MongoClient.connect(url, {
poolSize: 1
});
From the documentation:
poolSize, this allows you to control how many tcp connections are
opened in parallel. The default value for this is 5 but you can set it
as high as you want. The driver will use a round-robin strategy to
dispatch and read from the tcp connection.