Mongo cursorFinalizerEnabled performance effect - mongodb

I'm using Spring Boot with Mongo 3.4 (in cluster with MongoS)
The mongo client options configuration has the option cursorFinalizerEnabled.
According to documentation, this flag allows to:
Mongo Template closes the cursors. Making this true, spawns a thread
on every new MongoClient.
Attempts to clean up DBCursors that are not
closed.
MongoClientOptions options = MongoClientOptions.builder()
.cursorFinalizerEnabled(false)
.build();
What is the best practice? true or false? performance effect?

The default value of cursorFinalizerEnabled is true (see MongoClientOptions). So, your MongoClient will spawn this thread (and apply this behaviour) unless you choose not to.
This feature provides a safety net for client code which is (or might be) casual about handling cursors. So, depending on how you treat your cursors it might be useful or it might be a no-op.
The standard advice is: if your client code ensures that the close method of DBCursor is always invoked then you can set this to false. Otherwise, just accept the default.
As for the performance implications; it's hard to measure that. If your client code does not leave any open, unused cursors then it's a no-op but if your client code does leave open, unused cursors then this flag will help to reduce the impact on shared resources. Spawning a single thread to run this harvester seems like a low cost so if you are at all unsure about how your client code handles cursors then it's worth enabling it.
And, of course, as with all performance questions; the most reliable way of detemining the performance effect (if any) is to test with and without this flag and then compare :)

Related

Stateful session memory management in Drools

Just have a general question regarding memory management when using "Stateful sessions" in Drools. For context, I'm specifically looking to use a ksession in "Stream" mode, together with fireUntilHalt() to process an infinite stream of events. Each event is timestamped, however I'm mainly writing rules using length-based windows notation (i.e. window:length()) and the from accumulate syntax for decision making.
The docs are a little vague though about how memory management works in this case. The docs suggest that using temporal operators the engine can automatically remove any facts/events that can no longer match. However, would this also apply to rules that only use the window:length()? Or would my system need to manually delete events that are no longer applicable, in order to prevent running OOM?
window:time() calculates expiration, so works for automatic removal. However, window:length() doesn't calculate expiration, so events would be retained.
You can confirm the behaviour with my example:
https://github.com/tkobayas/kiegroup-examples/tree/master/Ex-cep-window-length-8.32
FYI)
https://github.com/kiegroup/drools/blob/8.32.0.Final/drools-core/src/main/java/org/drools/core/rule/SlidingLengthWindow.java#L145
You would need to explicitly delete them or set #expires (if it's possible for your application to specify expiration with time) to avoid OOME.
Thank you for pointing that the document is not clear about it. I have filed a doc JIRA to explain it.
https://issues.redhat.com/browse/DROOLS-7282

Parameter sniffing / bind peeking in PostgresSQL

The Prepare and Execute combination in PostgreSQL permit the use of bound parameters. However, Prepare does not produce a plan optimized for one set of parameter bindings that can be reused with a different set of parameters bindings. Does anybody have pointers on implementing such functionality? With this, the plan would be optimized for the given set of parameter bindings but could be reused for another set. The plan might not be efficient for the subsequent set, but if the plan cost was recomputed using the new parameter bindings, it might be found to be efficient.
Reading and using parameter binding values for cardinality estimation is called "parameter sniffing" in SQL Server and "bind peeking" in Oracle. Basically, has anybody done anything similar in PostgreSQL.
PostgreSQL uses a heuristic to decide whether to do "bind peeking". It does peeking the first 5 times (I think it is) that a prepared statement is executed, and if none of those lead to better (expected-to-be-better) plans than the generic plan was, it stops checking in the future.
Starting in v12, you can change this heuristic by setting plan_cache_mode.
Note that some drivers implement their own heuristics--just because you call the driver's prepare method doesn't mean it actually transmits this to the server as a PREPARE. It might instead stash the statement text away, wait until you execute, and then quote/escape your parameters and bundle them up with your previously pseudo-prepared statement and send them to the server in one packet. That is, they might treat the prepare/execute separation simply as a way to prevent SQL injections, not as a way to increase performance.

How to use "Try" Postgres Advisory Locks

I am experiencing some unexpected (to me) behavior using pg_try_advisory_lock. I believe this may be connection pooling / timeout related.
pg_advisory_lock is working as expected. When I call the function and the desired lock is already in use, my application waits until the specified command timeout on the function call.
however, when i replace with pg_try_advisory_lock and instead check the result of this function (true / false) to determine if the lock was acquired some scenario is allowing multiple processes (single threaded .net core deployed to ECS) to acquire "true" on the same lock key at the same time.
in C# code, I have implemented within an IDisposable and make my call to release the lock and dispose of the underlying connection on disposal. This is the case both for my calls to pg_advisory_lock and pg_try_advisory_lock. all of the work that needs to be synchronized happens inside a using block.
my operating theory is that the settings around connection pooling / timeouts are at play here. since the try call doesnt block, the session context for the lock "disposes" at the postgres - perhaps as a result of the connection being idle(?).
if that is the cause, the simplest solution seems to be to disable any kind of pooling for the connections used in try locking. but since pooling is just a theory at this point, it seems a bit early to start targeting a specific solution.
any ideas what may be the cause?
Example of the C#
using (Api.Instance.Locking.TryAcquire(someKey1, someKey2, out var acquired))
{
if (acquired)
{
// do some locked work
}
}
Under the hood. TryAcquire is calling
select pg_try_advisory_lock as acquired from pg_try_advisory_lock(#key1,#key2)
This turned out to be kind of dumb. No changes to pooling where required.
I am using Dapper and NpgSql libraries. The NpgsqlConnection returns to a closed state after being used to .Query() unless the connection is explicitly opened.
This was impacting both my calls to try / blocking versions of the advisory lock calls albeit in a less adverse way in the blocking capacity.

Mongodb tailable cursor - Isn't that bad practice to have a continually spinning loop?

I'm looking into the best way to have my app get notified when a collection is updated in mongo. From everything I read on the interwebs, the standard practice is to use a capped collection with a tailable cursor and here's a snippet from mongodb's docs on tailable cursors.
I notice in there snippet that they have a continuous while loop that never stops. Doesn't this seem like a bad practice? I can't imagine this would perform well when scaling.
Does anyone have any insight as to how this could possibly scale and still be performant? Is there something i'm not understanding?
EDIT
So this is a good example where i see the stream is just open and since the stream is never closed, it just has a listener listening. That makes sense to me i guess.
I'm also looking at this mubsub implementation where they use a setTimeout with 1 second pause.
Aren't these typically bad practices - to leave a stream open or to use a setTimeout like that? Am i just being old school?
I notice in there snippet that they have a continuous while loop that never stops. Doesn't this seem like a bad practice?
Looks like it to me as well, yes.
Does anyone have any insight as to how this could possibly scale and still be performant?
You can set the AwaitData flag and make the more() call blocking for some time - it won't block until data is available though, but it will block for some time. Requires server support (from v. 1.6 or so) That is also what's being done in the node.js example you posted ({awaitdata:true}).
where they use a setTimeout with 1 second pause.
The way I read it, they retry to get the cursor back when lost in regular intervals and return an error iff that failed for a full second.
Aren't these typically bad practices - to leave a stream open [...]?
You're not forgetting the stream (that would be bad), you keep using it - that's pretty much the definition of a tailable cursor.

MongoDB 2.6: maxTimeMS on an instance/database level

MongoDB 2.6 introduced the .maxTimeMS() cursor method, which allows you to specify a max running time for each query. This is awesome for ad-hoc queries, but I wondered if there was a way to set this value on a per-instance or per-database (or even per-collection) level, to try and prevent locking in general.
And if so, could that value then be OVERWRITTEN on a per-query basis? I would love to set an instance level timeout of 3000ms or thereabouts (since that would be a pretty extreme running time for queries issued by my application), but then be able to ignore it if I had a report to run.
Here's the documentation from mongodb.org, for reference: http://docs.mongodb.org/manual/reference/method/cursor.maxTimeMS/#behaviors
Jason,
Currently MongoDB does not support a global / universal "maxTimeMS". At the moment this option is applicable to the individual operations only. If you wish to have such a global setting available in MongoDB, I would suggest raising a SERVER ticket at https://jira.mongodb.org/browse/SERVER along with use-cases that can take advantage of such setting.
I know this is old but came here with a similar question and decided to post my findings, The timeout, as a global parameter, is supported by the drivers as part of the connection string, which makes sense because it's the driver the one that can control this global parameters, here you can find documentation about this: https://docs.mongodb.com/manual/reference/connection-string/, but each driver can handle this slightly different (c# for example uses Mongo Settings parameter for this, python has it as parameters in the init constructor, etc)
To test this you can start a test server using mtools like this:
mlaunch --replicaset --name testrepl --nodes 3 --port 27000
Then an example in python will be like:
from pymongo import MongoClient
c = MongoClient(host="mongodb://localhost:27000,localhost:27001,localhost:27002/?replicaSet=testrepl&wtimeoutMS=2000&w=3")
c.test_database.col.insert({ "name": "test" })
I'm using the URI method so this can be used in other drivers, but Python also supports the parameters w and wtimeout, in this example all the write operations will be defaulted to 2 segs and 3 nodes have to be confirmed before returning, if you restart the database and use the wtimeout of 1 (meaning 1 ms) you will see an exception because the replication will take a bit longer to initialize the first time you execute the python script.
Hope this helps others coming with the same question.