Limits on maximum parallel trigger executions of MongoDB Stitch - mongodb-atlas

I am trying to use MongoDB Stitch for real-time analytics. The MongoDB Stitch documentation documentation states the following note:
Stitch limits the execution of trigger functions to a rate of 50 executions per second across all triggers in an application. If additional triggers fire beyond this threshold, Stitch adds their associated functions to a queue and executes the functions once capacity becomes available.
I am looking for more clarity on the statement above, and the questions are listed below:
Is the limit of 50 executions per second is bound by the capacity of Atlas MongoDB instance?
If an execution takes 2 seconds to process, does the limit of 50 executions per second still hold good?
Is there an upper limit on the number of pending operations in the queue?

The limit of 50 executions per second is just to protect Stitch and isn’t related to the Atlas instance size.
The function execution time and the 50 executions per second aren’t really related. This is just trying to say that 50 jobs can be added to the queue every second which isn’t dependent on the amount of time a function takes to run.
There is a max limit on the number of jobs that can be added to the queue, but it’s not really a max. Once it hits that limit it just begins to slow down adding jobs so that it gives the consumers time to catch up.

Related

Google Firestore Scalability Limit

As per the Google Firebase documentation, Firestore will support upto 1,000,000 simultaneous connections and 10,000 writes per second per database. We have ran a scalalbility test to check the Firestore is viable for our solution and encountering the errors like "Connect to firebase.googleapis.com:443:Connection timed out" during write operations for 40K samples, "firestore.googleapis.com:443:failed to respond" during GET operations while the 40K samples writing. Would like to understand the Firestore limits and its scalalbility
Running JMeter script to write data to Firebase (Firestore Blaze Plan purchased for testing) on multiple VMs and PCs which are connected to wired network to check the scalalbility. The JMeter script write data to Firebase using REST API PATCH and each script on PC/VM writes 5K data in a period 5 minutes. There are total of 8 PC/VM which writes 40K data to Firebase. During this we also GET 500 records in 5 minutes 2 times a day. During this test we are htting the Firebase failures
The errors are "Connect to firebase.googleapis.com:443:Connection timed out" during write operations for 40K samples, "firestore.googleapis.com:443:failed to respond" and Connection reset.
Are you considering the other limits for your specific operations?
According to https://firebase.google.com/docs/firestore/quotas
For example, for the write operation:
Maximum writes per second per database = 10,000 (up to 10 MiB per second)
Maximum write rate to a document= 1 per second
Maximum write rate to a collection in which documents contain sequential values in an indexed field = 500 per second
Maximum number of writes that can be passed to a Commit operation or performed in a transaction = 500
During GET operations you have the limits: Maximum number of exists(), get(), and getAfter() calls per request:
10 for single-document requests and query requests.
20 for multi-document reads, transactions, and batched writes. The previous limit of 10 also applies to each operation.
For example, imagine you create a batched write request with 3 write operations and that your security rules use 2 document access calls to validate each write. In this case, each write uses 2 of its 10 access calls and the batched write request uses 6 of its 20 access calls.
Exceeding either limit results in a permission denied error.
Some document access calls may be cached, and cached calls do not count towards the limits.
I think that some parameter could be causing the abortion of these connections.

What is MongDB (WiredTiger) update query's default lock wait time?

I have embedded sub documents array in MongoDB document and multiple users can try add sub documents into the array. I use update($push) query to add document into array, but if multiple users try to add entry from UI, how do I make sure second $push doesn't fail due to lock by first? I would have chance of only couple of users adding entry at same into single document, so not to worry about the case where 100s of users exist. What is the default wait time of update in WiredTiger, so 2nd push doesn't abort immediately and can take upto 1 sec, but $push should complete successful?
I tried finding default wait time in MongoDB and WiredTiger docs, I could find transaction default wait times, but update query.
Internally, WiredTiger uses an optimistic locking method. This means that when two threads are trying to update the same document, one of them will succeed, and the other will back off. This will manifest as a "write conflict" (see metrics.operation.writeConflicts).
This conflict will be retried transparently, so from a client's point of view, the write will just take longer than usual.
The backing off algorithm will wait longer the more conflict it encounters, from 1 ms, and capped at 100 ms per wait. So the more conflict it encounters, it will eventually wait at 100 ms per retry.
Having said that, the design of updating a single document from multiple sources will have trouble scaling in the future due to two reasons:
MongoDB has a limit of 16 MB document size, so the array cannot be updated indefinitely.
Locking issues, as you have readily identified.
For #2, in a pathological case, a write can encounter conflict after conflict, waiting 100 ms between waits. There is no cap limit on the number of waits, so it can potentially wait for minutes. This is a sign that the workload is bottlenecked on a single document, and the app essentially operates on a single-thread model.
Typically the solution is to not create artificial bottlenecks, but to spread out the work across many different documents, perhaps in a separate collection. This way, concurrency can be maintained.

MongoDB TTL doesn't delete documents if under load

Use case
I am using MongoDB to persist messages from a message queue system (e. g. RabbitMQ / Kafka). Each message has a timestamp and based on that timestamp I want to expire the documents 1 hour afterwards. Therefore I've got a deleteAt field which is indexed and has set expireAfterSeconds: 0. Everything works fine, except if MongoDB is under heavy load.
We are inserting roughly 5-7k messages / second into a single replica set. The TTL Thread seems to be way slower than the rate of message coming in and thus the storage is quickly growing (which we want to avoid with TTLs).
To describe the behaviour more exactly, when I sort the messages by deleteAt ascending (oldest date first) I can see that it sometimes does not delete any of those messages for hours. Because of this observation I believe that the TTL thread sometimes is stuck or not active at all.
My question
What could I do to ensure that the TTL thread is not negatively impacted by the rate of messages coming in? According to our metrics our only bottleneck seems to be CPU, even though the SSD disk I/O is pretty high too.
Do I need to tune something (e. g. give MongoDB more threads for document deletion) so that the TTL thread can keep up with the write rate?
I believe I am facing a known bug as described in MongoDB's Jira Dashboard: https://jira.mongodb.org/browse/SERVER-19334
From https://docs.mongodb.com/manual/core/index-ttl/:
The background task that removes expired documents runs every 60 seconds. As a result, documents may remain in a collection during the period between the expiration of the document and the running of the background task.
Because the duration of the removal operation depends on the workload of your mongod instance, expired data may exist for some time beyond the 60 second period between runs of the background task.
I'm not aware of any way to tune that TTL thread, and I suspect you'll need to run your own cron to do batched deletes.
The other thing to look at might be what's taking up CPU and IO and see if there's any way of reducing that load.
You can create the index with "sparse", this should perform the clean up on a separate thread in the background.

How I can speed up mongodb?

I have a crawler which consits of 6+ processes. Half of processes are masters which crawl web and when they find jobs they put it inside jobs collection. In most times masters save 100 jobs at once (well I mean they get 100 jobs and save each one of them separatly as fast as possible.
The second half of processes are slaves which constantly check if new jobs of some type are available for him - if yes it marks them in_process (it is done using findOneAndUpdate), then processe the job and save the result in another collection.
Moreover from time to time master processes have to read a lot of data from jobs table to synchronize.
So to sum up there are a lot of read operations and write operations on db. When db was small it was working ok but now when I have ~700k job records (job document is small, it has 8 fields and has proper indexes / compound indexes) my db slacks. I can observe it when displaying "stats" page which basicaly executes ~16 count operations with some conditions (on indexed fields).
When masters/slaves processes are not runing stats page displays after 2 seconds. When masters/slaves are runing same page displays around 1 minute and sometime is does not display at all (timeout).
So how I can make my db to handle more requests per second? I have to replicate it?

Memcache or Queue for Hits Logging

We have a busy website, which needs to log 'hits' about certain pages or API endpoints which are visited, to help populate stats, popularity grids, etc. The hits were logging aren't simple page hits, so can't use log parsing.
In the past, we've just directly logged to the database with an update query, but under heavy concurrency, this creates a database load that we don't want.
We are currently using Memcache but experiencing some issues with the stats not being quite accurate due to non-atomic updates.
So my question:
Should we continue to use Memcache but improve atomic increments:
1) When page is hit, create a memcache key such as "stats:pageid:3" and increment this each time we hit atomically
2) Write a batch script to cycle through all the memcache keys and create a batch update to database once every 10 mins
PROS: Less database hits, as we're only updating once per page per 10 mins (with however many hits in that 10 min period)
CONS: We can atomically increment the individual counters, but would still need a memcache key to store which pageids have had hits, to loop through and log. This won't be atomic, so when we flush the data to DB and reset everything, things may linger in this key. We could lose up to 10 mins of data.
OR Use a queue/task system:
1) When page is hit, add a job to the task queue
2) Task queue can then be rate limited and in the background process these 'hits' to the database.
PROS: Easy to code, we can scale up queue workers if required.
CONS: We're still hitting the database once per hit as each task would be processed individually, rather than 'summing' up all the hits.
Or any other suggestions?
OR: use something designed for recording stats at high-traffic levels, such as StatsD & Graphite. The original StatsD is written in Javascript on top of NodeJS, which can be a little complex to setup (but there are easier ways to install it, with a Docker container), or you can use a work-alike (not using NodeJS), that does the same function, such as one written in GoLang.
I've used the original StatsD and Graphite pair to great effect, plus it's making the pretty graphs (this was for 10's of millions of events per day).