MongoDB Aggregate is very slow

MongoDB Aggregate is very slow - mongodb

The Admin page shows a graph of the DAU, the data generated. (Data related to statistics.) Simple figures, including graphs, were implemented using only aggregate functions without application code writing. But as the number of data went over 50,000, it started to take a very long time. (10 seconds to 50 seconds)
Administration pages usually query data by date (createdAt). $group, $addField is frequently used.
In my collection, 'write' occurs frequently more than 'read'. Three new documents are generated per second. More data will be added quickly in the future. I hesitate to create createdAt because of this problem.
If 'writing' occurs frequently like my database, do I have to request and process the request to MongoDB several times? Or should I add indexes and optimize queries to use aggregates efficiently?

Related

Unexpectedly High no. of Reads

I am using Google Cloud Functions and read, write operations are performed on the Firestore thru these Cloud Functions. We are seeing unexpectedly high number of read operations on Firestore, the source of which I am unable to figure out.
Not more than 20K documents are generated on a daily basis. But the daily read count is usually more than 25,000,000
What I am looking for, is ways to identify the root cause of these high number of reads in the Cloud Functions.
To start with, I have captured the size of the results of all the Firestore get() methods in Cloud Functions. But the sum total of all the sizes is much much much lower than the read count I mentioned above.
Need suggestions on ways/practices to identify the source from where these high reads are generating.

You can use a SnapshotListener as a workaround, which allows us to listen for changes in real-time.
You will be charged for readings as if we had sent a new query if the listener is disconnected for more than 30 minutes. If the listener is disconnected every 31 minutes in the worst-case scenario, we will be charged 50 reads each time.
As a result, this technique is only practicable when the listener is not frequently disconnected.
According to the documentation, I found you can reduce the number of reads using get().
In each document in the collection, you must add a new property named lastModified of type Date. When you create a new document or edit an existing one, you must use FieldValue.serverTimestamp() to set or update the field's value.
Reviewing Firebase documentation, I found that high read or write rates to lexicographically close documents need to be avoided to avoid contention faults in your application. Hotspotting is the term for this problem, and it can occur if your program does any of the following:
Creates new documents at a rapid rate and assigns its own IDs that
are monotonically rising.
A scatter algorithm is used by Cloud Firestore to assign document
IDs.
If you use automated document IDs to create new documents, you must
not see hotspotting on writes.
In a collection with few documents, creates new documents at a fast
rate.
Creates new documents at a rapid rate with a monotonically growing
field, such as a timestamp.
Deletes a large number of documents from a collection.
Writes to the database at a rapid rate without growing traffic
gradually.

Slow Queries on MongoDB 5.0 Native Time-Series Collection

I was using MongoDB to store some time-series data at 1 Hz. To facilitate this, my central document represented an hour of data per device per user, with 3600 values pre-allocated at document creation time. Two drawbacks:
every insert is an update. I need to query for the correct record (by user, by device, by day, by hour), append the latest IoT reading to the list, and update the record.
paged queries require complex custom code. I need to query for a record count of all the data matching my search range, then manually create each page of data to be returned.
I was hoping the MongoDB Native Time-Series Collections introduced in 5.0 would give me some performance improvements and it did, but only on ingestion rate. The comparison I used is to push 108,000 records into the database as quickly as possible and average the response time, then perform paged queries and get a range for the response time for those. This is what I observed:
Mongo Custom Code Solution: 30 milliseconds inserts, 10-20 millisecond
paged query.
Mongo Native Time-Series: 138 microsecond insert, 50-90 millisecond
paged query.
The difference in insert rate was expected, see #1 above. But for paged queries I didn't expect my custom time-series kluge implementation to be significantly faster than a native implementation. I don't really see a discussion of the advantages to be expected in the Mongo 5.0 documentation. Should I expect to see improvements in this regard?

What is the best way to archive history data in mongo

I have a collection in mongo that stores every user action of my application, and its very huge in size (3Million documents per day). On UI I have a requirement to show the user actions for max. 6months period.
And the queries on this collection are becoming very slow with all the historic data, though there are indexes in place. So, I want to move the documents that are older than 6months to a separate collection.
Is it the right way to handle my issue?

Following are some of the techniques you can use to manage data growth in MongoDB:
Using capped collection
Using TTLs
Using mulitple collections for months
Using different databases on same host

mongodb taking 500 ms in fetching 150 records

We are using MongoDb(3.0.6) for the application we are using. We have around 150 entries in one of the collections, but it takes approx 500ms to fetch all of these records, which I didn't expect. Mongo is deployed on a company server. What can be done to reduce this time?
We are not getting too many reads and there is no CPU load, etc, What can be mistake which may be causing this or what config should be changes to affect these.
Here is my schema: http://pastebin.com/x18iDPKf
I am just querying all the entries, which are 160 in number. I don't think time taken is due to Mongoose or NodeJs, as when I quesry using RoboMongo, It still takes same time.
Output of db.<collection>.stats().size :
223000
The query I am doing is:
db.getCollection('collectionName').find({})

Definitely it shouldn't be a problem with MongoDB. It should be a temporary issue or might be due to system constraints such as Internal Memory and so on.
If still the problem exists, use Indexes on the appropriate field which you are querying.
https://docs.mongodb.com/manual/indexes/

Atomic counters Postgres vs MongoDB

I'm building a very large counter system. To be clear, the system is counting the number of times a domain occurs in a stream of data (that's about 50 - 100 million elements in size).
The system will individually process each element and make a database request to increment a counter for that domain and the date it is processed on. Here's the structure:
stats_table (or collection)
-----------
id
domain (string)
date (date, YYYY-MM-DD)
count (integer)
My initial inkling was to use MongoDB because of their atomic counter feature. However as I thought about it more, I figured Postgres updates already occur atomically (at least that's what this question leads me to believe).
My question is this: is there any benefit of using one database over the other here? Assuming that I'll be processing around 5 million domains a day, what are the key things I need to be considering here?

All single operations in Postgres are automatically wrapped in transactions and all operations on a single document in MongoDB are atomic. Atomicity isn't really a reason to preference one database over the other in this case.
While the individual counts may get quite high, if you're only storing aggregate counts and not each instance of a count, the total number of records should not be too significant. Even if you're tracking millions of domains, either Mongo or Postgres will work equally well.
MongoDB is a good solution for logging events, but I find Postgres to be preferable if you want to do a lot of interesting, relational analysis on the analytics data you're collecting. To do so efficiently in Mongo often requires a high degree of denormalization, so I'd think more about how you plan to use the data in the future.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse