Why postgres is slow updating a simple JSONB field? - postgresql

I am running on postgresql 12 and I have a pretty small table users (~5000 records).
I am logging slow queries and I found that updating JSONB fields is pretty slow, here is an example:
update "users" set "artifacts" = '[{"xx": "xxx", "xxx": "xxx"}]' where "id" = 1000;
It is a pretty simple query on an index, but in my production node this query pops out in the slow
queries. (~100ms).
I run an EXPLAIN ANALYZE on it but can't get nothing useful, at least for my knowledge :)
https://explain.depesz.com/s/2DGg
If I run an UPDATE query on the same table, but on a non-JSONB field the query is super fast.
Any hint?

The slow query log only shows you the times it was slow. How many times did it run when it wasn't slow? You can use pg_stat_statements to help find that out. (You could also log the duration of every query to avoid selection bias, but that might cause excessive log bloat).
If I run an UPDATE query on the same table, but on a non-JSONB field the query is super fast.
And when you ran this query on stage, it was also super fast. Maybe it is only slow when your server is severely overloaded. Is the column indexed? Maybe it had to stop to clean up the fastupdate buffer.

Related

Inserts bursts on collection with indexes

I have a very big collection with almost every field is indexed (I know sounds like I need to redesign my system, or queries, but lets think it goes as it is). So a lot of data and a lot of indexes. The writes happen in big bursts (millions of inserts immediately, use insertMany). It is fine at the beginning, cause first I write everything and then index, works very fast. The next burst is much slower, than the first, and every next of that is even slower. I know that if I write all the data first then index it would be fast. But I do that in the bursts it gets 100-1000 times slower. Of course, it is something to do with indexes here, looked at the mongod logs, it shows a lot of index file check points taking quite some time. Try to see if there an index building job (MongoDB version 4.2, so background indexing should kick in) that I can kill, did not find one, should there be one?
The only solution I found is to drop off indexing before the burst and build again after, but it sounds very pessimistic approach. Any suggestions in how to delay, supress indexing temporarily? Any parameter I can adjust in DB or insert queries to ease the insertion?

Is it possible to run queries on 200GB data on mongodb with 16GB RAM?

I am trying to run a simple query to find number of all records with a particular value using:
db.ColName.find({id_c:1201}).count()
I have 200GB of data. When I run this query, mongodb takes up all the RAM and my system starts lagging. After an hour of futile waiting, I give up without getting any results.
What can be the issue here and how can I solve it?
I believe the right approach in the NoSQL world isn't trying to perform a full query like that, but accumulate stats overtime.
For example, you should have a collection stats with arbitrary objects which should own a kind or id property that can take a value like "totalUserCount". Whenever you add an user, you also update this count.
This way you'll get instant results. It's just getting a property value in a small collection of stats.
BTW, this slowness should be originated by querying objects by a non-indexed property in your collection. Try to index id_c and probably you'll get quicker results.
That amount of data can easily be managed by MySQL, MSSQL or Oracle with the given hardware specification. You don't need a NoSQL database for that, NoSQL databases are made for much larger storing needs which actually require lots of hardware (RAM, harddisks) to be efficient.
You need to define an index to read that id and use a normal SQL database.

Mongo shell query timing out at 90 seconds

I am using mongodb via the mongo shell to query a large collection. For some reason after 90 seconds the mongo shell seems to be stopping my query and nothing is returned.
I have tried the following two commands but neither will return anything. After 90 seconds it just gives me a new line that I can type in another command.
db.cards.find("Field":"Something").maxTimeMS(9999999)
db.cards.find("Field":"Something").addOption(DBQuery.Option.tailable)
db.cards.find() return results, but anything with parameters is timing out at exactly 90 seconds and nothing is being returned.
Any help would be greatly appreciated.
Given the level of detail in your question, I am going to focus on 'query a large collection' and guess that your are using the MMAPv1 storage engine, with no index coverage on your query.
Are you disk bound?
Given the above assumptions, you could be cycling data between RAM and disk. Mongo has a default 100MB RAM limit, so if your query has to examine a lot of documents (no index coverage), paging data from disk to RAM could be the culprit. I have heard of mongo shell acting as you describe or locking/terminating when memory constraints are exceeded.
32bit systems can also impose severe memory limits for large collections.
You could look at your OS specific disk activity monitor to get a clue into whether this is your problem.
Just how large is your collection?
Next, how big is your collection? You can show collections and see the physical size of the collection and also db.cards.count() to see your record count. This helps quantify "large collection".
NOTE: you might need the mongo-hacker extensions to see collection disk use in show collections.
Mongo shell investigation
Within the mongo shell, you have a couple more places to look.
By default, mongo will log slow queries (> 100ms). After your 90 sec timeout:
db.adminCommand({getLog: "global" })
and look for slow query log entries.
Next look at your winning query plan.
var e = db.cards.explain()
e.find("Field":"Something")
I am guessing you will see
"stage": "COLLSCAN",
Which means you are doing a full collection scan and you need index coverage for your query (good idea for queries and sorts).
Suggestions
You should have at least partial index coverage on any production query. A proper index should solve your problem (assuming you don't have documents > 16MB).
Another approach (that I don't recommend - indexing is better) is to use a cursor instead
var cursor = db.cards.find("Field":"Something")
while (cursor.hasNext()) {
print(tojson(cursor.next()));
}
Depending on the root cause, this may work for you.

Do I need to reindex MongoDB Collection after some period of time like RDBMS

I like to get some knowledge on reindexing the MongoDB. Please forgive me as I am asking some subjective questions.
The questinon is : Do MongoDB needs to do reindexing periodically like we do for RDBMS or Mongo automatically manages it.
Thanks for your fedback
Mongodb takes care of indexes during routine updates. This operation
may be expensive for collections that have a large amount of data
and/or a large number of indexes.For most users, the reIndex command
is unnecessary. However, it may be worth running if the collection
size has changed significantly or if the indexes are consuming a
disproportionate amount of disk space.
Call reIndex using the following form:
db.collection.reIndex();
Reference : https://docs.mongodb.com/manual/reference/method/db.collection.reIndex/
That's a good question, because nowhere in the documentation does it mention explicitly that indexes are automatically maintained*. But, they are. You rarely need to reindex manually.
*I filed a bug for that, go vote for it :)

Is MongoDB search without index really slow?

I am trying the performance of MongoDB to compare my current MySQL based solution.
In a collection/table X with three attributes A, B, and C, I have attribute A indexed in both MongoDB and MySQL.
Now I throw 1M data in MongoDB and MySQL, and tries the search performance in this straight-ward scenario.
The insert speed on MongoDB is only 10% faster than insert to MySQL. But that is OK, I knew adopting of MongoDB won't bring a magic promotion of my CRUDs, but I am really surprised by the search in MongoDB without index.
The results shows that, MongoDB select on non-indexed field is ten times slower than the select on a indexed field.
On the other hand, the MySQL select (MyISAM) on non-indexed field is only about 70% slower than the select on a indexed field.
Last but not least, in select with index scenario, MongoDB is about 30% quicker than my MySQL solution.
I wanna know that, is above figures normal? Especially the performance of MongoDB select without index?
I have my code like:
BasicDBObject query = new BasicDBObject("A", value_of_field_A);
DBCursor cursor = currentCollection.find(query);
while(cursor.hasNext()) {
DBObject obj = cursor.next();
// do nothing after that, only for testing purpose
}
BTW, from business logic's prespective, my collection could be really large (TB and more), what would you suggest for the size of each physical collection? 10 million Documents or 1 billion Documents?
Thanks a lot!
------------------------------ Edit ------------------------------
I tried the insert with 10 million records on both MongoDB and MySQL, and MongoDB's behavior is about 20% faster than MySQL -- not really that much as I thought.
I am curious that, if I have the MongoDB Auto-sharding being setup, will the insert speed being promoted? If so, do I need to put the Shards on different physic machines, or I can put them on the same machine with multi- cores?
------------------------------ Update ------------------------------
First, I modified the MongoDB write concern from ACKNOWLEDGED into UNACKNOWLEDGED, then the MongoDB insert speed is 3X faster.
Later on, I made the insert program in parallel (8 threads with a 8-cores computer), For MongoDB ACKNOWLEDGED mode, the insert is also improved 3X, for its UNACKNOWLEDGED mode, the speed is actually 50% slower.
For MySQL, the parallel insert mode increases the speed 5X faster! Which is faster than the best insert case from MongoDB!
MongoDB queries without the index will be doing table scan and we should know that data size of mongodb as compared to mysql is much more. I am guessing this might be one of the issue for slowness when doing a full scan.
Regarding query with indexes, mongoDB may turn out faster because of caching, no complex query optimizer plan (like mysql) etc.
The size of the collection is not an issue. In fact 10 million can be easily be handled in one collection. If you are have the requirement of archiving data, then you can break into smaller collections which will make the process easy.