MongoDB upsert operation blocks inconsistently (with syncdelay set to 0) - mongodb

There is a database with 9 million rows, with 3 million distinct entities. Such a database is loaded everyday into MongoDB using perl driver. It runs smoothly on the first load. However from the second, third, etc. the process is really slowed down. It blocks for long times every now and then.
I initially realised that this was because of the automatic flushing to disk every 60 seconds, so I tried setting syncdelay to 0 and I tried the nojournalling option. I have indexed the fields that are used for upsert. Also I have observed that the blocking is inconsistent and not always at the same time for the same line.
I have 17 G ram and enough hard disk space. I am replicating on two servers with one arbiter. I do not have any significant processes running in the background. Is there a solution/explanation for such blocking?
UPDATE: The mongostat tool says in the 'res' column that around 3.6 G is used.

Related

What can cause random slowdowns for a JPA native insert query?

I have a NativeQuery which copies about 1000 rows (selects and inserts with some columns changed).
The problem is that sometimes (about 1/3 of the runs) executeUpdate takes 50 ms and sometimes (2/3 of the runs) it takes 1500 ms.
The same query is run many times during a request. Usually all these runs are all slow or all fast.
All the requests were started from the same database state (ie. exact same records are selected and inserted). Neither database nor Tomcat had other users at this time. Sometimes it is slow up to 5 times in a row, sometimes it is fast up to 5 times in a row, sometimes it is slow,fast,slow,fast.
I've tried restarting both Tomcat and Postgres. Sometimes it is slow after the restart, sometimes it is fast. I tried adding System.gc() to the beginning of the request but the randomness remained.
When I try the same query directly (via dBeaver/JDBC), it is always fast.
My environment:
Tomcat 8.5.33
Eclipselink 2.6.4
PostgreSQL 9.6.11
Ubuntu 18.04
Any ideas how to debug this situation?

MongoDB poor performance few hours after data deletion

I've got 3 Mongo DB (v 3.4.10) servers (256 Gb RAM, 1 Tb HDD, 12 CPUs each) as a replica setup. Servers are under decent load and HDD is eaten up quite rapidly. I'm considering sharding big collections, but not yet there.
In the meantime, the typical scenario I face:
Morning I see an alert that database HDD is 92% used
Midday I delete bunch of redundant data from big collections (1M - 4M entries) on master. I either update collection like this:
update({}, {'$unset' : {'key_1' : true, 'key_2' : true, 'key_3' : true}}, {"multi" : 1})
or create new collection, insert only needed data there and drop old one.
Evening (about 4-5 hours after deletion, usually peak of the load) Mongo response time increases dramatically from 3-4ms to 500ms. This period lasts for a while, during which my application is almost down. It only restores back to normal performance after I stop my application completely for 10-20 minutes and try to start it back again.
The days I do not delete data - database performs like normal.
I read a bit about oplog and nuances of deleting data on replicated servers. However, in my case the lag between deletion and performance drop is several hours.
Is there any internal Mongo process, which happens hours after massive update/insert? How should I bulk update/insert to avoid this?

Redshift cluster: queries getting hang and filling up space

I have a Redshift cluster with 3 nodes. Every now and then, with users running queries against it, we end in this unpleasant situation where some queries run for way longer than expected (even simple ones, exceeding 15 minutes), and the cluster storage starts increasing to the point that if you don't terminate the long-standing queries it gets to 100% storage occupied.
I wonder why this may happen. My experience is varied, sometimes it's been a single query doing this and sometimes it's been different concurrent queries been run at the same time.
One specific scenario where we saw this happen related to LISTAGG. The type of LISTAGG is varchar(65535), and while Redshift optimizes away the implicit trailing blanks when stored to disk, the full width is required in memory during processing.
If you have a query that returns a million rows, you end up with 1,000,000 rows times 65,535 bytes per LISTAGG, which is 65 gigabytes. That can quickly get you into a situation like what you describe, with queries taking unexpectedly long or failing with “Disk Full” errors.
My team discussed this a bit more on our team blog the other day.
This typically happens when a poorly constructed query spills a too much data to disk. For instance the user accidentally specifies a Cartesian product (every row from tblA joined to every row of tblB).
If this happens regularly you can implement a QMR rule that limits the amount of disk spill before a query is aborted.
QMR Documentation: https://docs.aws.amazon.com/redshift/latest/dg/cm-c-wlm-query-monitoring-rules.html
QMR Rule Candidates query: https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminScripts/wlm_qmr_rule_candidates.sql

MongoDB is giving inconsistent write times

I am using Scala, Reactive Mongo 0.10.5 and Mongo 2.6.4 running on Ubuntu. I have tested on a few machine configurations but right now I am working with 15gb of memory, 2 cores and 60gb of SSD storage (AWS)
I have just set up a test mongo instance and have been using it to benchmark a few things, however I am seeing some inconsistency that I can't explain.
I am writing a consistent amount of data using 10 separate threads to a single collection. Each write consists of a document containing an array which contains 1000 elements. Each element is a complex document consisting of several fields and nested fields. I have tested with arrays of 1000, 10000 and 100 and have seen the same behavior with all. Each write is unique (i.e. I never write to the same document twice)
The write speed tends to be around 100-200ms per write with the current hardware I am using. I would like better but that isn't my main issue.
My main issue is that sometimes the write times will spike. When they do, it can take a single write several seconds to complete. They do eventually complete but it takes a while. I have timeouts built into the app doing the writing (10 seconds) and when the spikes happen it will frequently hit that timeout. I have increased the timeout and verified that the write does eventually complete but it can take a long time (30+ seconds).
I have worked with Mongo before using the Mongo Java Driver in Scala and have not noticed this problem. However it is unclear whether the issue is a result of the driver, or my Mongo setup.
I have looked at the logs and while they report when the query is taking longer, they don't actually provide any information about why it is taking longer. I have done the same with profiling and again they report a long query but don't say why it is long.
I have run mongostat while running and it seems that when the writes start taking a long time I notice a similar slow down in mongostat. I.E. mongostat will pause for several seconds before continuing.
The mongo machine itself is bored while this is happening. Load averages are minimal as are CPU and memory usage. It does not appear to be going into swap.
I suspect I just have something configured incorrectly in the Mongo but I haven't been able to find anything that indicates what.
Has anyone seen this behavior before? Is it something in my configuration or perhaps something with the Reactive Mongo driver?
UPDATE:
Using iostat I was able to determine that the normal writes/second is hitting around 1Mb/second. However during the slow periods it spikes to 6-7Mb/second.
I also found the following in the mongo logs.
[DataFileSync] flushing mmaps took 15621ms for 35 files
[DataFileSync] flushing mmaps took 14816ms for 22 files
In at least one case this log statement corresponds exactly with one of the slow downs.
This definitely seems to be a disk flush problem based on these observations.
Does this imply that I am pushing more data than the current Mongo configuration can handle? Or is there some other configuration that can be done to reduce the impact of those flushes?
It appears that in this case the problem may actually have been related to thread locking within the application itself. Once I resolved the issues with thread locking these other issues seemed to go away.
To be honest I don't know why thread locking would result in the observed behavior in Mongo, but if the problem is gone I am not going to complain.

TokyoCabinet Write speed too slow

I have a perl script (in Ubuntu 12.04 LTS) writing to 26 TCH files. The keys are roughly equally distributed. The writes become very slow after 3 Million inserts (equally distributed to all the files) and the speed comes down from 240,000 inserts/min at the beginning to 14,000 inserts/min after 3 MM inserts. Individually the shard files are no more than 150 MB and overall their size comes to around 2.7 GB.
I run optimize on every TCH File after every 100K inserts to that file with bnum as 4*num_records_then and options set to TLARGE and make sure xmsiz matches the size of bnum (as mentioned in Why does tokyo tyrant slow down exponentially even after adjusting bnum?)
Even after this, the inserts start at high speed then slowly decrease to 14k inserts/min from 240k inserts/min. Could it be due to holding multiple tch connections (26) in a single script? Or is there configuration setting, I'm missing (would disabling journaling help, but the above thread says journaling affects performance only after the tch file becomes bigger than 3-4GB, my shards are <150MB files..)?
I would turn off journaling and measure what changes.
The cited thread talks about a 2-3 GB tch file, but if you sum the sizes of your 26 tch files, you are in the same league. For the filesystem, the total amount of data ranges written to should be the relevant parameter.