Postgres Logical Replication - Limiting Bandwidth Per Publisher/Subscriber - postgresql

I have one publisher with around 50 subscribers. Not so often (few times a month) a binary file of size 30MB is written to the database. At this point all subscriber are getting this file and I have network bandwidth issues.
Is it possible to limit (in Postgres or OS) the bandwidth used by the logical replication per publisher/subscribers?
Is it possible to limit the bandwidth used during first sync?

At the PostgreSQL level I can suggest trying to reduce max_wal_senders parameter at the sending server (it is 10 by default)
Depending on the latency you can accept, you can limit the number of concurrent sending processes up to 1 process at a time

Related

mongodb max number of parallel find() requests from single instance

What is the maximum theoretical number of parallel requests that we can squize from single mongodb instance before deciding to shard?
Considering the database and indexes fit in memory and all requests are find() queries fetching single document based on indexed field. The hosting OS is Ubuntu , the data partition is SSD. ulimits are set to max.
In my laptop with simple test on single instance I reach near 40k/sec , after that the avg execution times start to increase significantly, but wondering what can be the upper theoretical limit?
It depends. If your active dataset can fit in the memory - if most of the requests don't need to perform any disk I/O - then you can achieve 24k+ requests pretty easily. If not on a (bigger) single machine, then at least use a replica set cluster with multiple secondaries.
If an active dataset is much larger than the available RAM then you have the same problem as with any other database. The advantage of MongoDB's new engine WiredTiger (since v3.0) is a transparent compression - it can reduce the amount of data and I/O and thus improve performance - even despite the fact that compression adds CPU load.
For more performance it really helps:
if the most accessed documents are small so it takes less time to
load them, transfer them, and less time to deserialize in your app List item
If you use projections in find(), for the same reasons
if you use bulk operations to reduce networking I/O and context switches
Even MongoDB itself has an option to limit the maximum number of incoming connections. It defaults to 64k.
for more information you can refer link

How to figure out optimal max_worker_processes?

Yet I can not find any reliable recommendation regarding the optimal value for max_worker_processes.
Some sources suppose that the value should not be higher than the number of available cores, but is that correct taking that server threads do a lot of IO?
Say I have 8 cores for PG container and plan to handle about 100 clients in parallel. Is that feasible, especially with the default max_worker_processes=8 ?
Any trusted reference would be much appreciated.
The reasonable limit dies not depend on the number of client connections, but on the actual upper limit on concurrent queries.
If it is guaranteed that only one of these clients will ever be active at the same time, you could set max_worker_processes, max_parallel_workers and max_parallel_workers_per_gather one less than the number of cores or parallel I/O operations that your storage can handle, whatever of the two is smaller. In essence, one query can then consume all the available resources.
On the other hand, if many of these clients are likely to run queries concurrently, you should disable parallel query by setting max_parallel_workers_per_gather to 0 to avoid overloading your database.
Concerning your comments to my answer: if your goal is to limit the number of active queries, use a connection pool.

Mirth performance benchmark

We are using mirth connect for message transformation from hl7 to text and storing the transformed messages to azure sql database. Our current performance is 45000 messages per hour .
machine configuration is
8 GB RAM and 2 core CPU. Memory assigned to mirth is -XMS = 6122MB
We don't have any idea about what could be performance parameters for Mirth with above configurations. Anyone have idea about performance benchmarks for Mirth connect?
I'd recommend looking into the Max Processing Threads option in version 3.4 and above. It's configurable in the Source Settings (Source tab). By default it's set to 1, which means only one message can process through the channel's main processing thread at any given time. This is important for certain interfaces where order of messages is paramount, but obviously it limits throughput.
Note that whatever client is sending your channel messages also needs to be reconfigured to send multiple messages in parallel. For example if you have a single-threaded process that is sending your channel messages via TCP/MLLP one after another in sequence, increasing the max processing threads isn't necessarily going to help because the client is still single-threaded. But, for example, if you stand up 10 clients all sending to your channel simultaneously, then you'll definitely reap the benefits of increasing the max processing threads.
If your source connector is a polling type, like a File Reader, you can still benefit from this by turning the Source Queue on and increasing the Max Processing Threads. When the source queue is enabled and you have multiple processing threads, multiple queue consumers are started and all read and process from the source queue at the same time.
Another thing to look at is destination queuing. In the Advanced (wrench icon) queue settings, there is a similar option to increase the number of Destination Queue Threads. By default when you have destination queuing enabled, there's just a single queue thread that processes messages in a FIFO sequence. Again, good for message order but hampers throughput.
If you do need messages to be ordered and want to maximize parallel throughput (AKA have your cake and eat it too), you can use the Thread Assignment Variable in conjunction with multiple destination Queue Threads. This allows you to preserve order among messages with the same unique identifier, while messages pertaining to different identifiers can process simultaneously. A common use-case is to use the patient MRN for this, so that all messages for a given patient are guaranteed to process in the order they were received, but messages longitudinally across different patients can process simultaneously.
We are using an AWS EC2 4c.4xlarge instance to test a bare bone Proof of Concept performance limit. We got about 50 msgs/sec without obvious bottlenecks on cpu/memory/network/disk io/db io and etc. Want to push the limits higher. Please share your observations if any.
We run the same process. Mirth -> Azure SQL Database. We're running through performance testing right now and have been stuck at 12 - 15 messages/second (43000 - 54000 per hour).
We've run tests on each channel and found this:
1 channel source: file reader -> destination: Azure SQL DB was about 36k per hour
2 channel source: file reader -> destination: Azure SQL DB was about 59k per hour
3 channel source: file reader -> destination: Azure SQL DB was about 80k per hour
We've added multi-threading (2,4,8) to both the source and destination on 1 channel with no performance increase. Mirth is running on 8GB mem and 2 Cores with heap size set to 2048MB.
We are now going to run through a few tests with mirth running on similar "hardware" as a C4.4xlarge which in Azure is 16 cores and 32GB mem. There is 200gb of SSD available as well.
Our goal is 100k messages per hour per channel.

MongoDB concurrent queries limit?

Is there any limit of concurrent queries that mongodb can run in a second?
I am trying to implement an API that runs 300 queries in each request in mongodb.
So if there are 100 client requests in a second so the number of queries becomes 100 x 300 which is resulting in high latency.
Any clue?
The only limit is the number of conections mongo can have in parallel. Check out net.maxIncomingConnections
The maximum number of simultaneous connections that mongos or mongod will accept. This setting has no effect if it is higher than
your operating system’s configured maximum connection tracking
threshold.
But still you'll need to monitor your system to figure out what's actually happening.
And yes, 300 queries per API request is a bit too much even from networking perspective/requests parsing overhead.

#mongodb Factors affecting write i/o operations

I was doing some tests to figure out the performance of Replica Sets in our environment. The set up consists of 1 Primary and 1 Secondary in local Data Center and 1 Secondary in remote Data Center.
My record consists of 1 field of size 512 bytes. The numbers of inserts were 100,000 and 500,000.
During week 1 the inserts in primary were happening within the following time:
100,000 writes - 5 seconds
500,000 writes - 20 seconds
Week 2 -
100,000 writes - 14 seconds
500,000 writes - 66 seconds
I can't seem to figure what could have caused the rate to dip down so much. I have an oplog of size 1 GB and journaling enabled. I am not concerned about replication lag since there isn't much lag. There is no other i/o processes happening in the environments on which the mongodb is setup. I have also deleted files and restarted the machines but still I notice this dip.
Can anyone let me know what could be the cause?
Thanks,
Ganesh
If these are virtual machines, then you might have a "noisy neighbor". If you're using NAS or SAN storage, then write throughput can be affected by network traffic or by I/O load for other hosts sharing the NAS or SAN.