which way is good for use memcached , more instances with small memory or less instances with big memory? - memcached

I am preparing hardwares for memcaced server and I have a question:
Should I start more instances with less memory or less instances with big memory ? For Example,to start an instance with a memory of 128G with one port or start 16 instance with a memory of 128G with 16 port ,which is the better one for memcached?

Related

OpenShift/Kubernetes memory handling

What is the correct way of memory handling in OpenShift/Kubernetes?
If I create a project in OKD, how can I determine optimal memory usage of pods? For example, if I use 1 deployment for 1-2 pods and each pod uses 300-500 Mb of RAM - Spring Boot apps. So technically, 20 pods uses around 6-10GB RAM, but as I see, sometimes each project could have around 100-150 containers which needs at least 30-50Gb of RAM.
I also tried with horizontal scale, and/or request/limits but still lot of memory used by each micro-service.
However, to start a pod, it requires around 500-700MB RAM, after spring container has been started they can live with around 300MB as mentioned.
So, I have 2 questions:
Is it able to give extra memory but only for the first X minutes for each pod start?
If not, than what is the best practice to handle memory shortage, if I have limited memory (16GB) and wants to run 35-40 pod?
Thanks for the answer in advance!
Is it able to give extra memory but only for the first X minutes for each pod start?
You do get this behavior when you set the limit to a higher value than the request. This allows pods to burst, unless they all need the memory at the same time.
If not, than what is the best practice to handle memory shortage, if I have limited memory (16GB) and wants to run 35-40 pod?
It is common to use some form of cluster autoscaler to add more nodes to your cluster if it needs more capacity. This is easy if you run in the cloud.
In general, Java and JVM is memory hungry, consider some other technology if you want to use less memory. How much memory an application needs/uses totally depends on your application, e.g what data structures are used.

BsonChunkPool and memory leak

I use Mongodb 3.6 + .net driver (MongoDb.Driver 2.10) to manage our data. Recenyly, we've noticed that our services (background) consume a lot of memory. After anaylyzing a dump, it turned out that there's a mongo object called BsonChunkPool that always consumes around 0.5 GB of memory. Is it normal ? I cannot really find any valuable documentation about this type, and what it actually does. Can anyone help ?
The BsonChunkPool exists so that large memory buffers (chunks) can be reused, thus easing the amount of work the garbage collector has to do.
Initially the pool is empty, but as buffers are returned to the pool the pool is no longer empty. Whatever memory is held in the pool will not be garbage collected. This is by design. That memory is intended to be reused. This is not a memory leak.
The default configuration of the BsonChunkPool is such that it can hold a maximum of 8192 chunks of 64KB each, so if the pool were to grow to its maximum size it would use 512MB of memory (even more than the 7 or 35 MB you are observing).
If for some reason you don't want the BsonChunkPool to use that much memory, you can configure it differently by putting the following statement at the beginning of your application:
BsonChunkPool.Default = new BsonChunkPool(16, 64 * 1024); // e.g. max 16 chunks of 64KB each, for a total of 1MB
We haven't experimented with different values for chunk counts and sizes so if you do decide to change the default BsonChunkPool configuration you should do some benchmarking and verify that it doesn't have an adverse impact on your performance.
From jira: BsonChunkPool and memory leak

Questions about HPC on SLURM

I have a few questions about HPC. I have a code with serial and parallel sections. Parallel sections work on different chunks of memory and at some point they communicate with each other. For this I used MPI on our cluster. SLURM is the resource manager. Below is the specifications of a node in a cluster.
Specifications of a node:
Processor: 2x Intel Xeon E5-2690 (totally 16 cores 32 thread)
Memory : 256 GB 1600MHz ECC
Disk : 2 x 600 GB 2.5" SAS (configured with raid 1)
Questions:
1) Do all cores on a node share the same memory (RAM)? If yes, do all of cores access memory at the same speed?
2) Consider a case:
--nodes = 1
--ntasks-per-node = 1
--cpus-per-task = 16 (all cores on a node)
If all cores share the same memory (depends on answer to question 1) will all cores be used or 15 of them sleep since OpenMP (for shared memory) is not used?
3) If required memory is less total memory of a node, isn't it much better to use a single node, use OpenMP to achieve core-level parallelism, and avoid time loss due to communication between nodes? That is, use this
--nodes = 1
--ntasks-per-core = 1
instead of this:
--nodes = 16
--ntasks-per-node = 1
Rest of the questions are related to statements in this link.
Use core allocation if your application is CPU bound; the more processors you can throw at it the better!
Does this statement mean that --ntasks-per-core is good when cores don't access RAM too often?
Use socket allocation if memory access is what bottlenecks your application’s performance. Since how much data can come in from memory is what limits the speed of the job, running more tasks on the same memory bus won’t result in speed-up since all of those tasks are fighting over the path to memory.
I just don't get this. What I know is all sockets and cores on sockets share the same memory. This is why I don't get why --ntasks-per-socket option is available?
Use node allocation if some node-wide resource is what bottlenecks your application. This is the case with applications that are relying heavily on access to disk or to networks resources. Running multiple tasks per node won’t result in a speed-up since all of those tasks are waiting for access to the same disk or network pipe.
Does this mean that, if memory required is more than total RAM of a single node then its better to use multiple nodes?
In order:
Yes, all cores share the same memory. But not usually at the same speed. Usually, each Processor (in your configuration, you have 2 processors or sockets) has memory that is 'closer' to it. Usually the Linux kernel will attempt to allocate memory on nearby memory. This is not something that a user application usually has to worry about.
If it is a serial job, then yes, 15 cores will sit idle. If your job uses MPI, then it can use the other cores on the same node. Actually, MPI on the same node usually much faster than MPI stretched across multiple nodes.
You can use OpenMP or MPI on a single node. I'm not sure about the speed difference, but if you are already familiar with MPI, I would just stick with that. The difference probably isn't that big. But, the difference between running MPI on a single node vs. multiple nodes is going to be large. Running MPI on a single node will be significantly faster than across multiple nodes.
Use core allocation if your application is CPU bound; the more processors you can throw at it the better!
This is likely targeting OpenMP or single node parallel jobs.
Use socket allocation if memory access is what bottlenecks your application’s performance. Since how much data can come in from memory is what limits the speed of the job, running more tasks on the same memory bus won’t result in speed-up since all of those tasks are fighting over the path to memory.
See the answer to 1. Though it is the same memory, cores usually have separate bus's to memory.
Use node allocation if some node-wide resource is what bottlenecks your application. This is the case with applications that are relying heavily on access to disk or to networks resources. Running multiple tasks per node won’t result in a speed-up since all of those tasks are waiting for access to the same disk or network pipe.
If you need more RAM than a single node can provide, then you have no choice but to divide your program and use MPI.

mongodb architect design with hardware specification

Currently we have one replica set of 3 members, 25 GB of data, normal cpu usage is 1.5 in both secondary, 0.5 in primary(read happen in secondary instance only), normally 1200 users hit our website. Now we have planned to increase the no of hit to our website. We are expecting about 5000 concurrent users to our website, can you please suggest no of instance needed to add in my replica set.
Current infra in our replica set:
1. Primary instance
CPUs: 16
RAM: 32 GB
HDD: 100 GB
2. Secondary instance
CPUs: 8
RAM: 16 GB
HDD: 100 GB
3. Secondary instance
CPUs: 8
RAM: 16 GB
HDD: 100 GB
Assuming your application scales linearly with the number of users, the CPU capacity should not be a problem (does it? Only you can tell - we don't know what your application does).
The question is: how much do you expect your data to grow? When you currently have 25 GB of data and 16 GB of ram, 64% of your data fits into RAM. That likely means that many queries can be served directly from the RAM cache without hitting the hard drives. These queries are usually very fast. But when your working set increases further beyond the size of your RAM, you might experience some increased latency when accessing the data which now needs to be read from the hard drives (it depends, though: when your application interacts primarily with recent data and rarely with older data, you might not even notice much of a difference).
The solution to this is obvious: get more RAM. Should this not be an option (for example because the server reached the maximum RAM capacity the hardware allows), your next option is building a sharded cluster where each shard is responsible for serving an interval of your data.

Cassandra storage vs memory sizing

I am considering developing an application with a Cassandra backend. I am hoping that I will be able to run each cassandra node on commodity hardware with the following specs:
Quad Core 2GHz i7 CPU
2x 750GB disk drives
16 GB installed RAM
Now, I have been reading online that the available disk-space for Cassandra should be double the amount that is stored on the disks, which would mean that each node (set up in a RAID-1 configuration) would be able to store 375 GB of data, which is acceptable.
My question is this if 16GB RAM is enough to efficiently serve 375 GB of data per node. The data in the application developed will also be fairly time-dependant, such that recent data will be the data most read from the database. In fact, most of the data will be deleted after about 6 months.
Also, would I assign Cassandra a Heap (-Xmx) close to 16 GB, or does Cassandra utilize off-heap memory ?
You should not set the Cassandra heap to more than 8GB; bigger than that, and garbage collection will kill you with large pauses. Cassandra will use the buffer cache (like other applications) so the remaining memory isn't wasted.
16GB of RAM will be enough to serve the data if your hot set will all fit in RAM, or if serving rate can be served off disk. Disks can do about 100 random IO/s, so with your setup if you need more than 200 reads / second you will need to make sure the data is in cache. Cassandra exports good cache statistics (cassandra-cli show keyspaces) so you should easily be able to tell how effective your cache is being.
Do bear in mind, with only two disks in RAID-1, you will not have a dedicated commit log. This could hamper write performance quite badly. You may want to consider turning off the commit log if it does affect performance, and forgo durable writes.
Although it is probably wise not to use a really huge heap with Cassandra, at my company we have used 10GB to 12GB heaps without any issues so far. Our servers typically have at least 48 GB of memory (RAM is cheap -- so why not :-)) and so we may try expanding the heap a bit more and see what happens.