How many concurrent users can the latest ejabberd XMPP server handle? - xmpp

I want to build an instant messaging app and I need to know how many concurrent users can ejabberd xmpp server handle, the following are my server hardware specs:
CPU: 2x Intel Xeon E5 2630v4, 2 x 10 x 2.20 GHz
RAM: 256 GB REG ECC
Storage: 1TB SSD
Thanks for the help in advance.

There is no hard limit in ejabberd, it all depends whether the machine CPU and RAM can handle the load.
It is quite important if those accounts will generate traffic (have many contacts and change presence, and send messages), or will be almost quiet.
Many accounts almost quiet: consume memory
Few accounts chatting and changing presence a lot: consume processor
It is also important in the long term that you use a powerful SQL database instead of the internal Mnesia database (which is acceptable only for small servers).
You may want to run some benchmark tool, designed with your expected usage, to check ejabberd (or any other xmpp server).
For example, 18 years ago, I used jabsimul in a machine with one 1.7 GHz processor, less than 1 GB RAM, and it could handle thousands of chatty users perfectly:
https://www.ejabberd.im/benchmark/index.html

Related

Maximizing S3 upload performance with AWS C++ SDK

I am using a c5.18xlarge instance with the ENA adapter enabled (so expect to have 25 Gbps connectivity to S3 per AWS support). I am using the AWS C++ SDK (version 1.3.59) on RHEL 7 to upload a 70 GB file to a single S3 object using a 256 MB part size. Per AWS support, I've set the ClientConfiguration's maxConnections field to 999 and its executor field to use a PooledThreadExecutor with a pool size of 999 (and these have improved my performance). I am performing a series of S3Client::UploadPart() calls, threading these myself; I get very similar performance when using UploadPartCallable() and letting the SDK manage the threading.
Here's the performance I'm seeing:
- 36 threads: 7.5 Gbps
- 200 threads: 15.7 Gbps
AWS support reported similar behavior (actually they were using 900 threads).
I've looked through the underlying implementation of S3Client and all the low level thread management and curl handle management. I don't see anything obviously inefficient going on. It just doesn't make any sense to me that I would need 200 threads to achieve this performance on a machine that has 36 physical cores. Is this expected? Could someone provide an explanation for what's happening or a different way to configure the SDK to not require this many threads? I think I could provide my own HTTPClientFactory and customize things to cut out a mutex in how the curl handles are managed if I'm careful, but this seems unlikely to account for what I'm seeing.
Thanks for any help.
-Adam
I am using the AWS C++ SDK (version 1.3.59) on RHEL 7 to upload a 70 GB file to a single S3 object using a 256 MB part size.
You're probably being limited by your disk/storage device's read throughput. It's actually impressive that you're able to reach 15.7 Gbps.
In my test, I see all threads created by Aws::Utils::Threading::PooledThreadExecutor are running in one single CPU core(while the spot instance has 72 vCPUs). Have you seen the same behavior in your tests?
The way I further improved the performance is by using my own threading model with S3Client blocking APIs instead of PooledThreadExecutor with S3 async methods(such as UploadPartAsync()).

httperf for bechmarking web-servers

I am using httperf to benchmark web-servers. My configuration, i5 processor and 4GB RAM. How to stress this configuration to get accurate results...? I mean I have to put 100% load on this server(12.04 LTS server).
you can use httperf like this
$httperf --server --port --wsesslog=200,0,urls.log --rate 10
Here the urls.log contains the different uri/path to be requested. Check the documention for details.
Now try to change the rate value or session value, then see how many RPS you can achieve and what is the reply time. Also in mean time monitor the cpu and memory utilization using mpstat or top command to see if it is reaching 100%.
What's tricky about httperf is that it is often saturating the client first, because of 1) the per-process open files limit, 2) TCP port number limit (excluding the reserved 0-1024, there are only 64512 ports available for tcp connections, meaning only 1075 max sustained connections for 1 minute), 3) socket buffer size. You probably need to tune the above limit to avoid saturating the client.
To saturate a server with 4GB memory, you would probably need multiple physical machines. I tried 6 clients, each of which invokes 300 req/s to a 4GB VM, and it saturates it.
However, there are still other factors impacting hte result, e.g., pages deployed in your apache server, workload access patterns. But the general suggestions are:
1. test the request workload that is closest to your target scenarios.
2. add more physical clients to see if the changes of response rate, response time, error number, in order to make sure you are not saturating the clients.

How to increase number of messages that can be stored in MSMQ

We have a number of MSMQ queues throughout our system, both private and public queues. Sometimes a windows service that reads from a queue will crash, and so messages will build up in that queue. Once the queue gets to a certain size (maybe 60K messages), all queues on that server will stop working, throwing errors about insufficient resources.
My question is, how are the queues really working behind the scenes, are they storing messages in RAM or on the hard drive? Does it run out of resources and crash when the server runs out of RAM? If it's using some allocated space on the hard drive, is there a way to increase the allowable size? If it's using RAM, can I simply add RAM to the servers and then that will increase the allowable size?
I need to make sure that when a service goes down, we can handle storing 100K or 200K messages in that queue while we work on fixing the service, as those messages are critical to our business.
Here is an article on MSDN that seems to address your question (as John points out below, this only applies to Windows Server 2000 so should probably be ignored by most people): Resource management in MSMQ applications. Specifically:
For MSMQ 1.0 and MSMQ 2.0, the combined size of messages capable of being stored on one machine is not limited to the amount of RAM in the machine or the size of the hard disk, but to the amount of virtual address space provided to the MSMQ service by the operating system (this limitation has been lifted in MSMQ 3.0). Each process in an x86 machine is allotted a virtual 4 GB of addressable memory. 2GB is reserved for use in kernel mode and 2GB for user mode. The MSMQ Queue Manager operates in user mode and therefore has an addressable 2GB of virtual address space to work with. Each message's data is stored in RAM, which is backed up by the system's paging file or memory mapped files. MSMQ uses memory mapped files to store both express and recoverable messages. Since we are limited to 2GB of addressable memory, we are limited to 2GB worth of messages on a disk. When you take into account the memory utilized by MSMQ code and its internal data structures, as well as file allocation to store message files on disk, we end up with between 1.4GB and 1.6GB worth of messages that can be stored on disk.
Note   This limitation of 1.6GB can be raised to approximately 2.6GB by enabling 3GB tuning on the MSMQ Service. See Q171793 for more information on how to enable 3GB tuning.
Edit: the tuning link seems to be broken. I believe it should be pointing here.
In terms of later versions of MSMQ, John discusses the issue in a blog post.
Maximum number of messages
This one is not as simple to work out. From my Insufficient Resources post we know that each message needs 75 bytes of kernel memory for indexing so, for example, 2 million messsages would require roughly 150 megabytes. It would seem, therefore, that all you need to do is add more RAM. After looking at a comparison of 32-bit and 64-bit memory architectures, though, you will quickly have to move to the 64-bit platform to take advantage of your investment as 32-bit machines max out at 450 MB of paged pool memory regardless of the amount of RAM fitted.
But, again, if you are trying to work out what amount of RAM will generate the paged pool memory required to accommodate a billion MSMQ messages, your design spec is up for some serious reviewing.
Not sure about the in-depth answer, but on a surface level anyhow, a non-transactional queue stores messages in memory, whereas a transactional queue stores messages on disk.
UPDATE
As John states below, all messages are held on disk whether durable or non-durable queues are used.

Statistic for requests in deployed VPS servers

I was thinking about different scalability features, and suddenly understand that I don't really know how much can handle one server (VPS). The question for them who have loaded projects.
Imagine server with:
1 Gb Ram
1 Xeon CPU
CentOS
LAMP with FastCGI
PostgreSQL on the same machine
And we need to calculate count of request, so I decided to take middle parameters for app:
80% of requests using one call to db with indexes
40-50 Kb of html
Cache in 60% of cases
Add some other parameters, and lets calculate, or tell your story about your loads?
I would look at cacti - it can give you plenty of stats to choose from.

What is your experience with Sun CoolThreads technology?

My project has some money to spend before the end of the fiscal year and we are considering replacing a Sun-Fire-V490 server we've had for a few years. One option we are looking at is the CoolThreads technology. All I know is the Sun marketing, which may not be 100% unbiased. Has anyone actually played with one of these?
I suspect it will be no value to us, since we don't use threads or virtual machines much and we can't spend a lot of time retrofitting code. We do spawn a ton of processes, but I doubt CoolThreads will be of help there.
(And yes, the money would be better spent on bonuses or something, but that's not going to happen.)
IIRC The coolthreads technology is referring to the fact that rather than just ramping up the clock speed ever higher to improve performance they are now looking at multiple core processors with hyperthreading effectively giving you loads of processors on one chip. Overall the processing capacity available is higher but without the additional electrical power and aircon requirements you would expect (hence cool). Its usefulness definitely depends on what you are planning to run on it. If you are running Apache with the multiple threads core it will love it as it can run the individual response threads on the individual cpu cores. If you are simply running single thread processes you will get some performance increases over a single cpu box but not as great (any old fashioned non mod_perl/mod_python CGID processes would still be sharing the the cpu a bit). If your application consists of one single threaded process running maxed out on the box you will get very little improvement on a single core cpu running at the same speed.
Peter
Edit:
Oh and for a benchmark. We compared a T2000 in our server farm to our current V240s (May have been V480's I don't recall) The T2000 took the load of 12-13 of the Older boxes in a live test without any OS tweeking for performance. As I said Apache loves it :-)
Disclosure: I work for Sun (but as an engineer in client software).
You don't necesarily need multithreaded code to make use of these machines. Having multiple processes will make use of multiple hardware threads on multiple cores.
The old T1 processors (T1000 and T2000 boxes) did have only a single FPU, and weren't really suitable for tasks with much more than about 1% floating point. The newer T2 and T2+ processors have an FPU per core. That's probably still not great for massive floating point crunching, but is much more respectable.
(Note: Hyper-Threading Technology is a trademark of Intel. Sun uses the term Chip MultiThreading (CMT).)
We used Sun Fire T2000s for my last system. The boxes themselves were far exceeded our capacity requirements in terms of processing power. For us the decision was based on the lower power consumption and space requirement. We successfully ran WebSphere 6, Oracle 10g and SunONE Directory server on the same box.
My info may be a bit out of date (last used these servers 2 years ago) but as I recall one big gotcha was that all the cores on a single CPU all shared the same FPU unit, so if your code did a lot of floating point (we were doing GIS) the FPU was a massive bottleneck and you didn't get much benefit from the large number of threads.
For any process with high parallelism these machines (eg, the t1000/t2000) are great for their cost. I've been running oracle on them for about 18 months now and it works great.
If you task is a single threaded/single process, then you'd be better off with a high speed dual/quad core intel machine.
If your application has lots of threads/lots of processes then these machines will likely be great for it.
Best of all, Sun will send you one for 60 days to evaluate, that is what we did before committing to it, ended up getting 2 t2000's and have recently purchased another 4 t1000's.
It hit me last night that our core processes aren't multi-threaded, but the machine in question does have a bunch of system processes that are. In particular, it acts as an NFS server. It sounds like running hundreds of processes will benefit from all those cores, as well.
I'll see if we can get a demo unit to test on first.
Sun has been selling the Niagra machines to be all things to all comers. They do have their place: web services being the best deployment. We have run Oracle on some T2000s and it worked well for highly parallelized operations. But the machines fall flat on single-treaded operations, the performance of which is rather bad. If you have floating point work to do, look elsewhere. Even the newer chips with A FPU per core is inadequate. Also, these machines cannot take a enterprise-class pounding for long and we've had reliability problems. Multi-core techology is more hype than substance. Sandia National Lab's research on it and found that four to eight cores is about the top-end of usefulnes and that a 16 core chip has the same throughput as a dual core chip. So a 16 core chip is a waste of a lot of money. Also, as the number of cores increase, the clock speed muust decrease, because of the thermal wall. Most manufacturers will probably settle on quad-core chips until memory technology improves (you can't keep 16 cores fed with memory and most of the cores are stalled). Finally, given the chaos at Sun, you'd do better to look elsewhere.