RDMA cluster benchmarking - rdma

Requirement :
I have a cluster of 4 machines and I want to get the collective latency and bandwidth for RDMA Write & RDMA Read operation in full mesh.
I used ib_write_lat/ib_read_lat for latency and ib_write_bw/ib_read_bw for bandwidth. But those are p2p and do not work on clusters.
I want to use a tool that is already available rather as with home grown tools publishing results will be a problem.

You can use MPI to run cluster-wide RDMA-enabled point-to-point or collective benchmarks
Checkout: http://mvapich.cse.ohio-state.edu/benchmarks/

Related

Eclipse milo - Performance/scalability when deploying an OPCUA server in the cloud

I have created an OPCUA server with eclipse milo that is installed in the same machine where the clients are installed, so the communication works fast and reliably.
I did a bit of sniffing with wireshark to see how much communication involves under the hood and apparently there is a lot going on when monitoring variable, alarms, etc....
So I am thinking what issues I may expect in terms of performance and scalability if the server gets deployed in the cloud. I have seen that people talks about OPCUA cloud services, but not being this a hot topic is hard to foresee what challenges may come, and how well it scales and performs.
I would imagine that OPCUA uses sticky sessions, which means that you only can support a max number of users/requests, so dynamic scaling may not be an alternative right?
I tried the samples provides by eclipse milo, which are stored somewhere in the network, and it took long timeto connect to it. If that is the performance one may expect then the perception of the service for non-technical users would be that it does not work well.
Is the cloud a right place to use OPCUA considering the network overhead? Any recommendation to stick to local networks (intranet) only and skip the cloud?
Any feedback would be appreciated, thanks!
If you wanted to get into more detail and share Wireshark captures we might be able to go over parameters that would reduce traffic.
If bandwidth is a concern because you're using cellular or other constrained connections then sure, OPC UA may not be the best fit.
I'm curious what kind of delays or latency you experienced running the examples - connecting over the internet generally does not take very long, so perhaps you were also measuring the time it took to compile and start the example or there was something going on with your network.

How to identify the network performance issue?

I am a little confuse about my message server's network bottleneck issue. I can obviously found the problem caused by the a lot of network operation, but I am not sure why and how to identify it.
Currently we are using GCP as our VM and 4 core/8G RAM for our message server. Redis & Cassandra is in other server at the same place. The problem happened at the network operation to the redis server and cassandra server.
I need to handle 3000+ requests at once to save data to redis and 12000+ requests to cassandra server.
My task consuming all my CPU power and the CPU usage down right after I merge the redis request and cassandra request to kind of batch request. The penalty is I have to delay my data saving.
What I want to know is how can I know the network's capability of my system. How many requests within 1 second is a reasonable task?. As my testing, this is obviously true that the bottleneck is the network operation, but I can't prove it. I can't even know how to estimate a reasonable network usage of my system? Are there some tools or other thing that can help to my make sure my network's problem? Or this is just a error config of my GCP system?
Thanks,
Eric
There is a "monitoring" label in each instance where you can check through graphs values like instance CPU, Network and RAM usage.
But to further check the performance of your instance you should use StackDriver Logging1 and Monitoring2. It stores a lot of information from the internal servers and the system performance. for that you will need to install the agent in the instance. It also stores information about your Load Balancer3, in case you are using one with your web application, which is very advisable since it scale your resources up or down with intelligent Autoscaling.
But in order to test out your network you will need to use some third party tool to overload the network. There are multiple tools to achieve this, like JMeter.

Jitsi server hardware requirements on test environment

We are implementing secure videoconferencing/chat using Jitsi. We could not find any hardware requirements for a Jitsi server. Could you please share your thoughts regarding the hardware requirements for a Jitsi server in test as well as production environment?
Thanks,
Syed
I am using https://github.com/matrix-org/docker-jitsi on a free tier Ec2 instance.
With 1 active conference (8 participant). it didn't seem to spike resource consumption. You can observe that CPU usage is 0.0 & used RAM is about 450MB.
The hardware requirements will depend on the amount of users you have. From what I've seen, Jisti does not require huge resources to run smoothly.
According to this Jitsi Videobridge Performance Evaluation
On a plain Xeon server that you can rent for about a hundred dollars,
for about 20% CPU you will be able to run 1000+ video streams using an
average of 550 Mbps!
For that thing, we need to get an idea about how many simultaneous conferences go there and how many participants at each conference.
Another special parameter is how many users enable their video stream and audio stream. And their network bandwidth.Based on that we can decide server requirements.

Running Kafka cluster in Docker containers?

From a performance perspective, is it a good choice to run Kafka in Docker containers ? Are there things which one should watch out for, tune specifically etc. ?
There is a good research paper from IBM on this topic - it is a bit dated by now, but I am sure the basic statements still hold true and have only been improved upon. The gist is, that the overhead introduced by Docker is quite small where it comes to cpu and memory, but for IO heavy applications you need to be a bit more careful. Depending on the workload I'd put Kafka squarely in the IO heavy group, so it is probably not a no-brainer.
Kafka benefits a lot from fast disc access, so if you run your containers in some sort of distributed platform with storage attached on a SAN or NFS share or something like that I'd assume, that you will notice a difference. But if you only chose containers to ease deployment and run them on one physical machine, I'd assume the difference to be negligible.
But as with all performance questions, it is hard to say this in general, you'll have to test your specific use case and environment to be sure.
I believe the performance would largely be effected by the type of machine you use. Linkedin and other large users of Kafka often recommend using spinning disks rather than SSDs because of the predominantly linear reads and writes done along with the the use of IBM's Zerocopy in the Kafka protocol. On a machine hosting many containers, you'd lose all the advantages that spinning disks give Kafka.

In network programming, there is a limit to number of sockets/connections, how webserver exceeds this limit?

I have started exploring on the network programming in Linux using Socket. I am wondering how come webservers like Yahoo, google, and etc are able to establish million/billions of connections. I believe the core is only socket programming to access the remote server. If that is the case then how come billion and millions of people are able to connect to the server. It means billions/millions of socket connection. This is not possible right? The spec says maximum 5 socket connections only. What is the mystery behind it?
Can you also speak in terms of this - API?
listen(sock,5);
To get an idea of tuning an individual server you may want to start with Apache Performance Tuning and maybe Linux Tuning Parameters, though it is somewhat outdated. Also see Upper limit of file descriptor in Linux
When you got a number of finely tuned servers, a network balancer is used and it typically distributes IP traffic across a cluster of such hosts. Sometimes a DNS load balancing is used in addition to further split between IP balancers.
Here, if you are interested you can follow Google's Compute Engine Load Balancing, which provides a single IP address, and does away with the need to have DNS balancing in addition, and reproduce their results:
The following instructions walk you step-by-step in setting up a
Google Compute Load Balancer benchmark that achieves 1,000,000
Requests Per Second. It is the code and step were used when writing a
blog post for the Google Cloud Platform blog. You can find the Google
Cloud Platform blog # http://googlecloudplatform.blogspot.com/ This
GIST is a combination of instructions and scripts from Eric Hankland
and Anthony F. Voellm. You are free to reuse the code snippets.
https://gist.github.com/voellm/1370e09f7f394e3be724
It doesn't 'say maximum 5 connections only'. The argument to listen() that you refer to is the backlog, not the total number of connections. It refers to the number of incoming connections that TCP will accept and hold on the 'backlog' queue() prior to the application getting hold of them via accept().