TCP connection - Bare Metal vs libvirt - VM - sockets

I performed a TCP connection between two directly connected 'bare metal'. I observed 90% of link as throughput.
When performing TCP connection between sender 'bare metal' to receiver VM, I observe 20% of link as throughput.
VM is set to the MAX core allocation and RAM allocation of Node. I also tried core affinity setting.
Can anybody suggest a reason for this variation like core allocation of the VM to Node?

Related

High number of socket descriptor in rmq

We recently had an issue in our rabbitmq server , where it was unable to accept new connections and dropping those TCP connections .
we didn’t saw any spike in our channels or consumers .
Socket Descriptors(SD) and erlang process shoot up in short span of time causing Rabbit MQ to get stuck and no new connections get established post that.
We do not see any significant increase in channels, connections or consumers to establish a link between the sudden increase in SD and erlang Processes.
RMQ VERSION: 3.7.14
Erlang version: Erlang 21.3.8.1
RMQ running on Kubernetes as a stateful set .
RMQ erlang process spike .
Socket used.
Post restarting the server its working fine , but its resurfacing again .
I suggest you to check server's half-open connections. Seems you can have that kind of a situation if you have an aggressive reconnects from clients side. They create connections, and reconnect again and again.
Also, even if you have the same amount of consumers, there can be increased amount of publishers.
So, my suggestion here - to check logs and metrics on reconnects to rabbitmq.

In jmeter SEVERE: No buffer space available (maximum connections reached?): connect exception

In jemeter i am testing for 100000 MQTT concurrent user with ramp up of 10000 and loop count is 1.
The library that I am using for MQTT in Jmeter is https://github.com/emqx/mqtt-jmeter . But I am getting
SEVERE: No buffer space available (maximum connections reached?): connect exception after reaching 64378.
Specification:
OS: Windows 10
Ram : 64 GB
CPU : i7
Configuration in registry editor:
This is due to the windows having too many active client connections.
The default number of ephemeral TCP ports is 5000. Sometimes this number may be insufficient if the server has too many active client connections. In that case the ephemeral TCP ports are all used up and no more can be allocated to a new client connection request resulting in the error message (for a Java application)
You should specify TCP / IP settings by editing the following registry values ​​in the HKEY_LOCAL_MACHINE \ SYSTEM \ CurrentControlSet \ Services \ Tcpip \ Parameters registry subkey:
MaxUserPort
Specifies the maximum port number for ephemeral TCP ports.
TcpNumConnections
Specifies the maximum number of concurrent connections that TCP can open. This value significantly affects the number of concurrent osh.exe processes that are allowed. If the value for TcpNumConnections is too low, Windows can not assign TCP ports to stages in parallel jobs, and the parallel jobs can not run.
These keys are not added to the registry by default.
Follow this link to Configuring the Windows registry: Specifying TCP / IP settings and made necessary edit.
Hope this will help.

Linux Kernel parameters which can be tuned when TCP backlog exceeds in WebSphere MQ server

We are facing an issue where we see TCP backlogs gets exceeded than default value (100) on our MQ server (v7.5) running on Linux (Redhat) platform during high number of connection requests on MQ server. The ListenerBacklog is configured as 100 in qm.ini which is the default listener backlog value (maximum connection requests) for Linux. Whenever we have connections burst and TCP backlogs exceeds the queue manager stops functioning and resumes only when queue manager/server is restarted.
So we are looking whether there are attributes in Linux kernel related to socket tuning which can improve tcp backlog at network layer and cause no harm to queue manager.Does increasing these values as below in /etc/sysctl.conf will help to resolve this issue or improve performance of queue manager?
net.ipv4.tcp_max_syn_backlog = 4096
net.core.somaxconn = 1024
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

Amazon EC2 Elastic Load Balancer TCP disconnect after couple of hours

I am testing the reliability of TCP connections using Amazon Elastic Load Balancer compared to not using the Load Balancer to see if it has any impact.
I have setup a small Elastic Load Balancer on Amazon EC2 us-east zones with 8 t2.micro instances using an auto scaling group without policy and set to 8 min/max instance.
Each instance run a simple TCP server that accept connections on port 8017 and relay some data to the clients coming from another remote server located in my network. The same data is send to all clients.
For the purpose of the test, the servers running on the micro instances are only sending 1 byte of data every 60 seconds (to be sure the connection don't time out).
I connected multiple clients from various outside networks using the ELB DNS name provided, and after maybe 6-24 hours, I always stop receiving data and eventually the connections all die.
All clients stops around the same time, even though they are on different network/ISP. Each "client" application is doing about 10 TCP connections and they all stop receiving data.
All server instances look fine after this happen, they still send data.
To do further testing and eliminate the TCP server code problem, I also have external clients connected directly to the public IP of a single instance, without the ELB, and the data doesn't stop and the connection is not lost in this case (so far).
The Load balancer Idle Timeout is set to 900 seconds.
The Cross-Zone load balancing is enabled and I am using the following zones: us-east-1e, us-east-1b, us-east-1c, us-east-1d
I read the documentation, and searched everywhere to see if this is a known behaviour, but I couldn't find any clear answer or confirmation of others having the same issue, but it seems clear it is happening in my case.
My question: Is this a known/expected behaviour for TCP load balancer? Otherwise, any idea what could be the problem in my setup?

Why is my Netty based TCP server hanging up with 100% CPU usage?

I've developed a Netty based TCP server to receive maintain connection with GSM/GPRS based devices and to persist those data in MySql database. Currently 5K connections are handled. Devices send periodic messages with interval of 30-60 secs, but connections are kept alive to maintain duplex communication.
The server application consumes 1-2% CPU in normal operation with peaks up to 10%, average load is very low. However after 6 hours to 48 hours normal operation, server application hangs up with constant 100% CPU consumption, thread dump indicates that epoll selector is the reason for high CPU usage. Applications still keeps connections for a few hours, then CPU consumption increases to 200% and most of the connections are released.
In the beginning of the project we used MINA and had the same issue with 1K active connections, that is why we switched to Netty. Until 5K connections Netty was much more stable and hang up period was 1-2 weeks.
Our server configuration:
I7-2600 Quad Core CPU,
8 GB Ram, Centos 5.0,
Open JDK 6.0,
Netty 3.2.4 (Netty is updated to 3.5.2 a few hours ago)
In order to overcome this problem we will update JDK to 7.0 (JDK has a new I/O implementation optimized for asynchronous operations) and try different OS including FreeBSD, Windows Server since each operating system has different strategies for handling I/O.
Any help will be appreciated, thanks..
This sounds like the Epoll bug.
The app is proxying connections to backend systems. The proxy has a pool of channels that it can use to send requests to the backend systems. If the pool is low on channels, new channels are spawned and put into the pool so that requests sent to the proxy can be serviced. The pools get populated on app startup, so that is why it doesn't take long at all for the CPU to spike through the roof (22 seconds into the app lifecycle). Source
Netty has a workaround built-in. Not sure from which version though, will have to update later.
System.setProperty("org.jboss.netty.epollBugWorkaround", "true");