Connection loses on data-send. Server receives RST, ACK after handshake - sockets

I've got a simple TCP-server hosted on 64bit windows server 2008 r2. TCP-server just receives connection and replies to incoming data with recieved message(echo). There are about 600-700 clients who try connect and send some information. And the problem is: server loses almost all of connections(about 90%) when data is sended from client to server(First 15-20 connections have been performed normallly). I've sniffed the TCP-traffic with Whireshark.
From server side log is:
+--------------+--------------+--------------------------------+
| Source | Destination | Info |
+--------------+--------------+--------------------------------+
| 1. client ip | server ip | [SYN] **Handshake step1** |
| 2. server ip | client ip | [SYN, ACK] **Handshake step2** |
| 3. client ip | server ip | [ACK] **Handshake step3** |
| 4. client ip | server ip | [RST, ACK] **Loses connection**|
+--------------+--------------+--------------------------------+
From client side log is:
+--------------+--------------+--------------------------------+
| Source | Destination | Info |
+--------------+--------------+--------------------------------+
| 1. client ip | server ip | [SYN] **Handshake step1** |
| 2. server ip | client ip | [SYN, ACK] **Handshake step2** |
| 3. client ip | server ip | [ACK] **Handshake step3** |
| 4. client ip | server ip | [PSH, ACK] Message |
| 5. client ip | server ip | [PSH, ACK] CRLF message |
| 6. server ip | client ip | [RST, ACK] **Loses connection**|
+--------------+--------------+--------------------------------+
In both cases the «Reset cause» is: \000\000\000......\000
The connection did not lose when we're connecting from local network.

I don't think it's related to your code, but I do have several questions:
1. What is the network speed between client and the server? Are there any packets lost for other applications? What's the size of the message sent from client?
2. How long is it between the RST received and the handshake finishes (server) or message sent (client)?
3. Do you know if there are any firewalls between the client and server? You also said it worked well on LAN. The China GFW often does so.

I found the solution. The problem was that, provider changed tariff plan without any notice. New tariff plan limited the maximum number of connections.

Related

RDS Connection being in idle state. Airflow Celery worker

I am using airflow 1.10.9 and celery worker. I have dags which run whenever task comes and it spins up new ec2 instance and it connects to RDS on the basis of logic but ec2 holds the connection even when there no task is running and it keeps holding connection until Auto scaling scales down the instance.
RDS Details -
Class : db.t3.xlarge
Engine : PostgreSQL
I have checked the RDS logs but no luck.
LOG: could not receive data from client: Connection reset by peer
here is RDS connections.
state | wait_event | wait_event_type | count
--------+---------------------+-----------------+-------
| AutoVacuumMain | Activity | 1
| BgWriterHibernate | Activity | 1
| CheckpointerMain | Activity | 1
idle | ClientRead | Client | 525
| LogicalLauncherMain | Activity | 1
| WalWriterMain | Activity | 1
active | | | 1
All the connections are from celery workers.
Any help is appreciated.

Cygnus does not reconnect to kafka broker

I am using cygnus-kafka connector. when the connection is lost beetween cygnus and the zookeeper. cygnus can not reconnect again to the zookeeper when the conenction is back. I need to restart it so it will be able to reconnect to the zookeeper.
Any ideas why cygnus is not able to reconnect to the kafka broker if the connection was lost once?
This the error that I got:
time=2016-11-30T11:29:26.254Z | lvl=WARN | corr=2a924ba4-b6f0-11e6-8836-fa163e68f7a2 | trans=ce766745-ae85-415a-a6f3-0bed9f121e79 | srv=service| subsrv=/servicepath | function=run | comp=cygnusagent | msg=org.apache.zookeeper.ClientCnxn$SendThread[1185] : Session 0x0 for server kafkaServerIp/kafkaServerIp:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:856)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154)
time=2016-11-30T11:29:28.211Z | lvl=WARN | corr=2a924ba4-b6f0-11e6-8836-fa163e68f7a2 | trans=ce766745-ae85-415a-a6f3-0bed9f121e79 | srv=service| subsrv=/servicepath | function=processNewBatches | comp=cygnusagent | msg=com.telefonica.iot.cygnus.sinks.NGSISink[439] : Unable to connect to zookeeper server within timeout: 10000
Thanks!
The problem is the connection from Cygnus to Kafka is permanent, because of efficiency issues. Nevertheless, a check for reseted connection by peer is missing in the code. I'll fix it ASAP in order it is ready for next version release (1.7.0) by the end of January (of course, it will be available at master branch once fixed, much sooner).

pcp_attach_node gives EOFError in pgpool

I have successfully setup replication for my Postgres database using pg_pool.
Then I stopped the master server and checked the pool status. It is as like below
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role
---------+------------+------+--------+-----------+--------
0 | 10.140.0.9 | 5432 | 3 | 0.500000 | slave
1 | 10.140.0.7 | 5432 | 2 | 0.500000 | master
(2 rows)
Then I started the server, but it still shows the same status for the slave. So I used the following command to start the node:
/usr/sbin/pcp_node_info 10 10.140.0.9 5432 postgres postgres 1
But it is giving "EOFError" error. Please help to solve this issue.
Or please let me know a way to bring back the status 2 from status 3?
I solved the issue myself. In configuration the pcp port is 9898. Also there should be no space before password in pcp.conf file.
The pcp command should be as follows
/usr/sbin/pcp_node_info 10 localhost 9898 postgres postgres 1

ActiveMQ 5.8 / XMPP Federation (Support for dialback?)

I'm trying to set up XMPP federation between a Cisco UCM platform and ActiveMQ 5.8. (Would like to consume XMPP messages over JMS). I've verified XMPP is set up on ActiveMQ by attaching to it with iChat, and have sent messages through it that arrive on a JMS topic.
Cisco Federation, however, is not working. I'm seeing the following in the ActiveMQ logs, and I'm not sure where to go with this. I see dialback classes in the xmpp jar files in ActiveMQ..
2013-08-27 11:48:29,789 | DEBUG | Creating new instance of XmppTransport | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ Transport Server Thread Handler: xmpp://0.0.0.0:61222
2013-08-27 11:48:29,796 | DEBUG | XMPP consumer thread starting | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:29,800 | DEBUG | Sending initial stream element | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ BrokerService[localhost] Task-106
2013-08-27 11:48:29,801 | DEBUG | Initial stream element sent! | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ BrokerService[localhost] Task-106
2013-08-27 11:48:29,852 | DEBUG | Unmarshalled new incoming event - jabber.server.dialback.Result | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:29,852 | WARN | Unkown command: jabber.server.dialback.Result#6b7acfe1 of type: jabber.server.dialback.Result | org.apache.activemq.transport.xmpp.ProtocolConverter | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:59,864 | DEBUG | Unmarshalled new incoming event - org.jabber.etherx.streams.Error | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:59,864 | WARN | Unkown command: org.jabber.etherx.streams.Error#69d2b85a of type: org.jabber.etherx.streams.Error | org.apache.activemq.transport.xmpp.ProtocolConverter | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:59,865 | DEBUG | Unmarshalled new incoming event - org.jabber.etherx.streams.Error | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:59,865 | WARN | Unkown command: org.jabber.etherx.streams.Error#94552fd of type: org.jabber.etherx.streams.Error | org.apache.activemq.transport.xmpp.ProtocolConverter | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:59,865 | DEBUG | XMPP consumer thread starting | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:59,866 | DEBUG | Transport Connection to: tcp://10.67.55.53:50750 failed: java.io.IOException: Unexpected EOF in prolog

loopback on tcp port from localhost to localhost

I've seen running netstat that there are two strange tcp connection opened:
tcp4 0 0 localhost.49153 localhost.1023 ESTABLISHED
tcp4 0 0 localhost.1023 localhost.49153 ESTABLISHED
I wonder if it is normal. Can someone help me? thank you!
Yes, that's just normal loopback used by OS X. Checkout a Apples port usage publication
port service description
-------------+------------+------------------
600-1023 | ipcserver | Mac OS X RPC-based services
49152-65535 | Xsans | Xsan Filesystem Access
49152-65535 | misc | Back to My Mac