I'm trying to set up XMPP federation between a Cisco UCM platform and ActiveMQ 5.8. (Would like to consume XMPP messages over JMS). I've verified XMPP is set up on ActiveMQ by attaching to it with iChat, and have sent messages through it that arrive on a JMS topic.
Cisco Federation, however, is not working. I'm seeing the following in the ActiveMQ logs, and I'm not sure where to go with this. I see dialback classes in the xmpp jar files in ActiveMQ..
2013-08-27 11:48:29,789 | DEBUG | Creating new instance of XmppTransport | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ Transport Server Thread Handler: xmpp://0.0.0.0:61222
2013-08-27 11:48:29,796 | DEBUG | XMPP consumer thread starting | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:29,800 | DEBUG | Sending initial stream element | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ BrokerService[localhost] Task-106
2013-08-27 11:48:29,801 | DEBUG | Initial stream element sent! | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ BrokerService[localhost] Task-106
2013-08-27 11:48:29,852 | DEBUG | Unmarshalled new incoming event - jabber.server.dialback.Result | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:29,852 | WARN | Unkown command: jabber.server.dialback.Result#6b7acfe1 of type: jabber.server.dialback.Result | org.apache.activemq.transport.xmpp.ProtocolConverter | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:59,864 | DEBUG | Unmarshalled new incoming event - org.jabber.etherx.streams.Error | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:59,864 | WARN | Unkown command: org.jabber.etherx.streams.Error#69d2b85a of type: org.jabber.etherx.streams.Error | org.apache.activemq.transport.xmpp.ProtocolConverter | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:59,865 | DEBUG | Unmarshalled new incoming event - org.jabber.etherx.streams.Error | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:59,865 | WARN | Unkown command: org.jabber.etherx.streams.Error#94552fd of type: org.jabber.etherx.streams.Error | org.apache.activemq.transport.xmpp.ProtocolConverter | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:59,865 | DEBUG | XMPP consumer thread starting | org.apache.activemq.transport.xmpp.XmppTransport | ActiveMQ Transport: tcp:///10.67.55.53:50750#61222
2013-08-27 11:48:59,866 | DEBUG | Transport Connection to: tcp://10.67.55.53:50750 failed: java.io.IOException: Unexpected EOF in prolog
Related
I already read a lot of the documentation from Datadog and Strimzi
about the JMX autodiscovery and the JMX configuration. But I missing something, at least it's not working (dd doesn't get the metrics)
Im using kubectl to an AKS, installed Strimzi to use Kafka on AKS
helm install strimzi-kafka-release strimzi/strimzi-kafka-operator
and with kafka-single.yaml setting up the kafka and zokeeper pods
kubectl apply -f kafka-single.yaml -n aks
then install the datadog agent with datadog-values.yaml file
helm install datadog-agent -f datadog-values.yaml --set datadog.site='datadoghq.com' --set datadog.apiKey='$DD-KEY' datadog/datadog
and I can even see the options for the jmx to be available on the process inspect in Datadog
I'm pretty sure I have something badplaced or badcalled, but I'm a little frustrated rn and can't get to what is the thing that doesn't allow the metrics to be discoverable for datadog.
I tried to edit the confd option on the datadog-values.yaml, but creates the files in /etc/datadog-agent/conf.d instead of /etc/datadog-agent/conf.d/kafka.d/ where it is recognized the conf file and try to do something (I guess, at least fails when I change the host)
I'm editing and copying kafka-conf.yaml directly to the pod
kubectl cp kafka-conf.yaml datadog-agent-pod:/etc/datadog-agent/conf.d/kafka.d/conf.yaml
and then I try the command
kubectl exec -it datadog-agent-pod agent jmx list matching
where it fails if I put localhost or somethig else different than %%host%%
(the failing message when I tried with directly wtit an IP)
Loading configs...
Config kafka was loaded.
2022-02-03 18:49:23 GMT | JMX | INFO | App | JMX Fetch 0.44.6 has started
2022-02-03 18:49:23 GMT | JMX | INFO | App | Found 0 config files
2022-02-03 18:49:24 GMT | JMX | INFO | App | update is in order - updating timestamp: 1643914164
2022-02-03 18:49:24 GMT | JMX | INFO | App | Cleaning up instances...
2022-02-03 18:49:24 GMT | JMX | INFO | App | Dealing with YAML config instances...
2022-02-03 18:49:24 GMT | JMX | INFO | App | Dealing with Auto-Config instances collected...
2022-02-03 18:49:24 GMT | JMX | INFO | App | Instantiating instance for: kafka
2022-02-03 18:49:24 GMT | JMX | INFO | App | Started instance initialization...
2022-02-03 18:49:24 GMT | JMX | INFO | Instance | Trying to connect to JMX Server at 10.244.0.66:9999
2022-02-03 18:49:24 GMT | JMX | INFO | Instance | Connection closed or does not exist. Attempting to create a new connection...
2022-02-03 18:49:24 GMT | JMX | INFO | ConnectionFactory | Connecting using JMX Remote
2022-02-03 18:49:24 GMT | JMX | INFO | Connection | Connecting to: service:jmx:rmi:///jndi/rmi://10.244.0.66:9999/jmxrmi
2022-02-03 18:49:27 GMT | JMX | INFO | App | Completed instance initialization...
2022-02-03 18:49:27 GMT | JMX | WARN | App | Could not initialize instance: kafka-10.244.0.66-9999:
java.util.concurrent.ExecutionException: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 10.244.0.66; nested exception is:
java.net.NoRouteToHostException: No route to host (Host unreachable)]
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.datadog.jmxfetch.App.processRecoveryResults(App.java:1001)
at org.datadog.jmxfetch.App$6.invoke(App.java:977)
at org.datadog.jmxfetch.tasks.TaskProcessor.processTasks(TaskProcessor.java:63)
at org.datadog.jmxfetch.App.init(App.java:969)
at org.datadog.jmxfetch.App.run(App.java:205)
at org.datadog.jmxfetch.App.main(App.java:153)
Caused by: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 10.244.0.66; nested exception is:
java.net.NoRouteToHostException: No route to host (Host unreachable)]
at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:370)
at java.management/javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
at org.datadog.jmxfetch.Connection.createConnection(Connection.java:64)
at org.datadog.jmxfetch.RemoteConnection.<init>(RemoteConnection.java:101)
at org.datadog.jmxfetch.ConnectionFactory.createConnection(ConnectionFactory.java:38)
at org.datadog.jmxfetch.Instance.getConnection(Instance.java:403)
at org.datadog.jmxfetch.Instance.init(Instance.java:416)
at org.datadog.jmxfetch.InstanceInitializingTask.call(InstanceInitializingTask.java:15)
at org.datadog.jmxfetch.InstanceInitializingTask.call(InstanceInitializingTask.java:3)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 10.244.0.66; nested exception is:
java.net.NoRouteToHostException: No route to host (Host unreachable)]
at jdk.naming.rmi/com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:137)
at java.naming/com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:207)
at java.naming/javax.naming.InitialContext.lookup(InitialContext.java:409)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1839)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1813)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:302)
... 12 more
Caused by: java.rmi.ConnectIOException: Exception creating connection to: 10.244.0.66; nested exception is:
java.net.NoRouteToHostException: No route to host (Host unreachable)
at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:635)
at java.rmi/sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:209)
at java.rmi/sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:196)
at java.rmi/sun.rmi.server.UnicastRef.newCall(UnicastRef.java:343)
at java.rmi/sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:116)
at jdk.naming.rmi/com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:133)
... 17 more
Caused by: java.net.NoRouteToHostException: No route to host (Host unreachable)
at org.datadog.jmxfetch.util.JmxfetchRmiClientSocketFactory.getSocketFromFactory(JmxfetchRmiClientSocketFactory.java:67)
at org.datadog.jmxfetch.util.JmxfetchRmiClientSocketFactory.createSocket(JmxfetchRmiClientSocketFactory.java:40)
at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:617)
... 22 more
but when the host is with %% there's no error but it get nothing from the kafka pods.
What I'm doing wrong? or just what I have wrong on this setting? .-.
I checked other answers and quesions and a lot of docs these last days just to get the kafka metrics and apparently One does not simply configure datadog for JMX autodiscovery in AKS with Strimzi/Kafka... I just need the topics metrics.
I know that Strimzi aims to have Prometheus Metrics, but I need Datadog and I already got scolded for trying the Prometheus option (bc I couldn't enable it and get the metrics from there to dd).
I feel like it has to be something with the annotations, but tbh idk.
Please help, I can't be the only one with this problem.
I am using airflow 1.10.9 and celery worker. I have dags which run whenever task comes and it spins up new ec2 instance and it connects to RDS on the basis of logic but ec2 holds the connection even when there no task is running and it keeps holding connection until Auto scaling scales down the instance.
RDS Details -
Class : db.t3.xlarge
Engine : PostgreSQL
I have checked the RDS logs but no luck.
LOG: could not receive data from client: Connection reset by peer
here is RDS connections.
state | wait_event | wait_event_type | count
--------+---------------------+-----------------+-------
| AutoVacuumMain | Activity | 1
| BgWriterHibernate | Activity | 1
| CheckpointerMain | Activity | 1
idle | ClientRead | Client | 525
| LogicalLauncherMain | Activity | 1
| WalWriterMain | Activity | 1
active | | | 1
All the connections are from celery workers.
Any help is appreciated.
I am new to Kafa and data ingestion. I know Kafka is fault tolerant, as it keeps the data redundantly on multiple nodes. However, what I don't understand is how can we achieve fault tolerance on the source/producer end. For example, If I have netcat as souce, as given in the example below.
nc -l [some_port] | ./bin/kafka-console-producer --broker-list [kafka_server]:9092 --topic [my_topic]
The producer would fail pushing messages if the node doing netcat goes down. I was thinking if there was a mechanism where Kafka can pull the input itself, so for example, if on one node the netcat fails, another node can take over and start pushing messages using netcat.
My second question is how this is achieved in Flume, as it is a pull based architecture. Would Flume work in this case, that is, if one node doing netcat fails?
Every topic, is a particular stream of data (similar to a table in a database). Topics, are split into partitions (as many as you like) where each message within a partition gets an incremental id, known as offset as shown below.
Partition 0:
+---+---+---+-----+
| 0 | 1 | 2 | ... |
+---+---+---+-----+
Partition 1:
+---+---+---+---+----+
| 0 | 1 | 2 | 3 | .. |
+---+---+---+---+----+
Now a Kafka cluster is composed of multiple brokers. Each broker is identified with an ID and can contain certain topic partitions.
Example of 2 topics (each having 3 and 2 partitions respectively):
Broker 1:
+-------------------+
| Topic 1 |
| Partition 0 |
| |
| |
| Topic 2 |
| Partition 1 |
+-------------------+
Broker 2:
+-------------------+
| Topic 1 |
| Partition 2 |
| |
| |
| Topic 2 |
| Partition 0 |
+-------------------+
Broker 3:
+-------------------+
| Topic 1 |
| Partition 1 |
| |
| |
| |
| |
+-------------------+
Note that data is distributed (and Broker 3 doesn't hold any data of topic 2).
Topics, should have a replication-factor > 1 (usually 2 or 3) so that when a broker is down, another one can serve the data of a topic. For instance, assume that we have a topic with 2 partitions with a replication-factor set to 2 as shown below:
Broker 1:
+-------------------+
| Topic 1 |
| Partition 0 |
| |
| |
| |
| |
+-------------------+
Broker 2:
+-------------------+
| Topic 1 |
| Partition 0 |
| |
| |
| Topic 1 |
| Partition 0 |
+-------------------+
Broker 3:
+-------------------+
| Topic 1 |
| Partition 1 |
| |
| |
| |
| |
+-------------------+
Now assume that Broker 2 has failed. Broker 1 and 3 can still serve the data for topic 1. So a replication-factor of 3 is always a good idea since it allows for one broker to be taken down for maintenance purposes and also for another one to be taken down unexpectedly. Therefore, Apache-Kafka offers strong durability and fault tolerance guarantees.
Note about Leaders:
At any time, only one broker can be a leader of a partition and only that leader can receive and serve data for that partition. The remaining brokers will just synchronize the data (in-sync replicas). Also note that when the replication-factor is set to 1, the leader cannot be moved elsewhere when a broker fails. In general, when all replicas of a partition fail or go offline, the leader will automatically be set to -1.
Having said that, as far as your producer lists all the addresses of the Kafka brokers that are in the cluster (bootstrap_servers), you should be fine. Even when a broker is down, your producer will attempt to write the record to another broker.
Finally, make sure to set acks=all (might have an impact to throughput though) so that the all in-sync replicas acknowledge that they received the message.
I am using cygnus-kafka connector. when the connection is lost beetween cygnus and the zookeeper. cygnus can not reconnect again to the zookeeper when the conenction is back. I need to restart it so it will be able to reconnect to the zookeeper.
Any ideas why cygnus is not able to reconnect to the kafka broker if the connection was lost once?
This the error that I got:
time=2016-11-30T11:29:26.254Z | lvl=WARN | corr=2a924ba4-b6f0-11e6-8836-fa163e68f7a2 | trans=ce766745-ae85-415a-a6f3-0bed9f121e79 | srv=service| subsrv=/servicepath | function=run | comp=cygnusagent | msg=org.apache.zookeeper.ClientCnxn$SendThread[1185] : Session 0x0 for server kafkaServerIp/kafkaServerIp:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:856)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154)
time=2016-11-30T11:29:28.211Z | lvl=WARN | corr=2a924ba4-b6f0-11e6-8836-fa163e68f7a2 | trans=ce766745-ae85-415a-a6f3-0bed9f121e79 | srv=service| subsrv=/servicepath | function=processNewBatches | comp=cygnusagent | msg=com.telefonica.iot.cygnus.sinks.NGSISink[439] : Unable to connect to zookeeper server within timeout: 10000
Thanks!
The problem is the connection from Cygnus to Kafka is permanent, because of efficiency issues. Nevertheless, a check for reseted connection by peer is missing in the code. I'll fix it ASAP in order it is ready for next version release (1.7.0) by the end of January (of course, it will be available at master branch once fixed, much sooner).
I've got a simple TCP-server hosted on 64bit windows server 2008 r2. TCP-server just receives connection and replies to incoming data with recieved message(echo). There are about 600-700 clients who try connect and send some information. And the problem is: server loses almost all of connections(about 90%) when data is sended from client to server(First 15-20 connections have been performed normallly). I've sniffed the TCP-traffic with Whireshark.
From server side log is:
+--------------+--------------+--------------------------------+
| Source | Destination | Info |
+--------------+--------------+--------------------------------+
| 1. client ip | server ip | [SYN] **Handshake step1** |
| 2. server ip | client ip | [SYN, ACK] **Handshake step2** |
| 3. client ip | server ip | [ACK] **Handshake step3** |
| 4. client ip | server ip | [RST, ACK] **Loses connection**|
+--------------+--------------+--------------------------------+
From client side log is:
+--------------+--------------+--------------------------------+
| Source | Destination | Info |
+--------------+--------------+--------------------------------+
| 1. client ip | server ip | [SYN] **Handshake step1** |
| 2. server ip | client ip | [SYN, ACK] **Handshake step2** |
| 3. client ip | server ip | [ACK] **Handshake step3** |
| 4. client ip | server ip | [PSH, ACK] Message |
| 5. client ip | server ip | [PSH, ACK] CRLF message |
| 6. server ip | client ip | [RST, ACK] **Loses connection**|
+--------------+--------------+--------------------------------+
In both cases the «Reset cause» is: \000\000\000......\000
The connection did not lose when we're connecting from local network.
I don't think it's related to your code, but I do have several questions:
1. What is the network speed between client and the server? Are there any packets lost for other applications? What's the size of the message sent from client?
2. How long is it between the RST received and the handshake finishes (server) or message sent (client)?
3. Do you know if there are any firewalls between the client and server? You also said it worked well on LAN. The China GFW often does so.
I found the solution. The problem was that, provider changed tariff plan without any notice. New tariff plan limited the maximum number of connections.