I am new to akka and am trying to get a sample to work between multiple JVMs. I am working from the example shown here but am encountering a problem. At the end of the log below, you can see that there in an AssociationError, saying connection to 127.0.0.1:2552 was refused.
Earlier in the log (15:35:24.543), I see that remoting was started, and is listening on my LAN address (which I have shown as WWW.XXX.YYY.ZZZ), rather than on localhost.
15:35:24.307 [RandomOrgSystem-akka.actor.default-dispatcher-2] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
15:35:24.319 [RandomOrgSystem-akka.actor.default-dispatcher-2] DEBUG akka.event.EventStream - logger log1-Slf4jLogger started
15:35:24.320 [RandomOrgSystem-akka.actor.default-dispatcher-2] DEBUG a.a.LocalActorRefProvider$SystemGuardian - now supervising Actor[akka://RandomOrgSystem/system/UnhandledMessageForwarder#883086044]
15:35:24.320 [RandomOrgSystem-akka.actor.default-dispatcher-2] DEBUG akka.event.EventStream - Default Loggers started
15:35:24.320 [RandomOrgSystem-akka.actor.default-dispatcher-2] DEBUG a.e.LoggingBus$$anonfun$startDefaultLoggers$2$$anon$1 - started (akka.event.LoggingBus$$anonfun$startDefaultLoggers$2$$anon$1#45de530a)
15:35:24.325 [RandomOrgSystem-akka.actor.default-dispatcher-4] DEBUG a.a.LocalActorRefProvider$SystemGuardian - now supervising Actor[akka://RandomOrgSystem/system/remoting-terminator#-1650950485]
15:35:24.339 [RandomOrgSystem-akka.actor.default-dispatcher-3] DEBUG a.r.RemoteActorRefProvider$RemotingTerminator - started (akka.remote.RemoteActorRefProvider$RemotingTerminator#4f27077b)
15:35:24.355 [RandomOrgSystem-akka.actor.default-dispatcher-2] DEBUG a.a.LocalActorRefProvider$SystemGuardian - now supervising Actor[akka://RandomOrgSystem/system/transports#300935452]
15:35:24.356 [RandomOrgSystem-akka.actor.default-dispatcher-3] DEBUG a.r.Remoting$TransportSupervisor - started (akka.remote.Remoting$TransportSupervisor#2cf5006)
15:35:24.359 [RandomOrgSystem-akka.actor.default-dispatcher-3] INFO Remoting - Starting remoting
15:35:24.364 [RandomOrgSystem-akka.actor.default-dispatcher-2] DEBUG a.a.LocalActorRefProvider$SystemGuardian - now supervising Actor[akka://RandomOrgSystem/system/endpointManager#-594071077]
15:35:24.364 [RandomOrgSystem-akka.actor.default-dispatcher-2] DEBUG a.r.RemoteActorRefProvider$RemotingTerminator - now monitoring Actor[akka://RandomOrgSystem/system]
15:35:24.373 [RandomOrgSystem-akka.actor.default-dispatcher-2] DEBUG akka.remote.EndpointManager - started (akka.remote.EndpointManager#7ad99f4a)
15:35:24.534 [RandomOrgSystem-akka.actor.default-dispatcher-3] DEBUG a.r.Remoting$TransportSupervisor - now supervising Actor[akka://RandomOrgSystem/system/transports/akkaprotocolmanager.tcp0#-2000194523]
15:35:24.537 [RandomOrgSystem-akka.actor.default-dispatcher-3] DEBUG a.r.transport.AkkaProtocolManager - started (akka.remote.transport.AkkaProtocolManager#749cd006)
15:35:24.543 [RandomOrgSystem-akka.actor.default-dispatcher-5] INFO Remoting - Remoting started; listening on addresses :[akka.tcp://RandomOrgSystem#WWW.XXX.YYY.ZZZ:2552]
15:35:24.545 [RandomOrgSystem-akka.actor.default-dispatcher-2] INFO Remoting - Remoting now listens on addresses: [akka.tcp://RandomOrgSystem#WWW.XXX.YYY.ZZZ:2552]
15:35:24.548 [RandomOrgSystem-akka.actor.default-dispatcher-2] DEBUG a.a.LocalActorRefProvider$SystemGuardian - now supervising Actor[akka://RandomOrgSystem/system/remote-watcher#-245042739]
15:35:24.549 [RandomOrgSystem-akka.actor.default-dispatcher-2] DEBUG a.a.LocalActorRefProvider$SystemGuardian - now supervising Actor[akka://RandomOrgSystem/system/remote-deployment-watcher#1846115901]
15:35:24.550 [RandomOrgSystem-akka.actor.default-dispatcher-3] DEBUG akka.remote.RemoteDeploymentWatcher - started (akka.remote.RemoteDeploymentWatcher#730a4a32)
15:35:24.550 [RandomOrgSystem-akka.actor.default-dispatcher-4] DEBUG a.a.LocalActorRefProvider$SystemGuardian - now supervising Actor[akka://RandomOrgSystem/system/deadLetterListener#1544852868]
15:35:24.552 [RandomOrgSystem-akka.actor.default-dispatcher-4] DEBUG akka.event.DeadLetterListener - started (akka.event.DeadLetterListener#79a422d9)
15:35:24.555 [RandomOrgSystem-akka.actor.default-dispatcher-5] DEBUG akka.remote.RemoteWatcher - started (akka.remote.RemoteWatcher#4aa0560e)
15:35:24.555 [RandomOrgSystem-akka.actor.default-dispatcher-5] INFO akka.actor.ActorSystemImpl - Started
15:35:24.559 [RandomOrgSystem-akka.actor.default-dispatcher-5] DEBUG a.a.LocalActorRefProvider$Guardian - now supervising Actor[akka://RandomOrgSystem/user/buffer#-400157724]
15:35:24.570 [RandomOrgSystem-akka.actor.default-dispatcher-5] DEBUG RemoteActorRefProvider - [akka://RandomOrgSystem/] Instantiating Remote Actor [akka.tcp://RandomOrgSystem#127.0.0.1:2552/remote/akka.tcp/RandomOrgSystem#WWW.XXX.YYY.ZZZ:2552/user/buffer/client]
15:35:24.581 [RandomOrgSystem-akka.actor.default-dispatcher-3] DEBUG akka.remote.RemoteWatcher - Watching: [akka://RandomOrgSystem/system/remote-deployment-watcher -> akka.tcp://RandomOrgSystem#127.0.0.1:2552/remote/akka.tcp/RandomOrgSystem#WWW.XXX.YYY.ZZZ:2552/user/buffer/client]
15:35:24.582 [RandomOrgSystem-akka.actor.default-dispatcher-3] DEBUG c.b.n.akka.demo.RandomOrgBuffer - started (com.blogspot.nurkiewicz.akka.demo.RandomOrgBuffer#3012db7c)
15:35:24.584 [RandomOrgSystem-akka.actor.default-dispatcher-3] DEBUG akka.remote.EndpointManager - now supervising Actor[akka://RandomOrgSystem/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FRandomOrgSystem%40127.0.0.1%3A2552-0#-126941538]
15:35:24.594 [RandomOrgSystem-akka.actor.default-dispatcher-5] DEBUG a.remote.ReliableDeliverySupervisor - started (akka.remote.ReliableDeliverySupervisor#b8235a1)
15:35:24.594 [RandomOrgSystem-akka.actor.default-dispatcher-5] DEBUG a.remote.ReliableDeliverySupervisor - now monitoring Actor[akka://RandomOrgSystem/system/endpointManager#-594071077]
15:35:24.594 [RandomOrgSystem-akka.actor.default-dispatcher-5] DEBUG a.remote.ReliableDeliverySupervisor - now supervising Actor[akka://RandomOrgSystem/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FRandomOrgSystem%40127.0.0.1%3A2552-0/endpointWriter#118183353]
15:35:24.606 [RandomOrgSystem-akka.actor.default-dispatcher-5] DEBUG akka.remote.EndpointWriter - started (akka.remote.EndpointWriter#4b069693)
15:35:24.606 [RandomOrgSystem-akka.actor.default-dispatcher-5] DEBUG akka.remote.EndpointWriter - now monitoring Actor[akka://RandomOrgSystem/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FRandomOrgSystem%40127.0.0.1%3A2552-0#-126941538]
15:35:24.616 [RandomOrgSystem-akka.actor.default-dispatcher-3] DEBUG a.r.transport.AkkaProtocolManager - now supervising Actor[akka://RandomOrgSystem/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FRandomOrgSystem%40127.0.0.1%3A2552-1#-1464884820]
15:35:24.631 [RandomOrgSystem-akka.actor.default-dispatcher-4] DEBUG a.r.transport.ProtocolStateActor - started (akka.remote.transport.ProtocolStateActor#620b5b80)
15:35:24.656 [RandomOrgSystem-akka.actor.default-dispatcher-5] ERROR akka.remote.EndpointWriter - AssociationError [akka.tcp://RandomOrgSystem#WWW.XXX.YYY.ZZZ:2552] -> [akka.tcp://RandomOrgSystem#127.0.0.1:2552]: Error [Association failed with [akka.tcp://RandomOrgSystem#127.0.0.1:2552]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://RandomOrgSystem#127.0.0.1:2552]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /127.0.0.1:2552
]
I'd like it if the remoting started on the localhost address so that I don't have to reconfigure when I change IP address. My application.conf contains:
akka {
log-config-on-start = off
loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = "DEBUG"
actor {
provider = "akka.remote.RemoteActorRefProvider"
debug {
receive = on
autoreceive = on
lifecycle = on
unhandled = on
}
deployment {
/buffer/client {
remote = "akka.tcp://RandomOrgSystem#127.0.0.1:2552"
}
}
}
remote {
transport = "akka.remote.netty.NettyRemoteTransport"
log-sent-messages = on
netty {
hostname = "127.0.0.1"
}
}
}
Thanks,
Tom
Try changing your remote section to:
remote {
transport = "akka.remote.netty.NettyRemoteTransport"
log-sent-messages = on
netty.tcp {
hostname = "127.0.0.1"
}
}
Because the config is wrong and you are not specifying the hostname property, it defaults to using the host ip. Setting up the config as above will allow you to use the loopback instead.
Related
I am trying to insttall zookeeper in my Windows. I am getting the error bellow no matter which suggestion I followed in zookeeper + Kafka - Unable to create data directory.
I am running it as Administrator and I have tried all these options:
#dataDir=/tmp/zookeeper
#dataDir=:\zookeeper-3.4.14\
#dataDir=C:\\_d\\WSs\\kafka\\zookeeper-3.4.14\\data
#dataDir=:\\\\zookeeper\\\\data
dataDir=C:\\_d\\WSs\\kafka\\zookeeper-3.4.14
I donĀ“t think it is relevant but let me add here: I have Java 11.
Any idea why it is happening will be appreciated.
Full logs
C:\Windows\system32>zkserver
C:\Windows\system32>call "C:\Program Files\Java\jdk-11.0.2"\bin\java "-Dzookeeper.log.dir=C:\_d\WSs\kafka\zookeeper-3.4.14\bin\.." "-Dzookeeper.root.logger=INFO,CONSOLE" -cp "C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\build\classes;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\build\lib\*;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\*;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\lib\*;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\conf" org.apache.zookeeper.server.quorum.QuorumPeerMain "C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\conf\zoo.cfg"
2019-04-18 15:17:42,629 [myid:] - INFO [main:QuorumPeerConfig#136] - Reading configuration from: C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\conf\zoo.cfg
2019-04-18 15:17:42,644 [myid:] - INFO [main:DatadirCleanupManager#78] - autopurge.snapRetainCount set to 3
2019-04-18 15:17:42,644 [myid:] - INFO [main:DatadirCleanupManager#79] - autopurge.purgeInterval set to 0
2019-04-18 15:17:42,644 [myid:] - INFO [main:DatadirCleanupManager#101] - Purge task is not scheduled.
2019-04-18 15:17:42,644 [myid:] - WARN [main:QuorumPeerMain#116] - Either no config or no quorum defined in config, running in standalone mode
2019-04-18 15:17:42,769 [myid:] - INFO [main:QuorumPeerConfig#136] - Reading configuration from: C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\conf\zoo.cfg
2019-04-18 15:17:42,769 [myid:] - INFO [main:ZooKeeperServerMain#98] - Starting server
2019-04-18 15:17:47,344 [myid:] - INFO [main:Environment#100] - Server environment:zookeeper.version=3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
2019-04-18 15:17:47,344 [myid:] - INFO [main:Environment#100] - Server environment:host.name=DESKTOP-AKCNE7F
2019-04-18 15:17:47,344 [myid:] - INFO [main:Environment#100] - Server environment:java.version=11.0.2
2019-04-18 15:17:47,344 [myid:] - INFO [main:Environment#100] - Server environment:java.vendor=Oracle Corporation
2019-04-18 15:17:47,344 [myid:] - INFO [main:Environment#100] - Server environment:java.home=C:\Program Files\Java\jdk-11.0.2
2019-04-18 15:17:47,344 [myid:] - INFO [main:Environment#100] - Server environment:java.class.path=C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\build\classes;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\build\lib\*;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\zookeeper-3.4.14.jar;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\lib\audience-annotations-0.5.0.jar;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\lib\jline-0.9.94.jar;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\lib\log4j-1.2.17.jar;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\lib\netty-3.10.6.Final.jar;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\lib\slf4j-api-1.7.25.jar;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\lib\slf4j-log4j12-1.7.25.jar;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\conf
2019-04-18 15:17:47,344 [myid:] - INFO [main:Environment#100] - Server environment:java.library.path=C:\Program Files\Java\jdk-11.0.2\bin;C:\Windows\Sun\Java\bin;C:\Windows\system32;C:\Windows;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\iCLS\;C:\Program Files\Intel\Intel(R) Management Engine Components\iCLS\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\DAL;C:\Program Files\Intel\Intel(R) Management Engine Components\DAL;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\IPT;C:\Program Files\Intel\Intel(R) Management Engine Components\IPT;C:\Program Files\Java\jdk-11.0.2\bin;C:\Program Files\Git\cmd;C:\ProgramData\chocolatey\bin;C:\_d\tools\apache-maven-3.6.0\bin;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\;C:\Users\jimis\AppData\Local\Programs\Python\Python37-32\Scripts\;C:\Users\jimis\AppData\Local\Programs\Python\Python37-32\;C:\Users\jimis\AppData\Local\Microsoft\WindowsApps;.
2019-04-18 15:17:47,360 [myid:] - INFO [main:Environment#100] - Server environment:java.io.tmpdir=C:\Users\jimis\AppData\Local\Temp\
2019-04-18 15:17:47,360 [myid:] - INFO [main:Environment#100] - Server environment:java.compiler=<NA>
2019-04-18 15:17:47,360 [myid:] - INFO [main:Environment#100] - Server environment:os.name=Windows 10
2019-04-18 15:17:47,360 [myid:] - INFO [main:Environment#100] - Server environment:os.arch=amd64
2019-04-18 15:17:47,376 [myid:] - INFO [main:Environment#100] - Server environment:os.version=10.0
2019-04-18 15:17:47,376 [myid:] - INFO [main:Environment#100] - Server environment:user.name=jimis
2019-04-18 15:17:47,376 [myid:] - INFO [main:Environment#100] - Server environment:user.home=C:\Users\jimis
2019-04-18 15:17:47,376 [myid:] - INFO [main:Environment#100] - Server environment:user.dir=C:\Windows\system32
2019-04-18 15:17:47,391 [myid:] - INFO [main:ZooKeeperServer#836] - tickTime set to 2000
2019-04-18 15:17:47,391 [myid:] - INFO [main:ZooKeeperServer#845] - minSessionTimeout set to -1
2019-04-18 15:17:47,391 [myid:] - INFO [main:ZooKeeperServer#854] - maxSessionTimeout set to -1
2019-04-18 15:17:47,782 [myid:] - INFO [main:ServerCnxnFactory#117] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2019-04-18 15:17:47,797 [myid:] - INFO [main:NIOServerCnxnFactory#89] - binding to port 0.0.0.0/0.0.0.0:2181
2019-04-18 15:18:00,365 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /127.0.0.1:54057
2019-04-18 15:18:00,375 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /127.0.0.1:54058
2019-04-18 15:18:00,378 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#383] - Exception causing close of session 0x0: Len error 1195725856
2019-04-18 15:18:00,379 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1056] - Closed socket connection for client /127.0.0.1:54057 (no session established for client)
*** Edited
*** edited
*** The answer to my question is "you can ignore the fact that I get an error while curl 127.0.0.1:port. Kafka is working anyway.
Are you trying to do a "HTTP GET" against the zookeeper client port?
So the error comes from NIOServerCnxn.java:readLength which is expecting either a 4-letter command or buffer where the first 4 bytes represent size.
The number 1195725856 in hex is 0x47455420 which is "GET " in ASCII.
So the error message is caused when you try to do a HTTP GET" against the 2181 port.
$ curl http://0.0.0.0:2181/
curl: (52) Empty reply from server
$ sudo tail /var/log/zookeeper/zookeeper.out
...
2019-04-19 12:56:25,303 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#215] - Accepted
2019-04-19 12:56:25,304 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#383] - Exception causing close of session 0x0: Len error 1195725856
2019-04-19 12:56:25,304 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1040] - Closed socket connection for client /127.0.0.1:33011 (no session established for client)
This WARN message is safe to ignore since ZooKeeper will just close that client session which is implied by the curl response.
Problem Description
2 computers(203,204)
created a Standalone mode HA Flink v1.6.1 cluster
both run jobmanager and taskmanager(2 task slots) on every computer
After I start a job (examples SocketWindowWordCount.jar ./flink run ../examples/streaming/SocketWindowWordCount.jar --hostname 10.1.2.9 --port 9000) on the JobManager node, I kill the working TaskManager instance.
Web Dashboard I can see the job being cancelled and then failed. Web Dashboard image
flink-conf.yaml
state.backend: filesystem
state.checkpoints.dir: hdfs://10.1.2.109:8020/wulin/flink-checkpoints
rest.port: 9081
blob.server.port: 6124
query.server.port: 6125
web.tmpdir: /home/flink/deploy/webTmp
web.log.path: /home/flink/deploy/log
io.tmp.dirs: /home/flink/deploy/taskManagerTmp
high-availability: zookeeper
high-availability.zookeeper.quorum: 10.0.1.79:2181
high-availability.zookeeper.path.root: /flink
high-availability.cluster-id: flink
high-availability.storageDir: hdfs://10.1.2.109:8020/wulin
security.kerberos.login.principal: xxxx
security.kerberos.login.keytab: /home/ctu/flink/flink-1.6/conf/user.keytab
full logs
log-standalonesession-203
log-taskexecutor-203
log-standalonesession-204
exception
kill working TM, get the excpetion like this
2018-12-28 11:04:27,877 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#hz203:42861] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#hz203:42861]] Caused by: [Connection refused: hz203/10.0.0.203:42861]
2018-12-28 11:04:28,660 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: Connection refused: hz203/10.0.0.203:42861
2018-12-28 11:04:28,660 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#hz203:42861] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#hz203:42861]] Caused by: [Connection refused: hz203/10.0.0.203:42861]
2018-12-28 11:04:28,678 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - The heartbeat of TaskManager with id 0f41bca09600cd25000e19801076fa1f timed out.
2018-12-28 11:04:28,678 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Closing TaskExecutor connection 0f41bca09600cd25000e19801076fa1f because: The heartbeat of TaskManager with id 0f41bca09600cd25000e19801076fa1f timed out.
2018-12-28 11:04:28,678 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Unregister TaskManager dcf3bb5b7ed2208cf45b658d212fd8d2 from the SlotManager.
2018-12-28 11:04:28,678 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Socket Stream -> Flat Map (1/1) (88aa62ad152f4df6b39a969dd32c0249) switched from RUNNING to FAILED.
org.apache.flink.util.FlinkException: The assigned slot 0f41bca09600cd25000e19801076fa1f_0 was removed.
at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:786)
at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:756)
at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:948)
at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:372)
at org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:803)
at org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener$1.run(ResourceManager.java:1116)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-12-28 11:04:28,680 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Socket Window WordCount (61f55876e79934d515c163d095d706a6) switched from state RUNNING to FAILING.
submit job
run ./bin/flink run -d ./examples/streaming/SocketWindowWordCount.jar --port 9000 --hostname 10.1.2.9, get the JM logs like this
2018-12-28 19:20:01,354 INFO org.apache.flink.runtime.jobmaster.JobMaster - Starting execution of job Socket Window WordCount (5cdb91c15ee12ec6e74256eed10b5291)
2018-12-28 19:20:01,354 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Socket Window WordCount (5cdb91c15ee12ec6e74256eed10b5291) switched from state CREATED to RUNNING.
2018-12-28 19:20:01,356 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Socket Stream -> Flat Map (1/1) (e30439b9f548c6013d8b8689e30d0dd7) switched from CREATED to SCHEDULED.
2018-12-28 19:20:01,359 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (102d04f5aa6fc50cfe5088e20902c72e) switched from CREATED to SCHEDULED.
2018-12-28 19:20:01,364 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{e33a40832a3922897470fb76bcf76b29}]
2018-12-28 19:20:01,367 INFO org.apache.flink.runtime.jobmaster.JobMaster - Connecting to ResourceManager akka.tcp://flink#hz203:46596/user/resourcemanager(b22f96303e74df23645fe4567f884b9e)
2018-12-28 19:20:01,370 INFO org.apache.flink.runtime.jobmaster.JobMaster - Resolved ResourceManager address, beginning registration
2018-12-28 19:20:01,370 INFO org.apache.flink.runtime.jobmaster.JobMaster - Registration at ResourceManager attempt 1 (timeout=100ms)
2018-12-28 19:20:01,371 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/5cdb91c15ee12ec6e74256eed10b5291/job_manager_lock.
2018-12-28 19:20:01,371 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Registering job manager 9a31e8b4e8dfbf7b31d6ed3d227648b6#akka.tcp://flink#hz203:46596/user/jobmanager_0 for job 5cdb91c15ee12ec6e74256eed10b5291.
2018-12-28 19:20:01,431 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Registered job manager 9a31e8b4e8dfbf7b31d6ed3d227648b6#akka.tcp://flink#hz203:46596/user/jobmanager_0 for job 5cdb91c15ee12ec6e74256eed10b5291.
2018-12-28 19:20:01,432 INFO org.apache.flink.runtime.jobmaster.JobMaster - JobManager successfully registered at ResourceManager, leader id: b22f96303e74df23645fe4567f884b9e.
2018-12-28 19:20:01,433 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Requesting new slot [SlotRequestId{e33a40832a3922897470fb76bcf76b29}] and profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager.
2018-12-28 19:20:01,434 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Request slot with profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} for job 5cdb91c15ee12ec6e74256eed10b5291 with allocation id AllocationID{f7a24e609e2ec618ccb456076049fa3b}.
2018-12-28 19:20:01,510 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Socket Stream -> Flat Map (1/1) (e30439b9f548c6013d8b8689e30d0dd7) switched from SCHEDULED to DEPLOYING.
2018-12-28 19:20:01,511 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Source: Socket Stream -> Flat Map (1/1) (attempt #0) to hz203
2018-12-28 19:20:01,515 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (102d04f5aa6fc50cfe5088e20902c72e) switched from SCHEDULED to DEPLOYING.
2018-12-28 19:20:01,515 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (attempt #0) to hz203
2018-12-28 19:20:01,674 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (102d04f5aa6fc50cfe5088e20902c72e) switched from DEPLOYING to RUNNING.
2018-12-28 19:20:01,708 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Socket Stream -> Flat Map (1/1) (e30439b9f548c6013d8b8689e30d0dd7) switched from DEPLOYING to RUNNING.
2018-12-28 19:20:43,267 INFO org.apache.flink.runtime.blob.BlobClient - Downloading null/t-61808afb630553305c73a0a23f9231ffd6b2b448-513fbe1e6ddf69d10689eccf4c65da97 from hz203/10.0.0.203:6124
2018-12-28 19:20:48,339 INFO org.apache.flink.runtime.blob.BlobClient - Downloading null/t-dd915bb9821ff6ced34dd5e489966b674de5a48f-7ea2600930e5fc5a4fbb7d47ee198789 from hz203/10.0.0.203:6124
2018-12-28 19:20:52,623 INFO org.apache.flink.runtime.blob.BlobClient - Downloading null/t-61808afb630553305c73a0a23f9231ffd6b2b448-0bd1ab86fa4cc54daeb472079bfbea8c from hz203/10.0.0.203:6124
kill TM
Body is limited to 30000 characters. please read this JM logs when kill TM
The logs indicate that your RestartStrategy has depleted its restart attempts or that no RestartStrategy has been configured. Please check whether you specified a RestartStrategy in your program via env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 0L)) or in flink-conf.yaml via restart-strategy: fixed-delay. If you want to learn more about Flink's restart strategies check out the documentation.
Im trying to connect from an basic Milo-Client (ReadSample), but getting UnresolvedAdressException. Both Client and Server are in an remote Network and I only got Access to the Client. I'm pretty sure its not a Firewall since I can Connect with other Clients (Prosys OPC UA Client) and i can see that the ip is resolved to an Host-Name in the Logs:
Server is opc.tcp://192.168.115.40:49580 aka opc.tcp://Extern-Mess-Rec:49580 (tried both in UaTcpStackClient.getEndpoints(url).get();)
13:24:51.530 [main] DEBUG
io.netty.util.internal.logging.InternalLoggerFactory - Using SLF4J as
the default logging framework 13:24:51.546 [main] DEBUG
io.netty.channel.MultithreadEventLoopGroup -
-Dio.netty.eventLoopThreads: 8 13:24:51.561 [main] DEBUG io.netty.util.internal.PlatformDependent0 - java.nio.Buffer.address:
available 13:24:51.561 [main] DEBUG
io.netty.util.internal.PlatformDependent0 - sun.misc.Unsafe.theUnsafe:
available 13:24:51.561 [main] DEBUG
io.netty.util.internal.PlatformDependent0 -
sun.misc.Unsafe.copyMemory: available 13:24:51.561 [main] DEBUG
io.netty.util.internal.PlatformDependent0 - java.nio.Bits.unaligned:
true 13:24:51.561 [main] DEBUG
io.netty.util.internal.PlatformDependent - Platform: Windows
13:24:51.561 [main] DEBUG io.netty.util.internal.PlatformDependent -
Java version: 8 13:24:51.561 [main] DEBUG
io.netty.util.internal.PlatformDependent - -Dio.netty.noUnsafe: false
13:24:51.561 [main] DEBUG io.netty.util.internal.PlatformDependent -
sun.misc.Unsafe: available 13:24:51.561 [main] DEBUG
io.netty.util.internal.PlatformDependent - -Dio.netty.noJavassist:
false 13:24:51.686 [main] DEBUG
io.netty.util.internal.PlatformDependent - Javassist: available
13:24:51.686 [main] DEBUG io.netty.util.internal.PlatformDependent -
-Dio.netty.tmpdir: C:\Users\SOFTWA~1\AppData\Local\Temp\3 (java.io.tmpdir) 13:24:51.686 [main] DEBUG
io.netty.util.internal.PlatformDependent - -Dio.netty.bitMode: 64
(sun.arch.data.model) 13:24:51.686 [main] DEBUG
io.netty.util.internal.PlatformDependent - -Dio.netty.noPreferDirect:
false 13:24:51.718 [main] DEBUG io.netty.channel.nio.NioEventLoop -
-Dio.netty.noKeySetOptimization: false 13:24:51.718 [main] DEBUG io.netty.channel.nio.NioEventLoop -
-Dio.netty.selectorAutoRebuildThreshold: 512 13:24:51.858 [main] DEBUG io.netty.util.ResourceLeakDetector - -Dio.netty.leakDetection.level:
simple 13:24:51.858 [main] DEBUG io.netty.util.ResourceLeakDetector -
-Dio.netty.leakDetection.maxRecords: 4 13:24:52.264 [main] DEBUG io.netty.buffer.PooledByteBufAllocator -
-Dio.netty.allocator.numHeapArenas: 8 13:24:52.264 [main] DEBUG io.netty.buffer.PooledByteBufAllocator -
-Dio.netty.allocator.numDirectArenas: 8 13:24:52.264 [main] DEBUG io.netty.buffer.PooledByteBufAllocator -
-Dio.netty.allocator.pageSize: 8192 13:24:52.264 [main] DEBUG io.netty.buffer.PooledByteBufAllocator -
-Dio.netty.allocator.maxOrder: 11 13:24:52.264 [main] DEBUG io.netty.buffer.PooledByteBufAllocator -
-Dio.netty.allocator.chunkSize: 16777216 13:24:52.264 [main] DEBUG io.netty.buffer.PooledByteBufAllocator -
-Dio.netty.allocator.tinyCacheSize: 512 13:24:52.264 [main] DEBUG io.netty.buffer.PooledByteBufAllocator -
-Dio.netty.allocator.smallCacheSize: 256 13:24:52.264 [main] DEBUG io.netty.buffer.PooledByteBufAllocator -
-Dio.netty.allocator.normalCacheSize: 64 13:24:52.264 [main] DEBUG io.netty.buffer.PooledByteBufAllocator -
-Dio.netty.allocator.maxCachedBufferCapacity: 32768 13:24:52.264 [main] DEBUG io.netty.buffer.PooledByteBufAllocator -
-Dio.netty.allocator.cacheTrimInterval: 8192 13:24:52.296 [main] DEBUG io.netty.util.internal.ThreadLocalRandom -
-Dio.netty.initialSeedUniquifier: 0x35f32988e43eab85 (took 10 ms) 13:24:52.327 [main] DEBUG io.netty.buffer.ByteBufUtil -
-Dio.netty.allocator.type: unpooled 13:24:52.327 [main] DEBUG io.netty.buffer.ByteBufUtil - -Dio.netty.threadLocalDirectBufferSize:
65536 13:24:52.327 [main] DEBUG io.netty.buffer.ByteBufUtil -
-Dio.netty.maxThreadLocalCharBufferSize: 16384 13:24:52.358 [ua-netty-event-loop-0] DEBUG
io.netty.util.internal.JavassistTypeParameterMatcherGenerator -
Generated:
io.netty.util.internal.matchers.org.eclipse.milo.opcua.stack.client.handlers.UaRequestFutureMatcher 13:24:52.389 [ua-netty-event-loop-0] DEBUG
io.netty.buffer.AbstractByteBuf -
-Dio.netty.buffer.bytebuf.checkAccessible: true 13:24:52.858 [ua-netty-event-loop-0] DEBUG io.netty.util.Recycler -
-Dio.netty.recycler.maxCapacity.default: 262144 13:24:52.890 [ua-netty-event-loop-0] DEBUG
org.eclipse.milo.opcua.stack.client.handlers.UaTcpClientAcknowledgeHandler
- Sent Hello message on channel=[id: 0xa0ec7fec, L:/130.83.225.169:58872 - R:/192.168.115.40:49580]. 13:24:52.905
[ua-netty-event-loop-0] DEBUG
org.eclipse.milo.opcua.stack.client.handlers.UaTcpClientAcknowledgeHandler
- Received Acknowledge message on channel=[id: 0xa0ec7fec, L:/130.83.225.169:58872 - R:/192.168.115.40:49580]. 13:24:52.921
[ua-netty-event-loop-0] DEBUG
org.eclipse.milo.opcua.stack.client.handlers.UaTcpClientMessageHandler
- OpenSecureChannel timeout scheduled for +5s 13:24:52.967 [ua-netty-event-loop-0] DEBUG
org.eclipse.milo.opcua.stack.client.handlers.UaTcpClientMessageHandler
- OpenSecureChannel timeout canceled 13:24:52.967 [ua-shared-pool-0] DEBUG
org.eclipse.milo.opcua.stack.client.handlers.UaTcpClientMessageHandler
- Sent OpenSecureChannelRequest (Issue, id=0, currentToken=-1, previousToken=-1). 13:24:52.999 [ua-shared-pool-1] DEBUG
org.eclipse.milo.opcua.stack.client.handlers.UaTcpClientMessageHandler
- Received OpenSecureChannelResponse. 13:24:52.999 [ua-shared-pool-1] DEBUG
org.eclipse.milo.opcua.stack.client.handlers.UaTcpClientMessageHandler
- SecureChannel id=1140, currentTokenId=1, previousTokenId=-1, lifetime=3600000ms, createdAt=DateTime{utcTime=131384570808248472,
javaDate=Fri May 05 13:24:40 CEST 2017} 13:24:52.999
[ua-netty-event-loop-0] DEBUG
org.eclipse.milo.opcua.stack.client.handlers.UaTcpClientMessageHandler
- 0 message(s) queued before handshake completed; sending now. 13:24:52.999 [ForkJoinPool.commonPool-worker-1] DEBUG
org.eclipse.milo.opcua.stack.client.ClientChannelManager - Channel
bootstrap succeeded: localAddress=/130.83.225.169:58872,
remoteAddress=/192.168.115.40:49580 13:24:53.061
[ForkJoinPool.commonPool-worker-1] DEBUG
org.eclipse.milo.opcua.stack.client.ClientChannelManager - Sending
CloseSecureChannelRequest... 13:24:53.061 [main] INFO
org.eclipse.milo.examples.client.ClientExampleRunner - Using endpoint:
opc.tcp://Extern-Mess-Rec:49580 [None] 13:24:53.077
[ua-netty-event-loop-0] DEBUG
org.eclipse.milo.opcua.stack.client.ClientChannelManager -
channelInactive(), disconnect complete 13:24:53.077
[ua-netty-event-loop-0] DEBUG
org.eclipse.milo.opcua.stack.client.ClientChannelManager - disconnect
complete, state set to Idle 13:24:53.124 [main] DEBUG
org.eclipse.milo.opcua.sdk.client.OpcUaClient - Added
ServiceFaultListener:
org.eclipse.milo.opcua.sdk.client.ClientSessionManager$$Lambda$1049/664457955#58134517
13:24:53.171 [main] DEBUG
org.eclipse.milo.opcua.sdk.client.OpcUaClient - Added
SessionActivityListener:
org.eclipse.milo.opcua.sdk.client.subscriptions.OpcUaSubscriptionManager$1#2d2e5f00
13:24:55.592 [ForkJoinPool.commonPool-worker-1] DEBUG
org.eclipse.milo.opcua.stack.client.ClientChannelManager - Channel
bootstrap failed: null java.nio.channels.UnresolvedAddressException:
null
at sun.nio.ch.Net.checkAddress(Net.java:101)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207)
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1279)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:453)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:439)
at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:453)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:439)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:421)
at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:1024)
at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203)
at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:167)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:374)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at java.lang.Thread.run(Thread.java:745) 13:24:55.608 [main] ERROR org.eclipse.milo.examples.client.ClientExampleRunner - Error
running client example: java.nio.channels.UnresolvedAddressException
java.util.concurrent.ExecutionException:
java.nio.channels.UnresolvedAddressException
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at org.eclipse.milo.examples.client.ReadExample.run(ReadExample.java:43)
at org.eclipse.milo.examples.client.ClientExampleRunner.run(ClientExampleRunner.java:106)
at org.eclipse.milo.examples.client.ReadExample.main(ReadExample.java:35)
Caused by: java.nio.channels.UnresolvedAddressException: null
at sun.nio.ch.Net.checkAddress(Net.java:101)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207)
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1279)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:453)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:439)
at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:453)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:439)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:421)
at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:1024)
at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203)
at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:167)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:374)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at java.lang.Thread.run(Thread.java:745)
The server you're getting endpoints from is probably returning "Extern-Mess-Rec" as its hostname, which you can't resolve.
See this answer for how to deal with that scenario.
I'm running Kafka (0.10.0.0) in a Docker on a Mac (w/docker-machine). I derived my Dockerfile from Spotify's, which means Kafka and Zookeeper run in the same image.
My instance starts cleanly and poking around inside it appears everything is normal/OK.
Docker maps ports 2181 and 9092 to high-ports 32822 and 32820 in this case. From outside my running Kafka Docker I am able to successfully telnet 192.168.99.100 32822 (where 192.168.99.100 is the IP of my docker-machine). From there I can issue a zookeeper command and get expected output.
It all seems so encouraging, but... I then try this code:
val numPartitions = 4
val replicationFactor = 1
val topicConfig = new java.util.Properties
// zookeeper = "192.168.99.100:32822"
val zkClient = ZkUtils(zookeeper, 10000, 10000, false)
try {
AdminUtils.createTopic(zkClient, topic, numPartitions, replicationFactor, topicConfig)
} catch {
case k: kafka.common.TopicExistsException => // do nothing...topic exists
}
zkClient.close()
This produces this error output:
DEBUG ZkConnection - Creating new ZookKeeper instance to connect to 192.168.99.100:32822.
INFO ZkEventThread - Starting ZkClient event thread.
INFO ZooKeeper - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
INFO ZooKeeper - Client environment:host.name=172.25.42.82
INFO ZooKeeper - Client environment:java.version=1.8.0_60
INFO ZooKeeper - Client environment:java.vendor=Oracle Corporation
INFO ZooKeeper - Client environment:java.home=/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre
INFO ZooKeeper - Client environment:java.class.path=/usr/local/Cellar/sbt/0.13.11/libexec/sbt-launch.jar
INFO ZooKeeper - Client environment:java.library.path=/Users/wmy965/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
INFO ZooKeeper - Client environment:java.io.tmpdir=/var/folders/ph/ccz4n1qs62n0bn8mqdg94gswt1jlwk/T/
INFO ZooKeeper - Client environment:java.compiler=<NA>
INFO ZooKeeper - Client environment:os.name=Mac OS X
INFO ZooKeeper - Client environment:os.arch=x86_64
INFO ZooKeeper - Client environment:os.version=10.11.5
INFO ZooKeeper - Client environment:user.name=wmy965
INFO ZooKeeper - Client environment:user.home=/Users/wmy965
INFO ZooKeeper - Client environment:user.dir=/Users/wmy965/git/LateKafka
INFO ZooKeeper - Initiating client connection, connectString=192.168.99.100:32822 sessionTimeout=10000 watcher=org.I0Itec.zkclient.ZkClient#55397e3
DEBUG ClientCnxn - zookeeper.disableAutoWatchReset is false
DEBUG ZkClient - Awaiting connection to Zookeeper server
INFO ZkClient - Waiting for keeper state SyncConnected
INFO ClientCnxn - Opening socket connection to server 192.168.99.100/192.168.99.100:32822. Will not attempt to authenticate using SASL (unknown error)
WARN ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
DEBUG ClientCnxnSocketNIO - Ignoring exception during shutdown input
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:780)
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:399)
at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:200)
at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1185)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1110)
DEBUG ClientCnxnSocketNIO - Ignoring exception during shutdown output
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:797)
at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:407)
at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:207)
at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1185)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1110)
INFO ClientCnxn - Opening socket connection to server 192.168.99.100/192.168.99.100:32822. Will not attempt to authenticate using SASL (unknown error)
INFO ClientCnxn - Socket connection established to 192.168.99.100/192.168.99.100:32822, initiating session
DEBUG ClientCnxn - Session establishment request sent on 192.168.99.100/192.168.99.100:32822
INFO ClientCnxn - Session establishment complete on server 192.168.99.100/192.168.99.100:32822, sessionid = 0x155225c51720000, negotiated timeout = 10000
DEBUG ZkClient - Received event: WatchedEvent state:SyncConnected type:None path:null
INFO ZkClient - zookeeper state changed (SyncConnected)
DEBUG ZkClient - Leaving process event
DEBUG ZkClient - State is SyncConnected
DEBUG ClientCnxn - Reading reply sessionid:0x155225c51720000, packet:: clientPath:null serverPath:null finished:false header:: 1,8 replyHeader:: 1,1,-101 request:: '/brokers/ids,F response:: v{}
It looks like I can't connect (presumably to zookeeper). Why not?
In new kafka streams, the ip of producer must have been knowing by kafka (docker). Kafka send their uuid (you can show this in /etc/hosts inside kafka docker) and espect response from this.
Summary:
Map uuid kafka docker to docker-machine in /etc/host of mac OS.
To help you, how to change etc/host file in mac:
https://www.tekrevue.com/tip/edit-hosts-file-mac-os-x/
Cleaner would be to set advertised.listeners=host-ip:port since advertised.host.name and advertised.port are deprecated in Kafka server.properties file.
If set host-ip to 0.0.0.0 it will listen requests from anywhere. But it's insecure.
I am new to spark. I am running Spark in standalone mode on my mac. I bring up the master and the worker and they all come up fine. The log file of the master looks like:
...
14/02/25 18:52:43 INFO Slf4jLogger: Slf4jLogger started
14/02/25 18:52:43 INFO Remoting: Starting remoting
14/02/25 18:52:43 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkMaster#Shirishs-MacBook-Pro.local:7077]
14/02/25 18:52:43 INFO Master: Starting Spark master at spark://Shirishs-MacBook-Pro.local:7077
14/02/25 18:52:43 INFO MasterWebUI: Started Master web UI at http://192.168.1.106:8080
14/02/25 18:52:43 INFO Master: I have been elected leader! New state: ALIVE
14/02/25 18:53:03 INFO Master: Registering worker Shirishs-MacBook-Pro.local:53956 with 4 cores, 15.0 GB RAM
The worker log looks like:
14/02/25 18:53:02 INFO Slf4jLogger: Slf4jLogger started
14/02/25 18:53:02 INFO Remoting: Starting remoting
14/02/25 18:53:02 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker#192.168.1.106:53956]
14/02/25 18:53:02 INFO Worker: Starting Spark worker 192.168.1.106:53956 with 4 cores, 15.0 GB RAM
14/02/25 18:53:02 INFO Worker: Spark home: /Users/shirish_kumar/Developer/spark-0.9.0-incubating
14/02/25 18:53:02 INFO WorkerWebUI: Started Worker web UI at http://192.168.1.106:8081
14/02/25 18:53:02 INFO Worker: Connecting to master spark://Shirishs-MacBook-Pro.local:7077...
14/02/25 18:53:03 INFO Worker: Successfully registered with master spark://Shirishs-MacBook-Pro.local:7077
Now, when I submit a job, the job fails to execute (because class not found error) but the worker also dies. Here is the master log:
14/02/25 18:55:52 INFO Master: Driver submitted org.apache.spark.deploy.worker.DriverWrapper
14/02/25 18:55:52 INFO Master: Launching driver driver-20140225185552-0000 on worker worker-20140225185302-192.168.1.106-53956
14/02/25 18:55:55 INFO Master: Registering worker Shirishs-MacBook-Pro.local:53956 with 4 cores, 15.0 GB RAM
14/02/25 18:55:55 INFO Master: Attempted to re-register worker at same address: akka.tcp://sparkWorker#192.168.1.106:53956
14/02/25 18:55:55 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956
14/02/25 18:55:57 INFO Master: akka.tcp://driverClient#192.168.1.106:53961 got disassociated, removing it.
14/02/25 18:55:57 INFO Master: akka.tcp://driverClient#192.168.1.106:53961 got disassociated, removing it.
14/02/25 18:55:57 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40192.168.1.106%3A53962-2#-21389169] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
4/02/25 18:55:57 INFO Master: akka.tcp://driverClient#192.168.1.106:53961 got disassociated, removing it.
14/02/25 18:55:57 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster#Shirishs-MacBook-Pro.local:7077] -> [akka.tcp://driverClient#192.168.1.106:53961]: Error [Association failed with [akka.tcp://driverClient#192.168.1.106:53961]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://driverClient#192.168.1.106:53961]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /192.168.1.106:53961
]
...
...
14/02/25 18:55:57 INFO Master: akka.tcp://driverClient#192.168.1.106:53961 got disassociated, removing it.
14/02/25 18:56:03 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956
14/02/25 18:56:10 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956
14/02/25 18:56:18 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956
14/02/25 18:56:25 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956
14/02/25 18:56:33 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956
14/02/25 18:56:40 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956
14/
The worker log looks like this
14/02/25 18:55:52 INFO Worker: Asked to launch driver driver-20140225185552-0000
2014-02-25 18:55:52.534 java[11415:330b] Unable to load realm info from SCDynamicStore
14/02/25 18:55:52 INFO DriverRunner: Copying user jar file:/Users/shirish_kumar/Developer/spark_app/SimpleApp to /Users/shirish_kumar/Developer/spark-0.9.0-incubating/work/driver-20140225185552-0000/SimpleApp
14/02/25 18:55:53 INFO DriverRunner: Launch Command: "/Library/Java/JavaVirtualMachines/jdk1.7.0_40.jdk/Contents/Home/bin/java" "-cp" ":/Users/shirish_kumar/Developer/spark-0.9.0-incubating/work/driver-20140225185552-0000/SimpleApp:/Users/shirish_kumar/Developer/spark-0.9.0-incubating/conf:/Users/shirish_kumar/Developer/spark-0.9.0-incubating/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop1.0.4.jar" "-Xms512M" "-Xmx512M" "org.apache.spark.deploy.worker.DriverWrapper" "akka.tcp://sparkWorker#192.168.1.106:53956/user/Worker" "SimpleApp"
14/02/25 18:55:55 ERROR OneForOneStrategy: FAILED (of class scala.Enumeration$Val)
scala.MatchError: FAILED (of class scala.Enumeration$Val)
at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:277)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/02/25 18:55:55 INFO Worker: Starting Spark worker 192.168.1.106:53956 with 4 cores, 15.0 GB RAM
14/02/25 18:55:55 INFO Worker: Spark home: /Users/shirish_kumar/Developer/spark-0.9.0-incubating
14/02/25 18:55:55 INFO WorkerWebUI: Started Worker web UI at http://192.168.1.106:8081
14/02/25 18:55:55 INFO Worker: Connecting to master spark://Shirishs-MacBook-Pro.local:7077...
14/02/25 18:55:55 INFO Worker: Successfully registered with master spark://Shirishs-MacBook-Pro.local:7077
After this in the webUI - the worker is show is dead.
My question is - has anyone encountered this problem. The worker should not die if a job fails.
Check you /Spark/work folder.
You can see the exact error for that particular driver.
For me its a class not found exception.Just give the fully qualified class name for the application main class(include package name too).
Then clear out the work directory and launch your application again in stand alone mode again.
This will work....!
You have to specify the path to your JAR files.
Pragmatically, you can do it this way:
sparkConf.set("spark.jars", "file:/myjar1, file:/myjarN")
Which implies you have to first compile a JAR file.
You also have to link dependent JARs - of which there are multiple ways of automating, but well beyond the scope of this question.