When and why does Curator throw ConnectionLossException? - apache-zookeeper

I use Curator 1.2.4 and I keep getting ConnectionLossException when I want to monitor one znode for its children's changes.
I then implemented a watcher like this
public class CuratorChildWatcherImpl implements CuratorWatcher {
private CuratorFramework client;
public CuratorChildWatcherImpl(CuratorFramework client) {
this.client = client;
}
#Override
public void process(WatchedEvent event) throws Exception {
List<String> children=client.getChildren().usingWatcher(this).forPath(event.getPath());
// Do other stuff with the children znode.
}
}
Every 11 seconds the code throws ConnectionLossException if connectionTimeout is set to 10 seconds. It seems the exception is connectionTimeout plus 1 second. Why?
I checked the source code found that GetChildrenBuilderImpl will call the CuratorZookeeperClient's blockUntilConnectedOrTimeout method which will check the connection state every 1 second.
2013-04-17 17:22:08 [ERROR]-[com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:97)] Connection timed out for connection string (...) and timeout (10000) / elapsed (10317913)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:107)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:413)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:213)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:202)
at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:198)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:190)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:37)
at com.netflix.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:56)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)

This was a known bug in the Curator/ZooKeeper interaction that's tracked under CURATOR-24 The current method of managing hung ZK handles needs improvement. It was fixed in 2.0.1-incubating version.

Related

Timeout in Vert.x 3.9 WebClient not working as expected

I need to set a request timeout on a downstream backend call. However, the WebClient class in Vert.x 3.9 doesn't seem to work as I expected. Here's some test code for it:
package client;
import io.vertx.reactivex.core.AbstractVerticle;
import io.vertx.reactivex.core.Vertx;
import io.vertx.reactivex.ext.web.client.WebClient;
public class Timeout extends AbstractVerticle {
private static final int port = 8080;
private static final String host = "localhost";
private static final int timeoutMilliseconds = 50;
#Override
public void start() {
WebClient client = WebClient.create(vertx);
for (int i = 0; i < 100; i++) {
client.get(port, host, "/").timeout(timeoutMilliseconds).send(
ar -> {
if (ar.succeeded()) {
System.out.println("Success!");
} else {
System.out.println("Fail: " + ar.cause().getMessage());
}
});
}
vertx.timerStream(1000).handler(aLong -> { vertx.close(); });
}
public static void main(String[] args) {
Vertx vertx = Vertx.vertx();
vertx.deployVerticle(new Timeout());
}
}
I'm running the following Go server on the same host for testing:
package main
import (
"fmt"
"net/http"
)
func main() {
http.HandleFunc("/", HelloServer)
http.ListenAndServe(":8080", nil)
}
func HelloServer(w http.ResponseWriter, r *http.Request) {
fmt.Println("Saying hello!")
fmt.Fprintf(w, "Hello, %s!", r.URL.Path[1:])
}
The output for my test server shows that WebClient opens 5 concurrent connections and every request is stopped by the timeout. What am I doing wrong here? How should I set a connection timeout on the requests? The output from the client is:
Fail: The timeout period of 50ms has been exceeded while executing GET / for server localhost:8080
Fail: The timeout period of 50ms has been exceeded while executing GET / for server localhost:8080
Fail: The timeout period of 50ms has been exceeded while executing GET / for server localhost:8080
Fail: The timeout period of 50ms has been exceeded while executing GET / for server localhost:8080
Fail: The timeout period of 50ms has been exceeded while executing GET / for server localhost:8080
Fail: The timeout period of 50ms has been exceeded while executing GET / for server localhost:8080
Fail: The timeout period of 50ms has been exceeded while executing GET / for server localhost:8080
...
I would expect to only see "Success!" printed, since the Go server running on the same host should respond well within 50ms.
EDIT: Removed the vertx.close() and clarified original question... Didn't actually have the vertx.close() in my original test code, but added it when editing the SO post, so people running it wouldn't need to hit CTRL-C.
It hangs because you are blocking the main thread.
Remove this:
try {
Thread.sleep(1000);
} catch(InterruptedException ex) {
Thread.currentThread().interrupt();
}
vertx.close();
The application will keep running as long as vert.x is alive.
If you really want to close vert.x yourself, do it in a separate thread.
Or alternatively, do it with Vert.x itself:
vertx.timerStream(1000).handler(aLong -> {
vertx.close();
});
not sure what you are trying to do there, but there are multiple things that are incorrect there:
in AbstractVerticle.start() you do only start logic. also if you have async logic, then you need to use an async interface like start(Promise<Void> startPromise) and report completion properly so that Vertx waits for your start logic to finish.
you are blocking the start process here:
try {
Thread.sleep(1000);
} catch(InterruptedException ex) {
Thread.currentThread().interrupt();
}
as long as this runs, your verticle is not really started and main thread of vertx is blocked.
you never close vertx in a verticle's start! so remove this line vertx.close() and quit the running application in another way.
in general check docs to understand the process and usage of verticles.

Apache Curator : No leader is getting selected intermittently

I am using Apache Curator Leader Election Recipe : https://curator.apache.org/curator-recipes/leader-election.html in my application.
Zookeeper version : 3.5.7
Curator : 4.0.1
Below are the sequence of steps:
1. Whenever my tomcat server instance is getting up, I create a single CuratorFramework instance(single instance per tomcat server) and start it :
CuratorFramework client = CuratorFrameworkFactory.newClient(connectionString, retryPolicy);
client.start();
if(!client.blockUntilConnected(10, TimeUnit.MINUTES)){
LOGGER.error("Zookeeper connection could not establish!");
throw new RuntimeException("Zookeeper connection could not establish");
}
Create an instance of LSAdapter and start it:
LSAdapter adapter = new LSAdapter(client, <some_metadata>);
adapter.start();
Below is my LSAdapter class :
public class LSAdapter extends LeaderSelectorListenerAdapter implements Closeable {
//<Class instance variables defined>
public LSAdapter(CuratorFramework client, <some_metadata>) {
leaderSelector = new LeaderSelector(client, <path_to_be_used_for_leader_election>, this);
leaderSelector.autoRequeue();
}
public void start() throws IOException {
leaderSelector.start();
}
#Override
public void close() throws IOException {
leaderSelector.close();
}
#Override
public void takeLeadership(CuratorFramework client) throws Exception {
final int waitSeconds = (int) (5 * Math.random()) + 1;
LOGGER.info(name + " is now the leader. Waiting " + waitSeconds + " seconds...");
LOGGER.debug(name + " has been leader " + leaderCount.getAndIncrement() + " time(s) before.");
while (true) {
try {
Thread.sleep(TimeUnit.SECONDS.toMillis(waitSeconds));
//do leader tasks
} catch (InterruptedException e) {
LOGGER.error(name + " was interrupted.");
//cleanup
Thread.currentThread().interrupt();
} finally {
}
}
}
}
When server instance is getting down, close LSAdapter instance(which application is using) and close CuratorFramework client created
CloseableUtils.closeQuietly(lsAdapter);
curatorFrameworkClient.close();
The issue I am facing is that at times, when server is restarted, no leader gets elected. I checked that by tracing the log inside takeLeadership(). I have two tomcat server instances with above code, connecting to same zookeeper quorum and most of the times one of the instance becomes leader but when this issue happens, both of them becomes follower. Please suggest what am I doing wrong.
As I answered on Curator's Jira, you are swallowing the interrupted exception. When you get InterruptedException you must exit your takeLeadership(). In your code example, you are merely resetting the interrupted state and continuing the loop - this will cause an infinite loop of interrupted exceptions, btw. After calling Thread.currentThread().interrupt(); you should exit the while loop.

Netty Client Connect with Server, but server does not fire channelActive/Registered

I have the following architecture in use:
- [Client] - The enduser connecting to our service.
- [GameServer] - The game server on which the game is running.
- [GameLobby] - A server that is responsible for matching Clients with a GameServer.
If we have for example 4 Clients that want to play a game and get matched to a GameLobby, then the first time all these connection succeeds properly.
However when they decide to rematch, then one of the Clients will not properly connect.
The connection between all the Clients and the GameServer happens simultaneously.
Clients that rematch first removes their current connection with the GameServer and head into the lobby again.
This connection will succeed, no errors are thrown. Even using a ChannelFuture it shows that the client connection was made properly, the following values are retrieved to show that the client thinks the connection was correct:
- ChannelFuture.isSuccess() = True
- ChannelFuture.isDone() = True
- ChannelFuture.cause() = Null
- ChannelFuture.isCancelled() = False
- Channel.isOpen() = True
- Channel.isActive() = True
- Channel.isRegistered() = True
- Channel.isWritable() = True
Thus the connection was properly made according to the Client. However on the GameServer at the SimpleChannelInboundHandler, the method ChannelRegistered/ChannelActive is never called for that specific Client. Only for the other 3 Clients.
All the 4 Clients, the GameServer, and the Lobby are running on the same IPAddress.
Since it only happens when (re)connecting again to the GameServer, I thought that is had to do with not properly closing the connection. Currently this is done through:
try {
group.shutdownGracefully();
channel.closeFuture().sync();
} catch (InterruptedException e) {
e.printStackTrace();
}
On the GameServer the ChannelUnregister is called thus this is working, and the connection is destroyed.
I have tried adding listeners to the ChannelFuture of the malfunctioning channel connection, however according to the channelFuture everything works, which is not the case.
I tried adding ChannelOptions to allow for more Clients queued to the server.
GameServer
The GameServer server is initialized as follow:
// Create the bootstrap to make this act like a server.
ServerBootstrap serverBootstrap = new ServerBootstrap();
serverBootstrap.group(bossGroup)
.channel(NioServerSocketChannel.class)
.childHandler(new ChannelInitialisation(new ClientInputReader(gameThread)))
.option(ChannelOption.SO_BACKLOG, 1000)
.childOption(ChannelOption.SO_KEEPALIVE, true)
.childOption(ChannelOption.TCP_NODELAY, true);
bossGroup.execute(gameThread); // Executing the thread that handles all games on this GameServer.
// Launch the server with the specific port.
serverBootstrap.bind(port).sync();
The GameServer ClientInputReader
#ChannelHandler.Sharable
public class ClientInputReader extends SimpleChannelInboundHandler<Packet> {
private ServerMainThread serverMainThread;
public ClientInputReader(ServerMainThread serverMainThread) {
this.serverMainThread = serverMainThread;
}
#Override
public void channelRegistered(ChannelHandlerContext ctx) throws Exception {
System.out.println("[Connection: " + ctx.channel().id() + "] Channel registered");
super.channelRegistered(ctx);
}
#Override
protected void channelRead0(ChannelHandlerContext ctx, Packet packet) {
// Packet handling
}
}
The malfunction connection is not calling anything of the SimpleChannelInboundHandler. Not even ExceptionCaught.
The GameServer ChannelInitialisation
public class ChannelInitialisation extends ChannelInitializer<SocketChannel> {
private SimpleChannelInboundHandler channelInputReader;
public ChannelInitialisation(SimpleChannelInboundHandler channelInputReader) {
this.channelInputReader = channelInputReader;
}
#Override
protected void initChannel(SocketChannel ch) throws Exception {
ChannelPipeline pipeline = ch.pipeline();
// every packet is prefixed with the amount of bytes that will follow
pipeline.addLast(new LengthFieldBasedFrameDecoder(Integer.MAX_VALUE, 0, 4, 0, 4));
pipeline.addLast(new LengthFieldPrepender(4));
pipeline.addLast(new PacketEncoder(), new PacketDecoder(), channelInputReader);
}
}
Client
Client creating a GameServer connection:
// Configure the client.
group = new NioEventLoopGroup();
Bootstrap b = new Bootstrap();
b.group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.TCP_NODELAY, true)
.handler(new ChannelInitialisation(channelHandler));
// Start the client.
channel = b.connect(address, port).await().channel();
/* At this point, the client thinks that the connection was succesfully, as the channel is active, open, registered and writable...*/
ClientInitialisation:
public class ChannelInitialisation extends ChannelInitializer<SocketChannel> {
private SimpleChannelInboundHandler<Packet> channelHandler;
ChannelInitialisation(SimpleChannelInboundHandler<Packet> channelHandler) {
this.channelHandler = channelHandler;
}
#Override
public void initChannel(SocketChannel ch) throws Exception {
// prefix messages by the length
ch.pipeline().addLast(new LengthFieldBasedFrameDecoder(Integer.MAX_VALUE, 0, 4, 0, 4));
ch.pipeline().addLast(new LengthFieldPrepender(4));
// our encoder, decoder and handler
ch.pipeline().addLast(new PacketEncoder(), new PacketDecoder(), channelHandler);
}
}
ClientHandler:
public class ClientPacketHandler extends SimpleChannelInboundHandler<Packet> {
#Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
super.channelActive(ctx);
System.out.println("Channel active: " + ctx.channel().id());
ctx.channel().writeAndFlush(new PacketSetupClientToGameServer());
System.out.println("Sending setup packet to the GameServer: " + ctx.channel().id());
// This is successfully called, as the client thinks the connection was properly made.
}
#Override
protected void channelRead0(ChannelHandlerContext ctx, Packet packet) {
// Reading packets.
}
}
I expect that the Client could connect properly to the server. Since the other Clients are properly connecting and the client could previously connect just fine.
TL;DR: When multiple Clients try to create a new match, there is a possibility that one, possibly more, Client(s) will not connect properly with the server, after the previous connection was closed.
For some that struggle with this issue in some way or another.
I did a workaround that allows me to continue even tho there is still a bug inside the Netty framework (as far as I am concerned). The workaround is quite simple just create a connection pool.
My solution uses a maximum of five connections inside the connection pool. If one of the connection gets no reply from the GameServer, then it is not that big of a deal, since there are four others that will have a high chance of succeeding. I know this is a bad workaround, but I could not find any information on this issue. It works and only gives a maximum delay of 5 seconds (each retry takes a second)

Kafka listener, get all messages

Good day collegues.
I have Kafka project using Spring Kafka what listen a definite topic.
I need one time in a day listen all messages, put them into a collection and find specific message there.
I couldn't understand how to read all messages in one #KafkaListener method.
My class is:
#Component
public class KafkaIntervalListener {
public CountDownLatch intervalLatch = new CountDownLatch(1);
private final SCDFRunnerService scdfRunnerService;
public KafkaIntervalListener(SCDFRunnerService scdfRunnerService) {
this.scdfRunnerService = scdfRunnerService;
}
#KafkaListener(topics = "${kafka.interval-topic}", containerFactory = "intervalEventKafkaListenerContainerFactory")
public void intervalListener(IntervalEvent event) throws UnsupportedEncodingException, JSONException {
System.out.println("Recieved interval message: " + event);
IntervalType type = event.getType();
Instant instant = event.getInterval();
List<IntervalEvent> events = new ArrayList<>();
events.add(event);
events.size();
this.intervalLatch.countDown();
}
}
My events collection always has size = 1;
I tried to use different loops, but then, my collection become filed 530 000 000 times the same message.
UPDATE:
I have found a way to do it with factory.setBatchListener(true); But i need to find launch it with #Scheduled(cron = "${kafka.cron}", zone = "Europe/Moscow"). Right now this method is always is listening. Now iam trying something like this:
#Scheduled(cron = "${kafka.cron}", zone = "Europe/Moscow")
public void run() throws Exception {
kafkaIntervalListener.intervalLatch.await();
}
It doesn't work, in debug mode my breakpoint never works on this site.
The listener container is, by design, message-driven.
For fetching messages on-demand, it's better to use the Kafka Consumer API directly and fetch messages using the poll() method.

Form Instantiation time in Restlet

I am new to Restlet framework and I have the following time issue in the post method of my server resource.
My post method code
#Post
public Representation represent(Representation entity){
try{
//Thread.sleep(1000);
long start = System.currentTimeMillis();
Form aForm = new Form(getRequestEntity());
System.err.println("FORM Instantiation TIME: " + (System.currentTimeMillis()-start));
}catch(Exception ex){
ex.printStackTrace();
}
return new StringRepresentation("hello");
}
On different trails, the output that I am getting is 1900-1999 ms. But if I uncomment the line Thread.sleep(1000), then the time output is 900-999 ms. Can any one please confirm what is happening when instantiation the Form object and why the time is always 1900+ ms. Sorting out this time issue is important for me as I have to implement token based authentication to reduce the post method processing time.
Sorry for late reply. The restlet version I am using is 2.0.7
Here is the details
public static void main(String[] args) throws Exception {
Component component = new Component();
component.getServers().add(Protocol.HTTP, 8182);
VirtualHost aHost = component.getDefaultHost();
aHost.attach("/sample", new MyApplication());
component.getLogger().setLevel(Level.OFF);
component.start();
System.err.println("REST SERVICE STARTED ON PORT NUMBER 8182...");
}
I am running this application in local and not in any Web/App Server.