Hazelcast LifecycleListener stateChanged never firing - event-handling

I would like to keep track of the Hazelcast lifecycle. The default logging comes up with events lke
INFO: [192.168.201.11]:5701 [MyApp] [4.1.2] [192.168.201.11]:5701 is STARTING
INFO: [192.168.201.11]:5701 [MyApp] [4.1.2] [192.168.201.11]:5701 is STARTED
so I'd l like these events telling me the status is STARTING or STARTED, etc.
Here's how I'm starting up
var hzConfig = new Config();
hzConfig.setClusterName(clusterName);
hzConfig.setInstanceName(memberId);
instance = Hazelcast.newHazelcastInstance(hzConfig)
var lifecycleListener = new LifecycleListener();
lifecycleListenerId = instance.getLifecycleService().addLifecycleListener(lifecycleListener);
And here's my LifecycleListener
public class LifecycleListener implements com.hazelcast.core.LifecycleListener {
#Override
public void stateChanged(LifecycleEvent lifecycleEvent) {
System.out.println("Lifecycle change: " + lifecycleEvent.getState());
}
}
All very basic, but my LifeCycleListener doesn't get any events, neither on starting up, nor on shutting down.
Any ideas what I'm missing here?

The code you wrote is correct. The reason why you don't see the logs is that your Hazelcast instance starts before your LifecycleListener is added.
Try the following code.
var hzConfig = new Config();
hzConfig.setClusterName(clusterName);
hzConfig.setInstanceName(memberId);
instance = Hazelcast.newHazelcastInstance(hzConfig)
var lifecycleListener = new LifecycleListener();
lifecycleListenerId = instance.getLifecycleService().addLifecycleListener(lifecycleListener);
Thread.sleep(5000);
instance.shutdown();
Thread.sleep(5000);
You should see the following log.
Lifecycle change: SHUTTING_DOWN
...
Lifecycle change: SHUTDOWN

Related

Apache Curator : No leader is getting selected intermittently

I am using Apache Curator Leader Election Recipe : https://curator.apache.org/curator-recipes/leader-election.html in my application.
Zookeeper version : 3.5.7
Curator : 4.0.1
Below are the sequence of steps:
1. Whenever my tomcat server instance is getting up, I create a single CuratorFramework instance(single instance per tomcat server) and start it :
CuratorFramework client = CuratorFrameworkFactory.newClient(connectionString, retryPolicy);
client.start();
if(!client.blockUntilConnected(10, TimeUnit.MINUTES)){
LOGGER.error("Zookeeper connection could not establish!");
throw new RuntimeException("Zookeeper connection could not establish");
}
Create an instance of LSAdapter and start it:
LSAdapter adapter = new LSAdapter(client, <some_metadata>);
adapter.start();
Below is my LSAdapter class :
public class LSAdapter extends LeaderSelectorListenerAdapter implements Closeable {
//<Class instance variables defined>
public LSAdapter(CuratorFramework client, <some_metadata>) {
leaderSelector = new LeaderSelector(client, <path_to_be_used_for_leader_election>, this);
leaderSelector.autoRequeue();
}
public void start() throws IOException {
leaderSelector.start();
}
#Override
public void close() throws IOException {
leaderSelector.close();
}
#Override
public void takeLeadership(CuratorFramework client) throws Exception {
final int waitSeconds = (int) (5 * Math.random()) + 1;
LOGGER.info(name + " is now the leader. Waiting " + waitSeconds + " seconds...");
LOGGER.debug(name + " has been leader " + leaderCount.getAndIncrement() + " time(s) before.");
while (true) {
try {
Thread.sleep(TimeUnit.SECONDS.toMillis(waitSeconds));
//do leader tasks
} catch (InterruptedException e) {
LOGGER.error(name + " was interrupted.");
//cleanup
Thread.currentThread().interrupt();
} finally {
}
}
}
}
When server instance is getting down, close LSAdapter instance(which application is using) and close CuratorFramework client created
CloseableUtils.closeQuietly(lsAdapter);
curatorFrameworkClient.close();
The issue I am facing is that at times, when server is restarted, no leader gets elected. I checked that by tracing the log inside takeLeadership(). I have two tomcat server instances with above code, connecting to same zookeeper quorum and most of the times one of the instance becomes leader but when this issue happens, both of them becomes follower. Please suggest what am I doing wrong.
As I answered on Curator's Jira, you are swallowing the interrupted exception. When you get InterruptedException you must exit your takeLeadership(). In your code example, you are merely resetting the interrupted state and continuing the loop - this will cause an infinite loop of interrupted exceptions, btw. After calling Thread.currentThread().interrupt(); you should exit the while loop.

Modify connector config in KafkaConnect before sending to task

I'm writing a SinkConnector in Kafka Connect and hitting an issue. This connector has a configuration as such :
{
"connector.class" : "a.b.ExampleFileSinkConnector",
"tasks.max" : '1',
"topics" : "mytopic",
"maxFileSize" : "50"
}
I define the connector's config like this :
#Override public ConfigDef config()
{
ConfigDef result = new ConfigDef();
result.define("maxFileSize", Type.STRING, "10", Importance.HIGH, "size of file");
return result;
}
In the connector, I start the tasks as such :
#Override public List<Map<String, String>> taskConfigs(int maxTasks) {
List<Map<String, String>> result = new ArrayList<Map<String,String>>();
for (int i = 0; i < maxTasks; i++) {
Map<String, String> taskConfig = new HashMap<>();
taskConfig.put("connectorName", connectorName);
taskConfig.put("taskNumber", Integer.toString(i));
taskConfig.put("maxFileSize", maxFileSize);
result.add(taskConfig);
}
return result;
}
and all goes well.
However, when starting the Task (in taskConfigs()), if I add this :
taskConfig.put("epoch", "123");
this breaks the whole infrastructure : all connectors are stopped and restarted in an endless loop.
There is no exception or error whatsoever in the connect log file that can help.
The only way to make it work is to add "epoch" in the connector config, which I don't want to do since it is an internal parameter that the connector has to send to the task. It is not intended to be exposed to the connector's users.
Another point I noticed is that it is not possible to update the value of any connector config parameter, apart to set it to the default value. Changing a parameter and sending it to the task produces the same behavior.
I would really appreciate any help on this issue.
EDIT : here is the code of SinkTask::start()
#Override public void start(Map<String, String> taskConfig) {
try {
connectorName = taskConfig.get("connectorName");
log.info("{} -- Task.start()", connectorName);
fileNamePattern = taskConfig.get("fileNamePattern");
rootDir = taskConfig.get("rootDir");
fileExtension = taskConfig.get("fileExtension");
maxFileSize = SimpleFileSinkConnector.parseIntegerConfig(taskConfig.get("maxFileSize"));
maxTimeMinutes = SimpleFileSinkConnector.parseIntegerConfig(taskConfig.get("maxTimeMinutes"));
maxNumRecords = SimpleFileSinkConnector.parseIntegerConfig(taskConfig.get("maxNumRecords"));
taskNumber = SimpleFileSinkConnector.parseIntegerConfig(taskConfig.get("taskNumber"));
epochStart = SimpleFileSinkConnector.parseLongConfig(taskConfig.get("epochStart"));
log.info("{} -- fileNamePattern: {}, rootDir: {}, fileExtension: {}, maxFileSize: {}, maxTimeMinutes: {}, maxNumRecords: {}, taskNumber: {}, epochStart : {}",
connectorName, fileNamePattern, rootDir, fileExtension, maxFileSize, maxTimeMinutes, maxNumRecords, taskNumber, epochStart);
if (taskNumber == 0) {
checkTempFilesForPromotion();
}
computeInitialFilename();
log.info("{} -- Task.start() END", connectorName);
} catch (Exception e) {
log.info("{} -- Task.start() EXCEPTION : {}", connectorName, e.getLocalizedMessage());
}
}
We found the root cause of the issue. The Kafka Connect Framework is actually behaving as designed - the problem has to do with how we are trying to use the taskConfigs configuration framework.
The Problem
In our design, the FileSinkConnector sets an epoch in its start() lifecycle method, and this epoch is passed down to its tasks by way of the taskConfigs() lifecycle method. So each time the Connector's start() lifecycle method runs, different configuration is generated for the tasks - which is the problem.
Generating different configuration each time is a no-no. It turns out that the Connect Framework detects differences in configuration and will restart/rebalance upon detection - stopping and restarting the connector/task. That restart will call the stop() and start() methods of the connector ... which will (of course) produces yet another configuration change (because of the new epoch), and the vicious cycle is on!
This was an interesting and unexpected issue ... due to a behavior in Connect that we had no appreciation for. This is the first time we tried to generate task configuration that was not a simple function of the connector configuration.
Note that this behavior in Connect is intentional and addresses real issues of dynamically-changing configuration - like a JDBC Sink Connector that spontaneously updates its configuration when it detects a new database table it wants to sink.
Thanks to those who helped us !

Kafka listener, get all messages

Good day collegues.
I have Kafka project using Spring Kafka what listen a definite topic.
I need one time in a day listen all messages, put them into a collection and find specific message there.
I couldn't understand how to read all messages in one #KafkaListener method.
My class is:
#Component
public class KafkaIntervalListener {
public CountDownLatch intervalLatch = new CountDownLatch(1);
private final SCDFRunnerService scdfRunnerService;
public KafkaIntervalListener(SCDFRunnerService scdfRunnerService) {
this.scdfRunnerService = scdfRunnerService;
}
#KafkaListener(topics = "${kafka.interval-topic}", containerFactory = "intervalEventKafkaListenerContainerFactory")
public void intervalListener(IntervalEvent event) throws UnsupportedEncodingException, JSONException {
System.out.println("Recieved interval message: " + event);
IntervalType type = event.getType();
Instant instant = event.getInterval();
List<IntervalEvent> events = new ArrayList<>();
events.add(event);
events.size();
this.intervalLatch.countDown();
}
}
My events collection always has size = 1;
I tried to use different loops, but then, my collection become filed 530 000 000 times the same message.
UPDATE:
I have found a way to do it with factory.setBatchListener(true); But i need to find launch it with #Scheduled(cron = "${kafka.cron}", zone = "Europe/Moscow"). Right now this method is always is listening. Now iam trying something like this:
#Scheduled(cron = "${kafka.cron}", zone = "Europe/Moscow")
public void run() throws Exception {
kafkaIntervalListener.intervalLatch.await();
}
It doesn't work, in debug mode my breakpoint never works on this site.
The listener container is, by design, message-driven.
For fetching messages on-demand, it's better to use the Kafka Consumer API directly and fetch messages using the poll() method.

Windows Workflow not terminating after Transaction Failure

I am bit new to Windows Workflow foundation so it might be a very straight forward, but I am stuck with it. I've a very simple sequential workflow and there are couple of code activities that are inside a Transaction Scope Activity.
I am running my workflow from Console application having following code:
Activity workflow = new Process();
var inputArgument = new Dictionary<string, object>();
inputArgument["Argument 1"] = 1234567;
inputArgument["Argument 2"] = 1234567;
inputArgument["Argument 3"] = "GUID";
inputArgument["Aggument 4"] = #"\\filepath\";
var syncEvent = new AutoResetEvent(false);
var workflowApp = new WorkflowApplication(workflow, inputArgument);
workflowApp.OnUnhandledException =
delegate (WorkflowApplicationUnhandledExceptionEventArgs e)
{
return UnhandledExceptionAction.Terminate;
};
workflowApp.Completed +=
delegate (WorkflowApplicationCompletedEventArgs e)
{
syncEvent.Set();
};
workflowApp.Run();
syncEvent.WaitOne();
If I don't add Transaction Scope activity my workflow runs fine and in case of exception the workflow instance terminates and my console application close as well.
However, when I add Transaction Scope activity and if any activity fails inside Transaction Scope then my workflow instance keep running as well as my console. Can any one guide me how to terminate the instance?
I am not handling any exception within my workflow and want it to be like that so that I can log the exception details.
If you go to properties on the TransactionScope in the Workflow there is a property that is set to true by default called AbortInstanceOnTransactionFailure. Set that to false. It should then behave as you're expecting.
When this is enabled it will cause the workflow instance to abort but not terminate.

When is a started service not a started service? (SQL Express)

We require programmatic access to a SQL Server Express service as part of our application. Depending on what the user is trying to do, we may have to attach a database, detach a database, back one up, etc. Sometimes the service might not be started before we attempt these operations. So we need to ensure the service is started. Here is where we are running into problems. Apparently the ServiceController.WaitForStatus(ServiceControllerStatus.Running) returns prematurely for SQL Server Express. What is really puzzling is that the master database seems to be immediately available, but not other databases. Here is a console application to demonstrate what I am talking about:
namespace ServiceTest
{
using System;
using System.Data.SqlClient;
using System.Diagnostics;
using System.ServiceProcess;
using System.Threading;
class Program
{
private static readonly ServiceController controller = new ServiceController("MSSQL$SQLEXPRESS");
private static readonly Stopwatch stopWatch = new Stopwatch();
static void Main(string[] args)
{
stopWatch.Start();
EnsureStop();
Start();
OpenAndClose("master");
EnsureStop();
Start();
OpenAndClose("AdventureWorksLT");
Console.ReadLine();
}
private static void EnsureStop()
{
Console.WriteLine("EnsureStop enter, {0:N0}", stopWatch.ElapsedMilliseconds);
if (controller.Status != ServiceControllerStatus.Stopped)
{
controller.Stop();
controller.WaitForStatus(ServiceControllerStatus.Stopped);
Thread.Sleep(5000); // really, really make sure it stopped ... this has a problem too.
}
Console.WriteLine("EnsureStop exit, {0:N0}", stopWatch.ElapsedMilliseconds);
}
private static void Start()
{
Console.WriteLine("Start enter, {0:N0}", stopWatch.ElapsedMilliseconds);
controller.Start();
controller.WaitForStatus(ServiceControllerStatus.Running);
// Thread.Sleep(5000);
Console.WriteLine("Start exit, {0:N0}", stopWatch.ElapsedMilliseconds);
}
private static void OpenAndClose(string database)
{
Console.WriteLine("OpenAndClose enter, {0:N0}", stopWatch.ElapsedMilliseconds);
var connection = new SqlConnection(string.Format(#"Data Source=.\SQLEXPRESS;initial catalog={0};integrated security=SSPI", database));
connection.Open();
connection.Close();
Console.WriteLine("OpenAndClose exit, {0:N0}", stopWatch.ElapsedMilliseconds);
}
}
}
On my machine, this will consistently fail as written. Notice that the connection to "master" has no problems; only the connection to the other database. (You can reverse the order of the connections to verify this.) If you uncomment the Thread.Sleep in the Start() method, it will work fine.
Obviously I want to avoid an arbitrary Thread.Sleep(). Besides the rank code smell, what arbitary value would I put there? The only thing we can think of is to put some dummy connections to our target database in a while loop, catching the SqlException thrown and trying again until it works. But I'm thinking there must be a more elegant solution out there to know when the service is really ready to be used. Any ideas?
EDIT: Based on feedback provided below, I added a check on the status of the database. However, it is still failing. It looks like even the state is not reliable. Here is the function I am calling before OpenAndClose(string):
private static void WaitForOnline(string database)
{
Console.WriteLine("WaitForOnline start, {0:N0}", stopWatch.ElapsedMilliseconds);
using (var connection = new SqlConnection(string.Format(#"Data Source=.\SQLEXPRESS;initial catal
using (var command = connection.CreateCommand())
{
connection.Open();
try
{
command.CommandText = "SELECT [state] FROM sys.databases WHERE [name] = #DatabaseName";
command.Parameters.AddWithValue("#DatabaseName", database);
byte databaseState = (byte)command.ExecuteScalar();
Console.WriteLine("databaseState = {0}", databaseState);
while (databaseState != OnlineState)
{
Thread.Sleep(500);
databaseState = (byte)command.ExecuteScalar();
Console.WriteLine("databaseState = {0}", databaseState);
}
}
finally
{
connection.Close();
}
}
Console.WriteLine("WaitForOnline exit, {0:N0}", stopWatch.ElapsedMilliseconds);
}
I found another discussion dealing with a similar problem. Apparently the solution is to check the sys.database_files of the database in question. But that, of course, is a chicken-and-egg problem. Any other ideas?
Service start != database start.
Service is started when the SQL Server process is running and responded to the SCM that is 'alive'. After that the server will start putting user databases online. As part of this process, it runs the recovery process on each database, to ensure transactional consistency. Recovery of a database can last anywhere from microseconds to whole days, it depends on the ammount of log to be redone and the speed of the disk(s).
After the SCM returns that the service is running, you should connect to 'master' and check your database status in sys.databases. Only when the status is ONLINE can you proceed to open it.