Apache beam multiple consumer groups in KafkaIO.read() | Out of memory - apache-beam

I'm working on Apache Beam streaming. I've made a stream that reads a lot of topics and put all data in GCS.
My KafkaIO.reader is
KafkaIO.<String, AvroGenericRecord>read()
.withBootstrapServers(bootstrapServers)
.withConsumerConfigUpdates(configUpdates)
.withTopics(inputTopics)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializerAndCoder(BeamKafkaAvroGenericDeserializer.class, AvroGenericCoder.of(serDeConfig()))
.withMaxNumRecords(maxNumRecords)
.commitOffsetsInFinalize()
.withoutMetadata();
In configUpdates I put ConsumerConfig.GROUP_ID_CONFIG value.
I would like to make somehow that I could read 2-3 consumer groups, is it possible to achieve? Because I have some topics which data comes quickly and some of them are not.
UPD
The reason I wanted to make multiple consumer groups is Out of memory of my job.
gcp#3|Caused by: java.lang.OutOfMemoryError: Java heap space
gcp#3|java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.OutOfMemoryError: Java heap space
gcp#3| org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:184)
gcp#3| org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:102)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.lambda$onTrigger$1(ReduceFnRunner.java:1057)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnContextFactory$OnTriggerContextImpl.output(ReduceFnContextFactory.java:438)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SystemReduceFn.onTrigger(SystemReduceFn.java:125)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.onTrigger(ReduceFnRunner.java:1060)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.emit(ReduceFnRunner.java:930)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:368)
gcp#3| org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:94)
gcp#3| org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:42)
gcp#3| org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115)
gcp#3| org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner.java:73)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80)
gcp#3| org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:134)
gcp#3| org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
gcp#3| org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
gcp#3| org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
gcp#3| org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
gcp#3| org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
gcp#3| org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316)
gcp#3| org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
gcp#3| org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049)
gcp#3| java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
gcp#3| java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
gcp#3| java.lang.Thread.run(Thread.java:745)
gcp#3|Caused by: org.apache.beam.sdk.util.UserCodeException: java.lang.OutOfMemoryError: Java heap space
gcp#3| org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:34)
gcp#3| org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn$DoFnInvoker.invokeProcessElement(Unknown Source)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:218)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:180)
gcp#3| org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:335)
gcp#3| org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
gcp#3| org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
gcp#3| org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:182)
gcp#3| org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:102)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.lambda$onTrigger$1(ReduceFnRunner.java:1057)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnContextFactory$OnTriggerContextImpl.output(ReduceFnContextFactory.java:438)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SystemReduceFn.onTrigger(SystemReduceFn.java:125)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.onTrigger(ReduceFnRunner.java:1060)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.emit(ReduceFnRunner.java:930)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:368)
gcp#3| org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:94)
gcp#3| org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:42)
gcp#3| org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115)
gcp#3| org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner.java:73)
gcp#3| org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80)
gcp#3| org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:134)
gcp#3| org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
gcp#3| org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
gcp#3| org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
gcp#3| org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
gcp#3| org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
gcp#3| org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316)
gcp#3| org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
gcp#3| org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049)
gcp#3| java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
gcp#3| java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
gcp#3| java.lang.Thread.run(Thread.java:745)
gcp#3|Caused by: java.lang.OutOfMemoryError: Java heap space
As I understand now the problem is not because of reading from Kafka it's because of incorrect windowing I think. I have a lot of topics (40+) and I try to read them all, a lot of data... I try to make event time windowing to handle everything.
This is my windowing:
records.apply(Window.<AvroGenericRecord>into(FixedWindows.of(Duration.standardHours(options.getWindowInMinutes())))
.triggering(AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane())
.withLateFirings(AfterPane.elementCountAtLeast(options.getElementsCountToWaitAfterWatermark())))
.withAllowedLateness(Duration.standardHours(1))
.discardingFiredPanes()
UPD 2.0
I think it happens during writing.
This is my class which put avro data in GCP buckets. It should put data by topic name and timestamp. The final output should be bucket/{topic}/{date}/{'avroContainerPerWindowOrPane'}
This is how I did it.
public class DynamicAvroGenericRecordDestinations extends DynamicAvroDestinations<AvroGenericRecord, AvroDestination, GenericRecord> {
private static final DateTimeFormatter formatter = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss");
private final String baseDir;
private final String fileExtension;
public DynamicAvroGenericRecordDestinations(String baseDir, String fileExtension) {
this.baseDir = baseDir;
this.fileExtension = fileExtension;
}
#Override
public Schema getSchema(AvroDestination destination) {
return new Schema.Parser().parse(destination.jsonSchema);
}
#Override
public GenericRecord formatRecord(AvroGenericRecord record) {
return record.getRecord();
}
#Override
public AvroDestination getDestination(AvroGenericRecord record) {
Schema schema = record.getRecord().getSchema();
return AvroDestination.of(record.getName(), record.getDate(), record.getVersionId(), schema.toString());
}
#Override
public AvroDestination getDefaultDestination() {
return new AvroDestination();
}
#Override
public FileBasedSink.FilenamePolicy getFilenamePolicy(AvroDestination destination) {
String pathStr = baseDir + "/" + destination.name + "/" + destination.date + "/" + destination.name;
return new WindowedFilenamePolicy(FileBasedSink.convertToFileResourceIfPossible(pathStr), destination.version, fileExtension);
}
private static class WindowedFilenamePolicy extends FileBasedSink.FilenamePolicy {
final ResourceId outputFilePrefix;
final String fileExtension;
final Integer version;
WindowedFilenamePolicy(ResourceId outputFilePrefix, Integer version, String fileExtension) {
this.outputFilePrefix = outputFilePrefix;
this.version = version;
this.fileExtension = fileExtension;
}
#Override
public ResourceId windowedFilename(
int shardNumber,
int numShards,
BoundedWindow window,
PaneInfo paneInfo,
FileBasedSink.OutputFileHints outputFileHints) {
IntervalWindow intervalWindow = (IntervalWindow) window;
String filenamePrefix =
outputFilePrefix.isDirectory() ? "" : firstNonNull(outputFilePrefix.getFilename(), "");
String filename =
String.format("%s-%s(%s-%s)-(%s-of-%s)%s", filenamePrefix,
version,
formatter.print(intervalWindow.start()),
formatter.print(intervalWindow.end()),
shardNumber,
numShards - 1,
fileExtension);
ResourceId result = outputFilePrefix.getCurrentDirectory();
return result.resolve(filename, RESOLVE_FILE);
}
#Override
public ResourceId unwindowedFilename(
int shardNumber, int numShards, FileBasedSink.OutputFileHints outputFileHints) {
throw new UnsupportedOperationException("Expecting windowed outputs only");
}
#Override
public void populateDisplayData(DisplayData.Builder builder) {
builder.add(
DisplayData.item("fileNamePrefix", outputFilePrefix.toString())
.withLabel("File Name Prefix"));
}
}
}

I don't think KafkaIO allows to have different GROUP_IDs in the same Kafka read transform. Well, we allow two different consumer configs but this is because, under the hood, there are two consumers in KafkaIO actually - for messages and for offsets, so it's a different story.
Btw, what is an issue to consume messages from the topics with different frequency of arriving in your case?

Related

How to avoid OutOfMemoryError using Kotlin Coroutines

I have a Ktor application with two Kafka consumers running in parallel. To achieve parallel execution I'm using coroutines but eventually I'm getting a java.lang.OutOfMemoryError: Java heap space in one of the consumers. I am no expert neither in Kafka nor in Kotlin coroutines so I'm not sure which of those could be causing the issue. I would like to discard first the coroutine implementation, because the Kafka consumer implementation is way more complex and is working fine in other applications without coroutines, the code looks like this:
private val parentJob = Job()
private val embeddedServer = embeddedServer(Netty, config.port) {
//setting up ktor, routing, etc...
startKafkaService(kafkaService1, "FirstConsumer", log, parentJob)
startKafkaService(kafkaService2, "SecondConsumer", log, parentJob)
}
fun <T> CoroutineScope.startKafkaService(
kafkaService: KafkaService<T, TbotUser>,
serviceName: String,
logger: Logger,
parentJob: CompletableJob
) {
val handler = CoroutineExceptionHandler { context, exception ->
val jobName = context[CoroutineName.Key]?.name ?: Thread.currentThread().name
logger.error("Exception caught in $jobName:\n${exception.stackTraceToString()}")
}
launch(parentJob + handler + CoroutineName(serviceName)) {
while (isActive) kafkaService.startConsuming()
kafkaService.close()
}
}
fun main(args: Array<String>) {
val logger = JsonLogger("...")
embeddedServer.start(true)
Runtime.getRuntime().addShutdownHook(object : Thread() {
override fun run() = runBlocking {
parentJob.cancelAndJoin()
}
})
}
and this is the error log I'm getting:
09:24:07.165 [kafka-coordinator-heartbeat-thread] ERROR org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=clientid, groupId=groupid] Heartbeat thread failed due to unexpected error
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3537)
at java.base/java.lang.String.encodeUTF8(String.java:1265)
at java.base/java.lang.String.encode(String.java:825)
at java.base/java.lang.String.getBytes(String.java:1783)
at org.apache.kafka.common.message.HeartbeatRequestData.addSize(HeartbeatRequestData.java:239)
at org.apache.kafka.common.protocol.SendBuilder.buildSend(SendBuilder.java:218)
at org.apache.kafka.common.protocol.SendBuilder.buildRequestSend(SendBuilder.java:187)
at org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:101)
at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:524)
at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:500)
at org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:460)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:499)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:255)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:306)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:1386)
09:24:07.407 [eventLoopGroupProxy-4-1] ERROR ktor.application - Unhandled: GET - /_/health
java.lang.OutOfMemoryError: Java heap space
09:24:08.400 [DefaultDispatcher-worker-2] INFO ktor.application - Responding at http://0.0.0.0:8080
09:24:09.151 [DefaultDispatcher-worker-1] ERROR App - Exception caught in SecondConsumer:
java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:1468)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3537)
at java.base/java.lang.String.encodeUTF8(String.java:1265)
at java.base/java.lang.String.encode(String.java:825)
at java.base/java.lang.String.getBytes(String.java:1783)
at org.apache.kafka.common.message.HeartbeatRequestData.addSize(HeartbeatRequestData.java:239)
at org.apache.kafka.common.protocol.SendBuilder.buildSend(SendBuilder.java:218)
at org.apache.kafka.common.protocol.SendBuilder.buildRequestSend(SendBuilder.java:187)
at org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:101)
at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:524)
at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:500)
at org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:460)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:499)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:255)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:306)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:1386)

How to Set Spring Kafka consumer max attempts when using Schema Registry

I am developing Spring boot server with Spring kafka(1.3.2.RELEASE), apache avro(1.8.2) and io.confluent's Schema Registry(3.1.2). So evenytime the kafka listener gets a kafka message, it will find the schema id in message and get the avro schema from the registry server by id. The problem is, if the scheme registry config server is down, my listener will keep trying to send http request to the registry server to get the avro schema when it get a message(also prints large amount of error log), and it will block all the next kafka message since the offset won't move on.
16:56:41.541 ERROR KafkaMessageListenerContainer$ListenerConsumer - - org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1 - Container exception
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition trade-0 at offset 810845
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 21
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1546)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:153)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:187)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:323)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:316)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:63)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndID(CachedSchemaRegistryClient.java:118)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:121)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:92)
at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:54)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:65)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:55)
at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:918)
at org.apache.kafka.clients.consumer.internals.Fetcher.access$2600(Fetcher.java:93)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1095)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1200(Fetcher.java:944)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:567)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:528)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1086)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:614)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
I have tried to use RetryTemplate to set the max attempts but it didn't work, It seems that the RetryTemplate may only works in my listener method. Also I didn't find any helpful config in the io confluent's website.
Now I replace the KafkaAvroDeserializer by using a CustomAvroDeserializer, which extends the KafkaAvroDeserializer and override its deserialize method with adding a try-catch to its content, like this:
#Log4j
public class CustomAvroDeserializer extends KafkaAvroDeserializer {
#Override
public Object deserialize(String s, byte[] bytes) {
try {
return this.deserialize(bytes);
} catch (Exception e) {
log.error("encounter a problem when deserializer message with schema registry:{}", e);
return null;
}
}
}

JBoss 6 AS with AOP throws StackOverflowError

I am using JBoss 6 AS and trying to add via AOP an interceptor to the classes in some package from a deployed application. This is the scenario:
I have an app.jar that contains the classes to which I want to add advices. This JAR also has some EJBs (ejb-jar.xml, jboss.xml).
I created my on JBoss interceptor like this :
package util;
import org.jboss.aop.joinpoint.Invocation;
import org.jboss.aop.joinpoint.MethodInvocation;
public class MyInterceptor implements org.jboss.aop.advice.Interceptor {
#Override
public Object invoke(Invocation invocation) throws Throwable {
long startTime = System.currentTimeMillis();
try {
return invocation.invokeNext();
} finally {
long endTime = System.currentTimeMillis() - startTime;
System.out.println("MyInterceptor : " + endTime);
if (invocation instanceof MethodInvocation) {
MethodInvocation mi = (MethodInvocation) invocation;
String clazz = "";
String method = "";
try {
clazz = mi.getTargetObject().getClass().toString();
method = mi.getMethod().getName();
} catch (Throwable e) {
System.out.println("Error when trying to get target info");
}
System.out.println("MyInterceptor : " + endTime);
}
}
}
#Override
public String getName() {
return "MyInterceptor";
}
}
I created a jboss-aop.xml file that contains:
<?xml version="1.0" encoding="UTF-8"?>
<aop xmlns="urn:jboss:aop-beans:1.0">
<interceptor name="MyInterceptor" class="util.MyInterceptor"/>
<bind pointcut="execution(* my.app.*->*(..))">
<interceptor-ref name="MyInterceptor"/>
</bind>
</aop>
I have set enableLoadTimeWeaving (bootstrap/aop.xml)
I have pluggable-instrumentor.jar in the right place (JBOSS/bin)
I have started the server with the option -javaagent:pluggable-instrumentor.jar
I have created a JAR file interceptor.jar where I put MyInterceptor.class and in its META-INF I've placed the jboss-aop.xml file
Now, that being said, the problem is that when I run my application and some method from any class from my.app package is being called the interceptor seems to intercept the call but it throws a nasty StackOverflowError. This is a part of my error stack:
java.lang.StackOverflowError
at org.jboss.resteasy.core.SynchronousDispatcher.unwrapException(SynchronousDispatcher.java:345) [:6.1.0.Final]
at org.jboss.resteasy.core.SynchronousDispatcher.handleApplicationException(SynchronousDispatcher.java:321) [:6.1.0.Final]
at org.jboss.resteasy.core.SynchronousDispatcher.handleException(SynchronousDispatcher.java:214) [:6.1.0.Final]
at org.jboss.resteasy.core.SynchronousDispatcher.handleInvokerException(SynchronousDispatcher.java:190) [:6.1.0.Final]
at org.jboss.resteasy.core.SynchronousDispatcher.getResponse(SynchronousDispatcher.java:534) [:6.1.0.Final]
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:496) [:6.1.0.Final]
at org.jboss.resteasy.core.SynchronousDispatcher.invokePropagateNotFound(SynchronousDispatcher.java:155) [:6.1.0.Final]
at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:212) [:6.1.0.Final]
at org.jboss.resteasy.plugins.server.servlet.FilterDispatcher.doFilter(FilterDispatcher.java:59) [:6.1.0.Final]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:274) [:6.1.0.Final]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:242) [:6.1.0.Final]
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275) [:6.1.0.Final]
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161) [:6.1.0.Final]
at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:181) [:6.1.0.Final]
at org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.event(CatalinaContext.java:285) [:1.1.0.Final]
at org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.invoke(CatalinaContext.java:261) [:1.1.0.Final]
at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:88) [:6.1.0.Final]
at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:100) [:6.1.0.Final]
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:159) [:6.1.0.Final]
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) [:6.1.0.Final]
at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158) [:6.1.0.Final]
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) [:6.1.0.Final]
at org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.invoke(ActiveRequestResponseCacheValve.java:53) [:6.1.0.Final]
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:362) [:6.1.0.Final]
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877) [:6.1.0.Final]
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:654) [:6.1.0.Final]
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:951) [:6.1.0.Final]
at java.lang.Thread.run(Thread.java:745) [:1.7.0_79]
Caused by: java.lang.StackOverflowError
at my.app.JoinPoint_invoke_N_5164114663869737738_3.invokeJoinpoint(JoinPoint_invoke_N_5164114663869737738_3.java) [:]
at my.app.MyInterceptor$MyInterceptorAdvisor.invoke_N_5164114663869737738(MyInterceptor$MyInterceptorAdvisor.java) [:]
at my.app.MyInterceptor.invoke(MyInterceptor.java) [:]
at my.app.JoinPoint_invoke_N_5164114663869737738_3.invokeNext(JoinPoint_invoke_N_5164114663869737738_3.java) [:]
at my.app.JoinPoint_invoke_N_5164114663869737738_3.invokeJoinpoint(JoinPoint_invoke_N_5164114663869737738_3.java) [:]
at my.app.MyInterceptor$MyInterceptorAdvisor.invoke_N_5164114663869737738(MyInterceptor$MyInterceptorAdvisor.java) [:]
at my.app.MyInterceptor.invoke(MyInterceptor.java) [:]
Basically what happens is that these four lines are being thrown until StackOverflowError gets raised:
at my.app.JoinPoint_invoke_N_5164114663869737738_3.invokeNext(JoinPoint_invoke_N_5164114663869737738_3.java) [:]
at my.app.JoinPoint_invoke_N_5164114663869737738_3.invokeJoinpoint(JoinPoint_invoke_N_5164114663869737738_3.java) [:]
at my.app.MyInterceptor$MyInterceptorAdvisor.invoke_N_5164114663869737738(MyInterceptor$MyInterceptorAdvisor.java) [:]
at my.app.MyInterceptor.invoke(MyInterceptor.java) [:]
If anyone had some similar problem any help would be appeciated!
Well i found the problem ... i put the interceptor on the same package my.app and not in util ... so eventually it was calling itself endlessly until the stack was full . So ... my bad

Getting NullPointerrException while using RollingSink

I am using windows* plateform. I am reading messages from kafka and want to store in files using RollingSink. I am getting messages but when I add rolling sink to DataStream it throws Null pointer exception. Below are the Code and the stack Trace.
It is creating the folder structure but no data int it.
i.e
Folder 2016-07-13--2031 and three files in this folder
._part-0-0.in-progress.crc, _part-0-0.in-progress, _part-0-0.pending
StreamExecutionEnvironment sev = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> kafkaStream = sev.addSource(new FlinkKafkaConsumer08<String>("test", new SimpleStringSchema(), properties));
String basePath = "C:\\project\\IOT\\testData\\SinkData";
RollingSink<String> rollingSink = new RollingSink<String>(basePath);
kafkaStream.addSink(rollingSink);
07/14/2016 00:48:48 Job execution switched to status FAILED.
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:717)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:663)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:663)
at `enter code here`scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:559)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:534)
at org.apache.hadoop.fs.LocatedFileStatus.<init>(LocatedFileStatus.java:42)
at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1698)
at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1680)
at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:1733)
at org.apache.flink.streaming.connectors.fs.RollingSink.open(RollingSink.java:339)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:38)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:91)
at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:317)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:215)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)

NPE during concurrent thread access of a single tess4j instance

I am working with Tesseract 3.0.2 and using 1.4.1 tess4j..this is not working in a thread-safe manner, I get a NPE. I am using Grizzly/Jesery/Spring.
#Service("textExtractorService")
public class TextExtractorServiceImpl implements TextExtractorService {
Logger LOGGER = Logger.getLogger(TextExtractorServiceImpl.class);
private final Tesseract instance = Tesseract.getInstance(); // JNA Interface
...
..
}
...
...
public ExtractedInfo extract(BufferedImage bufferedImage)
throws IOException {
ExtractedInfo extractedInfo = new ExtractedInfo();
try {
BufferedImage preProcessed = preProcess(bufferedImage);
String result = null;
//the below gives me the NPE, when multiple threads calls this method.
result = instance.doOCR(preProcessed);
String[] r = StringUtils.split(result, "\n");
extractedInfo.setRawText(r);
} catch (TesseractException e) {
throw new IOException(e);
}
return extractedInfo;
}
...
...
Full stack Trace:
SEVERE: service exception:
javax.servlet.ServletException: java.lang.Error: Invalid memory access
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:420)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at com.sun.grizzly.http.servlet.ServletAdapter$FilterChainImpl.doFilter(ServletAdapter.java:1059)
at com.sun.grizzly.http.servlet.ServletAdapter$FilterChainImpl.invokeFilterChain(ServletAdapter.java:999)
at com.sun.grizzly.http.servlet.ServletAdapter.doService(ServletAdapter.java:434)
at com.sun.grizzly.http.servlet.ServletAdapter.service(ServletAdapter.java:379)
at com.sun.grizzly.tcp.http11.GrizzlyAdapter.service(GrizzlyAdapter.java:179)
at com.sun.grizzly.tcp.http11.GrizzlyAdapterChain.service(GrizzlyAdapterChain.java:196)
at com.sun.grizzly.tcp.http11.GrizzlyAdapter.service(GrizzlyAdapter.java:179)
at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:850)
at com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:747)
at com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1032)
at com.sun.grizzly.http.DefaultProtocolFilter.execute(DefaultProtocolFilter.java:231)
at com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:137)
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:104)
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:90)
at com.sun.grizzly.http.HttpProtocolChain.execute(HttpProtocolChain.java:79)
at com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:54)
at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:59)
at com.sun.grizzly.ContextTask.run(ContextTask.java:71)
at com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:532)
at com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:513)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Error: Invalid memory access
at com.sun.jna.Native.invokeVoid(Native Method)
at com.sun.jna.Function.invoke(Function.java:367)
at com.sun.jna.Function.invoke(Function.java:315)
at com.sun.jna.Library$Handler.invoke(Library.java:212)
at com.sun.proxy.$Proxy55.TessBaseAPIDelete(Unknown Source)
at net.sourceforge.tess4j.Tesseract.dispose(Tesseract.java:346)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:242)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:200)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:184)
at com.vanitysoft.thirdeye.service.impl.TextExtractorServiceImpl.extract(TextExtractorServiceImpl.java:69)
at com.vanitysoft.thirdeye.web.TextExtractorResource.extract(TextExtractorResource.java:49)
I'm not sure if this is the exact same issue, but I found this answer on a similar question.
https://stackoverflow.com/a/24806132/2596497
In short, it appears that the underlying engine in Tesseract does not support multi-threading.