I want to disable auto commit for kafka SimpleConsumer. I am using 0.8.1 version.For High level consumer, config options can be set and passed via consumerConfig as follows
kafka.consumer.Consumer.createJavaConsumerConnector(this.consumerConfig);
How can I achieve the same for SimpleConsumer? I mainly want to disable auto commit. I tried setting auto commit to false in consumer.properties and restarted kafka server, zookeeper and producer. But, that does not work. I think I need to apply this setting through code, not in consumer.properties.
Can anyone help here?
Here is how my code looks like
List<TopicAndPartition> topicAndPartitionList = new ArrayList<>();
topicAndPartitionList.add(topicAndPartition);
OffsetFetchResponse offsetFetchResponse = consumer.fetchOffsets(new OffsetFetchRequest("testGroup", topicAndPartitionList, (short) 0, correlationId, clientName));
Map<TopicAndPartition, OffsetMetadataAndError> offsets = offsetFetchResponse.offsets();
FetchRequest req = new FetchRequestBuilder()
.clientId(clientName)
.addFetch(a_topic, a_partition, offsets.get(topicAndPartition).offset(), 100000) .build();
long readOffset = offsets.get(topicAndPartition).offset();
FetchResponse fetchResponse = consumer.fetch(req);
//Consume messages from fetchResponse
Map<TopicAndPartition, OffsetMetadataAndError > requestInfo = new HashMap<> ();
requestInfo.put(topicAndPartition, new OffsetMetadataAndError(readOffset, "metadata", (short)0));
OffsetCommitResponse offsetCommitResponse = consumer.commitOffsets(new OffsetCommitRequest("testGroup", requestInfo, (short)0, correlationId, clientName));
If above code crashes before committing offset, I still get latest offset as result of offsets.get(topicAndPartition).offset() in next run which makes me to think that auto commit of offset happens as code is executed.
Using SimpleConsumer just means you want to take care of everything about the message consuming including offset commits, so no auto commit is supported for low-level APIs.
Related
I am using Apache Flink to read data from kafka topic and to store it in files on server. I am using FileSink to store files, it creates the directory structure date and time wise but no logs files are getting created.
When i run the program it creates directory structure as below but log files are not getting stored here.
/flink/testlogs/2021-12-08--07
/flink/testlogs/2021-12-08--06
I want the log files should be written every 15 mins to a new log file.
Below is the code.
DataStream <String> kafkaTopicData = env.addSource(new FlinkKafkaConsumer<String>("MyTopic",new SimpleStringSchema(),p));
OutputFileConfig config = OutputFileConfig
.builder()
.withPartPrefix("prefix")
.withPartSuffix(".ext")
.build();
DataStream <Tuple6 < String,String,String ,String, String ,Integer >> newStream=kafkaTopicData.map(new LogParser());
final FileSink<Tuple6<String, String, String, String, String, Integer>> sink = FileSink.forRowFormat(new Path("/flink/testlogs"),
new SimpleStringEncoder < Tuple6 < String,String,String ,String, String ,Integer >> ("UTF-8"))
.withRollingPolicy(DefaultRollingPolicy.builder()
.withRolloverInterval(TimeUnit.MINUTES.toMillis(15))
.withInactivityInterval(TimeUnit.MINUTES.toMillis(5))
.withMaxPartSize(1024 * 1024 * 1024)
.build())
.withOutputFileConfig(config)
.build();
newStream.sinkTo(sink);
env.execute("DataReader");
LogParser returns Tuple6.
When used in streaming mode, Flink's FileSink requires that checkpointing be enabled. To do this, you need to specify where you want checkpoints to be stored, and at what interval you want them to occur.
To configure this in flink-conf.yaml, you would do something like this:
state.checkpoints.dir: s3://checkpoint-bucket
execution.checkpointing.interval: 10s
Or in your application code you can do this:
env.getCheckpointConfig().setCheckpointStorage("s3://checkpoint-bucket");
env.enableCheckpointing(10000L);
Another important detail from the docs:
Given that Flink sinks and UDFs in general do not differentiate between normal job termination (e.g. finite input stream) and termination due to failure, upon normal termination of a job, the last in-progress files will not be transitioned to the “finished” state.
I'm obviously a beginner with kafka/kafka streams. I just need to read given messages from a few topics, given their id. While our actual topology is fairly complex, this Stream app just needs to achieve this single simple goal
This is how a store is created :
final StreamsBuilder streamsBuilder = new StreamsBuilder();
streamsBuilder.table(
topic,
Materialized.<String, String>as( persistentKeyValueStore(storeNameOf(topic)))
.withKeySerde(Serdes.String()).withValueSerde(Serdes.String())
.withCachingDisabled());
// Materialized.<String, String>as( inMemoryKeyValueStore(storeNameOf(topic)))
// .withKeySerde(Serdes.String()).withValueSerde(Serdes.String())
// .withCachingDisabled());
);
KafkaStreams kafkaStreams = new KafkaStreams(streamsBuilder.build(), new Properties() {{ /** config items go here**/ }})
kafkaStreams.start();
//logic for awaiting kafkaStreams to reach `RUNNING` state as well as InvalidStateStoreException handling (by retrying) is ommited for simplicity :
ReadOnlyKeyValueStore<String, String> replyStore = kafkaStreams.store(storeNameOf(topicName), QueryableStoreTypes.keyValueStore());
So, when using the commented inMemoryKeyValueStore materialization replyStore is sucessfully created and I can query the values within without a problem
With persistentKeyValueStore the last line fails with java.lang.IllegalStateException: KafkaStreams is not running. State is ERROR. Note that I do check that KafkaStreams is in state RUNNING before the store call; the ERROR state is reached somehow within the call rather.
Do you think i might have missed anything when setting up the persistent store? Debugging hints would also greatly help, i'm quite stuck here I must confess
Thanks !
Edit : The execution happens under a docker container. This was quite relevant but I ommited to add initialy
As Matthias J. Sax pointed out in comment form, to debug the problem the uncaughtExceptionHandler registration helped greatly .
The actual issue was due to an incompatibility between RocksDB and the docker image I was using (so changed from openjdk:8-jdk-alpine to anapsix/alpine-java:8 )
Related :
https://issues.apache.org/jira/browse/KAFKA-4988
UnsatisfiedLinkError: /tmp/snappy-1.1.4-libsnappyjava.so Error loading shared library ld-linux-x86-64.so.2: No such file or directory
We are using Alpakka Kafka streams for consuming events from Kafka. Here is how the stream is defined as:
ConsumerSettings<GenericKafkaKey, GenericKafkaMessage> consumerSettings =
ConsumerSettings
.create(actorSystem, new KafkaJacksonSerializer<>(GenericKafkaKey.class),
new KafkaJacksonSerializer<>(GenericKafkaMessage.class))
.withBootstrapServers(servers).withGroupId(groupId)
.withClientId(clientId).withProperties(clientConfigs.defaultConsumerConfig());
CommitterSettings committerSettings = CommitterSettings.create(actorSystem)
.withMaxBatch(20)
.withMaxInterval(Duration.ofSeconds(30));
Consumer.DrainingControl<Done> control =
Consumer.committableSource(consumerSettings, Subscriptions.topics(topics))
.mapAsync(props.getMessageParallelism(), msg ->
CompletableFuture.supplyAsync(() -> consumeMessage(msg), actorSystem.dispatcher())
.thenCompose(param -> CompletableFuture.supplyAsync(() -> msg.committableOffset())))
.toMat(Committer.sink(committerSettings), Keep.both())
.mapMaterializedValue(Consumer::createDrainingControl)
.run(materializer);
Here is the piece of code that is shutting down the stream:
CompletionStage<Done> completionStage = control.drainAndShutdown(actorSystem.dispatcher());
completionStage.toCompletableFuture().join();
I tried doing a get too on the completable future. But neither join nor get on future are returning. Have anyone else too faced similar problem? Is there something that I am doing wrong here?
If you want to control stream termination from outside the stream, you need to use a KillSwitch : https://doc.akka.io/docs/akka/current/stream/stream-dynamic.html
Your usage looks correct and I can't identify anything that would hinder draining.
A common thing to miss with Alpakka Kafka consumers is the stop-timeout which defaults to 30 seconds.
When using the DrainingControl you can safely set it to 0 seconds.
See https://doc.akka.io/docs/alpakka-kafka/current/consumer.html#draining-control
I have an Apache Storm topology and would like to perform a certain action every once in a while. I'm not sure how to approach this in a way which would be natural and elegant.
Should it be a Bolt or a Spout using ScheduledExecutorService, or something else?
Tick tuples are a decent option https://kitmenke.com/blog/2014/08/04/tick-tuples-within-storm/
Edit: Here's the essential code for your bolt
#Override
public Map<String, Object> getComponentConfiguration() {
// configure how often a tick tuple will be sent to our bolt
Config conf = new Config();
conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 300);
return conf;
}
Then you can use TupleUtils.isTick(tuple) in execute to check whether the received tuple is a tick tuple.
I don't know if this is a correct approach, but it seems to be working fine:
At the end of the prepare method of a Bolt, I added a call to intiScheduler(), which contains the following code:
Calendar calendar = Calendar.getInstance();
ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor();
scheduler.scheduleAtFixedRate(new PeriodicAction() [class implementing Runnable], millisToFullHour(calendar) [wanna start at the top of the hour], 60*60*1000 [run every hour], TimeUnit.MILLISECONDS);
This needs to be used with caution though, because the bolt can have multiple instances depending on your setup.
I'm trying to create a kafka producer that sends messages to kafka brokers (and not to zoo keeper).
I know that the better practice is working with zk, but for the moment I would like to send messages directly to a broker.
To do that, I'm setting the property "broker.list" as described in the documentation. The thing is that it appears that in order for it to work it requires minimum of 3 brokers (else I get an exception).
In the source code of kafka I can see:
if(brokerInfo.size < 3) throw new InvalidConfigException("broker.list has invalid value")
This is weird cause in my data center I hold only 2 kafka nodes (and 3 zk), what can I do in this case?
Is there a way go around this?
The brokerInfo is obtained by splitting the individual broker info and NOT the number of brokers .. if you checked the source code more carefully you would see some thing like
// check if each individual broker info is valid => (brokerId: brokerHost: brokerPort)
and then they split this info as below
brokerInfoList.foreach { bInfo =>
val brokerInfo = bInfo.split(":")
if(brokerInfo.size < 3) throw new InvalidConfigException("broker.list has invalid value")
}
so every single broker expected to have an id with host name and port separated by the : delimiter
basically regarding the number of broker it just do this
val brokerInfoList = config.brokerList.split(",")
if(brokerInfoList.size == 0) throw new InvalidConfigException("broker.list is empty")
So you should be fine with that I guess, just try to pass a single broker and it should work. Let us know how it goes
Apparently when writing
props.put("broker.list", "0:" + <host:port>);
It works (I added the "0:" to the original string).
I have found it in section 9 of the quick start guide.
I'm not sure I'm getting it, maybe this zero is the partition number(?) maybe something else (could be nice if someone can shed some light here).