how to use withDynamicRead with KafkaIO in Apache BEAM - apache-beam

I'm using read with KafkaIO in Apache Beam and i'm trying to call withDynamicRead. I also have a basic call of withCheckStopReadingFn:
.withCheckStopReadingFn(new SerializableFunction<TopicPartition, Boolean>() {
#Override
public Boolean apply(TopicPartition input) {
return false;
}
})
I'm getting this error and i can't make sense of it. Anyone know how to call DyanamicRead properly? I'm using Apache Beam version 2.29
Exception in thread "main" org.apache.beam.sdk.transforms.display.DisplayData$InternalBuilder$PopulateDisplayDataException: Error while populating display data for component 'org.apache.beam.sdk.transforms.ParDo$MultiOutput': null

This looks like a bug in Beam. Information on contacting the Beam community -devs and other users, and filing bug reports - are at https://beam.apache.org/community/contact-us/

Related

Is it possible to create a batch flink job in streaming flink job?

I have a job streaming using Apache Flink (flink version: 1.8.1) using scala. there are flow job requirements as follows:
Kafka -> Write to Hbase -> Send to kafka again with a different topic
During the writing process to Hbase, there was a need to retrieve data from another table. To ensure that the data is not empty (NULL), the job must check repeatedly (within a certain time) if the data is empty.
is this possible with Flink? If yes, can you help provide examples for conditions similar to my needs?
Edit :
I mean, with the problem that I described in the content, I thought about having to create some kind of job batch in the job streaming, but I couldn't find the right example for my case. So, is it possible to create a batch flink job in streaming flink job? If yes, can you help provide examples for conditions similar to my needs?
With more recent versions of Flink you can do lookup queries (with a configurable cache) against HBase from the SQL/Table APIs. Your use case sounds like it might be easily implemented in this fashion. See the docs for more info.
Just to clarify my comment I will post a sketch of what I was trying to suggest based on The Broadcast State Pattern. The link provides an example in Java, so I will follow it. In case you want in Scala it should not be too much different. You will likely have to implement the below code as it is explained on the link that I mentioned:
DataStream<String> output = colorPartitionedStream
.connect(ruleBroadcastStream)
.process(
// type arguments in our KeyedBroadcastProcessFunction represent:
// 1. the key of the keyed stream
// 2. the type of elements in the non-broadcast side
// 3. the type of elements in the broadcast side
// 4. the type of the result, here a string
new KeyedBroadcastProcessFunction<Color, Item, Rule, String>() {
// my matching logic
}
);
I was suggesting that you can collect the stream ruleBroadcastStream in fixed intervals from the database or whatever is your store. Instead of getting:
// broadcast the rules and create the broadcast state
BroadcastStream<Rule> ruleBroadcastStream = ruleStream
.broadcast(ruleStateDescriptor);
like the web page says. You will need to add a source where you can schedule it to run every X minutes.
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
BroadcastStream<Rule> ruleBroadcastStream = env
.addSource(new YourStreamSource())
.broadcast(ruleStateDescriptor);
public class YourStreamSource extends RichSourceFunction<YourType> {
private volatile boolean running = true;
#Override
public void run(SourceContext<YourType> ctx) throws Exception {
while (running) {
// TODO: yourData = FETCH DATA;
ctx.collect(yourData);
Thread.sleep("sleep for X minutes");
}
}
#Override
public void cancel() {
this.running = false;
}
}

How to send message by message to Kafka

I'm new to reactive programming and I try to implement a very basic scenario.
I want to send a message to kafka each time a file is dropped to a specific folder.
I think that I don't understand well the basics things... so please could you help me?
So I have a few questions :
What is the difference between smallrye-reactive-messaging and smallrye-reactive-streams-operators ?
I have this simple code :
#Outgoing( "my-topic" )
public PublisherBuilder<Message<MessageWrapper>> generate() {
if(Objects.isNull(currentMessage)){
//currentMessage is an instance variable which is null when I start the application
return ReactiveStreams.of(new MessageWrapper()).map(Message::of);
}
else {
//currentMessage has been correctly set with the file information
LOGGER.info(currentMessage);
return ReactiveStreams.of(currentMessage).map(Message::of);
}
}
When the code goes in the if statement, everything is ok and I got a JSON serialization of my object will null values. However I don't understand why when my code goes to the else statement, nothing goes to the topic? It seems that the .of instructions of the if statement has broke the streams or something like that...
How to keep a continuous streams that 'react' to the new dropped files ? (or other events like HTTP GET request or something like that) ...
If I don't return an instance of PublisherBuilder but an Integer for example, then my kafka topic will be populated by a very huge stream of Integer value. This is why examples are using some intervals when sending messages...
Should I use some CompletationStage or CompletableFuture ? RxJAva2? It's a bit confusing which lib to use (vertx, smallrye, rxjava2, microprofile, ...)
What are the differences between :
ReactiveStreams.fromCompletionStage
ReactiveStreams.fromProcessor
ReactiveStreams.fromPublisher
ReactiveStreams.fromSubscriber
Which one to use on which scenario ?
Thank you very much !
Let's start with the difference between smallrye-reactive-messaging & smallrye-reactive-streams-operators: smallrye-reactive-streams-operators is the same as smallrye-reactive-messaging but in addition it has a support to MicroProfile-context-propagation. Since most reactive-messaging providers use Vert.x behind the scene, it will process your message in an event-loop style, which means it will run in separate thread. Sometimes you need to propagate some ctx from your base thread into the new thread (ex: populating CDI and Tx context to execute some JPA Entity manager logic). Here where ctx propagation help.
For method signatures. You can take a look at the official documentation of SmallRye-reactive-streams sections 3,4 & 5. Each one has a different use case. It is up to you which flavor do you want to use.
When to use what ? If you are not running within reactive context, you can use the below to send messages.
#Inject
#Channel("my-channel")
Emitter emitter;
For Message consumption you can use method signature like this :
#Incoming("channel-2")
public CompletionStage doSomething(Message anEvent)
Or
#Incoming("channel-2")
public void doSomething(String anEvent)
Hope that helps.

How to set the offset.commit.policy to AlwaysCommitOffsetPolicy in debezium?

I created a Debezium Embedded engine to capture MySQL change data. I want to commit the offsets as soon as I can. In the code, the config is created including follows.
.with("offset.commit.policy",OffsetCommitPolicy.AlwaysCommitOffsetPolicy.class.getName())
Running this returns, java.lang.NoSuchMethodException: io.debezium.embedded.spi.OffsetCommitPolicy$AlwaysCommitOffsetPolicy.<init>(io.debezium.config.Configuration)
However, When I start the embedded engine with,
.with("offset.commit.policy",OffsetCommitPolicy.PeriodicCommitOffsetPolicy.class.getName()), the embedded engine works fine.
Note that the class OffsetCommitPolicy.PeriodicCommitOffsetPolicy constructor includes the config parameter while OffsetCommitPolicy.AlwaysCommitOffsetPolicy doesn't.
public PeriodicCommitOffsetPolicy(Configuration config) {
...
}
How to get the debezium embedded engine to use its AlwaysCommitOffsetPolicy?
Thanks for the report. This is partly bug (which we would appreciate if you could log into our Jira). You can solve this issue by calling a dedicated method embedded engine builder like `io.debezium.embedded.EmbeddedEngine.create().with(OffsetCommitPolicy.always())'
Tested with version 1.4.0Final:
new EmbeddedEngine.BuilderImpl() // create builder
.using(config) // regular config
.using(OffsetCommitPolicy.always()) // explicit commit policy
.notifying(this::handleEvent) // even procesor
.build(); // and finally build!

spring batch 3.0.3 get "updates to tables using non-transactional storage engines such as MyISAM " with mysql 5.6

I use spring batch 3.0.3.RELEASE in grails 2.4.4
I found exception when i execute the code below.
"When ##GLOBAL.ENFORCE_GTID_CONSISTENCY = 1, updates to non-transactional tables can only be done in either autocommitted statements or single-statement transactions, and never in the same statement as updates to transactional tables."
the code is
List<Flow> flowList = Lists.newArrayList()
Shop.findAllByCityIdAndTypeAndStatus(cityId, 1 as byte, 1 as byte).each {
Shop stationShop ->
TaskletStep taskletStep = stepBuilderFactory.get("copy_city_item_to_station").tasklet(new Tasklet() {
#Override
RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
copyCityItemToStationItem(item, stationShop)
return RepeatStatus.FINISHED
}
}).build()
Flow flow = new FlowBuilder<Flow>("subflow").from(taskletStep).end();
flowList.add(flow)
}
Flow splitFlow = new FlowBuilder<Flow>("split_city_item_to_station").split(eventTaskExecutor).add(flowList.toArray(new Flow[0])).build();
FlowJobBuilder builder = jobBuilderFactory.get("push_item_to_all_station").start(splitFlow).end();
Job job = builder.preventRestart().build()
jobLauncher.run(job, new JobParametersBuilder().addLong("city.item.id", item.id).toJobParameters())
the google say the problom maybe exist in "https://dev.mysql.com/doc/refman/5.6/en/replication-gtids-restrictions.html", so i replace all the ENGINE form MyISAM to InnoDB in file 'schema-mysql.sql', and it works.
now i want to know what i do is right or not, is there a potential bug in my way ?
What you did is correct. That's a bug in Spring Batch's generated SQL file for MySql. I've created an issue in Jira that you can follow here: https://jira.spring.io/browse/BATCH-2373.

GWT Product Mode AssertionException Catch ? HOW?

in product mode in GWT the assertion is not available which is good, but because of a GXT error I get an assertion error and because the necessary classes are not available all I get is a com.google.gwt.core.client.JavaScriptException error in the browser and it's not enough to properly debug it. The reason why I need this is because in my custom framework I created a class that's responsible for error handling
GWT.setUncaughtExceptionHandler(new UncaughtExceptionHandler() {
#Override
public void onUncaughtException(Throwable e) {
addError(e);
}
});
public void addError(Throwable ex)
{
if(!ex.getClass().equals(AssertionError.class))//(ex instanceof AssertionError))
{
this.addError(ex, true);
}
}
as you can see I've tried to capture the error but I'm unable to in production mode. I somehow need to be able to sepcify the exception so I can filter it. All errors get into the logs and I don't want these errors to appear there
GXT Error => http://www.sencha.com/forum/showthread.php?171409-RowEditor-AssertionError&p=709005
thanks help
By Egg
You need to enable assertions in your compiled code, and this is done very similarly to how you would do this in a standard jvm, using the -ea flag. Instead of passing this to the jvm though, it needs to be passed to the Compiler class (or put in the program args if running from eclipse, or some other tool).
See http://code.google.com/webtoolkit/doc/latest/DevGuideCompilingAndDebugging.html#DevGuideCompilerOptions for the list of all args you can pass to the compiler