Apache Flink - how to send and consume POJOs using AWS Kinesis - pojo

I want to consume POJOs arriving from Kinesis with Flink.
Is there any standard for how to correctly send and deserialize the messages?
Thanks

I resolved it with:
DataStream<SamplePojo> kinesis = see.addSource(new FlinkKinesisConsumer<>(
"my-stream",
new POJODeserializationSchema(),
kinesisConsumerConfig));
and
public class POJODeserializationSchema extends AbstractDeserializationSchema<SamplePojo> {
private ObjectMapper mapper;
#Override
public SamplePojo deserialize(byte[] message) throws IOException {
if (mapper == null) {
mapper = new ObjectMapper();
}
SamplePojo retVal = mapper.readValue(message, SamplePojo.class);
return retVal;
}
#Override
public boolean isEndOfStream(SamplePojo nextElement) {
return false;
}
}

Related

Spring framework integration TCP IP - Client application SSL not working and posting incomplete requests

I am new to Spring framework. We have a requirement where our application is acting as a client and needs to integrate with another application using TCP. We will be sending them fixed length requests and we will receive response for the same. We have been asked to use the same TCP connection for each request. Using the same open connection, our application will also be receiving heartbeat messages from server application and we do not need to send any response for them.
The request messages that we need to send is header + body where header has message type and length details.
We will be using SSL. When we try to test with SSL, it does not show any exception during getConnection but is not able to receive any heartbeat messages.
When we test without SSL, it is able to send requests and receive response as well as heartbeat messages. But after the first request response, it sends partial request text to server application for subsequent messages which is causing issues and connections are being reset by peer due to unexpected message received at their end.
I have tried many things referring to online documents available but not able to successfully implement the requirement.
Please find below code. Thanks in advance.
public class ClientConfig implements ApplicationEventPublisherAware{
protected String port;
protected String host;
protected String connectionTimeout;
protected String keyStorePath;
protected String trustStorePath;
protected String keyStorePassword;
protected String trustStorePassword;
protected String protocol;
private ApplicationEventPublisher applicationEventPublisher;
#Override
public void setApplicationEventPublisher(ApplicationEventPublisher applicationEventPublisher) {
this.applicationEventPublisher = applicationEventPublisher;
}
#Bean
public DefaultTcpNioSSLConnectionSupport connectionSupport() {
if("SSL".equalsIgnoreCase(getProtocol())) {
DefaultTcpSSLContextSupport sslContextSupport =
new DefaultTcpSSLContextSupport(getKeyStorePath(),
getTrustStorePath(), getKeyStorePassword(), getTrustStorePassword());
sslContextSupport.setProtocol(getProtocol());
DefaultTcpNioSSLConnectionSupport tcpNioConnectionSupport =
new DefaultTcpNioSSLConnectionSupport(sslContextSupport);
return tcpNioConnectionSupport;
}
return null;
}
#Bean
public AbstractClientConnectionFactory clientConnectionFactory() {
if(StringUtils.isNullOrEmptyTrim(getHost()) || StringUtils.isNullOrEmptyTrim(getPort())) {
return null;
}
TcpNioClientConnectionFactory tcpNioClientConnectionFactory =
new TcpNioClientConnectionFactory(getHost(), Integer.valueOf(getPort()));
tcpNioClientConnectionFactory.setApplicationEventPublisher(applicationEventPublisher);
tcpNioClientConnectionFactory.setSoKeepAlive(true);
tcpNioClientConnectionFactory.setDeserializer(new CustomSerializerDeserializer());
tcpNioClientConnectionFactory.setSerializer(new CustomSerializerDeserializer());
tcpNioClientConnectionFactory.setLeaveOpen(true);
tcpNioClientConnectionFactory.setSingleUse(false);
if("SSL".equalsIgnoreCase(getProtocol())) {
tcpNioClientConnectionFactory.setSslHandshakeTimeout(60);
tcpNioClientConnectionFactory.setTcpNioConnectionSupport(connectionSupport());
}
return tcpNioClientConnectionFactory;
}
#Bean
public MessageChannel outboundChannel() {
return new DirectChannel();
}
#Bean
public PollableChannel receiverChannel() {
return new QueueChannel();
}
#Bean
#ServiceActivator(inputChannel = "outboundChannel")
public TcpSendingMessageHandler outboundClient
(AbstractClientConnectionFactory clientConnectionFactory) {
TcpSendingMessageHandler outbound = new TcpSendingMessageHandler();
outbound.setConnectionFactory(clientConnectionFactory);
if(!StringUtils.isNullOrEmpty(getConnectionTimeout())) {
long timeout = Long.valueOf(getConnectionTimeout());
outbound.setRetryInterval(TimeUnit.SECONDS.toMillis(timeout));
}
outbound.setClientMode(true);
return outbound;
}
#Bean
public TcpReceivingChannelAdapter inboundClient(TcpNioClientConnectionFactory connectionFactory) {
TcpReceivingChannelAdapter inbound = new TcpReceivingChannelAdapter();
inbound.setConnectionFactory(connectionFactory);
if(!StringUtils.isNullOrEmpty(getConnectionTimeout())) {
long timeout = Long.valueOf(getConnectionTimeout());
inbound.setRetryInterval(TimeUnit.SECONDS.toMillis(timeout));
}
inbound.setOutputChannel(receiverChannel());
inbound.setClientMode(true);
return inbound;
}
}
public class CustomSerializerDeserializer implements Serializer<String>, Deserializer<String> {
#Override
public String deserialize(InputStream inputStream) throws IOException {
int i = 0;
byte[] lenbuf = new byte[8];
String message = null;
while ((i = inputStream.read(lenbuf)) != -1) {
String messageType = new String(lenbuf);
if(messageType.contains(APP_DATA_LEN)){
byte byteResp[] = new byte[RESP_MSG_LEN-8];
inputStream.read(byteResp, 0, RESP_MSG_LEN-8);
String readMsg = new String(byteResp);
message = messageType + readMsg;
}else {
byte byteResp[] = new byte[HANDSHAKE_LEN-8];
inputStream.read(byteResp, 0, HANDSHAKE_LEN-8);
String readMsg = new String(byteResp);
message = messageType + readMsg;
}
}
return message;
}
#Override
public void serialize(String object, OutputStream outputStream) throws IOException {
outputStream.write(object.getBytes());
outputStream.flush();
}
}
#Override
public String sendMessage(String message) {
Message<String> request = MessageBuilder.withPayload(message).build();
DirectChannel outboundChannel = (DirectChannel) applicationContext.getBean(DirectChannel.class);
outboundChannel.send(request);
}
//Below code is being used to open connection
TcpNioClientConnectionFactory cf = (TcpNioClientConnectionFactory) applicationContext.getBean(AbstractClientConnectionFactory.class);
if(cf != null) {
TcpNioConnection conn = (TcpNioConnection) cf.getConnection();
}

Kafka RecordFilterStrategy does not filter records when using spring-kafka ReplyingKafkaTemplate

Hi I have following configuration for ReplyingKafkaTemplate and i want to filter message before consumer based on correlationID but some reason its not filter can anyone suggest what is wrong with this.
#Bean
public ConcurrentMessageListenerContainer<String, FireflyResponse> replyContainer() {
ConcurrentKafkaListenerContainerFactory<String, FireflyResponse> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
RetryTemplate retryTemplate = new RetryTemplate();
retryTemplate.setRetryPolicy(new SimpleRetryPolicy(retry));
factory.setRetryTemplate(retryTemplate);
factory.setConcurrency(3);
factory.setBatchListener(true);
factory.setAckDiscarded(true);
factory.setRecordFilterStrategy(new RecordFilterStrategy<String, FireflyResponse>() {
#Override
public boolean filter(ConsumerRecord<String, FireflyResponse> consumerRecord) {
return consumerRecord.headers().lastHeader(KafkaHeaders.CORRELATION_ID) == null;
}
});
return factory.createContainer(responseTopic);
}
#Bean
public ReplyingKafkaTemplate<String, FireflyRequest, FireflyResponse> kafkaTemplate(
ConcurrentMessageListenerContainer<String, FireflyResponse> replyContainer) {
ReplyingKafkaTemplate<String, FireflyRequest, FireflyResponse> template = new ReplyingKafkaTemplate<>(
producerFactory(), replyContainer);
template.setDefaultReplyTimeout(Duration.ofSeconds(connectionTimeout));
template.setSharedReplyTopic(true);
return template;
}
The replying template ALWAYS sets the correlation id header...
#Override
public RequestReplyFuture<K, V, R> sendAndReceive(ProducerRecord<K, V> record, #Nullable Duration replyTimeout) {
Assert.state(this.running, "Template has not been start()ed"); // NOSONAR (sync)
CorrelationKey correlationId = this.correlationStrategy.apply(record);
Assert.notNull(correlationId, "the created 'correlationId' cannot be null");
...
It needs it to correlate the reply with a request.
EDIT
It appears you are trying the filter the response; that is not supported; only requests are filtered.
Simply return null from the listener if you don't want to reply.

Apache beam IOException in decoder

I have a simple pipeline that reads from Kafka by KafkaIO reader and transforms next into pipeline. In the end, it writes down to GCP in avro format. So when I run the pipeline in DataFlow it works perfectly but when the runner is DirectRunner it reads all data from topics and throws the exception.
java.lang.IllegalArgumentException: Forbidden IOException when reading from InputStream
at org.apache.beam.sdk.util.CoderUtils.decodeFromSafeStream(CoderUtils.java:118)
at org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:98)
at org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:92)
at org.apache.beam.sdk.util.CoderUtils.clone(CoderUtils.java:141)
at org.apache.beam.runners.direct.CloningBundleFactory$CloningBundle.add(CloningBundleFactory.java:84)
at org.apache.beam.runners.direct.GroupAlsoByWindowEvaluatorFactory$OutputWindowedValueToBundle.outputWindowedValue(GroupAlsoByWindowEvaluatorFactory.java:251)
at org.apache.beam.runners.direct.GroupAlsoByWindowEvaluatorFactory$OutputWindowedValueToBundle.outputWindowedValue(GroupAlsoByWindowEvaluatorFactory.java:237)
at org.apache.beam.repackaged.direct_java.runners.core.ReduceFnRunner.lambda$onTrigger$1(ReduceFnRunner.java:1057)
at org.apache.beam.repackaged.direct_java.runners.core.ReduceFnContextFactory$OnTriggerContextImpl.output(ReduceFnContextFactory.java:438)
at org.apache.beam.repackaged.direct_java.runners.core.SystemReduceFn.onTrigger(SystemReduceFn.java:125)
at org.apache.beam.repackaged.direct_java.runners.core.ReduceFnRunner.onTrigger(ReduceFnRunner.java:1060)
at org.apache.beam.repackaged.direct_java.runners.core.ReduceFnRunner.onTimers(ReduceFnRunner.java:768)
at org.apache.beam.runners.direct.GroupAlsoByWindowEvaluatorFactory$GroupAlsoByWindowEvaluator.processElement(GroupAlsoByWindowEvaluatorFactory.java:185)
at org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:160)
at org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:124)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
at org.apache.beam.sdk.util.VarInt.decodeLong(VarInt.java:73)
at org.apache.beam.sdk.coders.IterableLikeCoder.decode(IterableLikeCoder.java:136)
at org.apache.beam.sdk.coders.IterableLikeCoder.decode(IterableLikeCoder.java:60)
at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82)
at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36)
at org.apache.beam.sdk.util.CoderUtils.decodeFromSafeStream(CoderUtils.java:115)
... 19 more
I use custom serializator and deserializator for reading avro and getting paylod.
Kafka Reader
private PTransform<PBegin, PCollection<KV<String, AvroGenericRecord>>> createKafkaRead(Map<String, Object> configUpdates) {
return KafkaIO.<String, AvroGenericRecord>read()
.withBootstrapServers(bootstrapServers)
.withConsumerConfigUpdates(configUpdates)
.withTopics(inputTopics)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializerAndCoder(BeamKafkaAvroGenericDeserializer.class, AvroGenericCoder.of(serDeConfig()))
.withMaxNumRecords(maxNumRecords)
.commitOffsetsInFinalize()
.withoutMetadata();
}
AvroGenericCoder
public class AvroGenericCoder extends CustomCoder<AvroGenericRecord> {
private final Map<String, Object> config;
private transient BeamKafkaAvroGenericDeserializer deserializer;
private transient BeamKafkaAvroGenericSerializer serializer;
public static AvroGenericCoder of(Map<String, Object> config) {
return new AvroGenericCoder(config);
}
protected AvroGenericCoder(Map<String, Object> config) {
this.config = config;
}
private BeamKafkaAvroGenericDeserializer getDeserializer() {
if (deserializer == null) {
BeamKafkaAvroGenericDeserializer d = new BeamKafkaAvroGenericDeserializer();
d.configure(config, false);
deserializer = d;
}
return deserializer;
}
private BeamKafkaAvroGenericSerializer getSerializer() {
if (serializer == null) {
serializer = new BeamKafkaAvroGenericSerializer();
}
return serializer;
}
#Override
public void encode(AvroGenericRecord record, OutputStream outStream) {
getSerializer().serialize(record, outStream);
}
#Override
public AvroGenericRecord decode(InputStream inStream) {
try {
return getDeserializer().deserialize(null, IOUtils.toByteArray(inStream));
} catch (IOException e) {
throw new RuntimeException("Error translating into bytes ", e);
}
}
#Override
public void verifyDeterministic() {
}
#Override
public Object structuralValue(AvroGenericRecord value) {
return super.structuralValue(value);
}
#Override
public int hashCode() {
return HashCodeBuilder.reflectionHashCode(this);
}
#Override
public boolean equals(Object obj) {
return EqualsBuilder.reflectionEquals(this, obj);
}
}
This is main pipeline
PCollection<AvroGenericRecord> records = p.apply(readKafkaTr)
.apply(Window.<AvroGenericRecord>into(FixedWindows.of(Duration.standardMinutes(options.getWindowInMinutes())))
.triggering(AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardMinutes(options.getWindowInMinutes())))
.withLateFirings(AfterPane.elementCountAtLeast(options.getElementsCountToWaitAfterWatermark())))
.withAllowedLateness(Duration.standardDays(options.getAfterWatermarkInDays()))
.discardingFiredPanes()
);
records.apply(Filter.by((ProcessFunction<AvroGenericRecord, Boolean>) Objects::nonNull))
.apply(new WriteAvroFilesTr(options.getBasePath(), options.getNumberOfShards()));
Yes, I think #RyanSkraba is right - DirectRunner does many things that not all other runners do (because initial goal of DirectRunner was to be used for testing, so it performs many additional checks comparing to other runners).
Btw, why would not use Beam AvroCoder in this case? Simple example how to use it with KafkaIO:
https://github.com/aromanenko-dev/beam-issues/blob/master/kafka-io/src/main/java/KafkaAvro.java

Unable to send message with KafkaNull as Value

I am building a Kafka Application Using Log Compaction on a Topic but I am not able to send a Tombstone Value (KafkaNull)
I have tried using the default configuration for a serializer and when that did not work I used the suggested changes from "Publish null/tombstone message with raw headers" To set the application.properties to:
spring.cloud.stream.output.producer.useNativeEncoding=true
spring.cloud.stream.kafka.binder.configuration.value.serializer=org.springframework.kafka.support.serializer.JsonSerializer
The code I have to send a message to a stream is
this.stockTopics.compactedStocks().send(MessageBuilder
.withPayload(KafkaNull.INSTANCE)
.setHeader(KafkaHeaders.MESSAGE_KEY,company.getBytes())
.build())
this.stopTopics.compactedStocks() returns a messageStream that I can send messages to.
Every time I try and send that message with a KafkaNull instance as a payload I get the error Failed to convert message: 'GenericMessage [payload=org.springframework.kafka.support.KafkaNull#1c2d8163, headers={id=f81857e7-fbd0-56f5-8418-6a1944e7f2b1, kafka_messageKey=[B#36ec022a, contentType=application/json, timestamp=1547827957485}]' to outbound message.
I expect the message to simply be sent to the consumer with a null value but obviously it errors.
I opened a GitHub issue for this.
EDIT
Workaround - this works...
#SpringBootApplication
#EnableBinding(Source.class)
public class So54257687Application {
public static void main(String[] args) {
SpringApplication.run(So54257687Application.class, args);
}
#Bean
public ApplicationRunner runner(MessageChannel output) {
return args -> output.send(new GenericMessage<>(KafkaNull.INSTANCE));
}
#KafkaListener(id = "foo", topics = "output")
public void listen(#Payload(required = false) byte[] in) {
System.out.println(in);
}
#Bean
#StreamMessageConverter
public MessageConverter kafkaNullConverter() {
class KafkaNullConverter extends AbstractMessageConverter {
KafkaNullConverter() {
super(Collections.emptyList());
}
#Override
protected boolean supports(Class<?> clazz) {
return KafkaNull.class.equals(clazz);
}
#Override
protected Object convertFromInternal(Message<?> message, Class<?> targetClass, Object conversionHint) {
return message.getPayload();
}
#Override
protected Object convertToInternal(Object payload, MessageHeaders headers, Object conversionHint) {
return payload;
}
}
return new KafkaNullConverter();
}
}

Why use Kryo serialize framework into apache storm will over write data when blot get values

Maybe mostly develop were use AVRO as serialize framework in Kafka and Apache Storm scheme. But I need handle most complex data then I found the Kryo serialize framework also were successfully integrate it into our project which follow Kafka and Apache Storm environment. But when want to further operation there had a strange status.
I had sent 5 times message to Kafka, the Storm job also can read the 5 messages and deserialize success. But next blot get the data value is wrong. There print out the same value as the last message. Then I had add the print out after when complete the deserialize code. Actually it print out true there had different 5 message. Why the next blot can't the values? See my code below:
KryoScheme.java
public abstract class KryoScheme<T> implements Scheme {
private static final long serialVersionUID = 6923985190833960706L;
private static final Logger logger = LoggerFactory.getLogger(KryoScheme.class);
private Class<T> clazz;
private Serializer<T> serializer;
public KryoScheme(Class<T> clazz, Serializer<T> serializer) {
this.clazz = clazz;
this.serializer = serializer;
}
#Override
public List<Object> deserialize(byte[] buffer) {
Kryo kryo = new Kryo();
kryo.register(clazz, serializer);
T scheme = null;
try {
scheme = kryo.readObject(new Input(new ByteArrayInputStream(buffer)), this.clazz);
logger.info("{}", scheme);
} catch (Exception e) {
String errMsg = String.format("Kryo Scheme failed to deserialize data from Kafka to %s. Raw: %s",
clazz.getName(),
new String(buffer));
logger.error(errMsg, e);
throw new FailedException(errMsg, e);
}
return new Values(scheme);
}}
PrintFunction.java
public class PrintFunction extends BaseFunction {
private static final Logger logger = LoggerFactory.getLogger(PrintFunction.class);
#Override
public void execute(TridentTuple tuple, TridentCollector collector) {
List<Object> data = tuple.getValues();
if (data != null) {
logger.info("Scheme data size: {}", data.size());
for (Object value : data) {
PrintOut out = (PrintOut) value;
logger.info("{}.{}--value: {}",
Thread.currentThread().getName(),
Thread.currentThread().getId(),
out.toString());
collector.emit(new Values(out));
}
}
}}
StormLocalTopology.java
public class StormLocalTopology {
public static void main(String[] args) {
........
BrokerHosts zk = new ZkHosts("xxxxxx");
Config stormConf = new Config();
stormConf.put(Config.TOPOLOGY_DEBUG, false);
stormConf.put(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS, 1000 * 5);
stormConf.put(Config.TOPOLOGY_WORKERS, 1);
stormConf.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS, 5);
stormConf.put(Config.TOPOLOGY_TASKS, 1);
TridentKafkaConfig actSpoutConf = new TridentKafkaConfig(zk, topic);
actSpoutConf.fetchSizeBytes = 5 * 1024 * 1024 ;
actSpoutConf.bufferSizeBytes = 5 * 1024 * 1024 ;
actSpoutConf.scheme = new SchemeAsMultiScheme(scheme);
actSpoutConf.startOffsetTime = kafka.api.OffsetRequest.LatestTime();
TridentTopology topology = new TridentTopology();
TransactionalTridentKafkaSpout actSpout = new TransactionalTridentKafkaSpout(actSpoutConf);
topology.newStream(topic, actSpout).parallelismHint(4).shuffle()
.each(new Fields("act"), new PrintFunction(), new Fields());
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(topic+"Topology", stormConf, topology.build());
}}
There also other problem why the kryo scheme only can read one message buffer. Is there other way get multi messages buffer then can batch send data to next blot.
Also if I send 1 message the full flow seems success.
Then send 2 message is wrong. the print out message like below:
56157 [Thread-18-spout0] INFO s.s.a.s.s.c.KryoScheme - 2016-02- 05T17:20:48.122+0800,T6mdfEW#N5pEtNBW
56160 [Thread-20-b-0] INFO s.s.a.s.s.PrintFunction - Scheme data size: 1
56160 [Thread-18-spout0] INFO s.s.a.s.s.c.KryoScheme - 2016-02- 05T17:20:48.282+0800,T(o2KnFxtGB0Tlp8
56161 [Thread-20-b-0] INFO s.s.a.s.s.PrintFunction - Thread-20-b-0.99--value: 2016-02-05T17:20:48.282+0800,T(o2KnFxtGB0Tlp8
56162 [Thread-20-b-0] INFO s.s.a.s.s.PrintFunction - Scheme data size: 1
56162 [Thread-20-b-0] INFO s.s.a.s.s.PrintFunction - Thread-20-b-0.99--value: 2016-02-05T17:20:48.282+0800,T(o2KnFxtGB0Tlp8
I'm sorry this my mistake. Just found a bug in Kryo deserialize class, there exist an local scope parameter, so it can be over write in multi thread environment. Not change the parameter in party scope, the code run well.
reference code see blow:
public class KryoSerializer<T extends BasicEvent> extends Serializer<T> implements Serializable {
private static final long serialVersionUID = -4684340809824908270L;
// It's wrong set
//private T event;
public KryoSerializer(T event) {
this.event = event;
}
#Override
public void write(Kryo kryo, Output output, T event) {
event.write(output);
}
#Override
public T read(Kryo kryo, Input input, Class<T> type) {
T event = new T();
event.read(input);
return event;
}
}