Kafka producing message key as STRING even though the REST program has INT? - apache-kafka

I am using following program to produce records in kafka:
import java.io.IOException;
import java.security.SecureRandom;
public class SensorStatusProducer {
private final static String TOPIC = "SENSOR_STATUS_DETAILS";
private final static String PRODUCER_URI = "http://xxx.xxx.xxx.xxx:8082/topics/" + TOPIC;
private final static SecureRandom randomNumber = new SecureRandom();
private final static SensorDetails sensorDetails = new SensorDetails();
public static void main(String[] args) {
int[] sensorid = sensorDetails.getSensorid(); //this will return [1001,1002,1003,1004,1005]
try {
HttpRestProxyUtil rest = new HttpRestProxyUtil(); //this is defined in another class
for (int sid : sensorid) {
rest.produceRecords(PRODUCER_URI, String.format("{\"records\":[{\"key\": %d," +
"\"value\":{" +
"\"sensorid\":%d," +
"\"status\":%s," +
"\"lastconnectedtime\":%s}}]}", sid, sid, "\"CONNECTED\"", String.format("\"%s\"", sensorDetails.currentTimestamp()))); //currentTimestamp() function in defined in another class
}
} catch (InterruptedException | IOException me) {
me.printStackTrace();
}
}
}
The key has format specifier as %d but the record produced has key of STRING type.
This is evident by following:
When trying to make table:
CREATE TABLE STATUS_IB_TABLE (ROWKEY INT KEY,
sensorid INTEGER,
status VARCHAR,
lastconnectedtime STRING)
WITH (TIMESTAMP='lastconnectedtime', TIMESTAMP_FORMAT='yyyy-MM-dd HH:mm:ss', KAFKA_TOPIC='SENSOR_STATUS_DETAILS', VALUE_FORMAT='JSON', KEY='sensorid');
The KEY is serialized as STRING as pointed out by #Andrew Coates
I don't know how's that possible.
can someone please clarify this for me, what am I doing wrong?
PS:
=> this is a follow up question for my earlier question ksqlDB not taking rowkey properly
=> Confluent Platform version: 5.5
=> This is the main class of the program.

The REST Proxy supports various content types, but not including the primitive type to write a serialized 32-bit integer.
Your code is thus producing data to the topic with a string key. For an example of how to produce an INT see the example here which uses kafkacat.
Since you're using Java, you could use the native Java Producer API to control exactly how the data is produced to Kafka (which is also more performant and flexible than the REST API).

Related

How should I define Flink's Schema to read Protocol Buffer data from Pulsar

I am using Pulsar-Flink to read data from Pulsar in Flink. I am having difficulty when the data's format is Protocol Buffer.
In the GitHub top page, Pulsar-Flink is using SimpleStringSchema. However, seemingly it does not comply with Protocol Buffer officially. Does anyone know how to deal with the data format? How should I define the schema?
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
Properties props = new Properties();
props.setProperty("topic", "test-source-topic")
FlinkPulsarSource<String> source = new FlinkPulsarSource<>(serviceUrl, adminUrl, new SimpleStringSchema(), props);
DataStream<String> stream = see.addSource(source);
// chain operations on dataStream of String and sink the output
// end method chaining
see.execute();
FYI, I am writing Scala code, so if your explanation is for Scala(not for Java), it is really helpful. Surely, any kind of advice is welcome!! Including Java.
You should implement your own DeserializationSchema. Let's assume that you have a protobuf message Address and have generated the respective Java class. Then the schema should look like the following:
public class ProtoDeserializer implements DeserializationSchema<Address> {
#Override
public TypeInformation<Address> getProducedType() {
return TypeInformation.of(Address.class);
}
#Override
public Address deserialize(byte[] message) throws IOException {
return Address.parseFrom(message);
}
#Override
public boolean isEndOfStream(Address nextElement) {
return false;
}
}

Stateful filtering/flatMapValues in Kafka Streams?

I'm trying to write a simple Kafka Streams application (targeting Kafka 2.2/Confluent 5.2) to transform an input topic with at-least-once semantics into an exactly-once output stream. I'd like to encode the following logic:
For each message with a given key:
Read a message timestamp from a string field in the message value
Retrieve the greatest timestamp we've previously seen for this key from a local state store
If the message timestamp is less than or equal to the timestamp in the state store, don't emit anything
If the timestamp is greater than the timestamp in the state store, or the key doesn't exist in the state store, emit the message and update the state store with the message's key/timestamp
(This is guaranteed to provide correct results based on ordering guarantees that we get from the upstream system; I'm not trying to do anything magical here.)
At first I thought I could do this with the Kafka Streams flatMapValues operator, which lets you map each input message to zero or more output messages with the same key. However, that documentation explicitly warns:
This is a stateless record-by-record operation (cf. transformValues(ValueTransformerSupplier, String...) for stateful value transformation).
That sounds promising, but the transformValues documentation doesn't make it clear how to emit zero or one output messages per input message. Unless that's what the // or null aside in the example is trying to say?
flatTransform also looked somewhat promising, but I don't need to manipulate the key, and if possible I'd like to avoid repartitioning.
Anyone know how to properly perform this kind of filtering?
you could use Transformer for implementing stateful operations as you described above. In order to not propagate a message downstream, you need to return null from transform method, this mentioned in Transformer java doc. And you could manage propagation via processorContext.forward(key, value). Simplified example provided below
kStream.transform(() -> new DemoTransformer(stateStoreName), stateStoreName)
public class DemoTransformer implements Transformer<String, String, KeyValue<String, String>> {
private ProcessorContext processorContext;
private String stateStoreName;
private KeyValueStore<String, String> keyValueStore;
public DemoTransformer(String stateStoreName) {
this.stateStoreName = stateStoreName;
}
#Override
public void init(ProcessorContext processorContext) {
this.processorContext = processorContext;
this.keyValueStore = (KeyValueStore) processorContext.getStateStore(stateStoreName);
}
#Override
public KeyValue<String, String> transform(String key, String value) {
String existingValue = keyValueStore.get(key);
if (/* your condition */) {
processorContext.forward(key, value);
keyValueStore.put(key, value);
}
return null;
}
#Override
public void close() {
}
}

How to handle GROUPBY key(long) values using an aggregate method in kafka API

I am facing a problem while adding KGroupedStream's key values of type (Long,Integer) to an aggregate method. I mean to say that aggregate doesn't allow a key like the (Integer, Long, Double) data type, but it's working fine with the String data type KGroupedStream<String, JsonNode> using kstreams in kafka.
Finally conclude that the aggregate method doesn't allow the key to be like (Integer, Long).
My code so far is:
::->groupValue=groupBy((key, value) -> value.asLong(), Serialized.with(Serdes.Long(), jsonSerde));
groupValue.aggregate(
new Initializer<Long>()
{
#Override
public Long apply() { return value; }
},
new Aggregator<Long, JsonNode, Long>()
{
public Long apply(Long, JsonNode, Long)
{
return value1;
}
);
So please provide a solution to the given problem.

Kafka Streams: Appropriate way to find min value in a stream

I'm using Kafka Streams version 0.10.0.1, and trying to find the min value in a stream.
The incoming messages come from a topic called kafka-streams-topic and have a key and the value is a JSON payload that looks like this:
{"value":2334}
This is a simple payload but I want to find the min value of this JSON.
The outgoing message is just a number:
2334
and the key is also part of the message.
So if the incoming topic got:
key=1, value={"value":1000}
outgoing topic, named min-topic, would get
key=1,value=1000
another message comes through:
key=1, value={"value":100}
because this is the same key I would like to now produce a message with key=1 value=100 since this is now smaller than the first message
Now lets say we got:
key=2 value=99
A new message would be produced where:
key=2 and value=99 but the key=1 and associated value shouldn't change.
Additionally if we got the message:
key=1 value=2000
No message should be produced since this message is larger than the current value of 100
This works but I'm wondering if this adheres to the intent of the API:
public class MinProcessor implements Processor<String,String> {
private ProcessorContext context;
private KeyValueStore<String, Long> kvStore;
private Gson gson = new Gson();
#Override
public void init(ProcessorContext context) {
this.context = context;
this.context.schedule(1000);
kvStore = (KeyValueStore) context.getStateStore("Counts");
}
#Override
public void process(String key, String value) {
Long incomingPotentialMin = ((Double)gson.fromJson(value, Map.class).get("value")).longValue();
Long minForKey = kvStore.get(key);
System.out.printf("key: %s incomingPotentialMin: %s minForKey: %s \n", key, incomingPotentialMin, minForKey);
if (minForKey == null || incomingPotentialMin < minForKey) {
kvStore.put(key, incomingPotentialMin);
context.forward(key, incomingPotentialMin.toString());
context.commit();
}
}
#Override
public void punctuate(long timestamp) {}
#Override
public void close() {
kvStore.close();
}
}
Here is the code that actually runs the processor:
public class MinLauncher {
public static void main(String[] args) {
TopologyBuilder builder = new TopologyBuilder();
StateStoreSupplier countStore = Stores.create("Counts")
.withKeys(Serdes.String())
.withValues(Serdes.Long())
.persistent()
.build();
builder.addSource("source", "kafka-streams-topic")
.addProcessor("process", () -> new MinProcessor(), "source")
.addStateStore(countStore, "process")
.addSink("sink", "min-topic", "process");
KafkaStreams streams = new KafkaStreams(builder, KafkaStreamsProperties.properties("kafka-streams-min-poc"));
streams.cleanUp();
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
}
Not sure what your exact input data and result is (maybe you can update you question with this information: what are your input records? what is your output? What "EXTRA messages [] are produced [] that [you] don't expect"?).
However, a few general clarifications (can refine this answer later on if required).
You do your computation based in keys, so you should expect a result for each key (not sure if you have multiple different keys in your input).
You emit data in punctuate() which is called periodically (base in the internally tracked stream-time -- i.e., based on the timestamp values extracted from your input records via TimestampExtractor). Hence, you will write the current min value of each key written to the topic when punctuate() gets called and therefore, you can have multiple updates per key that are all appended to your result topic. (Topics are append only and if you write two messages with the same key, you see both -- there is no overwrite.)

How to set values in ItemPreparedStatementSetter for one to many mapping

I am trying to use JdbcBatchItemWriter for a domain object RemittanceClaimVO . RemittanceClaimVO has a List of another domain object , ClaimVO .
public class RemittanceClaimVO {
private long remitId;
private List<ClaimVO> claims = new ArrayList<ClaimVO>();
//setter and getters
}
So for each remit id, there would be multiple claims and I wish to use single batch statement to insert all rows.
With plain jdbc, I used to write this object by putting values in batches like below ,
ist<ClaimVO> claims = remittanceClaimVO.getClaims();
if(claims != null && !claims.isEmpty()){
for(ClaimVO claim:claims){
int counter = 1 ;
stmt.setLong(counter++, remittanceClaimVO.getRemitId());
stmt.setLong(counter++, claim.getClaimId());
stmt.addBatch();
}
}
stmt.executeBatch();
I am not sure how to achieve same in Spring Batch by using ItemPreparedStatementSetter.
I have tried similar loop as above in setValues method but values not getting set.
#Override
public void setValues(RemittanceClaimVO remittanceClaimVO, PreparedStatement ps) throws SQLException {
List<ClaimVO> claims = remittanceClaimVO.getClaims();
for(ClaimVO claim:claims){
int counter = 1 ;
ps.setLong(counter++, remittanceClaimVO.getRemitId());
ps.setLong(counter++, claim.getClaimId());
}
}
This seems another related question.
Please suggest.