How to consume messages from kafka producer in batches (kafka-python)

How to consume messages from kafka producer in batches (kafka-python) - apache-kafka

I am having a kafka producer and consumer in python. I wish to consume messages from kafka producer in batches, let's say 2. From the producer, I have been sending email data like the following:
[{
"email" : "sukhi215c#gmail.com",
"subject": "Test 1",
"message" : "this is a test"
},
{
"email" : "sukhi215c#gmail.com",
"subject": "Test 2",
"message" : "this is a test"
},
{
"email" : "sukhi215c#gmail.com",
"subject": "Test 3",
"message" : "this is a test"
},
{
"email" : "sukhi215c#gmail.com",
"subject": "Test 4",
"message" : "this is a test"
}]
I am trying to consume these data in batches. I wish to consume 2 message at a time and send emails based on those 2 data and consume the next set of data. The workaround that I tried is:
consumer = KafkaConsumer(topic, bootstrap_servers=[server], api_version=(0, 10))
for message in consumer[:2]:
string = message.value.decode("utf-8")
dict_value = ast.literal_eval(string)
The error that I am getting is:
for message in consumer[:2]:
TypeError: 'KafkaConsumer' object is not subscriptable
Can someone help me getting through this?

The consumer is not a collection; it's iterator is infinite.
If you want to perform an action every two events, use a counter or your own list
data = []
consumer = KafkaConsumer(topic, bootstrap_servers=[server], api_version=(0, 10))
for message in consumer:
data.append(message)
if len(data) >= 2:
action(data)
data.clear()

Use the poll() interface documented here:
https://kafka-python.readthedocs.io/en/master/_modules/kafka/consumer/group.html#KafkaConsumer.poll
This allows you to set a timeout to return early if there are no messages to consume.

Related

How To parse Json in Flink Kafka + Scala

Hi I am new developer to Apache Flink. I am trying to build a Data-streaming Application in Apache Flink integrated with Kafka.
I already an Have a Topic named source-json created in Kafka containing a json which i am required to parse using Apache Flink(Scala).
I have already written the code to extract the Json in Apacheflink facing issue in parsing the json in apache Flink using the deseralizer for spark i apparently i could not find much support for apache flink spark in docs.
json file i am Trying to parse
{
"timestamp": "2020-01-09",
"integrations": {},
"context": {
"page": {
"path": "/",
"title": "shop",
"url": "www.example.com"
},
"library": {
"version": "3.10.1"
}
},
"properties": {
"url": "https://www.example.com/",
"page_type": "Home",
},
"Time": "2020-01-09",
"_sorting": {
"SortAs": ["SGML"],
"Acronym": ["SGML"]
}
}
I was able to write the code for print the json from kafka using flink.
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val source = KafkaSource.builder[String].
setBootstrapServers("localhost:9092")
.setTopics("source-json")
.setGroupId("my-group")
.setStartingOffsets(OffsetsInitializer.earliest)
.setValueOnlyDeserializer(new SimpleStringSchema())
.build
val streamEnv : DataStream[String] = env.fromSource(source, WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofSeconds(20)), "Kafka Source")
streamEnv.print()
env.execute("running stream")
}
Need help in parsing the json storing the filtered json in a different topic in kafka named sink-json in apache flink.

ActiveMQ Artemis how to filter message by part of the "text" field

Is there any way to filter messages in ActiveMQ Artemis 2.10.0 by part of the "text" field using the management console?
I use method "browse(java.lang.String)" and try to filter my message (example below) by this expression:
text LIKE '%777-555-333-111%'
Message example:
{
"address": "ADDRESS.EXAMPLE",
"ShortProperties": {},
"messageID": "11111",
"priority": 4,
"type": 3,
"redelivered": false,
"ByteProperties": {
"_AMQ_ROUTING_TYPE": 1
},
"IntProperties": {
"CamelHttpResponseCode": 200
},
"durable": true,
"StringProperties": {
"Server": "nginx\/1.19.5",
"CamelHttpCharacterEncoding": "UTF-8",
"Content_HYPHEN_Type": "application\/xop+xml",
"connection": "keep-alive"
},
"DoubleProperties": {},
"expiration": 0,
"text": "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><processId>777-555-333-111<\/processId><\/error>",
"BooleanProperties": {},
"FloatProperties": {}
}
However, it doesn't give me any results.
Would be grateful for a hint if it possible on my current Artemis version.

The filter used by the browse management operation (as well as that used by JMS consumers, etc.) only applies to message headers and properties. You can't filter a message by the text in its body.
The data that you pasted is just the serialized message data sent to the client after the filter has already been applied.

Apache ActiveMQ Artemis also supports special XPath filters which operate on
the body of a message. The body must be XML, see the documentation for further details.
To use an XPath filter use this syntax:
XPATH '<xpath-expression>'

kafka connect sink to mongo only last result with delay

i have aggregation query pageView group by country, results push to out topic.
And sink to mongodb by kafka connector
{
"connector.class": "MongoDbAtlasSink",
"name": "confluent-mongodb-sink",
"input.data.format" : "JSON",
"connection.host": "ip",
"topics": "viewPageCountByUsers",
"max.num.retries": "3",
"retries.defer.timeout": "5000",
"max.batch.size": "0",
"database": "test",
"collection": "ViewPagesCountByUsers",
"tasks.max": "1"
}
The problem is that this data is very frequent and very load mongodb. How i can set kafkaconnection that send only last value by key as batch, example with 5 sec delay ?
Example: It's pointless to update the database 5 times
{countryID:7, viewCount: 111}
{countryID:7, viewCount: 112}
{countryID:7, viewCount: 113}
{countryID:7, viewCount: 114}
{countryID:7, viewCount: 115}
If there was an opportunity send only last result by key with 5 sec delay i can update 1 time.
// collect batch 5 sec and flush:
{countryID:7, viewCount: 115}
{countryID:8, viewCount: 573}
How do it?

Sink connectors just take whatever is in the topic, generally without batching.
You'd need to use a stream-processor such as Kafka Streams / KSQLdb to run a windowed-aggregation, then output to a new topic, which you'd read from the sink connector.

how to access latest offset of topic in confluent kafka rest proxy to calculate lag

In confluent kafka rest proxy we can get the last committed offset of particular consumer group but how can we get the latest offset of topic to calculate the lag.

You can use Kafka REST Proxy to fetch the latest offset committed for a particular partition. According to the Confluent Docs,
GET /consumers/(string: group_name)/instances/(string: instance)/offsets
Get the last committed offsets for the given partitions (whether the
commit happened by this process or another).
Note that this request must be made to the specific REST proxy
instance holding the consumer instance.
Parameters:
group_name (string) -- The name of the consumer group
instance (string) -- The ID of the consumer instance Request JSON
Array of Objects:
partitions -- A list of partitions to find the last committed offsets for
partitions[i].topic (string) -- Name of the topic
partitions[i].partition (int) -- Partition ID
Response JSON Array of Objects:
offsets -- A list of committed offsets
offsets[i].topic (string) -- Name of the topic for which an offset was committed
offsets[i].partition (int) -- Partition ID for which an offset was committed
offsets[i].offset (int) -- Committed offset
offsets[i].metadata (string) -- Metadata for the committed offset
Status Codes:
404 Not Found --
Error code 40402 -- Partition not found
Error code 40403 -- Consumer instance not found
Example Request:
GET /consumers/testgroup/instances/my_consumer/offsets HTTP/1.1
Host: proxy-instance.kafkaproxy.example.com
Accept: application/vnd.kafka.v2+json, application/vnd.kafka+json, application/json
{
"partitions": [
{
"topic": "test",
"partition": 0
},
{
"topic": "test",
"partition": 1
}
]
}
Example Response:
HTTP/1.1 200 OK
Content-Type: application/vnd.kafka.v2+json
{"offsets":
[
{
"topic": "test",
"partition": 0,
"offset": 21,
"metadata":""
},
{
"topic": "test",
"partition": 1,
"offset": 31,
"metadata":""
}
]
}

Looks like there is an early access feature for this: https://docs.confluent.io/platform/current/kafka-rest/api.html#get--clusters-cluster_id-consumer-groups-consumer_group_id-lags

Sample messages from IOT sensors for MQTT communications

There is an M2M Application which wants to talk to the temperature sensors on the field, i.e. send/receive messages using MQTT pub/sub protocol.
I have setup both IOTDM as well as one with eclipse OneM2M using Mosquito. But, I am looking for some sample APIs/commands through which a M2M application can send a message to the MQTT client and vice versa.
Or if any of you could point me to the appropriate call flows that would be helpful.
Any help would be highly appreciated.

Here is a GET MQTT message example:
topic: /oneM2M/req/{{origin}}/{{cse-id}}/json
message:
{
"m2m:rqp": {
"op": "2",
"to": "{{resource_uri}}",
"fr": "{{origin}}",
"rqi": 12345,
"pc": ""
}
}
{{resource_uri}} is the relative path of a resource existing on the
oneM2M server (e.g. /my_cse_base/my_ae)
{{origin}} is the origin enabled (by ACP) to retrieve the resource
{{cse-id}} is the CSEbase ID
The message received could be similar to:
topic: /oneM2M/resp/{{origin}}/{{cse-id}}/json
message:
{
"m2m:rsp": {
"rsc": 2000,
"rqi": 12345,
"pc": {
"m2m:ae": {
"pi": "Sy2XMSpbb",
"ty": 2,
"ct": "20170706T085259",
"ri": "r1NX_cOiVZ",
"rn": "my_ae",
"lt": "20170706T085259",
"et": "20270706T085259",
"acpi": ["/my_cse_base/acp_my_ae"],
"aei": "my_ae_id",
"rr": true
}
}
}
}
A POST example:
topic: /oneM2M/req/{{origin}}/{{cse-id}}/json
message:
{
"m2m:rqp": {
"op": "1",
"to": "{{resource_uri}}",
"fr": "{{origin}}",
"rqi": 12345,
"ty": "4",
"pc": {
"m2m:cin": {
"cnf": "text/plain:0",
"con": "123",
"lbl": ["test"]
}
}
}
}
{{resource_uri}} is the relative path of a resource existing on the
oneM2M server (e.g. /my_cse_base/my_ae)
{{origin}} is the origin enabled (by ACP) to create a new resource
{{cse-id}} is the CSEbase ID

For an JS speach i made an app for mesure the soil moisture. I used MQTT for send information from my Arduino to server written in NodeJS. I don't know if you have some skills on JS. You can see the cond on my github repo . I hope this solution can help you.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to consume messages from kafka producer in batches (kafka-python) - apache-kafka

Use the poll() interface documented here: https://kafka-python.readthedocs.io/en/master/_modules/kafka/consumer/group.html#KafkaConsumer.poll This allows you to set a timeout to return early if there are no messages to consume.

Related

How To parse Json in Flink Kafka + Scala

ActiveMQ Artemis how to filter message by part of the "text" field

kafka connect sink to mongo only last result with delay

how to access latest offset of topic in confluent kafka rest proxy to calculate lag

Sample messages from IOT sensors for MQTT communications

Categories

Resources