Parsing Kafka messages - apache-kafka

My question will be short and clean. I would like to parse json data which will be coming from the Kafka topic. Thus, My application will run as the Kafka consumer. I am only interested in some part in JSON data. Do I need to process this data using a library for example Apache-Flink? After that I will send the data to somewhere else.

In the beginning you say "filter data", so, looks like you need a RecordFilterStrategy injected into the AbstractKafkaListenerContainerFactory. See documentation for this matter: https://docs.spring.io/spring-kafka/docs/current/reference/html/#filtering-messages
Then you say "interested in some part in JSON". Well, this doesn't sound like you need records filtering, but more sounds like data projection. For this reason you can use a ProjectingMessageConverter for slicing data by some ProjectionFactory. See their JavaDocs for more info.

Related

How can I deserialize geometry fields from Kafka messages stream via Debezium Connect?

I have a PostGIS + Debezium/Kafka + Debezium/Connect setup that is streaming changes from one database to another. I have been watching the messages via Kowl and everything is moving accordingly.
My problem relies when I'm reading the message from my Kafka Topic, the geometry (wkb) column in particular.
This is my Kafka message:
{
"schema":{
"type":"struct"
"fields":[...]
"optional":false
"name":"ecotx_geometry_kafka.ecotx_geometry_impo..."
}
"payload":{
"before":NULL
"after":{
"id":"d6ad5eb9-d1cb-4f91-949c-7cfb59fb07e2"
"type":"MultiPolygon"
"layer_id":"244458fa-e6e0-4c6c-a7e1-5bf0afce2fb8"
"geometry":{
"wkb":"AQYAACBqCAAAAQAAAAEDAAAAAQAAAAUAAABwQfUo..."
"srid":2154
}
"custom_style":NULL
"style_id":"default_layer_style"
}
"source":{...}
"op":"c"
"ts_ms":1618854994546
"transaction":NULL
}
}
As can be seem, the WKB information is something like "AQAAAAA...", despite the information inserted in my database being "01060000208A7A000000000000" or "LINESTRING(0 0,1 0)".
And I don't know how to parse/transform it to a ByteArray or a Geometry in my Consumer app (Kotlin/Java) to further use in GeoTools.
I don't know if I'm missing an import that is able to translate this information.
I'm have just a few questions around of people posting their json messages and every message that has a geom field (streamed w/ Debezium) got changed to this "AAAQQQAAAA".
Having said that, how can I parse/decoded/translate it to something that can be used by GeoTools?
Thanks.
#UPDATE
Additional info:
After an insert, when I analyze my slot changes (querying the database using pg_logical_slot_get_changes function), I'm able to see my changes in WKB:
{"change":[{"kind":"insert","schema":"ecotx_geometry_import","table":"geometry_data","columnnames":["id","type","layer_id","geometry","custom_style","style_id"],"columntypes":["uuid","character varying(255)","uuid","geometry","character varying","character varying"],"columnvalues":["469f5aed-a2ea-48ca-b7d2-fe6e54b27053","MultiPolygon","244458fa-e6e0-4c6c-a7e1-5bf0afce2fb8","01060000206A08000001000000010300000001000000050000007041F528CB332C413B509BE9710A594134371E05CC332C4111F40B87720A594147E56566CD332C4198DF5D7F720A594185EF3C8ACC332C41C03BEDE1710A59417041F528CB332C413B509BE9710A5941",null,"default_layer_style"]}]}
Which would be useful in the consumer app, the thing definitely relies on the Kafka Message content itself, just ain't sure who is transforming this value, if Kafka or DBZ/Connect.
I think it is just a different way to represent binary columns in PostGIS and in JSON. The WKB is a binary field, meaning it is has bytes with arbitrary values, many of which has no corresponding printable characters. PostGIS prints it out using HEX encoding, thus it looks like '01060000208A7A...' - hex digits, but internally it is just bytes. Kafka's JSON uses BASE64 encoding instead for exactly the same binary message.
Let's test with a prefix of your string,
select to_base64(from_hex('01060000206A080000010000000103000000010000000500'))
AQYAACBqCAAAAQAAAAEDAAAAAQAAAAUA

insert msg object to mongodb (node-red)?

We're experimenting with storing data to a MongoDB by using node-red. As it is now, we can store data on the database, but it seems like only the 'msg.payload' is stored (as document) - and not the whole msg object. Which confuses us a little...
The flow is very simple and nothing much has really been done.
We actually dont need ALL data, but we wish to store payload but also metadata as a document to our collection on our database. We've tried searching for an answer to this, but couldn't find anything relevant on how to do this. Hopefully we can get some help on this forum.
Thanks in advance! (btw. we're using mongodb3 on node-red to store data)
The node you are using is working as intended.
The normal pattern for Node-RED is that the focus of any given node is the msg.payload entry, any other msg properties are considered to be meta data.
The simplest thing here would be to use the built in core change node to move the other fields you are interested in to be properties of the msg.payload object.

Conditional routing in Apache NiFi

I'm using NiFi to get data from an Oracle database and put some of this data in Kafka (using the processor PutKafka).
Example : if the attribute "id" contains "aaabb"
Is that possible in Apache NiFi? How can i do it?
This should definitely be possible, the flow might be something like this...
1) ExecuteSQL or QueryDatabaseTable to get the data from the database, these produce Avro
2) ConvertAvroToJson processor to convert the Avro to Json
3) EvaluateJsonPath to extract the id field into an attribute
4) RouteOnAttribute to route flow files where the id attribute contains "aaabbb"
5) PutKafka to deliver any of the matching results from RouteOnAttribute
To add on to Bryan's example flow, I wanted to point you to some great documentation that should help introduce you to Apache NiFi.
Firstly, I would suggest checking out the NiFi documentation. It is very good and should help a lot. In addition to providing details on each of the processors Bryan mentioned it also has general documentation for every type of user.
For a basic introduction to build a NiFi flow check out this video.
For example templates check out this repo. It's a has an excel file at it's root level which has a description and list of processors for each template.

post body in REST

I was referring to the O'Reilly book on REST api design, that clearly lays down the message format specifically around the areas of how links should be used to represent interrelated resources and stuff. But all the examples are for reading a resource (GET) and how the server structures the message. But what about a Create (POST) ? Should the message structure for create of a similarily inter-connected object be similar i.e through links ??
By the way of an example, let us consider we want to create a Person object with a Parent field . Should the json message format sent to server thru POST (Post msg body) be like :-
{
name:'test',
age:12,
links:[
{
rel:'parent',
href:'/people/john'
}
]
}
Here is a media type you could look at
http://stateless.co/hal_specification.html
Yes, that is one way of doing it. GET information might be usefully made human-readable, but POST/PUT information targets the machine.
Adding information to reduce the server's need to process information (e.g. by limiting itself to verifying information makes sense rather than recovering it all from scratch) also makes a lot of sense, performance-wise. As long as you do verify: keep in mind that user data must be treated as suspect on general principles. You don't want the first ExtJS-savvy guy being able to forge requests to your services.
You might also format data in XML or CSV, depending on what's best for the specific application. And keeping in mind that you might want to refactor or reuse the code, so adhering to a single standard also makes sense. All things considered, JSON is probably the best option.

MSMQ querying for a specific message

I have a questing regarding MSMQ...
I designed an async arhitecture like this:
CLient - > WCF Service (hosted in WinService) -> MSMQ
so basically the WCF service takes the requests, processes them, adds them to an INPUT queue and returns a GUID. The same WCF service (through a listener) takes first message from queue (does some stuff...) and then it puts it back into another queue (OUTPUT).
The problem is how can I retrieve the result from the OUTPUT queue when a client requests it... because MSMQ does not allow random access to it's messages and the only solution would be to iterate through all messages and push them back in until I find the exact one I need. I do not want to use DB for this OUTPUT queue, because of some limitations imposed by the client...
You can look in your Output-Queue for your message by using
var mq = new MessageQueue(outputQueueName);
mq.PeekById(yourId);
Receiving by Id:
mq.ReceiveById(yourId);
A queue is inherently a "first-in-first-out" kind of data structure, while what you want is a "random access" data structure. It's just not designed for what you're trying to achieve here, so there isn't any "clean" way of doing this. Even if there was a way, it would be a hack.
If you elaborate on the limitations imposed by the client perhaps there might be other alternatives. Why don't you want to use a DB? Can you use a local SQLite DB, perhaps, or even an in-memory one?
Edit: If you have a client dictating implementation details to their own detriment then there are really only three ways you can go:
Work around them. In this case, that could involve using a SQLite DB - it's just a file and the client probably wouldn't even think of it as a "database".
Probe deeper and find out just what the underlying issue is, ie. why don't they want to use a DB? What are their real concerns and underlying assumptions?
Accept a poor solution and explain to the client that this is due to their own restriction. This is never nice and never easy, so it's really a last resort.
You may could use CorrelationId and set it when you send the message. Then, to receive the same message you can pick the specific message with ReceiveByCorrelationId as follow:
message = queue.ReceiveByCorrelationId(correlationId);
Moreover, CorrelationId is a string with the following format:
Guid()\\Number