Calculate differences between consecutive Kafka messages in one Topic - apache-kafka

I have some temperature sensors which are generated Kafka messages to a Kafka topic (my-sensors-topic). The messages generally look like below.
{"Offset": 7, "Id": 1, "Time": 1643718777898, "Value": 21}
{"Offset": 6, "Id": 1, "Time": 1643718768592, "Value": 20}
{"Offset": 5, "Id": 2, "Time": 1643718755443, "Value": 21}
{"Offset": 4, "Id": 3, "Time": 1643718746678, "Value": 21}
{"Offset": 3, "Id": 4, "Time": 1643718733408, "Value": 22}
{"Offset": 2, "Id": 2, "Time": 1643718709450, "Value": 20}
{"Offset": 1, "Id": 3, "Time": 1643718667375, "Value": 22}
{"Offset": 0, "Id": 1, "Time": 1643718386944, "Value": 19}
What I want to do is for a new generated message:
{"Offset": 8, "Id": 2, "Time": 1643719318393, "Value": 21}
Firstly compare the "Time" differences (in milliseconds) with the last existed message that has the same Id. In this case:
{"Offset": 5, "Id": 2, "Time": 1643718755443, "Value": 21}
Because its the last existed message and also with Id 2.
Secondly, I want to subtract the "Time" differences between these two messages.
If the difference is greater than 60000, then it's counted as an error occurred for this sensor and I need to create a message to record the error and write the message to another Kafka topic (my-sensors-error-topic).
The created message perhaps look like:
{"Id": 2, "Time_lead": 1643719318393, "Time_lag": 1643718755443, "Letancy": 1562950}
//Latency is calculated by (Time_lead-Time_lag)
So later I can select the Count from my-sensors-error-topic by (sensor) Id so I know how many errors occurred for this sensor.
From my own investigation, to reach my scenario, I need to use Kafka Processor API with State Store. Some examples mentioned I can implement Processor interface while others mentioned using Transform.
Which way is better to implement my scenarios and how?

Related

How to parse dynamic json?

I'm new to flutter development, I can't figure out how to parse json correctly, since lists are changeable.
Maybe there are some libraries that allow you to turn json into an object right away?
{
"name": "Nuan",
"key": "wqewewrwerer",
"places": {
"1": {
"place": 1,
"numberPlaces": 4,
"type": "TABLE"
},
"2": {
"place": 2,
"numberPlaces": 4,
"type": "TABLE"
},
"3": {
"place": 3,
"numberPlaces": 4,
"type": "TABLE"
}
},
"menu": {
"categories": [{
"name": "Salads",
"elements": [{
"id": 1,
"name": "Caesar salad",
"img": "cesar.jpg",
"price": 4,
"currency": "USD",
"weight": "222",
"typeWeight": "g",
"time": 3,
"calories": 500,
"description": "The salad\u0027s creation is generally attributed to the restaurateur Caesar Cardini, an Italian immigrant who operated restaurants in Mexico and the United States. His daughter Rosa recounted that her father invented the salad at his Prohibition-era restaurant in Tijuana, Mexico when a Fourth of July rush in 1924 depleted the kitchen\u0027s supplies. (Cardini lived in San Diego but ran the family restaurant in Tijuana to attract American customers seeking to circumvent the restrictions of the Prohibition). Cardini made do with what he had, adding the dramatic flair of the table-side tossing \"by the chef.\" A number of Cardini\u0027s staff have said that they invented the dish.",
"structure": "croutons, romaine, anchovies, parmeasan cheese, olive oil, vinegar and plenty of black pepper."
}, {
"id": 2,
"name": "Greek salad",
"img": "grek.jpg",
"price": 5,
"currency": "USD",
"weight": "220",
"typeWeight": "g",
"time": 3,
"calories": 300,
"description": "Various other salads have also been called \"Greek\" in the English language in the last century, including some with no apparent connection to Greek cuisine. A 1925 Australian newspaper described a Greek salad of boiled squash dressed with sour milk; a 1934 American newspaper described a mayonnaise-dressed lettuce salad with shredded cabbage and carrots",
"structure": "Lettuce, tomatoes, feta, olives, cucumber"
}, {
"id": 3,
"name": "Salmon salad",
"img": "semga.jpg",
"price": 8,
"currency": "USD",
"weight": "250",
"typeWeight": "g",
"time": 3,
"calories": 400
}, {
"id": 4,
"name": "Hunting salad",
"img": "hunting.jpg",
"price": 7,
"currency": "USD",
"weight": "220",
"typeWeight": "g",
"time": 4,
"calories": 390,
"description": "Hunting salad has such a name, since it includes meat of animals caught by hunters. History is silent at what point the game meat was replaced with beef, hunting sausages or even pork, but in our reality it is these meat products that are used for such a salad. The salad turns out to be very satisfying, nutritious and quite dense � ideal for men\u0027s snacks.",
"structure": "sausages, potatoes, greens"
}]
}, {
"name": "soups",
"elements": [{
"id": 5,
"name": "Solyanka soup",
"img": "solyanka.jpg",
"price": 10,
"currency": "USD",
"weight": "300",
"typeWeight": "g",
"time": 15,
"calories": 300,
"description": "There is no consensus on the correctness of the name selyanka in relation to soup. Russian linguist and writer L. I. Skvortsov writes about the traditionality of the name selyanka and the ambiguity of the etymology of the word \"solyanka\", at the same time, Russian historian, researcher and popularizer of cooking V. V. Pokhlebkin writes about the incorrectness and distortion of the name selyanka and claims that the name solyanka is fixed in the \"House-building\" of 1547, while the term selyanka took root only in the XIX century and at the beginning of the XX century was again replaced by the term solyanka. The Dictionary of the Russian Academy (1794) indicated the name solyanka as the main variant, and marked the selyanka variant as \"simple\"",
"structure": "sausage, potato, onion, tomato, pickle"
}, {
"id": 6,
"name": "Borsch soup",
"img": "borsch.jpg",
"price": 8,
"currency": "USD",
"weight": "300",
"typeWeight": "g",
"time": 15,
"calories": 300,
"description": "In the old days, borscht was called soup made from borscht. Later borscht was cooked on beet kvass: it was diluted with water, the mixture was poured into a clay pot or cast iron and brought to a boil. Chopped beets, cabbage, carrots and other vegetables were put in boiling water and put the pot in the oven. The cooked borscht was salted and refilled",
"structure": "meat, carrots, onions, potatoes, beets, white cabbage, beans, tomato paste, vegetable oil, bay leaf, sugar, vinegar"
}, {
"id": 7,
"name": "Mushroom soup",
"img": "mushroom.jpg",
"price": 10,
"currency": "USD",
"weight": "230",
"typeWeight": "g",
"time": 16,
"calories": 320,
"description": "Cream soup with mushrooms has been prepared since about the 17th century, when entrepreneurs already learned how to grow champignons in artificial conditions. It was in France, so this country is considered the birthplace of mushroom soups-puree. But the soup with dried porcini mushrooms is cooked on water and already only one aroma, spreading around the house, can drive even the most refined gourmets crazy.",
"structure": "potatoes, onions, herbs, salt, pepper, vegetable oil, mushrooms, soft processed cheese"
}]
}, {
"elements": [{
"id": 8,
"name": "Crackers",
"img": "crackers.png",
"price": 4,
"currency": "USD",
"weight": "200",
"typeWeight": "g",
"time": 12,
"calories": 600,
"description": "Crackers are second-baked bread, \"dried for the purpose of either storage or further culinary use in various dishes,\" is the definition of the product given by William Pohlebkin (1923-2000), a Russian historian and writer. The main distinguishing feature of crackers from all other bakery products is their reduced humidity, ideally no more than forty-nine percent, more often - up to eight.",
"structure": "bread, garlic, salt, oil"
}, {
"id": 9,
"name": "Croutons",
"img": "croutons.png",
"price": 4,
"currency": "USD",
"weight": "200",
"typeWeight": "g",
"time": 12,
"calories": 580,
"description": "Croutons are made from any bread and are used as a light snack, for example, croutons with garlic for beer, or as an ingredient in soups, broths, salads (\"Caesar\"), cutlets and other dishes. To add to soups (French onion soup), bread is simply fried with salt and/or black pepper.",
"structure": "white bread, garlic, salt, pepper, oil"
}, {
"id": 11,
"name": "Chips",
"img": "chips.png",
"price": 3,
"currency": "USD",
"weight": "100",
"typeWeight": "g",
"time": 3,
"calories": 330,
"structure": "potatoes, salt, pepper, oil"
}]
}]
},
"description": "There are many things in the world, friend Horatio, that our wise men never dreamed of"
}
I've seen standard ways where data is passed to the constructor, but it's not entirely clear how to embed large json there
You can use the json2Dart utility to generate your dart objects and call the fromJson() method on your root object :)

Why doesn't this pymongo subdocument find work?

I'm looking at using mongodb and so far most things that I've tried work. But I don't know why this find doesn't work.
col = db.create_collection("test")
x = col.insert_many([
{"item": "journal", "qty": 25, "size": {"h": 14, "w": 21, "uom": "cm"}, "status": "A"},
{"item": "notebook", "qty": 50, "size": {"h": 8.5, "w": 11, "uom": "in"}, "status": "A"},
{"item": "paper", "qty": 100, "size": {"h": 8.5, "w": 11, "uom": "in"}, "status": "D"},
{"item": "planner", "qty": 75, "size": {"h": 22.85, "w": 30, "uom": "cm"}, "status": "D"},
{"item": "postcard", "qty": 45, "size": {"h": 10, "w": 15.25, "uom": "cm"}, "status": "A"}
])
cursor = col.find({"size": {"h": 14, "w": 21, "uom": "cm"}})
if cursor.retrieved == 0:
print("found nothing") # <<<<<<<<< prints this
As explained into docs into section Match an Embedded/Nested Document:
Equality matches on the whole embedded document require an exact match of the specified document, including the field order.
So, you have to set the object into find stage in the same order that exists into DB.
I really don't know if keys into objects follows an strict order (alphabetically or whatever) but using this query almost everything output the result. Not always so I think there is a "random" (or not possible to handle) concept to store data -at least into mongo playground-.
By the way, the correct way to ensure results is to use dot notation so this query will always works ok.
coll.find({
"size.h": 14,
"size.w": 21,
"size.uom": "cm"
})
I was thinking that cursor.retrieved was non zero if it found something. I guess not. I found that this works:
lst = list(cursor)
print(lst)
cursor.rewind()
print(list(cursor))
if len(lst) != 0:
for d in lst:
print(d)

MongoDB - MongoImport of JSON (jsonl) - Rename, change types and add fields

i'm new to the topic MongoDB and have 4 different problems importing a big (16GB) file (jsonl) into my MongoDB (simple PSA-Cluster).
Below attached you will find a sample entry from the mentiond JSON-Dump.
With this file which i get from an external provider I actually have 4 problems.
"hotel_id" is the key and should normally be (re-)named as "_id"
"hotel_id" should not be treated as string rather than as Number
"location" is not properly formatted (if i understood correctly the MongoDB Manual) as GeoJSON as it should be like
"location": {
"type": "Point",
"coordinates": [-93.26838,37.15845]
}
instead of
"location": {
"coordinates": {
"latitude": 37.15845,
"longitude": -93.26838
}
}
"dates" can this be used to efficiently update just the records which needs to be updated?
So my challenge is now to transform the data according to my needs before importing the data or at time of import, but in both cases of course as quickly as possible.
Therefore i searched a lot for hints and best practices, but i was not able to find a solution yet, maybe due to the fact that i'm a beginner with MongoDB.
I played around with "jq" to adjust the data and for example add the type which seems to be necessary for the location (point 3), but wasn't really successful.
cat dump.jsonl | ./bin/jq --arg typeOfField Point '.location + {type: $typeOfField}'
Beside that i was injecting a sample dump of round-about 500MB which took 1,5 mins when importing it the first time (empty database). If i run it in "upsert" mode it will take round-about 12 hours. So i was also wondering what is the best practice to import such a big JSON-dump?
Any help is appreciated!! :-)
Kind regards,
Lumpy
{
"hotel_id": "12345",
"name": "Test Hotel",
"address": {
"line_1": "123 Test St",
"line_2": "Apt A",
"city": "Test City",
},
"ratings": {
"property": {
"rating": "3.5",
"type": "Star"
},
"guest": {
"count": 48382,
"average": "3.1"
}
},
"location": {
"coordinates": {
"latitude": 22.54845,
"longitude": -90.11838
}
},
"phone": "555-0153",
"fax": "555-7249",
"category": {
"id": 1,
"name": "Hotel"
},
"rank": 42,
"dates": {
"added": "1998-07-19T05:00:00.000Z",
"updated": "2018-03-22T07:23:14.000Z"
},
"statistics": {
"11": {
"id": 11,
"name": "Total number of rooms - 220",
"value": "220"
},
"12": {
"id": 12,
"name": "Number of floors - 7",
"value": "7"
}
},
"chain": {
"id": -2,
"name": "Test Hotels"
},
"brand": {
"id": 2,
"name": "Test Brand"
}
}

CouchDB: query reduced value on complex key with timeframe

Application user can perform different tasks. Each kind of task has unique identifier. Each user activity is recorded to database.
So we have following Event entity to keep in database:
{
"user_id": 1,
"task_id": 2,
"event_dt": [
2013, 11, 15, 10, 0, 0, 0
]
}
I need to know how many tasks of each type were performed by particular user during particular timeframe. Timeframe might be quite long (i.e. rolling chart for last year is requested).
For better understanding, map function might be something like:
emit([doc.user_id, doc.task_id, doc.event_dt], 1)
and it might be queried using group_level=2 (or group_level=1 in case just number of user events is needed).
Is it possible to answer above question by making single view query using map/reduce mechanism? Do I have to use list functionality (though it may cause performance issues)?
Just use flat key [doc.user_id, doc.task_id].concat(doc.event_dt) since it will simplify request and grouping logic:
with group_level=1: you'll get amount of tasks per user for all time
with group_level=2: amount of specific task ids per user for all time
with group_level=3: same as above but in context of specific year
with group_level=4: same as above but also grouped by months
etc. by days, hours, minutes and seconds
For instance, the result for group_level=3 may be:
{"rows":[
{"key": ["user1", "task1", 2012], "value": 3},
{"key": ["user1", "task2", 2013], "value": 14},
{"key": ["user1", "task3", 2013], "value": 15},
{"key": ["user2", "task1", 2012], "value": 9},
{"key": ["user2", "task4", 2012], "value": 26},
{"key": ["user2", "task4", 2013], "value": 53},
{"key": ["user3", "task1", 2013], "value": 5}
]}

OData JSON response from server comes back with line return characters

When you ask the OData server for JSON, the JSON response comes back with "\r\n" line returns. Currently I'm stripping the response of the line returns on the client side. Is there a way to have the JSON response come back without the "pretty format" without the "\r\n" line returns?
Response from server:
{\r\n"d" : [\r\n{\r\n"__metadata": {\r\n"uri": "http://services.odata.org/(S(cxfoyevtmm2e2elq52yherkc))/OData/OData.svc/Categories(0)", "type": "ODataDemo.Category"\r\n}, "ID": 0, "Name": "Food", "Products": {\r\n"__deferred": {\r\n"uri": "http://services.odata.org/(S(cxfoyevtmm2e2elq52yherkc))/OData/OData.svc/Categories(0)/Products"\r\n}\r\n}\r\n}, {\r\n"__metadata": {\r\n"uri": "http://services.odata.org/(S(cxfoyevtmm2e2elq52yherkc))/OData/OData.svc/Categories(1)", "type": "ODataDemo.Category"\r\n}, "ID": 1, "Name": "Beverages", "Products": {\r\n"__deferred": {\r\n"uri": "http://services.odata.org/(S(cxfoyevtmm2e2elq52yherkc))/OData/OData.svc/Categories(1)/Products"\r\n}\r\n}\r\n}, {\r\n"__metadata": {\r\n"uri": "http://services.odata.org/(S(cxfoyevtmm2e2elq52yherkc))/OData/OData.svc/Categories(2)", "type": "ODataDemo.Category"\r\n}, "ID": 2, "Name": "Electronics", "Products": {\r\n"__deferred": {\r\n"uri": "http://services.odata.org/(S(cxfoyevtmm2e2elq52yherkc))/OData/OData.svc/Categories(2)/Products"\r\n}\r\n}\r\n}\r\n]\r\n}
Expected response:
{"d" : [{"__metadata": {"uri": "http://services.odata.org/(S(cxfoyevtmm2e2elq52yherkc))/OData/OData.svc/Categories(0)", "type": "ODataDemo.Category"}, "ID": 0, "Name": "Food", "Products": {"__deferred": {"uri": "http://services.odata.org/(S(cxfoyevtmm2e2elq52yherkc))/OData/OData.svc/Categories(0)/Products"}}}, {"__metadata": {"uri": "http://services.odata.org/(S(cxfoyevtmm2e2elq52yherkc))/OData/OData.svc/Categories(1)", "type": "ODataDemo.Category"}, "ID": 1, "Name": "Beverages", "Products": {"__deferred": {"uri": "http://services.odata.org/(S(cxfoyevtmm2e2elq52yherkc))/OData/OData.svc/Categories(1)/Products"}}}, {"__metadata": {"uri": "http://services.odata.org/(S(cxfoyevtmm2e2elq52yherkc))/OData/OData.svc/Categories(2)", "type": "ODataDemo.Category"}, "ID": 2, "Name": "Electronics", "Products": {"__deferred": {"uri": "http://services.odata.org/(S(cxfoyevtmm2e2elq52yherkc))/OData/OData.svc/Categories(2)/Products"}}}]}
This is a known issue in the last release. In the next release, we will fix the code to never indent the response payload. If the client