Getting measurement into influxdb, nosql database - nosql

I have a measurement im want to persist in a influxdb database. The measurement itself consists of approx. 4000 measurement points which are generated by a microcontroller. Measurement points are in float format and are generated periodically (every few minutes) with a constant frequency.
I trying to get some knowledge for NoSQL databases, influxdb is my first try here.
Question is: How do I get these measurements in the influxdb assuming they are within an mqtt-message (in json format)? How are the insert strings generated/handled?
{
"begin_time_of_meas": "2020-11-19T16:02:48+0000",
"measurement": [
1.0,
2.2,
3.3,
...,
3999.8,
4000.4
],
"device": "D01"
}
I have used Node-RED in the past and i know there is a plugin for influx db, so i guess this would be a way. But im an quite unsure how the insert string is genereated/handled for an array of measurement points. Every exmaple i have seen so far handles only 1 point measurements like one temperature measurement every few seconds or cpu load. Thanks for your help.

I've successfully used the influxdb plugin with a time precision of milliseconds. Not sure how to make it work for more precise timestamps, and I've never needed to.
It sounds like you have more than a handful of points arriving per second; send groups of messages as an array to the influx batch node.
In your case, it depends what those 4000 measurements are, and how it best makes sense to group them. If the variables all measure the same point, something like this might work. I've no idea what the measurements are, etc. A function that takes the mqtt message and converts it to a block of messages like this might work well (note that this function output could replace the join node):
[{
measurement: "microcontroller_data",
timestamp: new Date("2020-11-19T16:02:48+0000").getTime(),
tags: {
device: "D01",
point: "0001",
},
fields: {
value: 1.0
}
},
{
measurement: "microcontroller_data",
timestamp: new Date("2020-11-19T16:02:48+0000").getTime(),
tags: {
device: "D01",
point: "0002",
},
fields: {
value: 2.2
}
},
...etc...
]
That looks like a lot of information to store, but measurement and tags values are basically header values that don't get written with every entry. The fields values do get stored, but these are compressed. The json describing the data to be stored is much larger than the on-disk space the storage will actually use.
It's also possible to have multiple fields, but I believe this will make data retrieval trickier:
{
measurement: "microcontroller_data",
timestamp: new Date("2020-11-19T16:02:48+0000").getTime(),
tags: {
device: "D01",
point: "0001",
},
fields: {
value_0001: 1.0,
value_0002: 2.2,
...etc...
}
}
Easier to code, but it would make for some ugly and inflexible queries.
You will likely have some more meaningful names than "microcontroller_data", or "0001", "0002" etc. If the 4000 signals are for very distinct measurements, it's also possible that there is more than one "measurement" that makes sense, e.g. cpu_parameters, flowrate, butterflies, etc.
Parse your MQTT messages into that shape. If the messages are sent one-at-a-time, then send to a join node; mine is set to send after 500 messages or 1 second of inactivity; you'll find something that fits.
If the json objects are grouped into an array by your processing, send directly to the influx batch node.
In the influx batch node, under "Advanced Query Options", I set the precision to ms because that's the default from Date().getTime().

Related

MongoDB scheme collection for monitoring sensors

I'm designing a monitoring application and hesitating about mongodb scheme. The app will monitoring 32 sensors. Each sensors will have multiple scalar value, but that not the same between sensors (could be differents units, config etc..). Scalar data will be pushed every minute. And for each sensors, there will be one or some array (3200 value) to be pushed too. Each sensors could have a completly different config, that have to be stored too (config is complex, but does not change often at all) and of course event log.
I don't know if i have to create different collection for each sensors, with a flat db like :
Config
1D_data
2D_data
event_log
Or creating sub collection for each sensors with config/1d/2d in each of them.
Request to display data will be "display all 1D from one sensor" (to display a trend over time) and "show a 2D_data at this time".
If it was only scalar value i'll chose the first solution, but dunno if 2D "big" results is also on the game.
Thanks for your advices !
It seems a little oversimplified at first but all the information can be stored in a single collection. There is no added value in having 32 different collections. Each document will have a shape similar to this:
{
"device":"D1",
"created": ISODate("2017-10-02T13:59:48.194Z"),
"moreCommonMetadata": { whatever: 77 },
"data": {
"color":"red",
"maxAmps":10,
"maxVolts":240
}
},
{
"device":"D2",
"created": ISODate("2017-10-02T13:59:48.194Z"),
"moreCommonMetadata": { whatever: 77 },
"data": {
"bigArray":[23,34,45,56,7,78,78,78],
"adj": [{foo:'bar', bin:'baz'}, {corn:'dog'}],
"vers": 2
}
}
Queries can easily sweep across the metadata, and the data field can be any shape and will be logically tied to the device field i.e. some value of the device field will drive logic to manipulate the data field.
A fairly comprehensive example can be found at
http://moschetti.org/rants/lhs.html

How MongoDB works for this case?

I have a doubt about MongoDB, I know 'what is mongo' but I am not sure if this database is good for a requirement that I need to do. Well, here I go.
Description:
I need to store some data from devices (200 devices more o less) and those devices will report every 30 seconds geolocalization data (lat, long), so it will be 576.000 objects/day (2880 request = 1 device per day)
I thought this structure for my documents inside of 'locations' collection
{
"mac": "dc:a6:32:d4:b6:dc",
"company_id": 5,
"locations": [
{
"date": "2021-02-23 10:00:02",
"value": "-32.955465, -60.661143"
}
]
}
where 'locations' is an array that will store all locations every 30 seconds.
Questions:
Is able MongoDB database to do this?
Is correctly my document structure to solve this?
When this array will be a very big month after, What will happen?
There is a better way to do this? (database, framework, etc)
TIA !!
Is able MongoDB database to do this?
Yes, this will be fine.
Is correctly my document structure to solve this?
No, not at all!
Never store date/time values as sting, it's a design flaw. Use always proper Date object. (This applies for any database).
Similar statement applies for the coordinate, don't store it as string. I recommend a GeoJSON Objects, then you can also create index on it and run spatial queries. Example: location: { type: "Point", coordinates: [ -32.955465, -60.661143 ] }
When this array will be a very big month after, What will happen?
The document size in MongoDB cannot exceed 16MiByte, it's a fixed limit. So, it does not look like a good design. Maybe store locations per day or even one document per report.
There is a better way to do this? (database, framework, etc)
Well, ask 5 people and you will get 6 answers. At least your approach is not wrong.
Is able MongoDB database to do this? Yes
Is correctly my document structure to solve this? No
When this array will be a very big month after, What will happen? The maximum BSON document size is 16 megabytes.
Is there a better way to do this?
(database, framework, etc) Yes. The Bucket Pattern is a great solution for when needing to manage Internet of Things (IoT) applications.
You can have one document per device per hour, and a location document where the keys are the lapse each 30 seconds.
{
"mac": "dc:a6:32:d4:b6:dc",
"company_id": 5,
"date": ISODate("2021-02-23T10"),
"locations": {
"0": "-32.955465, -60.661143",
"1": "-33.514655, -60.664143",
"2": "-33.122435, -59.675685"
}
}
Adjust this solution considering your workload and main queries of your system.

Mongo for Meteor data design: opposite of normalizing?

I'm new to Meteor and Mongo. Really digging both, but want to get feedback on something. I am digging into porting an app I made with Django over to Meteor and want to handle certain kinds of relations in a way that makes sense in Meteor. Given, I am more used to thinking about things in a Postgres way. So here goes.
Let's say I have three related collections: Locations, Beverages and Inventories. For this question though, I will only focus on the Locations and the Inventories. Here are the models as I've currently defined them:
Location:
_id: "someID"
beverages:
_id: "someID"
fillTo: "87"
name: "Beer"
orderWhen: "87"
startUnits: "87"
name: "Second"
number: "102"
organization: "The Second One"
Inventories:
_id: "someID"
beverages:
0: Object
name: "Diet Coke"
units: "88"
location: "someID"
timestamp: 1397622495615
user_id: "someID"
But here is my dilemma, I often need to retrieve one or many Inventories documents and need to render the "fillTo", "orderWhen" and "startUnits" per beverage. Doing things the Mongodb way it looks like I should actually be embedding these properties as I store each Inventory. But that feels really non-DRY (and dirty).
On the other hand, it seems like a lot of effort & querying to render a table for each Inventory taken. I would need to go get each Inventory, then lookup "fillTo", "orderWhen" and "startUnits" per beverage per location then render these in a table (I'm not even sure how I'd do that well).
TIA for the feedback!
If you only need this for rendering purposes (i.e. no further queries), then you can use the transform hook like this:
var myAwesomeCursor = Inventories.find(/* selector */, {
transform: function (doc) {
_.each(doc.beverages, function (bev) {
// use whatever method you want to receive these data,
// possibly from some cache or even another collection
// bev.fillTo = ...
// bev.orderWhen = ...
// bev.startUnits = ...
}
}
});
Now the myAwesomeCursor can be passed to each helper, and you're done.
In your case you might find denormalizing the inventories so they are a property of locations could be the best option, especially since they are a one-to-many relationship. In MongoDB and several other document databases, denormalizing is often preferred because it requires fewer queries and updates. As you've noticed, joins are not supported and must be done manually. As apendua mentions, Meteor's transform callback is probably the best place for the joins to happen.
However, the inventories may contain many beverage records and could cause the location records to grow too large over time. I highly recommend reading this page in the MongoDB docs (and the rest of the docs, of course). Essentially, this is a complex decision that could eventually have important performance implications for your application. Both normalized and denormalized data models are valid options in MongoDB, and both have their pros and cons.

MongoDB database schema design

I have a website with 500k users (running on sql server 2008). I want to now include activity streams of users and their friends. After testing a few things on SQL Server it becomes apparent that RDMS is not a good choice for this kind of feature. it's slow (even when I heavily de-normalized my data). So after looking at other NoSQL solutions, I've figured that I can use MongoDB for this. I'll be following data structure based on activitystrea.ms
json specifications for activity stream
So my question is: what would be the best schema design for activity stream in MongoDB (with this many users you can pretty much predict that it will be very heavy on writes, hence my choice of MongoDB - it has great "writes" performance. I've thought about 3 types of structures, please tell me if this makes sense or I should use other schema patterns.
1 - Store each activity with all friends/followers in this pattern:
{
_id:'activ123',
actor:{
id:person1
},
verb:'follow',
object:{
objecttype:'person',
id:'person2'
},
updatedon:Date(),
consumers:[
person3, person4, person5, person6, ... so on
]
}
2 - Second design: Collection name- activity_stream_fanout
{
_id:'activ_fanout_123',
personId:person3,
activities:[
{
_id:'activ123',
actor:{
id:person1
},
verb:'follow',
object:{
objecttype:'person',
id:'person2'
},
updatedon:Date(),
}
],[
//activity feed 2
]
}
3 - This approach would be to store the activity items in one collection, and the consumers in another. In activities, you might have a document like:
{ _id: "123",
actor: { person: "UserABC" },
verb: "follow",
object: { person: "someone_else" },
updatedOn: Date(...)
}
And then, for followers, I would have the following "notifications" documents:
{ activityId: "123", consumer: "someguy", updatedOn: Date(...) }
{ activityId: "123", consumer: "otherguy", updatedOn: Date(...) }
{ activityId: "123", consumer: "thirdguy", updatedOn: Date(...) }
Your answers are greatly appreciated.
I'd go with the following structure:
Use one collection for all actions that happend, Actions
Use another collection for who follows whom, Subscribers
Use a third collection, Newsfeed for a certain user's news feed, items are fanned-out from the Actions collection.
The Newsfeed collection will be populated by a worker process that asynchronously processes new Actions. Therefore, news feeds won't populate in real-time. I disagree with Geert-Jan in that real-time is important; I believe most users don't care for even a minute of delay in most (not all) applications (for real time, I'd choose a completely different architecture).
If you have a very large number of consumers, the fan-out can take a while, true. On the other hand, putting the consumers right into the object won't work with very large follower counts either, and it will create overly large objects that take up a lot of index space.
Most importantly, however, the fan-out design is much more flexible and allows relevancy scoring, filtering, etc. I have just recently written a blog post about news feed schema design with MongoDB where I explain some of that flexibility in greater detail.
Speaking of flexibility, I'd be careful about that activitystrea.ms spec. It seems to make sense as a specification for interop between different providers, but I wouldn't store all that verbose information in my database as long as you don't intend to aggregate activities from various applications.
I believe you should look at your access patterns: what queries are you likely to perform most on this data, etc.
To me The use-case that needs to be fastest is to be able to push a certain activity to the 'wall' (in fb terms) of each of the 'activity consumers' and do it immediately when the activity comes in.
From this standpoint (I haven't given it much thought) I'd go with 1, since 2. seems to batch activities for a certain user before processing them? Thereby if fails the 'immediate' need of updates. Moreover, I don't see the advantage of 3. over 1 for this use-case.
Some enhancements on 1? Ask yourself if you really need the flexibility of defining an array of consumers for every activity. Is there really a need to specify this on this fine-grained scale? instead wouldn't a reference to the 'friends' of the 'actor' suffice? (This would a lot of space in the long run, since I see the consumers-array being the bulk of the entire message for each activity when consumers typically range in the hundreds (?).
on a somewhat related note: depending on how you might want to implement realtime notifications for these activity streams, it might be worth looking at Pusher - http://pusher.com/ and similar solutions.
hth

How can I efficiently use MongoDB to create real-time analytics with pivots?

So I'm getting a ton of data continuously that's getting put into a processedData collection. The data looks like:
{
date: "2011-12-4",
time: 2243,
gender: {
males: 1231,
females: 322
},
age: 32
}
So I'll get lots and lots of data objects like this continually. I want to be able to see all "males" that are above 40 years old. This is not an efficient query it seems because of the sheer size of the data.
Any tips?
Generally speaking, you can't.
However, there may be some shortcuts, depending on actual requirements. Do you want to count 'males above 40' across all dataset, or just one day?
1 day: split your data into daily collections (processedData-20111121, ...), this will help your queries. Also you can cache results of such query.
whole dataset: pre-aggregate data. That is, upon insertion of new data entry, do something like this:
db.preaggregated.update({_id : 'male_40'},
{$set : {gender : 'm', age : 40}, $inc : {count : 1231}},
true);
Similarly, if you know all your queries beforehand, you can just precalculate them (and not keep raw data).
It also depends on how you define "real-time" and how big a query load you will have. In some cases it is ok to just fire ad-hoc map-reduces.
My guess your target GUI is a website? In that case you are looking for something called comet. You should make a layer which processes all the data and broadcasts new mutations to your client or event bus (more on that below). Mongo doesn't enable real-time data as it doesn't emit anything on an mutation. So you can use any data store which suites you.
Depending on the language you'll use you have different options (for comet):
Socket.io (nodejs) - Javascript
Cometd - Java
SignalR - C#
Libwebsocket - C++
Most of the times you'll need an event bus or message queue to put the mutation events on. Take a look at JMS, Redis or NServiceBus (depending on what you'll use).