MongoDB scheme collection for monitoring sensors - mongodb

I'm designing a monitoring application and hesitating about mongodb scheme. The app will monitoring 32 sensors. Each sensors will have multiple scalar value, but that not the same between sensors (could be differents units, config etc..). Scalar data will be pushed every minute. And for each sensors, there will be one or some array (3200 value) to be pushed too. Each sensors could have a completly different config, that have to be stored too (config is complex, but does not change often at all) and of course event log.
I don't know if i have to create different collection for each sensors, with a flat db like :
Config
1D_data
2D_data
event_log
Or creating sub collection for each sensors with config/1d/2d in each of them.
Request to display data will be "display all 1D from one sensor" (to display a trend over time) and "show a 2D_data at this time".
If it was only scalar value i'll chose the first solution, but dunno if 2D "big" results is also on the game.
Thanks for your advices !

It seems a little oversimplified at first but all the information can be stored in a single collection. There is no added value in having 32 different collections. Each document will have a shape similar to this:
{
"device":"D1",
"created": ISODate("2017-10-02T13:59:48.194Z"),
"moreCommonMetadata": { whatever: 77 },
"data": {
"color":"red",
"maxAmps":10,
"maxVolts":240
}
},
{
"device":"D2",
"created": ISODate("2017-10-02T13:59:48.194Z"),
"moreCommonMetadata": { whatever: 77 },
"data": {
"bigArray":[23,34,45,56,7,78,78,78],
"adj": [{foo:'bar', bin:'baz'}, {corn:'dog'}],
"vers": 2
}
}
Queries can easily sweep across the metadata, and the data field can be any shape and will be logically tied to the device field i.e. some value of the device field will drive logic to manipulate the data field.
A fairly comprehensive example can be found at
http://moschetti.org/rants/lhs.html

Related

Structuring data in Firebase Firestore for "File explorer/Google Drive" style application

So recently I started using Firebase with NOSQL database for my flutter project.
I'd like to create something of a shape of Windows file explorer, where basically user is able to create "folder", or file of one type, and inside each folder he has the same option. In the end user will be able to organize his workspace however he wants.
Poorly created visualisation
At first my idea was to create collection of "Folder/Files", where each document there would have Parent and Children parameters, with document ID's referencing other documents. But I feel like this would create a lot of unnecessary queries to database.
Now I'm thinking of creating Collection "Structures" that would basically have all data in array/list style, but then I feel like I would have trouble getting all the data and handling it.
Also, this project is about wine storage, where each user would basically recreate their real life wine cellar with all wine cabinets and shelves, and then be able to filter out/search for the bottles that are needed at any given time.
Currently my Firestore looks like this:
Firestroe Database
WineVaults are basically Wineries, so when the user logs in, he sees those, which he has access to, then when he clicks it, he should go inside of that structure, and be able to create Folders/Wine bottles inside.
The question's title is misleading, relative to what you're actually after. a "File Explorer" means there's an infinitely-sized tree with an unknown size vertically and horizontally. What you're actually after is a tree with a fixed vertical size (Winery > Folder > Bottle) and an infinite horizontal size.
As such, I would structure the data like this: //wineries/shelves/bottle.
So for example, you might have a structure similar to the bellow, with wineries, shelves, and bottles being collections (and sub-collections):
{
"wineries": [
"wineryId": {...},
"wineryId": {
"address": "",
"shelves": [
"shelfId": {...}
"shelfId": {
"bottles": [
"bottleId": {
"label": "",
"name": "",
"year" : ""
},
],
"description": "",
"location": ""
}
]
}
]
}
Note that one of the nice things of Firestore is that you can use Collection Groups to fetch all bottles in the system, for example. Even if they are "located" in a nested sub-collection somewhere.
I think that your original idea was good.
Keeping all the shelves, bottles in an array/list/map have the following limitations:
firesotre documents have a size limit of 1MB - what would you do when your user will reach that limit?
every time a user is searching / modifying his data you need to read and write his whole inventory
Your original idea will create more reads, but you only need to read the next level in the tree not the whole thing.
I also like the flexibility of the original idea
If I were to implement this, I would create a user collection, a "folder"/"node" collection (for wineries, shelves etc...) and a "file" collection for the bottles.
If you need security (allowing a user to see only his files/folders) it can be a sub-collection under the user.
A DB intent is to store data and allow you to search the data, I don't think that downloading all of the user's data and searching it manually is a good idea

Getting measurement into influxdb, nosql database

I have a measurement im want to persist in a influxdb database. The measurement itself consists of approx. 4000 measurement points which are generated by a microcontroller. Measurement points are in float format and are generated periodically (every few minutes) with a constant frequency.
I trying to get some knowledge for NoSQL databases, influxdb is my first try here.
Question is: How do I get these measurements in the influxdb assuming they are within an mqtt-message (in json format)? How are the insert strings generated/handled?
{
"begin_time_of_meas": "2020-11-19T16:02:48+0000",
"measurement": [
1.0,
2.2,
3.3,
...,
3999.8,
4000.4
],
"device": "D01"
}
I have used Node-RED in the past and i know there is a plugin for influx db, so i guess this would be a way. But im an quite unsure how the insert string is genereated/handled for an array of measurement points. Every exmaple i have seen so far handles only 1 point measurements like one temperature measurement every few seconds or cpu load. Thanks for your help.
I've successfully used the influxdb plugin with a time precision of milliseconds. Not sure how to make it work for more precise timestamps, and I've never needed to.
It sounds like you have more than a handful of points arriving per second; send groups of messages as an array to the influx batch node.
In your case, it depends what those 4000 measurements are, and how it best makes sense to group them. If the variables all measure the same point, something like this might work. I've no idea what the measurements are, etc. A function that takes the mqtt message and converts it to a block of messages like this might work well (note that this function output could replace the join node):
[{
measurement: "microcontroller_data",
timestamp: new Date("2020-11-19T16:02:48+0000").getTime(),
tags: {
device: "D01",
point: "0001",
},
fields: {
value: 1.0
}
},
{
measurement: "microcontroller_data",
timestamp: new Date("2020-11-19T16:02:48+0000").getTime(),
tags: {
device: "D01",
point: "0002",
},
fields: {
value: 2.2
}
},
...etc...
]
That looks like a lot of information to store, but measurement and tags values are basically header values that don't get written with every entry. The fields values do get stored, but these are compressed. The json describing the data to be stored is much larger than the on-disk space the storage will actually use.
It's also possible to have multiple fields, but I believe this will make data retrieval trickier:
{
measurement: "microcontroller_data",
timestamp: new Date("2020-11-19T16:02:48+0000").getTime(),
tags: {
device: "D01",
point: "0001",
},
fields: {
value_0001: 1.0,
value_0002: 2.2,
...etc...
}
}
Easier to code, but it would make for some ugly and inflexible queries.
You will likely have some more meaningful names than "microcontroller_data", or "0001", "0002" etc. If the 4000 signals are for very distinct measurements, it's also possible that there is more than one "measurement" that makes sense, e.g. cpu_parameters, flowrate, butterflies, etc.
Parse your MQTT messages into that shape. If the messages are sent one-at-a-time, then send to a join node; mine is set to send after 500 messages or 1 second of inactivity; you'll find something that fits.
If the json objects are grouped into an array by your processing, send directly to the influx batch node.
In the influx batch node, under "Advanced Query Options", I set the precision to ms because that's the default from Date().getTime().

Mongo Collections and Meteor Reactivity

I'm trying to decide the best approach for an app I'm working on. In my app each user has a number of custom forms for example X user will have custom forms and Y user will have 5 different forms customized to their needs.
My idea is to create a mongo db collection for each custom form, at the start I wouldn't have to many users I understand the mongo collection limit is set to 24000 (I think not sure). If that's correct I'm ok for now.
But I think this might create issues down the line but also not sure this is the best approach for performance, management and so forth.
The other option is to create one collocation "forms" and add custom data under an object field like so
{
_id: dfdfd34df4efdfdfdf,
data: {}
}
My concern with this is one Meteor reactivity and scale.
First I'm expecting each user to fill out each form at least 30 to 50 times per week, so I'm expecting the collection size to increase very fast. Which makes me question this approach and go with the collection option which breaks down the size.
My second concern or question is well Meteor be able to identify changes in the first level object and second level object. As I need the data to be reactive.
First Level
{
_id: dfdfd34df4efdfdfdf,
data: {}
}
Second Level
{
_id: dfdfd34df4efdfdfdf,
data: {
Object:
{
name:Y, _id: random id
}
}
}
The answer is somewhat here limits of number of collections in databases
It's not a yes or no but it's clear regrading the mongo collection limit. As for Meteor reactivity that's another topic.

Links vs References in Document databases

I am confused with the term 'link' for connecting documents
In OrientDB page http://www.orientechnologies.com/orientdb-vs-mongodb/ it states that they use links to connect documents, while in MongoDB documents are embedded.
Since in MongoDB http://docs.mongodb.org/manual/core/data-modeling-introduction/, documents can be referenced as well, I can not get the difference between linking documents or referencing them.
The goal of Document Oriented databases is to reduce "Impedance Mismatch" which is the degree to which data is split up to match some sort of database schema from the actual objects residing in memory at runtime. By using a document, the entire object is serialized to disk without the need to split things up across multiple tables and join them back together when retrieved.
That being said, a linked document is the same as a referenced document. They are simply two ways of saying the same thing. How those links are resolved at query time vary from one database implementation to another.
That being said, an embedded document is simply the act of storing an object type that somehow relates to a parent type, inside the parent. For example, I have a class as follows:
class User
{
string Name
List<Achievement> Achievements
}
Where Achievement is an arbitrary class (its contents don't matter for this example).
If I were to save this using linked documents, I would save User in a Users collection and Achievement in an Achievements collection with the List of Achievements for the user being links to the Achievement objects in the Achievements collection. This requires some sort of joining procedure to happen in the database engine itself. However, if you use embedded documents, you would simply save User in a Users collection where Achievements is inside the User document.
A JSON representation of the data for an embedded document would look (roughly) like this:
{
"name":"John Q Taxpayer",
"achievements":
[
{
"name":"High Score",
"point":10000
},
{
"name":"Low Score",
"point":-10000
}
]
}
Whereas a linked document might look something like this:
{
"name":"John Q Taxpayer",
"achievements":
[
"somelink1", "somelink2"
]
}
Inside an Achievements Collection
{
"somelink1":
{
"name":"High Score",
"point":10000
}
"somelink2":
{
"name":"High Score",
"point":10000
}
}
Keep in mind these are just approximate representations.
So to summarize, linked documents function much like RDBMS PK/FK relationships. This allows multiple documents in one collection to reference a single document in another collection, which can help with deduplication of data stored. However it adds a layer of complexity requiring the database engine to make multiple disk I/O calls to form the final document to be returned to user code. An embedded document more closely matches the object in memory, this reduces Impedance Mismatch and (in theory) reduces the number of disk I/O calls.
You can read up on Impedance Mismatch here: http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch
UPDATE
I should add, that choosing the right database to implement for your needs is very important from the start. If you have a lot of questions about each database, it might make sense to contact each supplier and get some of their training material. MongoDB offers 2 free courses you can take to learn more about their product and best uses at MongoDB University. OrientDB does offer training, however it is not free. It might be best to try contacting them directly and getting some sort of pre-sales training (if you are looking to license the db), usually they will put you in touch with some sort of pre-sales consultant to help you evaluate their product.
MongoDB works like RDBMS where the object id is like a foreign key. This means a "JOIN" that is run-time expensive. OrientDB, instead, has direct links that are created only once and have a very low run-time cost.

How can I efficiently use MongoDB to create real-time analytics with pivots?

So I'm getting a ton of data continuously that's getting put into a processedData collection. The data looks like:
{
date: "2011-12-4",
time: 2243,
gender: {
males: 1231,
females: 322
},
age: 32
}
So I'll get lots and lots of data objects like this continually. I want to be able to see all "males" that are above 40 years old. This is not an efficient query it seems because of the sheer size of the data.
Any tips?
Generally speaking, you can't.
However, there may be some shortcuts, depending on actual requirements. Do you want to count 'males above 40' across all dataset, or just one day?
1 day: split your data into daily collections (processedData-20111121, ...), this will help your queries. Also you can cache results of such query.
whole dataset: pre-aggregate data. That is, upon insertion of new data entry, do something like this:
db.preaggregated.update({_id : 'male_40'},
{$set : {gender : 'm', age : 40}, $inc : {count : 1231}},
true);
Similarly, if you know all your queries beforehand, you can just precalculate them (and not keep raw data).
It also depends on how you define "real-time" and how big a query load you will have. In some cases it is ok to just fire ad-hoc map-reduces.
My guess your target GUI is a website? In that case you are looking for something called comet. You should make a layer which processes all the data and broadcasts new mutations to your client or event bus (more on that below). Mongo doesn't enable real-time data as it doesn't emit anything on an mutation. So you can use any data store which suites you.
Depending on the language you'll use you have different options (for comet):
Socket.io (nodejs) - Javascript
Cometd - Java
SignalR - C#
Libwebsocket - C++
Most of the times you'll need an event bus or message queue to put the mutation events on. Take a look at JMS, Redis or NServiceBus (depending on what you'll use).