How MongoDB works for this case? - mongodb

I have a doubt about MongoDB, I know 'what is mongo' but I am not sure if this database is good for a requirement that I need to do. Well, here I go.
Description:
I need to store some data from devices (200 devices more o less) and those devices will report every 30 seconds geolocalization data (lat, long), so it will be 576.000 objects/day (2880 request = 1 device per day)
I thought this structure for my documents inside of 'locations' collection
{
"mac": "dc:a6:32:d4:b6:dc",
"company_id": 5,
"locations": [
{
"date": "2021-02-23 10:00:02",
"value": "-32.955465, -60.661143"
}
]
}
where 'locations' is an array that will store all locations every 30 seconds.
Questions:
Is able MongoDB database to do this?
Is correctly my document structure to solve this?
When this array will be a very big month after, What will happen?
There is a better way to do this? (database, framework, etc)
TIA !!

Is able MongoDB database to do this?
Yes, this will be fine.
Is correctly my document structure to solve this?
No, not at all!
Never store date/time values as sting, it's a design flaw. Use always proper Date object. (This applies for any database).
Similar statement applies for the coordinate, don't store it as string. I recommend a GeoJSON Objects, then you can also create index on it and run spatial queries. Example: location: { type: "Point", coordinates: [ -32.955465, -60.661143 ] }
When this array will be a very big month after, What will happen?
The document size in MongoDB cannot exceed 16MiByte, it's a fixed limit. So, it does not look like a good design. Maybe store locations per day or even one document per report.
There is a better way to do this? (database, framework, etc)
Well, ask 5 people and you will get 6 answers. At least your approach is not wrong.

Is able MongoDB database to do this? Yes
Is correctly my document structure to solve this? No
When this array will be a very big month after, What will happen? The maximum BSON document size is 16 megabytes.
Is there a better way to do this?
(database, framework, etc) Yes. The Bucket Pattern is a great solution for when needing to manage Internet of Things (IoT) applications.
You can have one document per device per hour, and a location document where the keys are the lapse each 30 seconds.
{
"mac": "dc:a6:32:d4:b6:dc",
"company_id": 5,
"date": ISODate("2021-02-23T10"),
"locations": {
"0": "-32.955465, -60.661143",
"1": "-33.514655, -60.664143",
"2": "-33.122435, -59.675685"
}
}
Adjust this solution considering your workload and main queries of your system.

Related

Server side paging and grouping of large dataset

I'll try to explain the issue as best I can. Implement a grid with server paging. On request for N entities, DB should return a set of data which should be grouped or better said transformed in such a way that when the transformation phase is done it should result in those N entities.
Best way as I can see is something like this:
Query_all_data() => Result; (10000000 documents)
Transform(Result) => Transformed (100 groups)
Transformed.Skip(N).Take(N)
Transformation phase should be something like this:
Result = [d0, d1, d2..., dN]
Transformed = [
{ info: "foo", docs: [d0. d2, d21, d67, d100042] },
{ info: "bar", docs: [d3. d28, d121, d6271, d100042] },
{ info: "baz", docs: [d41. d26, d221, d567, d100043] },
{ info: "waz", docs: [d22. d24, d241, d167, d1000324] }
]
Every object in Transformed is an entity in grid.
I'm not sure if it's important but the DB in question is MongoDB and all documents are stored in one collection. Now, the huge pitfall of this approach is that it's way to slow on large dataset which will most certainly be the case.
Is there a better approach. Maybe different DB design?
#dakt, you can store your data in couple of different ways based on how you are going to use the data. In the process it may also be useful to store data in de-normalized form where in some duplication of data may occur.
Store data as individual documents as mentioned in your problem statement
Store the data in transformed format in your problem statement. It looks like you have a consistent way of mapping the docs to some tag. If so, why not maintain documents such that they are always embedded for those tags. This certainly has limitation on number of docs that you may be able to contain base on the 16MB document limit.
I would suggest looking at the MongoDB use-cases - http://docs.mongodb.org/ecosystem/use-cases/ and see if any of those are similar to what you are trying to achieve.

Mongodb real basic use case

I'm approaching the noSQL world.
I studied a little bit around the web (not the best way to study!) and I read the Mongodb documentation.
Around the web I wasn't able to find a real case example (only fancy flights on big architectures not well explained or too basic to be real world examples).
So I have still some huge holes in my understanding of a noSQL and Mongodb.
I try to summarise one of them, the worst one actually, here below:
Let's imagine the data structure for a post of a simple blog structure:
{
"_id": ObjectId(),
"title": "Title here",
"body": "text of the post here",
"date": ISODate("2010-09-24"),
"author": "author_of_the_post_name",
"comments": [
{
"author": "comment_author_name",
"text": "comment text",
"date": ISODate("date")
},
{
"author": "comment_author_name2",
"text": "comment text",
"date": ISODate("date")
},
...
]
}
So far so good.
All works fine if the author_of_the_post does not change his name (not considering profile picture and description).
The same for all comment_authors.
So if I want to consider this situation I have to use relationships:
"authorID": <author_of_the_post_id>,
for post's author and
"authorID": <comment_author_id>,
for comments authors.
But MongoDB does not allow joins when querying. So there will be a different query for each authorID.
So what happens if I have 100 comments on my blog post?
1 query for the post
1 query to retrieve authors informations
100 queries to retrieve comments' authors informations
**total of 102 queries!!!**
Am I right?
Where is the advantage of using a noSQL here?
In my understanding 102 queries VS 1 bigger query using joins.
Or am I missing something and there is a different way to model this situation?
Thanks for your contribution!
Have you seen this?
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
It sounds like what you are doing is NOT a good use case for NoSQL. Use relational database for basic data storage to back applications, use NoSQL for caching and the like.
NoSQL databases are used for storage of non-sensitive data for instance posts, comments..
You are able to retrieve all data with one query. Example: Don't care about outdated fields as author_name, profile_picture_url or whatever because it's just a post and in the future this post will not be visible as newer ones. But if you want to have updated fields you have two options:
First option is to use some kind of worker service. If some user change his username or profile picture you will give some kind of signal to your service to traverse all posts and comments and update all fields his new username.
Second option use authorId instead of author name, and instead of 2 query you will make N+2 queries to query for comment_author_profile. But use pagination, instead of querying for 100 comments take 10 and show "load more" button/link, so you will make 12 queries.
Hope this helps.

Structuring Nested Collections in RavenDB

I've got a question about how I structure my data in RavenDB. As like most I'm coming from a relational database background and it feels slightly like I'm having to re-program my brain :).
Anyway. I have a utility which looks as below
{
"Name": "Gas",
"Calendars": [
{
"Name": "EFA"
},
{
"Name": "Calendar"
}
]
}
And I have a contract. Whilst creating the contract I need to first pick a utility type. Then based upon that I need to pick a Calendar type.
For example, I would pick Gas and then I would pick EFA. My question is how should I store this information against the contract object. It almost feels like each of my calendars should have an id, but I'm guessing this is wrong? Or should I just be storing the text values?
Any advice on the correct way to do this would be appreciated.
You can have internal objects have ids in RavenDB, but those are application managed, not managed by RavenDB.

Extract data lists from Mongo Documents

As a mongo/nosql newbie with a RDBMS background I wondered what's the best way to proceed.
Currently I've got a large set of documents, containing in some fields, what I consider as "reference datas".
My need is to display in a search interface summarizing the possible values of those "reference fields" to further proceed a filter on my documents set.
Let's take a very simple and stupid example about nourishment.
Here is an extract of some mongo documents:
{ "_id": 1, "name": "apple", "category": "fruit"}
{ "_id": 1, "name": "orange", "category": "fruit"}
{ "_id": 1, "name": "cucumber", "category": "vegetable"}
In the appplication I'd like to have a selectbox displaying all the possible values for "category". Here it would display "fruit" and "vegetable".
What's the best way to proceed ?
extract datas from the existing documents ?
create some reference documents listing unique possible values (as I would do in RDBMS )
store reference data in a rdbms and programatically link mongo and rdbms...
something else ?
The first option is the easiest to implement and should be efficient if you have indexes properly set (see distinct command), so I would go with this.
You could also choose the second option (linking to a reference collection - RDBMS way) which trades performance (you will need more queries for fetching data) for space (you will need less space). Also, this option is preferred if the category is used in other collections as well.
I would advise against using a mixed system (NoSQL + RDBMS) in this case as the other options are better.
You could also store category values directly in application code - depends on your use case. Sometimes it makes sense, although any RDBMS fanatic would burst into tears (or worse) if you tell him that. YMMV. ;)

When to embed documents in Mongo DB

I'm trying to figure out how to best design Mongo DB schemas. The Mongo DB documentation recommends relying heavily on embedded documents for improved querying, but I'm wondering if my use case actually justifies referenced documents.
A very basic version of my current schema is basically:
(Apologies for the psuedo-format, I'm not sure how to express Mongo schemas)
users {
email (string)
}
games {
user (reference user document)
date_started (timestamp)
date_finished (timestamp)
mode (string)
score: {
total_points (integer)
time_elapsed (integer)
}
}
Games are short (about 60 seconds long) and I expect a lot of concurrent writes to be taking place.
At some point, I'm going to want to calculate a high score list, and possibly in a segregated fashion (e.g., high score list for a particular game.mode or date)
Is embedded documents the best approach here? Or is this truly a problem that relations solves better? How would these use cases best be solved in Mongo DB?
... is this truly a problem that relations solves better?
The key here is less about "is this a relation?" and more about "how am I going to access this?"
MongoDB is not "anti-reference". MongoDB does not have the benefits of joins, but it does have the benefit of embedded documents.
As long as you understand these trade-offs then it's perfectly fair to use references in MongoDB. It's really about how you plan to query these objects.
Is embedded documents the best approach here?
Maybe. Some things to consider.
Do games have value outside of the context of the user?
How many games will a single user have?
Is games transactional in nature?
How are you going to access games? Do you always need all of a user's games?
If you're planning to build leaderboards and a user can generate hundreds of game documents, then it's probably fair to have games in their own collection. Storing ten thousand instances of "game" inside of each users isn't particularly useful.
But depending on your answers to the above, you could really go either way. As the litmus test, I would try running some Map / Reduce jobs (i.e. build a simple leaderboard) to see how you feel about the structure of your data.
Why would you use a relation here? If the 'email' is the only user property than denormalization and using an embedded document would be perfectly fine. If the user object contains other information I would go for a reference.
I think that you should to use "entity-object" and "object-value" definitions from DDD. For entity use reference,but for "object-value" use embed document.
Also you can use denormalization of your object. i mean that you can duplicate your data. e.g.
// root document
game
{
//duplicate part that you need of root user
user: { FirstName: "Some name", Id: "some ID"}
}
// root document
user
{
Id:"ID",
FirstName:"someName",
LastName:"last name",
...
}