How to see bucket structure that gets stored for the timeseries data in mongoDB - mongodb

I have created a timeseries collection for weather data.
db.createCollection(
"weather",
{
timeseries: {
timeField: "timestamp",
metaField: "metadata",
granularity: "hours"
}
}
)
When I retrieve the data I get the data as I stored it, but mongoDB stores these data in different way. It creates buckets based on metaField and granularity.
How to see those original buckets?

MongoDB treats time series collections as writable non-materialized views backed by an internal collection.
MongoDB stores system information in collections that use the <database>.system.* namespace, which MongoDB reserves for internal use.
system.buckets stores the underlying data associated to a time series collection in an optimized format and schema for an efficient representation of the persisted time series data.
In your case, it will be system.buckets.weather collection.

Related

How does MongoDB store data in key-value pairs in Wired Tiger?

I know (or think I know) that WiredTiger stores all non-index data as table-like structures. MongoDB somehow stores NoSQL BSON documents in this structure, and supports searches and indexes on specific columns. How does it do this? In other words, what is the schema by which MongoDB stores data in WiredTiger?

mongodb - combining multiple collections, but each individual query might not need all columns

Here is the scenario :
We have 2 tables (issues, anomalies) in BigQuery, which we plan to combine into a single document in MongoDB, since the 2 collections (issues, anomalies) is data about particular site.
[
{
"site": "abc",
"issues": {
--- issues data --
},
"anomalies": {
-- anomalies data --
}
}
]
There are some queries which require the 'issues' data, while others require 'anomalies' data.
In the future, we might need to show 'issues' & 'anomalies' data together, which is the reason why i'm planning to combine the two in a single document.
Questions on the approach above, wrt performance/volume of data read:
When we read the combined document, is there a way to read only specific columns (so the data volume read is not huge) ?
Or does this mean that when we read the document, the entire document is loaded in memory ?
Pls let me know.
tia!
UPDATE :
going over the mongoDB docs, we can use projections to pull only the required data from mongoDB documents.
Also, in this case - the data that is transferred over the network is only the data the specific fields that is read.
However the mongoDB server will still have to select the specific fields from the documents.

how mongodb store and manage the metadata?

how mongodb store and manage the metadata?
for example, i want to know how many field in a collection, or how many collections in a db.
i can use show collections to show the collections but i don't know how it store in the mongodb, as a document?
how mongodb store and manage the metadata?
for example, i want to know how many field in a collection, or how many collections in a db.
i can use show collections to show the collections but i don't know how it store in the mongodb, as a document?
MongoDB stores its collections on files in the data folder configured.
The format of data in collections is BSON (Binary JSON).
MongoDB is document-oriented, so each document may have different fields at all. Each document is a well-formatted JSON, then the document has metadata stored inside itself.
For example:
{
"_id":{"$oid":"5fbcec5b0f751ebbc4d46516"},
"health":"green",
"status":"open",
"pri":1,
"rep":2
}
You have the field's metadata stored in the document itself.
The type of field is derived like JSON is: With quotes is a string, without quotes is a number.
Internally, Date objects are stored as a signed 64-bit integer representing the number of milliseconds since the Unix epoch (Jan 1, 1970).
You can write a Date object using the new Date(string) method.

Which mongo document schema/structure is correct?

I have two document formats which I can't decide is the mongo way of doing things. Are the two examples equivalent? The idea is to search by userId and have userId be indexed. It seems to me the performance will be equal for either schemas.
multiple bookmarks as separate documents in a collection:
{
userId: 123,
bookmarkName: "google",
bookmarkUrl: "www.google.com"
},
{
userId: 123,
bookmarkName: "yahoo",
bookmarkUrl: "www.yahoo.com"
},
{
userId: 456,
bookmarkName: "google",
bookmarkUrl: "www.google.com"
}
multiple bookmarks within one document per user.
{
userId: 123,
bookmarks:[
{
bookmarkName: "google",
bookmarkUrl: "www.google.com"
},
{
bookmarkName: "yahoo",
bookmarkUrl: "www.yahoo.com"
}
]
},
{
userId: 456,
bookmarks:[
{
bookmarkName: "google",
bookmarkUrl: "www.google.com"
}
]
}
The problem with the second option is that it causes growing documents. Growing documents are bad for write performance, because the database will have to constantly move them around the database files.
To improve write performance, MongoDB always writes each document as a consecutive sequence to the database files with little padding between each document. When a document is changed and the change results in the document growing beyond the current padding, the document needs to be deleted and moved to the end of the current file. This is a quite slow operation.
Also, MongoDB has a hardcoded limit of 16MB per document (mostly to discourage growing documents). In your illustrated use-case this might not be a problem, but I assume that this is just a simplified example and your actual data will have a lot more fields per bookmark entry. When you store a lot of meta-data with each entry, that 16MB limit could become a problem.
So I would recommend you to pick the first option.
I would go with the option 2 - multiple bookmarks within one document per user because this schema would take advantage of MongoDB’s rich documents also known as “denormalized” models.
Embedded data models allow applications to store related pieces of information in the same database record. As a result, applications may need to issue fewer queries and updates to complete common operations. Link
There are two tools that allow applications to represent these
relationships: references and embedded documents.
When designing data models, always consider the application usage of
the data (i.e. queries, updates, and processing of the data) as well
as the inherent structure of the data itself.
The Second type of structure represents an Embedded type.
Generally Embedded type structure should be chosen when our application needs:
a) better performance for read operations.
b) the ability to request and retrieve
related data in a single database operation.
c) Data Consistency, to update related data in a single atomic write operation.
In MongoDB, operations are atomic at the document level. No single
write operation can change more than one document. Operations that
modify more than a single document in a collection still operate on
one document at a time. Ensure that your application stores all fields
with atomic dependency requirements in the same document. If the
application can tolerate non-atomic updates for two pieces of data,
you can store these data in separate documents. A data model that
embeds related data in a single document facilitates these kinds of
atomic operations.
d) to issue fewer queries and updates to complete common operations.
When not to choose:
Embedding related data in documents may lead to situations where
documents grow after creation. Document growth can impact write
performance and lead to data fragmentation. (limit of 16MB per
document)
Now let's compare the structures from a developer's perspective:
Say I want to see all the bookmarks of a particular user:
The first type would require an aggregation to be applied on all the documents.
minimum set of functions that would be required to get the aggregated results, $match,$group(with $push operator):
db.collection.aggregate([{$match:{"userId":123}},{$group:{"_id":"$userId","bookmarkNames":{$push:"$bookmarkName"},"bookMarkUrls:{$push:"$bookmarkUrl"}"}}])
or a find() which returns multiple documents to be iterated.
Wheras the Embedded type would allow us to fetch it using a $match in the find query.
db.collection.find({"userId":123});
This just indicates the added overhead from the developer's point of view. We would view the first type as an unwinded form of the embedded document.
The first type, multiple bookmarks as separate documents in a collection,
is normally used in case of logging. Where the log entries are huge and will have a TTL, time to live. The collections in that case, would be capped collections. Where documents would be automatically deleted after a particular period of time.
Bottomline, if your documents size would not grow beyond 16 MB at any particular time opt for the Embedded type. it would save developing effort as well.
See Also: MongoDB relationships: embed or reference?

Store Records without keys in mongodb or document mapping with keys

mongodb always stores records in such a way that
{
'_id' : '1' ,
'name' : 'nidhi'
}
But i want to store in a way like
{ 123 , 'nidhi'}
I do not want to store keys again and again in database.
Is it possible with mongodb or with any other database.
Is there anything like sql is possible in nosql that I set the architecture first like mysql and then start inserting values in documents.
That is not possible with MongoDB. Documents are defined by Key/Value pairs. That has something to do that BSON (Binary JSON) – the internal storage format of MongoDB – was developed from JSON (JavaScript Object Notation). And without keys, you couldn't query the database at all, except by rather awkward positional parameters.
However, if disk space is so precious, you could revise your modeling to sth like:
{ _id:1,
values:['nidhi','foo','bar','baz']
}
However, since disk space is relatively cheap when compared to computing power (not a resource MongoDB uses a lot, though) and RAM (rule of thumb for MongoDB: the more, the better), you approach doesn't make sense. For a REST API to return a record to the client, all you have to do is (pseudo code):
var docToReturn = collection.findOne({_id:requestedId});
if(docToReturn){
response.send(200,docToReturn);
} else {
response.send(404,{'state':404,'error':'resource '+requestedId+' not available'});
}
Even if the data was possible to query with your approach, you would have to map the returned values to meaningful keys. And how would you deal with the fact that Arrays don't have a guaranteed order? Or with the fact that MongoDB has dynamic schemas, so that one doc in the collection may have a totally different structure than the other?