MongoDB schema diagram - mongodb

Would diagramming mongodb schema in a class diagram (UML format) be feasible as ER diagrams relate more to SQL.
When representing the id in a high level schema, which of the following 3 has the correct type: (int or objectId or _id)
id: int
OR
id: bson.ObjectId
OR
id: _id
When representing a subdocument object in a schema diagram, which of the following 2 has the correct type (String or Object)
comments : String [
{
userName : String
date : String
actualComment : String
}
]
OR
comments : Object [
{
userName : String
date : String
actualComment : String
}
]
UPDATE
If I have the following subdocument (here is JSON representation), how does Mongo store the replies - what type would it be?
comments : String [
{
userName : String
date : String
actualComment : String
replies : comment [ ] // how does mongo store nested replies
}
]

A UML class diagram is for classes in object-oriented programming and an ER diagram is for relations in a relational database. MongoDB is neither an object database nor a relational database, so neither tool is really a good fit for MongoDB. But given only those two tools, I would rather use UML class diagrams, because ER emphasizes something which should best be avoided in MongoDB: relations between documents.
By default, the _id field is filled with a generated value of type BSON ObjectId, so your second example bson.ObjectId would be technically correct if you use the default. However, you don't have to use the default. You can also explicitly set your _id fields to an own value of any type you want. So if you want to use integers for your ObjectId's for some reason (remember that you then need to take care of keeping them unique), you can of course do so and should say so in your documentation. When you don't use custom values for your _id's and also otherwise don't make any use of them, you might consider to just omit them from your diagrams, because they are implied.
In my opinion, embedded documents are best expressed in UML class diagrams by using composition (black-diamond arrow), while referenced documents are expressed using aggregation (white-diamond arrow). A sub-document is definitely not a String. Object is better, but even better would be to use the correct type.
Regarding your follow-up question: infinitely nested data structures (comments with an array of comments with an array of comments...) can be visualize in UML through a composition arrow pointing back at the box it comes from. But keep in mind that such data structures are a bad fit for MongoDB and usually best avoided. I would rather recommend you to put each comment into an own document which references the topic it belongs to and the parent comment (aggregation). But even that's not a particularly elegant solution. MongoDB isn't built for storing graphs.

Feasible, but not suitable in all cases. FK relationships can be represented the same way. For arrays, embedded, etc. you'd have to establish a representation/interpretation.
ObjectID is the type; that's a BSON type. _id is the field name. No idea how got behind int, BSON types are 32 bit integer and 64 bit integer.
None of them. It's a simply a (sub)document.
UPDATE
It's an array technically. No specific type. In that case you probably were thinking of an array of ids of comment entities, but could be anything you want I think (including subdocuments).

Related

nosql inconsistent data structure

I'm new to nosql (MongoDB) so go easy on me.
I'm scraping json-ld from various web pages and want to store/recall the data. However the value types keep changing. For instance sometimes the "author" field uses an "organization" type, other times it's a "person" type sometimes it's simply a string, and sometimes it's just missing.
Should I convert the data to some type of standard?
Should each object be put into it's own collection and referenced?
How do you deal with displays being different.
Looking for words of experience or links to good articles on how to deal with inconsistent data structure.
The whole point of No-Sql database is that its schema less, and the structure can vary from document to other, so I see no issue in here.
I think you are asking on how you should deal with it in your application business logic, so here is my suggestion:
You can save the author as an embedded sub-document which always have a field called “type” (as an enum of values: String, Person, Organization, etc…) and act accordingly when you fetch the data.
For example, if the author is simply a String then the document would look like something like:
{
…,
“author”: {
“type”: “String”,
“text”: <text>
}
}
If its a Person type then:
{
…,
“author”: {
“type”: “Person”,
“first_name”: <first name>,
“last_name”: <last name>
}
}

mgo - bson.ObjectId vs string id

Using mgo, it seems that best practice is to set object ids to be bson.ObjectId.
This is not very convenient, as the result is that instead of a plain string id the id is stored as binary in the DB. Googling this seems to yield tons of questions like "how do I get a string out of the bson id?", and indeed in golang there is the Hex() method of the ObjectId to allow you to get the string.
The bson becomes even more annoying to work with when exporting data from mongo to another DB platform (this is the case when dealing with big data that is collected and you want to merge it with some properties from the back office mongo DB), this means a lot of pain (you need to transform the binary ObjectId to a string in order to join with the id in different platforms that do not use bson representation).
My question is: what are the benefits of using bson.ObjectId vs string id? Will I lose anything significant if I store my mongo entities with a plain string id?
As was already mentioned in the comments, storing the ObjectId as a hex string would double the space needed for it and in case you want to extract one of its values, you'd first need to construct an ObjectId from that string.
But you have a misconception. There is absolutely no need to use an ObjectId for the mandatory _id field. Quite often, I advice against that. Here is why.
Take the simple example of a book, relations and some other considerations set aside for simplicty:
{
_id: ObjectId("56b0d36c23da2af0363abe37"),
isbn: "978-3453056657",
title: "Neuromancer",
author: "William Gibson",
language: "German"
}
Now, what use would have the ObjectId here? Actually none. It would be an index with hardly any use, since you would never search your book databases by an artificial key like that. It holds no semantic value. It would be a unique ID for an object which already has a globally unique ID – the ISBN.
So we simplify our book document like this:
{
_id: "978-3453056657",
title: "Neuromancer",
author: "William Gibson",
language: "German"
}
We have reduced the size of the document, make use of a preexisting globally unique ID and do not have a basically unused index.
Back to your basic question wether you loose something by not using ObjectIds: Quite often, not using the ObjectId is the better choice. But if you use it, use the binary form.

Referencing Other Documents by String rather than ObjectId

Let's say I have two collections:
Products and Categories.
The latter collection's documents have 2 fields:
_id (BSON ObjectId)
Name (String)
The latter collection's documents have 3 fields:
_id (BSON ObjectId)
Name (String)
Products (Array of Strings)
Assume I have the following Product document:
{ "_id" : ObjectId("AAA"), "name" : "Shovel" }
Let's say I have the following Category document:
{ "_id" : ObjectId("BBB"), "Name" : "Gardening", "Products" : ["AAA"] }
For purposes of this example, assume that AAA and BBB are legitimate ObjectId's - example: ObjectId("523c7df5c30cc960b235ddee") where they would equal the inner ObjectId's string.
Should the Products field be stored as ObjectId(...)'s rather than as Strings?
I don't think it really matters that much.
I'm pretty sure that the ObjectId format encodes a hex number, so it is probably slightly more efficient with memory and bandwidth. I have done it both ways. As long as you decide, for each field, how you are going to encode it, either will work just fine.
As long as you consistently use the same type (so that comparisons happen correctly), the difference is:
An ObjectId cannot be compared to a String representation of the same ObjectId value. Thus, ObjectId("523c7df5c30cc960b235ddee") is not equal to "523c7df5c30cc960b235ddee".
ObjectIds, when stored natively, will be stored as 12 bytes, plus field name
An ObjectId, when stored as a string, will be commonly stored in 24 bytes (as it will be converted to a hexadecimal number), plus field name
Comparisons can be made more SLIGHTLY more efficiently with the 12 byte number, as it's comparing fewer bytes. It won't matter in most types of usage though, so it's a micro-optimization (but something you should know)
Bonus -- if you don't use short abbreviated field names, the size benefit of using an ObjectId natively as 12 bytes really won't matter, as the field names will far outweigh the size of bytes when stored as a string.
I'd recommend storing them as native ObjectIds. Some drivers can optionally and transparently translate to an ObjectId to a String and back so that the client code can more easily manipulate it. The C# driver for example can do this, and I've used it so that when serializing to JSON, the ObjectId is in a simple format that is easily consumed in JavaScript.
This will matter most when you try to find the details of a product starting from the Categories collection.
Since there are no server side JOIN in Mongo, your code will have to match documents together. ObjectIDs are encoded as 12 bytes, which you can easilly compare in any language. Using either strings or object ids does not really matter.
The real issue you are facing is one of data normalization (or lack thereof). If you store the Name field in your Categories documents, instead of the ObjectID, you will be able to return the products names in a single call (instead of multiple calls, 1 for each products of the category).
It feels wrong the first time you do it. After all, you will have to update many documents if you ever change the name of a product, which might or might not be frequent. You have to model your data by thinking of the way your application will use it.
Finally, index the Name attribute in the Prodcuts collection. Getting the details of a product, starting with the string you found in a Categories document will be fast.
Another way to do it is to not to have a Categories collection at all, but to add a Category attribute to your Products document. You can find documents that have the {'Category':'Gardening'}. Indexing the Category field will probably be a good idea.
Again, ObjectID or String does not matter much. It is about modeling your data thinking of how your application will use it.

Change size of Objectid

In MongoDb ObjectId is a 12-byte BSON type.
Is there any way to reduce the size of objectID?
No. It's a BSON data type. It's like asking a 32-bit integer to shrink itself.
Every object must have _id property, but you are not restricted to ObjectId.
Every document in a MongoDB collection needs to have a unique _id but the value does not have to be an ObjectId. Therefore, if you are looking to reduce the size of documents in your collection you have two choices:
Pick one of the unique properties of your documents and use it as the _id field. For example, if you have an accounts collection where the account ID--provided externally--is part of your data model, you could store the account ID in the _id field.
Manage primary keys for the collection yourself. Many drivers support custom primary key factories. As #assylias suggests, going with an int will give you good space savings but, still, you will use more space than if you can use one of the fields in your model as the _id.
BTW, the value of an _id field can be composite: you can use an Object/hash/map/dictionary. See, for example, this SO question.
If you are using some type of object/model framework on top of Mongo, I'd be careful with (1). Some frameworks have a hard time with developers overriding id generation. For example, I've had bad experience with Mongoid in Ruby. In that case, (2) may be the safer way to go as the generation happens at the driver layer.

How do I describe a collection in Mongo?

So this is Day 3 of learning Mongo Db. I'm coming from the MySql universe...
A lot of times when I need to write a query for a MySql table I'm unfamiliar with, I would use the "desc" command - basically telling me what fields I should include in my query.
How would I do that for a Mongo db? I know, I know...I'm searching for a schema in a schema-less database. =) But how else would users know what fields to use in their queries?
Am I going at this the wrong way? Obviously I'm trying to use a MySql way of doing things in a Mongo db. What's the Mongo way?
Type the below query in editor / mongoshell
var col_list= db.emp.findOne();
for (var col in col_list) { print (col) ; }
output will give you name of columns in collection :
_id
name
salary
There is no good answer here. Because there is no schema, you can't 'describe' the collection. In many (most?) MongoDb applications, however, the schema is defined by the structure of the object hierarchy used in the writing application (java or c# or whatever), so you may be able to reflect over the object library to get that information. Otherwise there is a bit of trial and error.
This is my day 30 or something like that of playing around with MongoDB. Unfortunately, we have switched back to MySQL after working with MongoDB because of my company's current infrastructure issues. But having implemented the same model on both MongoDB and MySQL, I can clearly see the difference now.
Of course, there is a schema involved when dealing with schema-less databases like MongoDB, but the schema is dictated by the application, not the database. The database will shove in whatever it is given. As long as you know that admins are not secretly logging into Mongo and making changes, and all access to the database is controller through some wrapper, the only place you should look at for the schema is your model classes. For instance, in our Rails application, these are two of the models we have in Mongo,
class Consumer
include MongoMapper::Document
key :name, String
key :phone_number, String
one :address
end
class Address
include MongoMapper::EmbeddedDocument
key :street, String
key :city, String
key :state, String
key :zip, String
key :state, String
key :country, String
end
Now after switching to MySQL, our classes look like this,
class Consumer < ActiveRecord::Base
has_one :address
end
class Address < ActiveRecord::Base
belongs_to :consumer
end
Don't get fooled by the brevity of the classes. In the latter version with MySQL, the fields are being pulled from the database directly. In the former example, the fields are right there in front of our eyes.
With MongoDB, if we had to change a particular model, we simply add, remove, or modify the fields in the class itself and it works right off the bat. We don't have to worry about keeping the database tables/columns in-sync with the class structure. So if you're looking for the schema in MongoDB, look towards your application for answers and not the database.
Essentially I am saying the exactly same thing as #Chris Shain :)
While factually correct, you're all making this too complex. I think the OP just wants to know what his/her data looks like. If that's the case, you can just
db.collectionName.findOne()
This will show one document (aka. record) in the database in a pretty format.
I had this need too, Cavachon. So I created an open source tool called Variety which does exactly this: link
Hopefully you'll find it to be useful. Let me know if you have questions, or any issues using it.
Good luck!
AFAIK, there isn't a way and it is logical for it to be so.
MongoDB being schema-less allows a single collection to have a documents with different fields. So there can't really be a description of a collection, like the description of a table in the relational databases.
Though this is the case, most applications do maintain a schema for their collections and as said by Chris this is enforced by your application.
As such you wouldn't have to worry about first fetching the available keys to make a query. You can just ask MongoDB for any set of keys (i.e the projection part of the query) or query on any set of keys. In both cases if the keys specified exist on a document they are used, otherwise they aren't. You will not get any error.
For instance (On the mongo shell) :
If this is a sample document in your people collection and all documents follow the same schema:
{
name : "My Name"
place : "My Place"
city : "My City"
}
The following are perfectly valid queries :
These two will return the above document :
db.people.find({name : "My Name"})
db.people.find({name : "My Name"}, {name : 1, place :1})
This will not return anything, but will not raise an error either :
db.people.find({first_name : "My Name"})
This will match the above document, but you will have only the default "_id" property on the returned document.
db.people.find({name : "My Name"}, {first_name : 1, location :1})
print('\n--->', Object.getOwnPropertyNames(db.users.findOne())
.toString()
.replace(/,/g, '\n---> ') + '\n');
---> _id
---> firstName
---> lastName
---> email
---> password
---> terms
---> confirmed
---> userAgent
---> createdAt
This is an incomplete solution because it doesn't give you the exact types, but useful for a quick view.
const doc = db.collectionName.findOne();
for (x in doc) {
print(`${x}: ${typeof doc[x]}`)
};
If you're OK with running a Map / Reduce, you can gather all of the possible document fields.
Start with this post.
The only problem here is that you're running a Map / Reduce on which can be resource intensive. Instead, as others have suggested, you'll want to look at the code that writes the actual data.
Just because the database doesn't have a schema doesn't mean that there is no schema. Generally speaking the schema information will be in the code.
I wrote a small mongo shell script that may help you.
https://gist.github.com/hkasera/9386709
Let me know if it helps.
You can use a UI tool mongo compass for mongoDb. This shows all the fields in that collection and also shows the variation of data in it.
If you are using NodeJS and want to get the all the field names using the API request, this code works for me-
let arrayResult = [];
db.findOne().exec(function (err, docs)){
if(err)
//show error
const JSONobj = JSON.parse(JSON.stringify(docs));
for(let key in JSONobj) {
arrayResult.push(key);
}
return callback(null, arrayResult);
}
The arrayResult will give you entire field/ column names
Output-
[
"_id",
"emp_id",
"emp_type",
"emp_status",
"emp_payment"
]
Hope this works for you!
Consider you have collection called people and you want to find the fields and it's data-types. you can use below query
function printSchema(obj) {
for (var key in obj) {
print( key, typeof obj[key]) ;
}
};
var obj = db.people.findOne();
printSchema(obj)
The result of this query will be like below,
you can use Object.keys like in JavaScript
Object.keys(db.movies.findOne())