mongodb best practice: nesting - mongodb

Is this example of nesting generally accepted as good or bad practice (and why)?
A collection called users:
user
basic
name : value
url : value
contact
email
primary : value
secondary : value
address
en-gb
address : value
city : value
state : value
postalcode : value
country : value
es
address : value
city : value
state : value
postalcode : value
country : value
Edit: From the answers in this post I've updated the schema applying the following rules (the data is slightly different from above):
Nest, but only one level deep
Remove unneccesary keys
Make use of arrays to make objects more flexible
{
"_id": ObjectId("4d67965255541fa164000001"),
"name": {
"0": {
"name": "Joe Bloggs",
"il8n": "en"
}
},
"type": "musician",
"url": {
"0": {
"name": "joebloggs",
"il8n": "en"
}
},
"tags": {
"0": {
"name": "guitar",
"points": 3,
"il8n": "en"
}
},
"email": {
"0": {
"address": "joe.bloggs#example.com",
"name": "default",
"primary": 1,
"il8n": "en"
}
},
"updates": {
"0": {
"type": "news",
"il8n": "en"
}
},
"address": {
"0": {
"address": "1 Some street",
"city": "Somecity",
"state": "Somestate",
"postalcode": "SOM STR",
"country": "UK",
"lat": 49.4257641,
"lng": -0.0698241,
"primary": 1,
"il8n": "en"
}
},
"phone": {
"0": {
"number": "+44 (0)123 4567 890",
"name": "Home",
"primary": 1,
"il8n": "en"
},
"1": {
"number": "+44 (0)098 7654 321",
"name": "Mobile",
"il8n": "en"
}
}
}
Thanks!

In my opinion above schema not 'generally accepted', but looks like great. But i suggest some improvements thats will help you to query on your document in future:
User
Name
Url
Emails {email, emailType(primary, secondary)}
Addresses{address, city, state, postalcode, country, language}
Nesting is always good, but two or three level nesting deep can create additional troubles in quering/updating.
Hope my suggestions will help you make right choice of schema design.

You may want to take a look at schema design in MongoDB, and specifically the advice on embedding vs. references.
Embedding is preferred as "Data is then colocated on disk; client-server turnarounds to the database are eliminated". If the parent object is in RAM, then access to the nested objects will always be fast.

In my experience, I've never found any "best practices" for what a MongoDB record actually looks like. The question to really answer is, "Does this MongoDB schema allow me to do what I need to do?"
For example, if you had a list of addresses and needed to update one of them, it'd be a pain since you'd need to iterate through all of them or know which position a particular address was located. You're safe from that since there is a key-value for each address.
However, I'd say nix the basic and contact keys. What do these really give you? If you index name, it'd be basic.name rather than just name. AFAIK, there are some performance impacts to long vs. short key names.
Keep it simple enough to do what you need to do. Try something out and iterate on it...you won't get it right the first time, but the nice thing about mongo is that it's relatively easy to rework your schema as you go.

That is acceptable practice. There are some problems with nesting an array inside of an array. See SERVER-831 for one example. However, you don't seem to be using arrays in your collection at all.
Conversely, if you were to break this up into multiple collections, you would have to deal with a lack of transactions and the resulting race conditions in your data access code.

Related

MongoDB database design for users and their data

New to MongoDB and databases in general. I'm trying to make a basic property app with Express and MongoDB for practice.
I'm looking for some help on the best way to scheme this out.
Basically, my app would have landlords and tenants. Each landlord would have a bunch of properties that information is stored about. Things like lease terms, tenant name, maintenance requests, images, etc.
The tenants would be able to sign up and be associated with the property they live in. They could submit maintenance forms, etc.
Is this a good approach? Should everything be kept in the same collection? Thanks.
{
"_id": "507f1f77bcf86cd799439011",
"user": "Corey",
"password": "hashed#PASSWORD",
"email": "corey#email.com",
"role": "landlord",
"properties": [
{
"addressId": "1",
"address": "101 Main Street",
"tenant": "John Smith",
"leaseDate": "04/21/2016",
"notes": "These are my notes about this property.",
"images": [ "http://www.imagelink.com/image1", "http://www.imagelink.com/image2", "http://www.imagelink.com/image3"]
},
{
"addressId": "2",
"address": "105 Maple Street",
"tenant": "John Jones",
"leaseDate": "01/01/2018",
"notes": "These are my notes about 105 Maple Ave property.",
"images": ["http://www.imagelink.com/image1", "http://www.imagelink.com/image2", "http://www.imagelink.com/image3"],
"forms": [
{
"formType": "lease",
"leaseTerm": "12 months",
"leaseName": "John Jones",
"leaseDate": "01/01/2018"
},
{
"formtype": "maintenance",
"maintenanceNotes": "Need furnace looked at. Doesn't heat properly.",
"maintenanceName": "John Jones",
"maintenanceDate": "01/04/2018",
"status": "resolved"
},
]
},
{
"addressId": "3",
"address": "110 Chestnut Street",
"tenant": "John Brown",
"leaseDate": "07/28/2014",
"notes": "These are some notes about 110 Chestnut Ave property.",
"images": [ "http://www.imagelink.com/image1", "http://www.imagelink.com/image2", "http://www.imagelink.com/image3"]
}
]
}
{
"_id": "507f1f77bcf86cd799439012",
"user": "John",
"password": "hashed#PASSWORD",
"email": "john#email.com",
"role": "tenant",
"address": "2",
"images": [ "http://www.imagelink.com/image1", "http://www.imagelink.com/image2" ]
}
For this relation I'd suggest three collections (Landlords, Properties, and Tenants), with each tenant having a "landLordId" and "propertyId".
This "landLordId" would simply be the ObjectId of the landLord, and same for the property Id.
This will make your life easier if you plan to do any kind of roll-up reports or if the you have more than one-to-one mappings for landlords to properties or landlords to tenants. (Example, more than one property manager for a given property)
This just makes everything easier/more intuitive as you could simply add things like maintenance requests, lease terms etc in arrays on the tenants with references to whatever need be.
This offers the most flexibility in terms of being able to aggregate easily for any kind of report/query.

Xcode: model adding entities

I am experimenting with API that returns some fields with underscore like _id. I am not able to map this field in the -xcdatamodel. The attribute must begin with letter.
I've also tried to map this field as "id" and provide in the "User Info" session a Key/Value like id : _id but without success.
Do you have a solution for this problem? As i know there are many APIs that have fields with underscore.
Other non underscore fields are mapped without problems.
{
"__v": 0,
"_avRateDelay": 5,
"_avRateRecommend": 5,
"_avRateStaff": 5,
"_id": "530f733df222bf594b190e0a10",
"_reviews": 1,
"active": 1,
"address": {
"city": "Little Rock",
"country": "USA",
"other": "",
"state": "AZ",
"street": "2701 E Roosevelt Rd",
"zip": "72206"
},
"location": {
"lat": 34.721175,
"lng": -92.24168600000002
},
"name": "Certainteed 69"
}
Don't use id or _id in Objective-C. id is a reserved word. Since many servers like to use that I recommend that you write mapping code so that it is mapped from the server id to something like identifier.
Since you need to write code to parse the fields anyway there is no hardship to look for that key and change it. You can even store the mapping in the NSEntityDescription and set up code to look for other mappings and change them. That way you can change other server styled values like created_at to their Objective-C counterparts like createdAt.
The key/values are editable directly in the model editor and then accessible via the -entity property on the NSManagedObject.

Mongo - What is best design: store menu documents or menu item documents?

I want to store website menus in Mongo for the navigation of my CMS, but since I'm new to Mongo and the concept of documents, I'm trying to figure out what would be best:
a) Should I store menu documents, containing children and those having more children, or
b) Should I store menu item documents with parent_id and child_ids ?
Both would appear to have benefits, since in case A it's normal to load an entire menu at once as you'll need everything to display, but B might be easier to update single items?
I'm using Spring data mongo.
PS: If I asked this question in a wrong way, please let me know. I'm sure this question can be expanded to any general parent-child relationship, but I was having trouble finding the right words.
Since menus are typically going to be very small (under 16MB I hope) then the embedded form should give you the best performance:
{
"topItem1": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
],
"topItem2": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
{
"name": "sub-menu",
"type": "sub",
"items": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
}
}
]
}
The only possible issue there is with updating the content inside nested arrays, as MngoDB can only "match" the first found array index. See the positional $ operator documentation for this.
But as long as you know the positions then this should not be a problem, using "dot notation" concepts:
db.menu.update({}, {
"$set": {
"topItem2.2.items.1": { "name": "item3", "link": "linkValue" }
}
})
But general adding should be simple:
db.menu.update(
{ "topItem2.name": "sub-menu" },
{
"$push": {
"topItem2.2.items": { "name": "item4", "link": "linkValue" }
}
}
)
So that is a perspective on how to use the inherrent embedded structure rather than associate "parent" and "child" items.
After long hard thinking I believe I would use:
{
_id: {},
submenu1: [
{label: "Whatever", url: "http://localhost/whatever"}
]
}
I thought about using related documents with IDs all sitting in a collection but then you would have to shoot off multiple queries to get the parent and its range, possibly even sub-sub ranges too. With this structure you have only one query for all.
This structure is not infallible however, if you change your menu items regularly you might start to notice fragmentation. You can remedy this a little with powerof2sizes allocation: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes
But yes, with careful planning you should be able to use one single document for every parent menu item

Embed or reference in Mongodb

I am developing a small app which will store information on users, accounts and transactions. The users will have many accounts (probably less than 10) and the accounts will have many transactions (perhaps 1000's). Reading the Docs it seems to suggest that embedding as follows is the way to go...
{
"username": "joe",
"accounts": [
{
"name": "account1",
"transactions": [
{
"date": "2013-08-06",
"desc": "transaction1",
"amount": "123.45"
},
{
"date": "2013-08-07",
"desc": "transaction2",
"amount": "123.45"
},
{
"date": "2013-08-08",
"desc": "transaction3",
"amount": "123.45"
}
]
},
{
"name": "account2",
"transactions": [
{
"date": "2013-08-06",
"desc": "transaction1",
"amount": "123.45"
},
{
"date": "2013-08-07",
"desc": "transaction2",
"amount": "123.45"
},
{
"date": "2013-08-08",
"desc": "transaction3",
"amount": "123.45"
}
]
}
]
}
My question is... Since the list of transactions will grow to perhaps 1000's within the document will the data become fragmented and slow the performance. Would I be better to have a document to store the users and the accounts which will not grow as big and then a separate collection to store transactions which are referenced to the accounts. Or is there a better way?
This is not the way to go. You have a lot of transactions, and you don't know how many you will get. Instead of this, you should store them like:
{
"username": "joe",
"name": "account1",
"date": "2013-08-06",
"desc": "transaction1",
"amount": "123.45"
},
{
"username": "joe",
"name": "account1",
"date": "2013-08-07",
"desc": "transaction2",
"amount": "123.45"
},
{
"username": "joe",
"name": "account1",
"date": "2013-08-08",
"desc": "transaction3",
"amount": "123.45"
},
{
"username": "joe",
"name": "account2",
"date": "2013-08-06",
"desc": "transaction1",
"amount": "123.45"
},
{
"username": "joe",
"name": "account2",
"date": "2013-08-07",
"desc": "transaction2",
"amount": "123.45"
},
{
"username": "joe",
"name": "account2",
"date": "2013-08-08",
"desc": "transaction3",
"amount": "123.45"
}
In a NoSQL database like MongoDB you shouldn't be afraid to denormalise. As you noticed, I haven't even bothered with a separate collection for users. If your users have more information that you will have to show with each transaction, you might want to consider including that information as well.
If you need to search on, or select by, any of those fields, then don't forget to create indexes, for example:
// look up all transactions for an account
db.transactions.ensureIndex( { username: 1, name: 1 } );
and:
// look up all transactions for "2013-08-06"
db.transactions.ensureIndex( { date: 1 } );
etc.
There are a lot of advantages to duplicate data. With a schema like above, you can have as many transactions as possible and you will never get any fragmentation as documents never change - you only add to them. This also increases write performance and also makes it a lot easier to do other queries.
Alternative
An alternative might be to store username/name in a collection and only use it's ID with the transactions:
Accounts:
{
"username": "joe",
"name": "account1",
"account_id": 42,
}
Transactions:
{
"account_id": 42,
"date": "2013-08-06",
"desc": "transaction1",
"amount": "123.45"
},
This creates smaller transaction documents, but it does mean you have to do two queries to also get user information.
Since the list of transactions will grow to perhaps 1000's within the document will the data become fragmented and slow the performance.
Almost certainly, infact I would be surprised if over a period of years transactions only reached into the thousands instead of 10's of thousand for a single account.
Added the level of fragmentation you will witness from the consistently growing document over time you could end up with serious problems, if not running out of root document space (with it being 16meg). In fact looking at the fact that you store all accounts for a person under one document I would say you run a high risk of filling up a document in the space of about 2 years.
I would reference this relationship.
I would separate the transactions to a different collections. Seems like the data and update patterns between users and transactions are quite different. If transactions are constantly added to the user and causes it to grow all the time it will be moved a lot in the mongo file. So yes, it brings performance impact (fragmentation, more IO, more work for mongo).
Also, array operation performance sometimes desegregates on big arrays in documents, so holding 1000s of object in an array might not be a good idea (depends on what you do with it).
You should consider creating indexes, using the ensureIndex() function, it should reduce the risk of performance issues.
The earlier you add these, the better you'll understand how the collection should be structured.
I haven't been using mongo too long but I haven't come across any issues(not yet anyway) of data being fragmented
Edit If you intend to use this for multi-object commits, mongo doesn't support rollbacks. You need to use the 64bit version to allow journaling and make transactions durable.

REST - Resource and Collection Representations

I have a confusion with the design of collection resources.
Let's say I have a user resource - represented as below.
http://www.example.com/users/{user-id}
user : {
id : "",
name : "",
age : "",
addresses : [
{
line1 : "",
line2 : "",
city : "",
state : "",
country : "",
zip : ""
}
]
}
Now, how should my users collection resource representation be? Should it be a list of user representations (as above)? Or can it be a subset of that like below:
http://www.example.com/users/
users : [
{
id : "",
name : "",
link : {
rel : "self",
href : "/users/{id}"
}
}
]
Should the collection resource representation include the complete representation of the containing resources or can it be a subset?
Media types define the rules on how you can convey information. Look at the specifications for Collection+JSON and HAL for examples of how to do what you are trying to do.
That falls entirely on what you want it to do. The great thing about REST APIs is that they are so flexible. You can represent data in any way (theoretically) that you want.
Personally, I would have an attribute that allows the user to specify a subset or style of representation. For instance /users/{userid}.json?output=simple or /users/{userid}.json?subset=raw
Something along those lines would also allow you to nest representations and fine tune what you want without sacrificing flexibility:
/users/{userid}.json?output=simple&subset=raw
The sky is the limit
I would make the list service fine grained by entertaining the
http://www.example.com/users?include=address|profile|foo|bar
Any delimiter (other than & and URL encoded) like , or - can be used instead of |. On the server side, check for those include attributes and render the JSON response accordingly.
There isn't really a standard for this. You have options:
1. List of links
Return a list of links to the collection item resources (i.e., the user IDs).
http://www.example.com/users/
users : [
"jsmith",
"mjones",
...
]
Note that these can actually be interpreted as relative URIs, which somewhat supports the "all resources should be accessible by following URIs from the root URI" ideal.
http://www.example.com/users/ + jsmith = http://www.example.com/users/jsmith
2. List of partial resources
Return a list of partial resources (users), allowing the caller to specify which fields to include. You might also have a default selection of fields in case the user doesn't supply any - the default might even be "include all fields."
http://www.example.com/users/?field=id&field=name&field=link
users : [
{
id : "jsmith",
name : "John Smith",
link : "www.google.com"
},
...
]
It can be subset but depends on the data. take a look at the below code.
{
"usersList": {
"users": [{
"firstName": "Venkatraman",
"lastName": "Ramamoorthy",
"age": 27,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": 10021
},
"phoneNumbers": [{
"type": "mobile",
"number": "+91-9999988888"
}, {
"type": "fax",
"number": "646 555-4567"
}]
}, {
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": 10021
},
"phoneNumbers": [{
"type": "home",
"number": "212 555-1234"
}, {
"type": "fax",
"number": "646 555-4567"
}]
}]
}
}