Advice on collection schema for MongoDB - mongodb

I am developing a little league app for my son's league as a "weekend project" and as a way to learn mongodb. I'm struggling with the best way to setup the schema in MongoDB. My biggest hangup is on whether or not I should replicate some of the data. Here's my first stab at the schema
Collection -
Player
{ "firstname": "Test",
"lastname" : "Player",
"street":"123 Lamar",
"city": "Austin",
"state":"TX" ,
"zip": "78701",
"littleleagueid": "123",
"league":"minors",
"team":"Rangers",
parents :
[ {
"firstname": "Bob",
"lastname": "Player",
"relationship": "father",
"street":"123 Lamar",
"city": "Austin",
"state":"TX",
"zip": "78701"
},
{
"firstname": "Sally",
"lastname": "Player",
"relationship": "stepmother",
"street":"123 Lamar",
"city": "Austin",
"state":"TX",
"zip": "78701"
},
{
"firstname": "Sue",
"lastname": "Explayer",
"relationship": "mother",
"street":"456 Congress",
"city": "Austin",
"state":"TX",
"zip": "78761"}
]
}
My biggest question is should I embed the parents into the kids collection or should they be separated to into their own collection? The address is being repeated multiple times. This might be the best method but in a SQL environment I would have just pulled this into its own table.
Any and all advice would be greatly appreciated.

If you need fast queries:
I think your schema is more or less Ok.
If you need fast updates:
I would instead refer to the parents using their _id to convert player collection into people collection. In this way if parents change you only update one document and not as many as players having this parent.
Keep in mind that, even if you need fast queries, this is not very maintainable because you may have, as you say, duplicate data and this can lead to data inconsistencies.

Related

Which is the best design for a MongoDB database model?

I feel like the MVP of my current database needs some design changes. The number of users is growing quite fast and we are having bad performances in some requests. I also want to get rid of all the DBRef we used.
Our current model can be summarized as follow :
A company can have multiple employees (thousands)
A company can have multiple teams (hundreds)
An employee can be part of a team
A company can have multiple devices (thousands)
An employee is affected to multiple devices
Our application displays in different pages :
The company data
The users
The devices
The teams
I guess I have different options, but I'm not familiar enough with MongoDB to make the best decision.
Option 1
Do not embed and use list of ids for one to many relationships.
// Company document
{
"companyName": "ACME",
"users": [ObjectId(user1), ObjectId(user2)],
"teams": [ObjectId(team1), ObjectId(team2)],
"devices": [ObjectId(device1), ObjectId(device2)]
}
// User Document
{
"userName": "Foo",
"devices": [ObjectId(device2)]
}
// Team Document
{
"teamName": "Foo",
"users": [ObjectId(user1)]
}
// Device Document
{
"deviceName": "Foo"
}
Option 2
Embed data and duplicate informations.
// User Document
{
"companyName": "ACME",
"userName": "Foo",
"team": {
"teamName": "Foo"
},
"device": {
"deviceName": "Foo"
}
}
// Team Document
{
"teamName": "Foo"
"companyName": "ACME",
"users": [
{
"userName": "Foo"
}
]
}
// Device Document
{
"deviceName": "Foo",
"companyName": "ACME",
"user": {
"userName": "Foo"
}
}
Option 3
Do not embed and use id for one to one relationship.
// Company document
{
"companyName": "ACME"
}
// User Document
{
"userName": "Foo",
"company": ObjectId(company),
"team": ObjectId(team1)
}
// Team Document
{
"teamName": "Foo",
"company": ObjectId(company)
}
// Device Document
{
"deviceName": "Foo",
"company": ObjectId(company),
"user": ObjectId(user1)
}
MongoDB recommends to embed data as much as possible but I don't think it can be possible to embed all data in the company document. A company can have multiple devices or users and I believe it can grow too big.
I'm switching from SQL to NoSQL and I think I haven't figured it out by myself yet !
Thanks !
MongodB provides you with a feature which is handling unstructured data.
Every database can contain collection which in turn can contain documents.
Moreover, you cannot use joins in mongodB. So, storing information in one company model is a better choice because you wont be needed join in that scenario.
One more thing, You dont need to embed all the models For example : You can get user and device both from company table, so why embedding users and device as well?

MongoDB text search across two collections

I have an Order collection with address fields and User collection with names. The Order collection contains a string called userId, which is a "foreign key" into the users collection.
I am using an aggregation pipeline to filter, join, sort, and paginate queries. The problem is that I need to provide full text search on the address and name fields.
Because the $text match must be the first stage in a pipeline, I am not sure how to accomplish the goal of finding text matching any address or name field.
User collection
[{
"_id": "5cb8caa069fc1a4351cc3705",
"firstName": "James",
"lastName": "Bond"
},{
"_id": "5c58b8de8596d52c248f34d5",
"firstName": "Jack",
"lastName": "Ryan"
}]
Order Collection
[{
"_id": "5ccc94602e67ca44fe69f160",
"address": {
"streetAddress1": "1112 main st",
"streetAddress2": null,
"unitNumber": "unit 1112",
"city": "Jackson Hole",
"state": "WY",
"postalCode": "83001"
},
"userId": "5cb8caa069fc1a4351cc3705"
}]
A search for "Jack" should match both the name "Jack" and the city "Jackson Hole".

MongoDB database design for users and their data

New to MongoDB and databases in general. I'm trying to make a basic property app with Express and MongoDB for practice.
I'm looking for some help on the best way to scheme this out.
Basically, my app would have landlords and tenants. Each landlord would have a bunch of properties that information is stored about. Things like lease terms, tenant name, maintenance requests, images, etc.
The tenants would be able to sign up and be associated with the property they live in. They could submit maintenance forms, etc.
Is this a good approach? Should everything be kept in the same collection? Thanks.
{
"_id": "507f1f77bcf86cd799439011",
"user": "Corey",
"password": "hashed#PASSWORD",
"email": "corey#email.com",
"role": "landlord",
"properties": [
{
"addressId": "1",
"address": "101 Main Street",
"tenant": "John Smith",
"leaseDate": "04/21/2016",
"notes": "These are my notes about this property.",
"images": [ "http://www.imagelink.com/image1", "http://www.imagelink.com/image2", "http://www.imagelink.com/image3"]
},
{
"addressId": "2",
"address": "105 Maple Street",
"tenant": "John Jones",
"leaseDate": "01/01/2018",
"notes": "These are my notes about 105 Maple Ave property.",
"images": ["http://www.imagelink.com/image1", "http://www.imagelink.com/image2", "http://www.imagelink.com/image3"],
"forms": [
{
"formType": "lease",
"leaseTerm": "12 months",
"leaseName": "John Jones",
"leaseDate": "01/01/2018"
},
{
"formtype": "maintenance",
"maintenanceNotes": "Need furnace looked at. Doesn't heat properly.",
"maintenanceName": "John Jones",
"maintenanceDate": "01/04/2018",
"status": "resolved"
},
]
},
{
"addressId": "3",
"address": "110 Chestnut Street",
"tenant": "John Brown",
"leaseDate": "07/28/2014",
"notes": "These are some notes about 110 Chestnut Ave property.",
"images": [ "http://www.imagelink.com/image1", "http://www.imagelink.com/image2", "http://www.imagelink.com/image3"]
}
]
}
{
"_id": "507f1f77bcf86cd799439012",
"user": "John",
"password": "hashed#PASSWORD",
"email": "john#email.com",
"role": "tenant",
"address": "2",
"images": [ "http://www.imagelink.com/image1", "http://www.imagelink.com/image2" ]
}
For this relation I'd suggest three collections (Landlords, Properties, and Tenants), with each tenant having a "landLordId" and "propertyId".
This "landLordId" would simply be the ObjectId of the landLord, and same for the property Id.
This will make your life easier if you plan to do any kind of roll-up reports or if the you have more than one-to-one mappings for landlords to properties or landlords to tenants. (Example, more than one property manager for a given property)
This just makes everything easier/more intuitive as you could simply add things like maintenance requests, lease terms etc in arrays on the tenants with references to whatever need be.
This offers the most flexibility in terms of being able to aggregate easily for any kind of report/query.

Embed or reference in Mongodb

I am developing a small app which will store information on users, accounts and transactions. The users will have many accounts (probably less than 10) and the accounts will have many transactions (perhaps 1000's). Reading the Docs it seems to suggest that embedding as follows is the way to go...
{
"username": "joe",
"accounts": [
{
"name": "account1",
"transactions": [
{
"date": "2013-08-06",
"desc": "transaction1",
"amount": "123.45"
},
{
"date": "2013-08-07",
"desc": "transaction2",
"amount": "123.45"
},
{
"date": "2013-08-08",
"desc": "transaction3",
"amount": "123.45"
}
]
},
{
"name": "account2",
"transactions": [
{
"date": "2013-08-06",
"desc": "transaction1",
"amount": "123.45"
},
{
"date": "2013-08-07",
"desc": "transaction2",
"amount": "123.45"
},
{
"date": "2013-08-08",
"desc": "transaction3",
"amount": "123.45"
}
]
}
]
}
My question is... Since the list of transactions will grow to perhaps 1000's within the document will the data become fragmented and slow the performance. Would I be better to have a document to store the users and the accounts which will not grow as big and then a separate collection to store transactions which are referenced to the accounts. Or is there a better way?
This is not the way to go. You have a lot of transactions, and you don't know how many you will get. Instead of this, you should store them like:
{
"username": "joe",
"name": "account1",
"date": "2013-08-06",
"desc": "transaction1",
"amount": "123.45"
},
{
"username": "joe",
"name": "account1",
"date": "2013-08-07",
"desc": "transaction2",
"amount": "123.45"
},
{
"username": "joe",
"name": "account1",
"date": "2013-08-08",
"desc": "transaction3",
"amount": "123.45"
},
{
"username": "joe",
"name": "account2",
"date": "2013-08-06",
"desc": "transaction1",
"amount": "123.45"
},
{
"username": "joe",
"name": "account2",
"date": "2013-08-07",
"desc": "transaction2",
"amount": "123.45"
},
{
"username": "joe",
"name": "account2",
"date": "2013-08-08",
"desc": "transaction3",
"amount": "123.45"
}
In a NoSQL database like MongoDB you shouldn't be afraid to denormalise. As you noticed, I haven't even bothered with a separate collection for users. If your users have more information that you will have to show with each transaction, you might want to consider including that information as well.
If you need to search on, or select by, any of those fields, then don't forget to create indexes, for example:
// look up all transactions for an account
db.transactions.ensureIndex( { username: 1, name: 1 } );
and:
// look up all transactions for "2013-08-06"
db.transactions.ensureIndex( { date: 1 } );
etc.
There are a lot of advantages to duplicate data. With a schema like above, you can have as many transactions as possible and you will never get any fragmentation as documents never change - you only add to them. This also increases write performance and also makes it a lot easier to do other queries.
Alternative
An alternative might be to store username/name in a collection and only use it's ID with the transactions:
Accounts:
{
"username": "joe",
"name": "account1",
"account_id": 42,
}
Transactions:
{
"account_id": 42,
"date": "2013-08-06",
"desc": "transaction1",
"amount": "123.45"
},
This creates smaller transaction documents, but it does mean you have to do two queries to also get user information.
Since the list of transactions will grow to perhaps 1000's within the document will the data become fragmented and slow the performance.
Almost certainly, infact I would be surprised if over a period of years transactions only reached into the thousands instead of 10's of thousand for a single account.
Added the level of fragmentation you will witness from the consistently growing document over time you could end up with serious problems, if not running out of root document space (with it being 16meg). In fact looking at the fact that you store all accounts for a person under one document I would say you run a high risk of filling up a document in the space of about 2 years.
I would reference this relationship.
I would separate the transactions to a different collections. Seems like the data and update patterns between users and transactions are quite different. If transactions are constantly added to the user and causes it to grow all the time it will be moved a lot in the mongo file. So yes, it brings performance impact (fragmentation, more IO, more work for mongo).
Also, array operation performance sometimes desegregates on big arrays in documents, so holding 1000s of object in an array might not be a good idea (depends on what you do with it).
You should consider creating indexes, using the ensureIndex() function, it should reduce the risk of performance issues.
The earlier you add these, the better you'll understand how the collection should be structured.
I haven't been using mongo too long but I haven't come across any issues(not yet anyway) of data being fragmented
Edit If you intend to use this for multi-object commits, mongo doesn't support rollbacks. You need to use the 64bit version to allow journaling and make transactions durable.

mongodb best practice: nesting

Is this example of nesting generally accepted as good or bad practice (and why)?
A collection called users:
user
basic
name : value
url : value
contact
email
primary : value
secondary : value
address
en-gb
address : value
city : value
state : value
postalcode : value
country : value
es
address : value
city : value
state : value
postalcode : value
country : value
Edit: From the answers in this post I've updated the schema applying the following rules (the data is slightly different from above):
Nest, but only one level deep
Remove unneccesary keys
Make use of arrays to make objects more flexible
{
"_id": ObjectId("4d67965255541fa164000001"),
"name": {
"0": {
"name": "Joe Bloggs",
"il8n": "en"
}
},
"type": "musician",
"url": {
"0": {
"name": "joebloggs",
"il8n": "en"
}
},
"tags": {
"0": {
"name": "guitar",
"points": 3,
"il8n": "en"
}
},
"email": {
"0": {
"address": "joe.bloggs#example.com",
"name": "default",
"primary": 1,
"il8n": "en"
}
},
"updates": {
"0": {
"type": "news",
"il8n": "en"
}
},
"address": {
"0": {
"address": "1 Some street",
"city": "Somecity",
"state": "Somestate",
"postalcode": "SOM STR",
"country": "UK",
"lat": 49.4257641,
"lng": -0.0698241,
"primary": 1,
"il8n": "en"
}
},
"phone": {
"0": {
"number": "+44 (0)123 4567 890",
"name": "Home",
"primary": 1,
"il8n": "en"
},
"1": {
"number": "+44 (0)098 7654 321",
"name": "Mobile",
"il8n": "en"
}
}
}
Thanks!
In my opinion above schema not 'generally accepted', but looks like great. But i suggest some improvements thats will help you to query on your document in future:
User
Name
Url
Emails {email, emailType(primary, secondary)}
Addresses{address, city, state, postalcode, country, language}
Nesting is always good, but two or three level nesting deep can create additional troubles in quering/updating.
Hope my suggestions will help you make right choice of schema design.
You may want to take a look at schema design in MongoDB, and specifically the advice on embedding vs. references.
Embedding is preferred as "Data is then colocated on disk; client-server turnarounds to the database are eliminated". If the parent object is in RAM, then access to the nested objects will always be fast.
In my experience, I've never found any "best practices" for what a MongoDB record actually looks like. The question to really answer is, "Does this MongoDB schema allow me to do what I need to do?"
For example, if you had a list of addresses and needed to update one of them, it'd be a pain since you'd need to iterate through all of them or know which position a particular address was located. You're safe from that since there is a key-value for each address.
However, I'd say nix the basic and contact keys. What do these really give you? If you index name, it'd be basic.name rather than just name. AFAIK, there are some performance impacts to long vs. short key names.
Keep it simple enough to do what you need to do. Try something out and iterate on it...you won't get it right the first time, but the nice thing about mongo is that it's relatively easy to rework your schema as you go.
That is acceptable practice. There are some problems with nesting an array inside of an array. See SERVER-831 for one example. However, you don't seem to be using arrays in your collection at all.
Conversely, if you were to break this up into multiple collections, you would have to deal with a lack of transactions and the resulting race conditions in your data access code.