MongoDB: mapping collection names for user

MongoDB: mapping collection names for user - mongodb

I have a following corner-case for MongoDB that I hope you can help me to solve.
My MongoDB database is used by multiple independent users and there's a technical limitations that holds me from creating a DB per user. Users are untrusted. Users will create collections with arbitrary names.
Is there any way to "namespace" the collections of one user from the collections of the other user? For example, when user "jim" makes a collection "orders" it will not clash with user "bob" creating collection "orders".
Users are authenticated and connected through SSL-protected channel.

Don't create these many collections in DB. In one collection you maintain that collection fields in one document. So, you will be able to access all the collection by "_id" unique field. And all documents fields and values will be according to user choice.
For example you have one collection "user_collection" which stores all the collection details of users.
user_collection
{
{ "_id" : "0921092109227812",
"collectionName": "orders",
"user": ObjectId(Ref_Id1),
"fields": []
},
{ "_id" : "5686565681232344",
"collectionName": "orders",
"user": ObjectId(Ref_Id2),
"fields": []
}
}
I have just given you the schema. You can elaborate this schema according to your requirements.

Related

MongoDB Embedding alongside referencing

There is a lot of content of what kind of relationships should use in a database schema. However, I have not seen anything about mixing both techniques. 
The idea is to embed only the necessaries attributes and with them a reference. This way the application have the necessary data for rendering and the reference for the updating methods.
The problem I see here is that the logic for handle any CRUD operations becomes more tricky because its mandatory to update multiples collections however I have all the information in one single read.
Basic schema for a page that only wants the students names of a classroom:
CLASSROOM COLLECTION
{"_id": ObjectID(),
"students": [{"studentId" : ObjectID(),
"name" : "John Doe",
},
...
]
}
STUDENTS COLLECION
{"_id": ObjectID(),
"name" : "John Doe",
"address" : "...",
"age" : "...",
"gender": "..."
}
I use the students' collection in a different page and there I do not want any information about the classroom. That is the reason not to embed the students.
I started to learning mongo a few days ago and I don't know if this kind of schema bring some problems.

You can embed some fields and store other fields in a different collection as you are suggesting.
The issues with such an arrangement in my opinion would be:
What is the authority for a field? For example, what if a field like name is both embedded and stored in the separate collection, and the values differ?
Both updating and querying become awkward as you need to do it differently depending on which field is being worked with. If you make a mistake and go in the wrong place, you create/compound the first issue.

MongoDB collections design

I've got such four tables:
Point is that users that joined in particular group have access to a survey for time interval from date to date. How should i organize collection structure of such db in mongodb?
For survey and questions this will be a simple colection of surveys with an array of questions. But for this behavior with start/end of survey it is not clear for me how to store this data.

What about something like.
Groups
{
_id : "group1",
"members" : [{"name":"A"...},{"name":"B"...}],
"surveys" : [{"surveyId":"survey1", "startDate": ISODate(),"endDate":ISODate()},{"surveyId":"survey2", "startDate": ISODate(),"endDate":ISODate()}]
}
Surveys
{
_id : "survey1",
questions : [{"text":"Atheist??"...},{....}]
}
Honestly, it depends on what pattern you want to use, I mean you can embed groups inside survey also with registration details.

Best way to represent multilingual database on mongodb

I have a MySQL database to support a multilingual website where the data is represented as the following:
table1
id
is_active
created
table1_lang
table1_id
name
surname
address
What's the best way to achieve the same on mongo database?

You can either design a schema where you can reference or embed documents. Let's look at the first option of embedded documents. With you above application, you might store the information in a document as follows:
// db.table1 schema
{
"_id": 3, // table1_id
"is_active": true,
"created": ISODate("2015-04-07T16:00:30.798Z"),
"lang": [
{
"name": "foo",
"surname": "bar",
"address": "xxx"
},
{
"name": "abc",
"surname": "def",
"address": "xyz"
}
]
}
In the example schema above, you would have essentially embedded the table1_lang information within the main table1document. This design has its merits, one of them being data locality. Since MongoDB stores data contiguously on disk, putting all the data you need in one document ensures that the spinning disks will take less time to seek to a particular location on the disk. If your application frequently accesses table1 information along with the table1_lang data then you'll almost certainly want to go the embedded route. The other advantage with embedded documents is the atomicity and isolation in writing data. To illustrate this, say you want to remove a document which has a lang key "name" with value "foo", this can be done with one single (atomic) operation:
db.table.remove({"lang.name": "foo"});
For more details on data modelling in MongoDB, please read the docs Data Modeling Introduction, specifically Model One-to-Many Relationships with Embedded Documents
The other design option is referencing documents where you follow a normalized schema. For example:
// db.table1 schema
{
"_id": 3
"is_active": true
"created": ISODate("2015-04-07T16:00:30.798Z")
}
// db.table1_lang schema
/*
1
*/
{
"_id": 1,
"table1_id": 3,
"name": "foo",
"surname": "bar",
"address": "xxx"
}
/*
2
*/
{
"_id": 2,
"table1_id": 3,
"name": "abc",
"surname": "def",
"address": "xyz"
}
The above approach gives increased flexibility in performing queries. For instance, to retrieve all child table1_lang documents for the main parent entity table1 with id 3 will be straightforward, simply create a query against the collection table1_lang:
db.table1_lang.find({"table1_id": 3});
The above normalized schema using document reference approach also has an advantage when you have one-to-many relationships with very unpredictable arity. If you have hundreds or thousands of table_lang documents per give table entity, embedding has so many setbacks in as far as spacial constraints are concerned because the larger the document, the more RAM it uses and MongoDB documents have a hard size limit of 16MB.
The general rule of thumb is that if your application's query pattern is well-known and data tends to be accessed only in one way, an embedded approach works well. If your application queries data in many ways or you unable to anticipate the data query patterns, a more normalized document referencing model will be appropriate for such case.
Ref:
MongoDB Applied Design Patterns: Practical Use Cases with the Leading NoSQL Database By Rick Copeland

MongoDB design for scalability

We want to design a scalable database. If we have N users with 1 Billion user responses, from the 2 options below which will be a good design? We would want to query based on userID as well as Reponse ID.
Having 2 Collections one for the user information and another to store the responses along with user ID. Each response is stored as a document so we will have 1 billion documents.
User Collection
{
"userid" : "userid1",
"password" : "xyz",
,
"City" : "New York",
},
{
"userid" : "userid2",
"password" : "abc",
,
"City" : "New York",
}
responses Collection
{
"userid": "userid1",
"responseID": "responseID1",
"response" : "xyz"
},
{
"userid": "userid1",
"responseID": "responseID2",
"response" : "abc"
},
{
"userid": "userid2",
"responseID": "responseID3",
"response" : "mno"
}
Having 1 Collection to store both the information as below. Each response is represented by a new key (responseIDX).
{
"userid" : "userid1",
"responseID1" : "xyz",
"responseID2" : "abc",
,
"responseN"; "mno",
"city" : "New York"
}

If you use your first options, I'd use a relational database (like MySQL) opposed to MongoDB. If you're heartfelt on MongoDB, use it to your advantage.
{
"userId": n,
"city": "foo"
"responses": {
"responseId1": "response message 1",
"responseId2": "response message 2"
}
}
As for which would render a better performance, run a few benchmark tests.

Between the two options you've listed - I would think using a separate collection would scale better - or possibly a combination of a separate collection and still using embedded documents.
Embedded documents can be a boon to your schema design - but do not work as well when you have an endlessly growing set of embedded documents (responses, in your case). This is because of document growth - as the document grows - and outgrows the allocated amount of space for it on disk, MongoDB must move that document to a new location to accommodate the new document size. That can be expensive and have severe performance penalties when it happens often or in high concurrency environments.
Also, querying on those embedded documents can become troublesome when you are looking to selectively return only a subset of responses, especially across users. As in - you can not return only the matching embedded documents. Using the positional operator, it is possible to get the first matching embedded document however.
So, I would recommend using a separate collection for the responses.
Though, as mentioned above, I would also suggest experimenting with other ways to group those responses in that collection. A document per day, per user, per ...whatever other dimensions you might have, etc.
Group them in ways that allow multiple embedded documents and compliments how you would query for them. If you can find the sweet spot between still using embedded documents in that collection and minimizing document growth, you'll have fewer overall documents and smaller index sizes. Obviously this requires benchmarking and testing, as the same caveats listed above can apply.
Lastly (and optionally), with that type of data set, consider using increment counters where you can on the front end to supply any type of aggregated reporting you might need down the road. Though the Aggregation Framework in MongoDB is great - having, say, the total response count for a user pre-aggregated is far more convenient then trying to get a count by running a aggregate query on the full dataset.

How to enforce foreign keys in NoSql databases (MongoDB)?

Let's say I have a collection of documents such as:
{ "_id" : 0 , "owner":0 "name":"Doc1"},{ "_id" : 1 , "owner":1, "name":"Doc1"}, etc
And, on the other hand the owners are represented as a separate collection:
{ "_id" : 0 , "username":"John"}, { "_id" : 1 , "username":"Sam"}
How can I make sure that, when I insert a document it references the user in a correct way. In old-school RDBMS this could easily be done using a Foreign Key.
I know that I can check the correctness of insertion from my business code, BUT what if an attacker tampers with my request to the server and puts "owner" : 100, and Mongo doesn't throw any exception back.
I would like to know how this situation should be handled in a real-word application.
Thank you in advance!

MongoDB doesn't have foreign keys (as you have presumably noticed). Fundamentally the answer is therefore, "Don't let users tamper with the requests. Only let the application insert data that follows your referential integrity rules."
MongoDB is great in lots of ways... but if you find that you need foreign keys, then it's probably not the correct solution to your problem.

To answer your specific question - while MongoDB encourages handling foreign-key relationships on the client side, they also provide the idea of "Database References" - See this help page.
That said, I don't recommend using a DBRef. Either let your client code manage the associations or (better yet) link the documents together from the start. You may want to consider embedding the owner's "documents" inside the owner object itself. Assemble your documents to match your usage patterns and MongoDB will shine.

This is a one-to-one to relationship. It's better to embed one document in another, instead of maintaining separate collections. Check here on how to model them in mongodb and their advantages.
Although its not explicitly mentioned in the docs, embedding gives you the same effect as foreign key constraints. Just want to make this idea clear. When you have two collections like that:
C1:
{ "_id" : 0 , "owner":0 "name":"Doc1"},{ "_id" : 1 , "owner":1, "name":"Doc1"}, etc
C2:
{ "_id" : 0 , "username":"John"}, { "_id" : 1 , "username":"Sam"}
And if you were to declare foreign key constraint on C2._id to reference C1._id (assuming MongoDB allows it), it would mean that you cannot insert a document into C2 where C2._id is non-existent in C1. Compare this with an embedded document:
{
"_id" : 0 ,
"owner" : 0,
"name" : "Doc1",
"owner_details" : {
"username" : "John"
}
}
Now the owner_details field represents the data from the C2 collection, and the remaining fields represent the data from C1. You can't add an owner_details field to a non-existent document. You're essentially achieving the same effect.

This questions was originally answered in 2011, so I decided to post an update here.
Starting from version MongoDB 4.0 (released in June 2018), it started supporting multi-document ACID transactions.
Relations now can be modeled in two approaches:
Embedded
Referenced (NEW!)
You can model referenced relationship like so:
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin",
"address_ids": [
ObjectId("52ffc4a5d85242602e000000")
]
}
Where the sample document structure of address document:
{
"_id":ObjectId("52ffc4a5d85242602e000000"),
"building": "22 A, Indiana Apt",
"pincode": 123456,
"city": "Los Angeles",
"state": "California"
}

If someone really wants to enforce the Foreign keys in the Project/WebApp. Then you should with a MixSQL approach i.e. SQL + NoSQL
I would prefer that the Bulky data which doesn't have that much references then it can be stored in NoSQL database Store. Like : Hotels or Places type of data.
But if there is some serious things like OAuth modules Tables, TokenStore and UserDetails and UserRole (Mapping Table) etc.... then you can go with SQL.

I would also reccommend that if username's are unique, then use them as the _id. You will save on an index. In the document being stored, set the value of 'owner' in the application as the value of 'username' when the document is created and never let any other piece of code update it.
If there are requirements to change the owner, then provide appropirate API's with business rules implemented.
There woudln't be any need of foreign keys.