MongoDB: How to organize data - mongodb

I am a little bit uncertain on how to organize the data when using MongoDB.
I have a user with some various data. Say a classified service, with a profile and possibly some items for sale. In a relational database this data would be split up into a profile table and a for-sale table. As I understand in MongoDB this would probably all go into one "document" (well, probably except if there is very large number of items for sale).
But my classified service is a little bit special, as for each item for sale, an administrator (salesman) adds stuff to the item for sale, such as allow the ad to go public, a comment on the item and possibly more. The user should obviously not be able to alter this admin-added info.
What would be the recommended way to deal with this? Can the administrator just change (add to) the users item-document? But I guess the user can then change what the administrator has added, right? So perhaps a better approach would be for the admin to create another document that contains the added data, and these two documents would be merged before being displayed?

Maybe the following may be helpful: http://docs.mongodb.org/manual/applications/data-models/?
Also, http://docs.mongodb.org/manual/data-modeling/

Related

Which method of storing USERS, ROLES & TEAMS in my relational DB is most efficient

I'm working on developing an app as part of my college assignment. It's a project management app, and I'm having trouble deciding the best way to store users and teams in my Postgres DB. Basically, users can signup and create/join teams. A user can be a part of multiple teams (each working on multiple projects). Users also have roles in teams (with varying permissions according to the role) and while they have only one role in a given team, they may have a different role in another one. In addition, users can mark some of their teams as favorites for easy access through the front-end.
I've come up with 3 ERDs to solve this.
First, store all users in one table and and all teams in another. Users table has all the data pertaining to a user, while the team table has the team data along with the members,roles and whether or not a user has marked this team as a favorite - like below.
This will have a lot of data duplication - if a team has a hundred members, there will be 100 entries where teamid, name, description are the same.
So, in v2 I separated them and added a members table. Now, each team is saved once, and so is each user. A reference to the team and user is made each time a user joins/creates a team and is stored in the members table along with the user's role and whether or not they have favorited the team.
But, I thought it might be bad to save roles as a string. If roles ever need to be changed/updated or I need to add new roles/rename roles, it would be easier with an ID rather than a string (I think).
So, then I came up with this.
Now all roles, users and teams are stored once (its possible that I've made the roles table into something like a lookup table, which I've heard is a bad practice). All these can be referenced in the members table.
My DBMS concepts are a little weak though I have tried my best to follow steps to normalize it and bring it into BCNF form. But I'm still unsure if I've done this right, or what to fix if something is wrong.
So essentially, I would like to know:
Is my table structure correct or incorrect?
Should everything be split into multiple tables, or is some data duplication okay (since I can use multiple or creative queries to get whatever I need)?
I like your ERD3 best. I don't think it is overkill, I think it looks fine. Having a "members" table be mostly foreign keys into other tables is a common thing.
It is not necessary to eliminate every trace of commonality in every table - sometimes it is more efficient to put up with a small amount of duplication - but in your example I think your ERD3 looks good.

Why there are two refs in declaring one-to-many association in mongoose?

I'm very new in mongodb, see this one-to-many example
As per my understanding
This example says that a person can write many stories or a story belongs_to a person , I think storing the person._id in stories collection was enough
why the person collection has the field stories
cases for fetching data
case 1
Fetch all stories of a person whose id is let us say x
solution: For this just fire a query in story collection where author = x
case 2
Fetch the author name of a particular story
solution: For this we have author field story collection
TL;DR
Put simply: Because there is no such notion as explicit relations in MongoDB.
Mongoose can not know how you want to resolve the relationship. Will the search be from a given story object and the author is to find? Or will the search be to find all stories for an author object? So it makes sure that it can resolve the relation regardless.
Note that there is a problem with that approach, and a big one. Say we are not talking of a one-to-few relation as in this example, but a "One-To-A-Shitload"™ relation. Since BSON documents have a size limit of 16MB, you have a limit of relations you can manage this way. Quite some, but there will be an artificial limit.
How to solve this: Instead of using an ODM, do proper modelling yourself. Since you know your use cases. I will give you an example below.
Detailed
Let us first elaborate your cases a bit:
For a given user (aka "we already have all the data of that user document"), what are his or her stories?
List all stories together with the user name on an overview page.
For a selected ("given") story, what are the authors details?
And just for demonstration purposes: A given user wants to change the name under which a story is displayed, be it his user name or natural name (it happens!) or even pseudonym.
Ok, and now lets put mongoose aside for now and let us think about how we could implement this ourselves. Keeping in mind that
Data modelling in MongoDB is deriving your model from the questions which come from your use cases so that they most common use cases are covered in the most efficient way.
As opposed to RDBMS modelling, where you identify your entities, their properties and relations and then jump through some hoops to get your questions answered somehow.
So, looking at our user stories, I guess we can agree that 2 is the most common use case, 3 and 1 next and 4 is rather rare compared to the other ones.
So now we can start
Modelling
We model the data involved in our most common use cases first.
So, we want to make the query for stories the most efficient one. And we want to sort the stories in descending order of submission. Simple enough:
{
_id: new ObjectId(),
user: "Name to Display",
story: "long story cut short",
}
Now lets say you want to display your stories, 10 of them:
db.stories.find({}).sort({_id:-1}).limit(10)
No relation, all the data we need, a single query, used the default index on _id for sorting. Since a timestamp is part of the ObjectId and it is the most significant part, we can use it to sort the stories by time. The question "Hey, but what if one changes his or her user name?" usually comes now. Simple:
db.stories.update({"user":oldname},{$set:{"user":newname}},{multi:true})
Since this is a rare use case, it only has to be doable and does not have to be extremely efficient. However, later we will see that we have to put an index on user anyway.
Talking of authors: Here it really depends on how you want to do it. But I will show you how I tend to model something like that:
{
_id: "username",
info1: "foo",
info2: "bar",
active: true,
...
}
We make use of some properties of _id here: It is a required index with a unique constraint. Just what we want for usernames.
However it comes with a caveat: _id is immutable. So if somebody wants to change his or her username, we need to copy the original user to a document with the _id of the new user name and change the user property in our stories accordingly. The advantage of this way of doing it that even when the update for changing usernames (see above) should fail during its runtime, each and every story can still be related to a user. If the update is successful, I tend to log out the user and have him log in with the new username again.
In case you want to have a distinction between username and displayed name, it all becomes even easier:
{
_id: "username",
displayNames: ["Foo B. Baz","P.S. Eudonym"],
...
}
Then you use the display name in your stories, of course.
Now let us see how we can get the user details of a given story. We know the author's name so it is as easy as:
db.authors.find({"_id":authorNameOfStory})
or
db.authors.find({"displayNames": authorNameOfStory})
Finding all stories for a given user is quite simple, too. It is either:
db.stories.find({"name":idFieldOfUser})
or
db.stories.find({"name":{$in:displayNamesOfUser}})
Now we have all your our use cases covered, now we can make them even more efficient with
Indexing
An obvious index is on the story models user field, so we do it:
db.stories.ensureIndex({"name":1})
If you are good with the "username as _id" way only, you are done with indexing. Using display names, you obviously need to index them. Since you most likely want display names and pseudonyms to be unique, it is a bit more complicated:
db.authors.ensureIndex({"displayNames":1},{sparse:true, unique:true})
Note: We need to make this as sparse index in order to prevent unnecessary errors when somebody has not decided for a display name or pseudonym yet. Make sure you explicitly add this field to an author document only when a user decides for a display name. Otherwise, it would evaluate to null server side , which is a valid value and you will get a constraint violation error, namely "E1100 duplicate key".
Conclusion
We have covered all your use cases with relations handled by the application thereby simplifying our data model a great deal and have the most efficient queries for our most common use cases. Every use case is covered with a single query, taking into account the information we already have at the time we are doing the query.
Note that there is no artificial limit on how many stories a user can publish since we use implicit relations to our advantage.
As for more complicated queries ("How many stories does each user submit per month?"), use the aggregation framework. That is what it is there for.

MongoDB default items with user overwrite

So the problem seems very simple but every way I approach the solution it seems to be a poor implementation approach with either duplicated content or messy data.
The Problem
I want to provide an option for “overwrites” per user on “default” items. Basically I have a mongodb database with a collection containing items with the following information:
ID
Name
Icon
Description
There is a set of 20-30 items in this collection, which each user using the app views.
Most users will be happy to see the default but if a user wishes to say change the icon for an item or the name then how do I handle this “overwrite” on the ”default” item for just that single user.
Possible solutions
My thoughts are to implement one of the following options but all just seem a little wrong (I have provided my thoughts on this):
for each “overwriten” item add a duplicated item to the collection with the changes and a user_id field to link the user - this seems like a little bit of duplicated content as the user might only change the icon and not the name/description. Also if the name is changed int eh future on the default item how do you handle this and also how do you understand that this item must replace one of the “defaults” for just that user. I worry it will be a little bit of a performance issue too when looking up the items and then replacing the changed item
having all the items duplicated per user in the same collection - very much duplication of content but might be the best performing option but could cause issues in the future if new “default” items need to be added or default options need changing
collection per user - same as the previous. This options seems all kinds of wrong but maybe I’m just new to this and it is actually the best option.
collection containing overwrites - this seems like a good idea but equally a bad one due to looking up and comparing. If everything is changed then why not just have all new items rather than effectively a find a replace.
Reason for wanting to get this right
Maybe I’m over thinking this but it seems like I will face this issue a lot and I think I need to get it right to remove future issues with performance, data management and updates to default items.

some questions about designing on OrientDB

We were looking for the most suitable database for our innovative “collaboration application”. Sorry, we don’t know how to name it in a way generally understood. In fact, highly complicated relationships among tenants, roles, users, tasks and bills need to be handled effectively.
After reading 5 DBs(Postgrel, Mongo, Couch, Arango and Neo4J), when the words “… relationships among things are more important than things themselves” came to my eyes, I made up my mind to dig into OrientDB. Both the design philosophy and innovative features of OrientDB (multi-models, cluster, OO,native graph, full graph API, SQL-like, LiveQuery, multi-masters, auditing, simple RID and version number ...) keep intensifying my enthusiasm.
OrientDB enlightens me to re-think and try to model from a totally different viewpoint!
We are now designing the data structure based on OrientDB. However, there are some questions puzzling me.
LINK vs. EDGE
Take a case that a CLIENT may place thousands of ORDERs, how to choose between LINKs and EDGEs to store the relationships? I prefer EDGEs, but they seem like to store thousands of RIDs of ORDERs in the CLIENT record.
Embedded records’ Security
Can an embedded record be authorized independently from it’s container record?
Record-level Security
How does activating Record-level Security affect the query performance?
Hope I express clearly. Any words will be truly appreciated.
LINK vs EDGE
If you don't have properties on your arch you can use a link, instead if you have it use edges. You really need edges if you need to traverse the relationship in both directions, while using the linklist you can only in one direction (just like a hyperlink on the web), without the overhead of edges. Edges are the right choice if you need to walk thru a graph.Edges require more storage space than a linklist. Another difference between them it's the fact that if you have two vertices linked each other through a link A --> (link) B if you delete B, the link doesn't disappear it will remain but without pointing something. It is designed this way because when you delete a document, finding all the other documents that link to it would mean doing a full scan of the database, that typically takes ages to complete. The Graph API, with bi-directional links, is specifically designed to resolve this problem, so in general we suggest customers to use that, or to be careful and manage link consistency at application level.
RECORD - LEVEL SECURITY
Using 1 Million vertex and an admin user called Luke, doing a query like: select from where title = ? with an NOT_UNIQUE_HASH_INDEX the execution time it has been 0.027 sec.
OrientDB has the concept of users and roles, as well as Record Level Security. It also supports token based authentication, so it's possible to use OrientDB as your primary means of authorizing/authenticating users.
EMBEDDED RECORD'S SECURITY
I've made this example for trying to answer to your question
I have this structure:
If I want to access to the embedded data, I have to do this command: select prop from User
Because if I try to access it through the class that contains the type of car I won't have any type of result
select from Car
UPDATE
OrientDB supports that kind of authorization/authentication but it's a little bit different from your example. For example: if an user A, without admin permission, inserts a record, another user B can't see the record inserted by user A without admin permission. An User can see only the records that has inserted.
Hope it helps

Practical usage of noSQL [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I’m starting a new web project and have to decide what database to use. I know, the question is very long but please bear with me on this.
I am very familiar with relational databases and have used frameworks like hibernate to get my data from the DB into Objects. But I have no experience with noSQL DBs. I am aware of the concepts of Document, Key-Value, etc. types.
While I do my research one question pops out every time and I don’t know how someone would handle this in noSQL DBs like MongoDB or any other Document-Typed noSQL DB where consistency takes top priority.
For example: let’s assume that we are creating a small shopping management system where customers can buy and sell stuff.
We have:
CUSTOMERs
ORDERs
PRODUCTs
A single CUSTOMER can have multiple ORDERs and an ORDER can have multiple PRODUCTs.
In a traditional RDBMS I would of course have 3 tables.
In the first version of our application, the front end for the customer should display his/her personal data, ORDERs and all the PRODUCTs he or she bought per order. Also which products are available for sale. So I guess in noSQL I would model the CUSTOMER class like this:
{
"id": 993784,
"firstname": "John",
"lastname": "Doe",
"orders": [
{
"id": 3234,
"quantity": 4,
"products": [
{
"id:" 378234,
"type": "TV",
"resolution": "1920x1080",
"screenSize":37,
"price": 999
}
]
}
],
"products": [
{
"id:" 7932,
"type": "car",
"sold": false,
"horsepower": 90
}
]
}
But later I want to extend my application to have 3 different UIs instead of only the first one:
The CUSTOMER Dashboard where a customer can view all his/her orders.
The PRODUCT Dashboard where a customer can add or remove products in his/her store.
THE SOLD Dashboard where a customer can view all sold PRODUCTs ready for shipping.
One very important thing to consider (the reason why I even bother asking this question): I want to be flexible with the classes like PRODUCT because products can have different properties. For Example: A TV has screen size and resolution while a car has horsepower and other properties. And if a user adds a new product, he or she should be able to dynamically add those properties depending on what he/she knows about it.
Now to some practical use cases of two fictional users Jane and John:
Let's say, Jane buys from John. Does that mean i have to create the PRODUCTs two times? One time as a child of Jane's ORDER and another time to stay in the "products" property of John?
Later Jane wants to view all products that are available from any user. Do i have to load every user to query the "products" property to generate a list of all products?
In version 2 of the application i want to enable John to view all outgoing orders (not orders he made but orders from other users who bought stuff from him) instead of viewing all sold products. How would this be done in noSQL? Would i now need to create an "outgoing" array of orders and duplicate them? (an outgoing order of Jane is an incoming order of John)
Some of you may say that noSQL is not right for this use case but isn’t that very common? Especially when we do not know what the future brings? If it does not fit for this use case, what use case would it fit into? Only baby applications (I guess not)? Wasn’t noSQL designed for more complex and flexible data?
Thank you very much for your advises and opinions!
EDIT 1:
Because this question was put on hold because of the unprecise question:
I made a very clear and simple example. So my question is not general about the use of noSQL but how to handle this specific example. How would a experienced noSQL user handle this use case? How to model this data? A recommendation to simply not use noSQL at all for this use case is also a valid answer to me.
I simply want to know how to use a noSQL database but still be able to manage entities and avoid redundancy.
For example: Are MongoDB's DBRefs/Manual refs a good way to achieve this? Performance issues because of multiple queries? What else to think about? I guess these questions can probably be answered quite well.
There probably isn't the one right answer to your question. But I'll make a start.
While it is technically possible in NoSQL to store some business entity together with all entities that are transitively linked with it (like Customer, Order, Product), it is't always clever to do so. The traditional reasons for separating entities, namely redundancies and therefore update and delete anomalies, don't just go away because a different platform is used.
So if you stored the product description with every customer who buys or sells this product, you will get update anomalies. If you have to change the screen size from 37 to 35, you'll have to find all customer records containing this product, which can be quite cumbersome.
Also, building up such a deep nested structure favors one direction of evaluating those structures over all other directions. If you put all orders and products into the customer document, this is very fine for getting a comprehensive view for a customer: whatever she bought throughout her lifetime. But if you want to query your database by orders (which orders need to be fulfilled tonight?) or products (who ordered product 1234?) you'll have to load tons of data that are of no interest to this query.
Similar questions are due to storing all orders with a customer. Old orders will sometimes still be of interest, so they may not be deleted. But do you want to load lots of orders everytime you load the customer?
This doesn't mean not to make use of the complex structuring made possible by a document store. As a rule of thumb, I would suggest: As long as the nested information belongs to the same business entity, put it into one document. If, e.g., the product description has some hierarchic structure, like nested sections consisting of text, pics, and videos, they may all go into one document. But entities with a totally different life cycle, like customers, orders, and suppliers, should be kept separate. Another indicator is references: A product will frequently be referenced as a whole, e.g. when it is ordered by a customer or ordered from a supplier. But the different parts of the product description may possibly never be referenced from the outside.
This rule of thumb wasn't completely precise, and it's not supposed to be. One person's business entity is another person's dumb attribute. Imagine the color of a car: For the car owner, it's just a piece of information describing a car. For the manufacturer, it's a business entity, having an availability, a price, one or more suppliers, a way of handling it, etc.
Your question also touches the aspect of dynamically adding attributes. This is often praised as one of the goodies of NoSQL, but it's no free lunch. Let's assume, as you mentioned, that the user may add attributes. That's technically possible, but how will these attributes be processed by the system? There won't be a specific view, nor specific business rules, for those attributes. So the best the system can do is offer some generic mechanism for displaying those attributes that were defined at runtime and never reflected in the program code.
This doesn't mean the feature is useless. Imagine your product description may be complex, as described above. You might build a generic mechanism to display (and edit) descriptions made up of sections, texts, images, etc., and afterwards the users may enter descriptions of unlimited width and depth. But in contrast, imagine your user will add a tiny delivery date attribute to the order. Unless the system knows specifically how to interpret this date, it will just be a dumb piece of information without any effect.
Now imagine not the user, but the developer adds new attributes. She has the opportunity to enhance the code at the same time, e.g. building some functionality around delivery dates. But this means that, although the database doesn't require it by its own, a new release of the software needs to be rolled out to make use of the new information.
The absence of a database scheme even makes the programmer's task more complicated. When a relational table has a certain column, you may be sure that each of its records has this column. If you want to make sure that it has a meaningful value, make it not null, and you may be sure that each record contains a value of the correct data type. Nothing like that is guaranteed by schemaless databases. So, when reading a record, defensive programming is needed to find out which parts are present, and whether they have the expected content. The same holds for database maintenance via administrative tools. Adding an attribute and initializing it with a default value is a 2-liner in SQL, or a couple of mouse clicks in pgadmin. For a schemaless database, you will write a short program on your own to achieve this.
This doesn't mean that I dislike NoSQL databases. But I think the "schemaless" characteristic is sometimes overestimated, and I wouldn't make it the main, or only, reason to employ such a database.