MongoDb constraint allowing only one entry in collection - mongodb

Can I make a constraint to prevent more than one document appearing in a collection? The collection will store the version of the database structures
In SQL I would have put a check constraint on the table that checks the count of rows in the table is less than 2

I believe the best way would be using User-Defined Roles to provide Collection-Level Access Control. You can create a document with an innitial database_version and posteriorly assign to your user a role that restricts the access to the collection to only find and update documents.
P.S.:While searching for a solution you may have come across a possible alternative called Capped Collections. It won't work in your case as Capped Collections restrict updates if the updates result in increased document size.

Related

mongodb indexing user-defined schemas

We are currently using MongoDB to allow tenants in a SaaS application to define entities that they can use in the application. We do not know know how each tenant is going to define the fields for the entities that they are creating upfront. Each entity will have a collection dynamically created for it in a separate database that belongs to the tenant.
For example, One tenant might define a Customer as First Name, Last Name, Email. Another tenant might define Shipment as Shipment Ref, Ship Date, Owner etc... Each tenant will have many entities/collections in their tenant database.
We have one field (ID) which we will always force the user to include in each entity/collection. We will index this field upfront when creating the collection.
However, how do we handle the case where we want to allow the tenant to sort/search/order/query large collections/entities quickly when/if the dataset becomes too large?
That is, since we do not know upfront what fields the user will be sorting/filtering/ordering by, what is the indexing strategy to use in this case with Mongo?
First of all Mongo requires you to have _id for each document and it indexes it automatically. You should take advantage of this and not create yet another ID field in case you require your clients to have ID field. I'm not sure if that's the case in your application.
What you are asking for can't have a perfect solution or even the most optimal one, but I can suggest couple options:
Create single field index for each field in the document. Let Mongo query optimizer decide which index to use depending on query. Disadvantages - takes lots of space on disk and in memory. Makes inserts slower. Mongo can use only 1 index in condition clause, so it will not be able to use compound index. You can easily extract schema with a tool like this. I wrote this little prototype that analyzes and prints Mongo schema.
Let your application learn what indexes to create. Get slow queries from Mongo profiler (in Mongo log), analyze common parts (automatically?) and create indexes on most commonly used fields. That's not so easy to implement and efficiency might change with time if your client changes queries or data. Application will be slow in the start until it learns about itself :).
Would just like to emphasize in choosing your design that the ID and not _id field you mention is actually some unique entity identifier then you are better of putting this in _id.
The reason here is that the performance trade-off for using another unique index over the required _id is a considerable overhead. Thinking about this, since _id is required it is the first thing that MongoDB looks for when determining which index to use. Otherwise consider a compound _id field containing your entity information and some other useful uniqueness.
As for the user defined fields, which is kind of the essence of mongo documents, for my money I would make it part of the API to set up indexes as required. Depending on the type of searching that is happening you'll probably want compound indexes and generated queries that make sense to these.
Simply indexing every field will probably have limited use as only one index is going to be picked for the find anyhow, and the query optimizer is going to try all of them. As has been mentioned, a long option could be to set indexes according to the usage patterns. But it could take some work to do.

How to manage relation in MongoDB?

I am new to MongoDB.I have one Master Collection user_group.The sample document is shown bellow.
{group_name:"xyz","previlege":["Add","Delete"],...}
And second collection user_detail
{"user_name":"pt123","group_name":"xyz",...}
How can I maintain relation between these two collections.Should I use reference from user_group into user_detail or any other alternative?
Often, in MongoDB, the "has many" relationship is managed on the opposite side as in a relational database. A MongoDB document often will have an array of ObjectIds or group names (or whatever you're using to identify the foreign document). This is opposed to a relational database where the other side usually has a "belongs to" column.
Do be clear, this is not required. In your example, you could store an array of user details IDs in your group document if it was the most common query that you were going to make. Basically, the question you should ask is "what query am I likely to need?" and design your documents to support it.
Simple answer: You don't.
The entire design philosophy changes when you start looking at MongoDB. If I were you, I would maintain the previlege field inside the user_detail documents itself.
{"user_name":"abc","group_name":"xyz","previlege" : ["add","delete"]}
This may not be ideal if you keep changing group priviledges though. But the idea is, you make design your data storage in a way so that all the information for one "record" can be stored in one object.
MongoDB being NoSQL does not have explicit joins. Workarounds are possible, but not recommended(read MapReduce).
Your best bet is to retrieve both the documents from the mongo collections on the client side and apply user specific privileges. Make sure you have index on the group_name in the user_group collection.
Or better still store the permissions[read, del, etc] for the user in the same document after applying the join at the client side. But then, you cannot update the collection externally since this might break invariants. Everytime an update to the user group occurs, you will need to apply those permissions(privileges) yourself at the client side and save those privileges in the same document. Writes might suffer but reads will be fast(assuming a few fields are indexed, like username).

MongoDB - Using email id as identifier across collections

I have user collection which holds email_id and _id as unique. I want to store user data across various collections. I would like to use email_id as identifier in those collections. Because it is easy to query in the shell against those collections with email_id instead of complex ObjectId.
Is this right way? will it give any performance problem while creating indexes with big emailIds?
Also, don't consider this option, If you have plan to enable email_id change
option in future.
While relational databases encourage you to normalize your data and spread it over many tables, this approach is usually not the best for MongoDB. MongoDB doesn't support JOINs over multiple collections or even multiple documents from the same collection. So you should try to design your database documents in a way that each query can be statisfied by searching for a single document. That means it is usually a good idea to store all information about a user in one document.
An exception for this is when certain points of data of the user grows indefinitely (like the posts made by a user in a forum). First, MongoDB documents have a size limit and second, when the size of a document increases, the database needs to reallocate its hard drive space frequently. This slows down writes and leads to fragmentation in the database. In that case it's better to put each entity in a different collection.
The size of the fields covered by an index don't matter when you search for equality. When you have an unique index on email_id, it should be just as fast as searching by _id.

Mongodb : multiple specific collections or one "store-it-all" collection for performance / indexing

I'm logging different actions users make on our website. Each action can be of different type : a comment, a search query, a page view, a vote etc... Each of these types has its own schema and common infos. For instance :
comment : {"_id":(mongoId), "type":"comment", "date":4/7/2012,
"user":"Franck", "text":"This is a sample comment"}
search : {"_id":(mongoId), "type":"search", "date":4/6/2012,
"user":"Franck", "query":"mongodb"} etc...
Basically, in OOP or RDBMS, I would design an Action class / table and a set of inherited classes / tables (Comment, Search, Vote).
As MongoDb is schema less, I'm inclined to set up a unique collection ("Actions") where I would store these objects instead of multiple collections (collection Actions + collection Comments with a link key to its parent Action etc...).
My question is : what about performance / response time if I try to search by specific columns ?
As I understand indexing best practices, if I want "every users searching for mongodb", I would index columns "type" + "query". But it will not concern the whole set of data, only those of type "search".
Will MongoDb engine scan the whole table or merely focus on data having this specific schema ?
If you create sparse indexes mongo will ignore any rows that don't have the key. Though there is the specific limitation of sparse indexes that they can only index one field.
However, if you are only going to query using common fields there's absolutely no reason not to use a single collection.
I.e. if an index on user+type (or date+user+type) will satisfy all your querying needs - there's no reason to create multiple collections
Tip: use date objects for dates, use object ids not names where appropriate.
Here is some useful information from MongoDB's Best Practices
Store all data for a record in a single document.
MongoDB provides atomic operations at the document level. When data
for a record is stored in a single document the entire record can be
retrieved in a single seek operation, which is very efficient. In some
cases it may not be practical to store all data in a single document,
or it may negatively impact other operations. Make the trade-offs that
are best for your application.
Avoid Large Documents.
The maximum size for documents in MongoDB is 16MB. In practice most
documents are a few kilobytes or less. Consider documents more like
rows in a table than the tables themselves. Rather than maintaining
lists of records in a single document, instead make each record a
document. For large media documents, such as video, consider using
GridFS, a convention implemented by all the drivers that stores the
binary data across many smaller documents.

Is there a NoSQL database that can effectively detect duplicate items?

I'm looking to implement a system that searches for duplicate entries before saving new entries, mostly by IP address. Since NoSQL databases have eventual consistency, this doesn't seem like a natural use case. Is there a way to make it work?
CouchDB enforces uniqueness within _id field of the document. Here's and excerpt from http://guide.couchdb.org
Within a CouchDB database, each document must have a unique _id field. If you require unique values in a database, just assign them to a document’s _id field and CouchDB will enforce uniqueness for you.
There’s one caveat, though: in the distributed case, when you are running more than one CouchDB node that accepts write requests, uniqueness can be guaranteed only per node or outside of CouchDB. CouchDB will allow two identical IDs to be written to two different nodes. On replication, CouchDB will detect a conflict and flag the document accordingly.
Every relational database and MongoDB supports unique indexes on data tables/collections preventing the insertion of duplicate data...why isn't that good enough?
Creating a unique index in MongoDB is straight forward. Trying to insert duplicate entries will raise an error (if you use safe mode enabled or checking the result
of the insertion operation).