Automatically trigger operations when creating collections - mongodb

I know that Mongo creates things on the fly. But I would like to have a server side script, and each time that a new collection is created Mongo will automatically execute that script or set of operations.
The idea is that my application code can be unaware of indexes and sharding configuration etc.
Can I do such thing, and if so, how?

I answered this over on the Google Group: http://groups.google.com/group/mongodb-user/browse_thread/thread/94d19658299f6bcc
The question is quite vague, but I took a shot at it anyway - try being a bit more specific in terms of what you are trying to do and you may get better responses.

There is no such functionality. Implement something inside your application code.

A possible approach to this is that you check whether a collection exists prior to performing any ISUD (insert, select, update, delete) against them. It seems a bit sledgehammer though. I'm not sure how you would know what index to apply to an arbitrarily named collection though, unless you are taking some free text from some user input and executing that against you mongo install? If you're looking to verify the db structure against an 'expected' structure, then you could Testing your document structure for inconsistencies

Related

What is the correct procedure to delete records from a large mongodb

So, here is the problem that I am running into, I have a mongodb that is about 2000tb, and there are constant read and write operations happening in there, However I have decided to truncate some old datas that are not being used, and this is where things starts to get interesting, Since the db is large and there is a constant read and write operations that are happing, wont the delete query on the old records be process intensive on the database or hold on the read or write ? so, here are the questions that I have:-
which type of parameters or situations be considered while writing a delete script like this ?
What are the things that might get affected when I run a delete query to delete a batch of records?
Is mongodb designed to handle situations like this ? will a simple delete query be effective ? can anyone point me to a resource of the internal working of the mongodb when a delete is fired ? (like deleting the record, and then rte-indexing etc)
I found this, however I am not sure this is the best fit that answers this situation.

mongodb: why rust mongodb driver does not support bulk write operation? are any workarounds?

I found discussion about the topic here https://jira.mongodb.org/browse/RUST-271
The latest answer:
Hello! We don't implement the bulk write API and have no plans to add
it. From dealing with it in other drivers, we've found that the
semantics tend to be non-obvious, especially with regards to error
handling, and it doesn't provide a whole lot of technical benefit
given that it still sends different write types in separate commands
to the server (i.e. the same way as if each of insert_many,
update_many, delete_many, etc. were called separately).
But there is something still not clear for me.
Why that answer mention *_many (update_many, insert_many etc) type operations if bulk write is set of *_one (update_one, insert_one etc)?
Ok, insert_many uses bulk write under the hood. But bulk write and update many are different. Because first one uses a set of update_one operations which have different filters and update bodies for each one. And update_many has one filter for many updates.
So, update_many cannot replace bulk write. Am I right? If so, then why MongoDB rust driver does not support bulk write? If I am not right can you please explain why? Maybe I do not understand some details.
It seems if I use update_one a lot of time I get bad performance compared to bulk write. Because in case of bulk write I will connect once to MongoDB compared to update_one where I need one connection per update_one.

How can I handle a very large database and do not miss the performance?

if i want to develop an application, I'm worried about its performance after the number of users and stored data increases.
actually I don't know what is the best way to implement a program that it works with a really large data and do some things like search in it, find and receive user information, search text and so on in real time without any delay !
Let's me explain the problem more
for example i have chosen 'Mongodb' as a database and suppose we have at least five million users and a user want to log in into the system, the user has sent the username and password
The first thing that we should do is to find the user with that username and then check the password, in mongodb we should use something like 'find' method to get the user's information, something like below:
Users.find({ username: entered_username })
then get the user information and we check the password
but the 'find' method should search the username between million users and it's a large number and if any person request for authentication, this method should be run for each of them and it cause a heavy processing on the system
but unfortunately this problem is only for something like finding a user, if we decide to search a text when we have a lot of texts and posts on the database the problem is more bigger
i don't know how big companies like facebook and linkedin search through millions of data in such a short span of time. actually i don't want to create something like facebook or more but i have a large amount of data and i'm looking for a good way to handle it
is there any framework or something else that help me to handle large data on the databases or is there exist a method to implement data on database so that we search and find data fast and quickly? should i use a particular data structure?
i founded an opensource project elasticsearch that it help us to search faster but i don't know if i found something with elastic how can i find it on mongodb too for doing something like updating data and if i use elastic search i should use mongodb too or not!? can i use elastic as a database and as a search engine simultaneous !?
if i use elasticsearch and mongodb together then i should have two copies of my data, one in mongodb and one in elasticsearch!? and this two copies of the data that are separated :( i wish elasticsearch search in the mongodb that does not have to create two copies of the data
thank you if you help me to find out a good way and understand what should i do.
When you talk about performance, it usually boils down to three things:
Your design
Your definition of "quick", and
How much you're willing to pay
Your design
MongoDB is great if you want to iterate on your data model, can scale horizontally, and very quick if used properly. Elasticsearch on the other hand, is not a database. However, it is very quick for searching. A traditional relational database will be useful if you know exactly how your data looks like, and don't expect it to change much, or is relational by nature.
You can, for example, use a relational database for user login, use MongoDB for everything else, and use Elastic for textual, searchable data. There is no rule that tells you to keep everything within a single database.
Make sure you understand indexing, and know how to utilize it to its fullest potential. The fastest hardware will not help you if you don't design your database properly.
Conclusion: use any tool you need, combine if necessary, but understand their strengths and weaknesses.
Your definition of "quick"
How "quick" is quick enough for your application? Is 100ms quick enough? Is 10ms quick enough? Remember that more performance you ask of the machine, more expensive it will be. You can get more performance with a better design, but design can only go so far.
Usually this boils down to what is acceptable for you and your client. Not every application needs a sub-10ms response time. There's plenty of applications that can tolerate queries that return in seconds.
Conclusion: determine what is acceptable, and design accordingly.
How much you're willing to pay
Of course, it all depends on how much you're willing to pay for all the hardware that need to host all that stuff. MongoDB might be open source, but you need some place to host it. Also, you cannot expect magic. You can't throw thousands of queries and updates per second, and expect it to be blazing fast when you only give it 1 GB of RAM.
Conclusion: never under-provision to save money if you want your application to be successful.

mongodb: atomically rename two collections?

I have two existing collections "A" and "B". I need to rename "B" to "C", and rename "A" to "B", without permitting any writes to B during that time. The rename itself activates the global lock, but I need to prevent writes from occurring in between renames. Is this possible?
Here's my code:
db.B.renameCollection('C')
<-- prevent writes from occurring to B in between commands
db.A.renameCollection('B')
Edit: I'm using mongodb version 1.8.1, and changing versions is not currently an option.
Mongodb itself cannot handle this, the only way you could do this is with some custom code.
If this will only occur one time in your app ( I guess renaming collections is not something that is done often ) you could have a more 'aggressive' approach, where you search for a flag in your database that will mean 'collection db.B has been renamed but db.A not yet'. If all your writes check for this before submitting the write to the server and just return if the flag is set, it can protect the app from writing to db.A after db.B is renamed.
I consider this the 'aggressive' approach since it clearly affects performance ( still, reads are so fast, you probably won't feel it ).
If your app runs on a single web server (and not a web farm) you can have the synchronization mechanism on the web app itself, using thread synchronization tools like semaphores, etc or even some thread safe variable that will be used as the flag I suggested above. (depends on the server side technology you are using )
You can create a function named "renameCollection" and put a lock on it :
db.runCommand({eval:renameCollection,args:["Collection1","Collection2"],nolock:false});
The lock allows to do this kind of operations safely and make wait the requests
As you could guess: this is not possible. No transaction support, only atomic operations.
MongoDB has no sense of transactional renames, in fact I am not sure if SQL does in this case either, however you could accomplish this with a bit of server-side programming and a lock collection.
From your server side language you can fire off the commands while writing a row to a lock table, each query against B will check for lock, if not found will write otherwise will bail out.
This is a simple method however most likely a bit tedious, especially if you have a very segmented code base that does not house a standardised query layer between the server-side code and the database.
I should also note that renameCollection will not work on sharded collections, you most likely already knew that but I thought I would just say it anyway. In the case of sharded collection it would be better to "move" the collection instead via copy OPs.
I work for Tokutek on TokuMX with multi-statement transactions.
As other answers have said, MongoDB cannot do this (to the best of my knowledge), but TokuMX can. TokuMX has multi-statement transactions on non-sharded clusters. To perform this operation, you can do:
db.beginTransaction()
db.B.renameCollection('C')
db.A.renameCollection('B')
db.commitTransaction()

Propagated delete in code or database?

I'm working on an iPhone application with a few data relationships (Author -> Books for example). When a user deletes an Author object from the application, I have a few SQLite triggers that run on the delete to remove any books from the database that have a foreign key matching the Author's primary key.
I'm also using a trigger to insert some data when a new item is created.
I can't help but shake the feeling that this might be bad design or lead to some problems down the road I am not thinking of. That said, should I rely on code in my app to handle propagating the deletes like this when the database has the capability built in to handle it?
What say you?
True. Use the inbuilt capabilities of the database as much as possible. Atleast try and start off like that and only compromise when things really demand so.
I would make use of the database's features to ensure relational integrity, especially with respect to updates/deletes. There are cases where I might use a trigger to insert some additional data (auditing comes to mind), though I would tend to avoid this and insert all of the data from my application. If you are doing multiple inserts, though, make sure to wrap it all in a single transaction so that you don't end up with a partial insert which could lead to loss of relational integrity.
I like the idea of using the database's built in functionality (I am not familiar with how it works).. but I would worry if I went back to the code a year from now, would I remember how it worked? (Given the code isn't right in front of me).
I imagine if you add a lot of comments to remind yourself about how it works now, if anything goes wrong in the future, at least you won't need to relearn the database features when you need to go do some debugging.
You're a few steps ahead of me: I recently learned about how to do that stuff with triggers and I am tempted to use them myself.
Based on the other answers here, it seems like a philosophical choice. It would probably be fine to use either triggers or code, but best to be consistent. So don't use triggers for cascading deletes on one table but then C code for another table.
Since you tagged the question iphone, I think the most important difference would be relative performance of C code versus a trigger. You'd probably have to code both and experiment to determine the difference, if any.
Another thing that comes to mind is that, of all the horror stories that I read on thedailywtf.com, about half of them seem to involve database triggers.
Unfortunately SQLite does NOT support on delete cascade etc. From the SQLite documentation:
http://www.sqlite.org/omitted.html
FOREIGN KEY constraints are parsed but are not enforced. However, the equivalent constraint enforcement can be achieved using triggers. The SQLite source tree contains source code and documentation for a C program that will read an SQLite database, analyze the foreign key constraints, and generate appropriate triggers automatically.
There is some support for triggers but it is not complete. Missing subfeatures include FOR EACH STATEMENT triggers (currently all triggers must be FOR EACH ROW), INSTEAD OF triggers on tables (currently INSTEAD OF triggers are only allowed on views), and recursive triggers - triggers that trigger themselves.
Therefore, the only way to code on delete cascade etc using SQLite requires triggers.
Kind regards,
Code goes in your app.
Triggers are code. The functionality goes in your app. Not in the database.
I think that databases should be used for data, not processing. I think apps should be used for processing, not data.
Database processing features merely muddy the water.