We're using mongodb, and rewriting parts of our stack with scala. I'm wondering if I should stick with mophia, or use a scala mongodb library such as subset.
Question is what do I get out of subset? e.g. with mophia I don't have to manually define the mongodb field names like i have to do in subset...
Is subset really worth using?
We use casbah + salat and it works well in almost all cases.
With Scala you should consider using Casbah, which is an officially supported interface for MongoDB that builds on the Java driver.
Casbah's approach is intended to add fluid, Scala-friendly syntax on top of MongoDB and handle conversions of common types. If you try to save a Scala List or Seq to MongoDB, we automatically convert it to a type the Java driver can serialize. If you read a Java type, we convert it to a comparable Scala type before it hits your code. All of this is intended to let you focus on writing the best possible Scala code using Scala idioms. A great deal of effort is put into providing you the functional and implicit conversion tools you’ve come to expect from Scala, with the power and flexibility of MongoDB.
Casbah provides improved interfaces to GridFS, Map/Reduce and the core Mongo APIs. It also provides a fluid query syntax which emulates an internal DSL and allows you to write code which looks like what you might write in the JS Shell. There is also support for easily adding new serialization/deserialization mechanisms for common data types.
Additionally to the ORM-Mapper/Client-Libraries, I would suggest you give Rouge a try. It will serve you with a nice Query DSL for Mongo. Rogue 1.X will only support Lift-MongoDB but version 2.x (which will ship in very near future) will work for a lot more MongoDB libraries.
A sample query would be (pure Scala code with compiletime typechecking):
Venue where (_.mayor eqs 1234) and (_.categories contains "Thai") fetch(10)
which queries for 10 entries in the Venue collection where 1234 is the mayor and Thai is one of its categories.
I am the author of Subset. I would say "Subset" is not really a kind of ORM library. It has no methods for working with databases and collections, leaving it to Java/Scala drivers. But it is more focused on transformations of MongoDB documents. This transformation core is rather generic and suitable not only for reading/writing of fields, but for applications that need perform e.g. document migrations as well. Query/Update builders Subset provides are built on top of this "core".
That said, if you need ORM, there are simpler alternatives indeed. I never had an intent for Subset to compete with true ORM libraries, I've filled the gap I met in my projects.
Related
Can I use mongoengine or djongo for ODM and pymongo for interaction with the db?
I've read these two about something related to my question:
Insert data by pymongo using mongoengine ORM in pyramid
Use MongoEngine and PyMongo together
But, I couldn't find what I'm looking for (I guess).
So here's what I'm trying to find:
¿Does this practice affect the performance of my application?
¿How well recommended is it?
So, if it is recommended, and everything is right, ¿Do I need to put an extra layer of security or something?, because, I want to build an API using the serializations for models that django-rest-framework-mongoengine offers, and then do what I have to do in the view of the API endpoint.
It could be djongo or something like it, what I want is just an ODM for serializing, define a structure for the API and so on, use pymongo for queries, cause according to what I've been reading, mongoengine could make slower the interaction with the db
The term "ORM" does not apply to MongoDB since MongoDB is non-relational. The proper term is "ODM" - object-document mapper.
Generally, a MongoDB ODM is built on top of a MongoDB driver. The functionalities of the ODM and the driver are complementary - the driver provides low-level database access and the ODM provides high-level features like schema, associations, callbacks.
If you want to use the high-level features, it makes sense to use an ODM. If you don't need any of those features and just want to perform basic CRUD operations, using a driver directly is more efficient. Some applications use both of these strategies depending on the operation that needs to be performed.
Previous answers to this question:
Difference between MongoDB and Mongoose
Why do we need, what advantages to use mongoose
The main reason given in these answers is "schemas". Since 3.6, mongodb has introduced its own schemas:
https://docs.mongodb.com/manual/core/schema-validation/
These are more thorough and work by default on inserts and updates.
Are there any more significant reasons to use Mongoose, as that was the main one and now it seems to have been integrated into the native API. I have also noticed that mongoose is lacking various new features implemented in mongodb.
Mongoose, the driver I'm using right now, is much more intuitive to learn if you're a beginner. Many criticize the mongoose because they claim that the creation of collection schemes is the opposite of what was thought of the mongodb and NoSQL databases; However I think that even using the native mongoDB driver you will always have to create a minimum of schematics, even for validation and have an idea of what you are entering in the database. Mangusta is extremely convenient because in addition to allowing the creation of templates, it is possible to declare methods in the document and control events. In addition, the mongoose automatically performs additional validation, and has many more search functions. The real negative point of the mongoose is performance. (On this page there is the difference in performance of the two drivers https://medium.com/#bugwheels94/performance-difference-in-mongoose-vs-mongodb-60be831c69ad)
Naturally all these characteristics of the mongoose abandon the performance.
It's hard to give you a driver because it's based on the kind of project you have in mind.
I'm trying to use Play2 with ReactiveMongo to build my web application. I spent few days reading related documentation and tutorials.
In my opinion one of the most powerful features of MongoDB is schema flexibility, that is the possibility of storing in the same collection documents that doesn't have exactly the same structure, but may differ one from another. For example one document may have a field that another doesn't.
Play with ReactiveMongo use case classes to implement models, but case classes obviously have a fixed structure. So all the instances of the class will have the same structure.
Does it represent a loss of flexibility? Or there is a way to implement schema flexibility with ReactiveMongo?
In addition to Andre's answer: ReactiveMongo also supports optional fields in documents, as Options in their case classes. So you can have both the convenience and type-safety of a Scala class modelling your document and a flexible document structure.
If your document structure is one where the field-names are totally dynamic (which generally is a bad idea in Mongo), then as Andre said, you may not want to use the case-class-based ReactiveMongo document modelling at all. But you can also usually use a hybrid approach, where some aspects of a document are de/serialized dynamically using name-value maps, and some use case-classes.
From what I've read in the documentations from both ReactiveMongo and ReactiveMongo Play plugin. They work with BSON and JSON structures, respectively.
It's only when you use the extensions that you have the models defined as case classes to build the DAOs. So you have all the flexibility you need or all the convenience you want. It's just a matter of choosing what structure you work with.
I am wondering whether map reduce job in mongodb has anything to do with Hadoop. Mapreduce in Mongodb is a standalone and no dependency on any hadoop installation? If what I am guessing is correct, then the map reduce syntax is the same between the two or it just means that mongodb is supporting its own map reduce (with different syntax)?
MongoDB has its own MapReduce. You write map/reduce/finalize functions in javascript (as opposed to Hadoop and Java).
They say that it is also possible to use Hadoop on top of MongoDB, but I didn't try that yet.
Map Reduce is not fastest interface in mongodb to do ad hoc queries it is more designed for background jobs, creating reports etc. I wrote some time ago how to do it on my blog
http://no-fucking-idea.com/blog/2012/04/01/using-map-reduce-with-mongodb/
MongoDB borrowed the idea of mapreduce, which btw predates hadoop, it was used at google on their infrastructure (and they also borrowed the idea from functional programming languages).
The "syntax" is also totally different (note: there are several APIs for hadoop for different languages so it's impossible to make a comparison).
What are the options for MongoDB schema migrations/upgrades?
We (my colleagues and I) have a somewhat large (~100 million record) MongoDB collection. This collection is mapped (ORM'd) to a Scala lift-mongodb object that has been through a number of different iterations. We've got all sorts of code in there which handles missing fields, renames, removals, migrations, etc.
As much as the whole "schema-less" thing can be nice and flexible, in this case it's causing a lot of code clutter as our object continues to evolve. Continuing down this "flexible object" path is simply not sustainable.
How have you guys implemented schema migrations/upgrades in MongoDB with Scala? Does a framework for this exist? I know that Foursquare uses Scala with MongoDB and Rogue (their own query DSL)... does anyone know how they handle their migrations?
Thank you.
Perhaps this can help somewhat, this is how Guardian.co.uk handle this:
http://qconlondon.com/dl/qcon-london-2011/slides/MatthewWall_WhyIChoseMongoDBForGuardianCoUk.pdf
Schema upgrades
This can be mitigated by:
Adding a “version” key to each document
Updating the version each time the application modifies a document
Using MapReduce capability to forcibly migrate documents from older versions if required
I program migrations of MongoDB data with my own Scala framework "Subset". It lets define document fields pretty easily, fine tune data serialization (e.g. write "date" is a specific format and so on) and build queries and update modifiers in terms of the defined fields. This "gist" gives a good introduction