Efficient paging in MongoDB using mgo.v2 and MongoDB > 4.2 - mongodb

I have already looked at Efficient paging in MongoDB using mgo and asked https://stackoverflow.com/review/low-quality-posts/25723764
I got the excelent response provided by #icza who shares his library https://github.com/icza/minquery.
However, as he said, "Starting with MongoDB 4.2, an index hint must be provided. Use the minquery.NewWithHint() constructor."
The problem is that minquery.NewWithHint() constructor seems to only be available in version 2.0.0, which changed gopkg.in/mgo.v2 support for github.com/globalsign/mgo support.
How can I solve this problem ?

gopkg.in/mgo.v2 has long gone unmaintained. The easiest solution for you would be to switch to the github.com/globalsign/mgo mgo driver. It has identical API, so most likely you only have to change the import paths. It is sill somewhat supported, but I believe it will fade away in favor of the official mongo-go driver. If you would choose to switch to mongo-go, that has "built-in" support for specifying the index min parameter for queries. But know that the mongo-go driver has different API.
Another option would be to fork minquery, and apply the commits I made to the v2.0.0 version, including support for the index hints.

Related

Tinkerpop 3.1 on OrientDB?

is there any way to run Tinkerpop Gremlin 3.1 traversals in OrientDB?
I've noticed that currently the DBMS supports the previous version (2.x) of the Tinkerpop traversal language which, for example, only allows to directly filter edges by label, but not vertices :( .
I was quite satisfied with gremlin-scala and orientDB-gremlin but I found that not all my queries where efficiently executed (some indexes were ignored).
Is there any other way?
Thanks in advance :)
Orientdb-gremlin is indeed the only available driver, and while it works pretty well for base cases there's some work left for index usage. If you report your cases in a github issue we can have a look. Best is obviously if you submit a PR :)

Is it possible to default all MongoDB writes to safe? What is the performance hit from doing this?

For MongoDB 2.2.2, is it possible to default all writes to safe, or do you have to include the right flags for each write operation individually?
If you use safe writes, what is the performance hit?
We're using MongoMapper on Rails.
If you are using the latest version of 10gen official drivers, then the default actually is safe, not fire-and-forget (which used to be the default).
You can read this blog post by 10gen co-founder and CTO which explains some history and announces that all MongoDB drivers as of late November use "safe" mode by default rather than "fire-and-forget".
MongoMapper is built on top of 10gen supported Ruby Driver, they also updated their code to be consistent with the new defaults. You can see the check-in and comments here for the master branch. Since I'm not certain what their release schedule is, I would recommend you ask on MongoMapper mailing list.
Even prior to this change you could set "safe" value on connection level in MongoMapper which is as good as global. Starting with 0.11, you can do it in mongo.yml file. You can see how in the release notes.
The bottom line is that you don't have to specify safe mode for each write, but you can still specify higher or lower than default durability for each individual write operation if you wish, but when you switch to the latest versions of everything, then you will be using "safe writes" globally by default.
I do not use mongomapper so I can only answer a little.
In terms of the database, depends. A safe write is basically (basically being the keyword) waiting for the database to do what it would normally do after you got a default "I am done" response from a fire and forget.
There is more work depending on how safe you want the write to be. A good example is a write to a single node and one to many nodes. If you write to that single node you will get a quicker response from the database than if you wish to replicate the command (safely) to other nodes in the network.
Any amount of safe write does, of course, cause a performance hit in the amount of writes you can send to the server since there is more work required before a response is given which means less writes able to be thrown at the database. The key thing is getting the balance just right for your application, between speed and durability of your data.
Many drivers now (including MongoDB PHP 1.3, using a write concern of 1: http://php.net/manual/en/mongo.writeconcerns.php ) are starting to use normal safe writes by default and normal fire and forget queries are starting to abolished by default.
By looking at the mongomapper documentation: http://mongomapper.com/documentation/plugins/safe.html it seems you must still add the flags everywhere.

About using MongoDB and Linq. What's better or worse with Norm?

I'd like to use MongoDB with Linq, simply because I do not like to not being able to check the query at compile time.
So I searched a bit and found Norm. However I am having a hard time deciding if it's "safe" to move from the official driver.
So I was wondering if can someone tell me the key differences between the official driver and Norm ?
Also what can Norm do that the official driver can't ?
Is it possible to implement Linq on top of the official driver ?
Thanks in advance
I suggest to use mongodb official driver because it will contains all latest features and any issue will be fixed asap. As i know last commit in norm repository was almost a half of year ago, so.. If you want linq support you can use fluent mongo at the top of the official driver, but i believe that linq support should be soon in official driver.

which driver should I use to run mongoDB

I'm wondering about which driver is the best between the following :
mongodb-csharp driver
simple-mongodb driver
NoRM
which one consider the best !>
I think there are even more flavours: the one you call mongodb-csharp is actually two:
https://github.com/samus/mongodb-csharp
https://github.com/mongodb/mongo-csharp-driver
The first is a bit more mature and is used widely in the field. The second is a recent development, but is coming from 10gen, the creators of mongodb. Implementation is looking good and is rather like the samus driver. If you need in production something right now, I'm not sure what to advise, but in the long run, I'd go for the 10gen driver.
The one thing that it currently doesn't offer is Linq integration. This could be important to you.
I have no experience with the NORM and simple-mongdb drivers.
I would use the official c# driver released by mongoDB.
http://www.mongodb.org/display/DOCS/CSharp+Language+Center
I've been using it and I like it so far.

MapReduce implementation in Scala

I'd like to find out good and robust MapReduce framework, to be utilized from Scala.
To add to the answer on Hadoop: there are at least two Scala wrappers that make working with Hadoop more palatable.
Scala Map Reduce (SMR): http://scala-blogs.org/2008/09/scalable-language-and-scalable.html
SHadoop: http://jonhnny-weslley.blogspot.com/2008/05/shadoop.html
UPD 5 oct. 11
There is also Scoobi framework, that has awesome expressiveness.
http://hadoop.apache.org/ is language agnostic.
Personally, I've become a big fan of Spark
http://spark-project.org/
You have the ability to do in-memory cluster computing, significantly reducing the overhead you would experience from disk-intensive mapreduce operations.
You may be interested in scouchdb, a Scala interface to using CouchDB.
Another idea is to use GridGain. ScalaDudes have an example of using GridGain with Scala. And here is another example.
A while back, I ran into exactly this problem and ended up writing a little infrastructure to make it easy to use Hadoop from Scala. I used it on my own for a while, but I finally got around to putting it on the web. It's named (very originally) ScalaHadoop.
For a scala API on top of hadoop check out Scoobi, it is still in heavy development but shows a lot of promise. There is also some effort to implement distributed collections on top of hadoop in the Scala incubator, but that effort is not usable yet.
There is also a new scala wrapper for cascading from Twitter, called Scalding.
After looking very briefly over the documentation for Scalding it seems
that while it makes the integration with cascading smoother it still does
not solve what I see as the main problem with cascading: type safety.
Every operation in cascading operates on cascading's tuples (basically a
list of field values with or without a separate schema), which means that
type errors, I.e. Joining a key as a String and key as a Long leads
to run-time failures.
to further jshen's point:
hadoop streaming simply uses sockets. using unix streams, your code (any language) simply has to be able to read from stdin and output tab delimited streams. implement a mapper and if needed, a reducer (and if relevant, configure that as the combiner).
I've added MapReduce implementation using Hadoop on Github with few test cases here: https://github.com/sauravsahu02/MapReduceUsingScala.
Hope that helps. Note that the application is already tested.