How to retrieve all objects in a Mongodb collection including the ids? - scala

I'm using Casbah and Salat to create my own Mongodb dao and am implementing a getAll method like this:
val dao: SalatDAO[T, ObjectId]
def getAll(): List[T] = dao.find(ref = MongoDBObject()).toList
What I want to know is:
Is there a better way to retrieve all objects?
When I iterate through the objects, I can't find the object's _id. Is it excluded? How do I include it in the list?

1°/ The ModelCompanion trait provides a def findAll(): SalatMongoCursor[ObjectType] = dao.find(MongoDBObject.empty) methods. You will have to do a dedicated request for every collection your database have.
If you iterate over the objects returned, it could be better to iterate with the SalatMongoCursor[T] returned by the dao.find rather than doing two iterations (one with the toList from Iterator trait then another on your List[T]).
2°/ Salat maps the _id key with your class id field. If you define a class with an id: ObjectId field. This field is mapped with the mongo _id key.
You can change this behaviour using the #Key annotation as pointed out in Salat documentation

I implemented something like:
MyDAO.ids(MongoDBObject("_id" -> MongoDBObject("$exists" -> true)))
This fetches all the ids, but given the wide range of what you might be doing, probably not the best solution for all situations. Right now, I'm building a small system with 5 records of data, and using this to help understand how MongoDB works.
If this was a production database with 1,000,000 entries, then this (or any getAll query) would be stupid. Instead of doing that, consider trying to write a targeted query that goes after the real results you seek.

Related

How does mongoose populate work under the hood

Could somebody tell me how
I have a collection
a {
b: String
c: Date
d: ObjectId --> j
}
j {
k: String
l: String
m: String
}
when I carry out a:
a.find({ b: 'thing' }).populate('d').exec(etc..)
in the background is this actually carrying out two queries against the MongoDB in order to return all the items 'j'?
I have no issues getting populate to work, what concerns me is the performance implications of the task.
Thanks
Mongoose uses two queries to fulfill the request.
The a collection is queried to get the docs that match the main query, and then the j collection is queried to populate the d field in the docs.
You can see the queries Mongoose is using by enabling debug output:
mongoose.set('debug', true);
Basically the model 'a' is containing an attribute 'd' which is referencing(pointing) towards the model 'j'.
So whenever we use
a.find({ b: 'thing' }).populate('d').exec(etc..)
Then through populate we can individually call properties of 'j' like :
d.k
d.l
d.m
Populate() helps us to call properties of other models.
Adding to #JohnnyHK answer on the performance implications of the task you worried about, I believe no matter what, these queries have to execute sequentially whether we use the mongoose provided populate() method or the one you will implement the server-side, both will have the same time complexity.
This is because in order to populate we need to have the results from the first query, after getting the result uuid will be used to query the document in the other collection.
So I believe it's a waste to make these changes the server-side than to use the mongoose provided method. The performance will remain the same.

Mongodb: list all values for an indexed field quickly

In mongodb I want to be able to rapidly list all index values. for instants lets say I have numerous collections of FOO:
public class Foo{
#ID
private ObjectID id;
#Indexed
private List<String> bars;
#Indexed List<String> bazs;
...
}
There may be repeats in bars and baz, such that iterating through ever foo and looking at the bars list would be inefficient, since I would be spending most of my time looking at repeats.
If I want to quickly list all 'bars' values without having to look at each Foo object can I do that? Since they are indexed there must be a table somewhere with all the indexes listed in an easily iterated manner. However, I can't seem to find a mongodb command to do this? or better yet a morphia command since I'm using java to interface with mongo
You are looking for distinct, which should work for lists / arrays as well. MongoDB will use an index if one is available.
Unfortunately this feature isn't yet implemented in Morphia, but you can do the following with the Java driver:
DBCollection c = collection;
List bars = c.distinct("bars");
For a more complex example see the unit test for this feature.

Upsert many records using ReactiveMongo and Scala

I am writing a DAO Actor for MongoDB that uses ReactiveMongo. I want to implement some very simple CRUD operations, among which the ability to upsert many records in one shot. Since I have a reactive application (built on Akka), it's important for me to have idempotent actions, so I need the operation to be an upsert, not an insert.
So far I have the following (ugly) code to do so:
case class UpsertResult[T](nUpd: Int, nIns: Int, failed: List[T])
def upsertMany[T](l: List[T], collection: BSONCollection)
(implicit ec: ExecutionContext, w: BSONDocumentWriter[T]):
Future[UpsertResult[T]] = {
Future.sequence(l.map(o => collection.save(o).map(r => (o, r))))
.transform({
results =>
val failed: List[T] = results.filter(!_._2.ok).unzip._1
val nUpd = results.count(_._2.updatedExisting)
UpsertResult(nUpd, results.size - nUpd - failed.size, failed)
}, t => t)
}
Is there an out-of-the-box way of upserting many records at once using the reactivemongo API alone?
I am a MongoDB beginner so this might sound trivial to many. Any help is appreciated!
Mongo has no support for upserting multiple documents in one query. The update operation for example can always only insert up to one new element. So this is not a flaw in the reactivemongo driver, there simply is no DB command to achieve the result you expect. Iterating over the documents you want to upsert is the right way to do it.
The manual on mongodb about upsert contains further informations:
http://docs.mongodb.org/manual/core/update/#update-operations-with-the-upsert-flag
According to the docs, BSSONCollection.save inserts the document, or updates it if it already exists in the collection: see here. Now, I'm not sure exactly how it makes the decision about whether the document already exists or not: presumably it's based on what MongoDB tells it... so the primary key/id or a unique index.
In short: I think you're doing it the right way (including your result counts from LastError).

Mapping MongoDB documents to case class with types but without embedded documents

Subset looks like an interesting, thin MongoDB wrapper.
In one of the examples given, there are Tweets and Users. However, User is a subdocument of Tweet. In classical SQL, this would be normalized into two separate tables with a foreign key from Tweet to User. In MongoDB, this wouldn't necessitate a DBRef, storing the user's ObjectId would be sufficient.
Both in Subset and Salat this would result in these case classes:
case class Tweet(_id: ObjectId, content: String, userId: ObjectId)
case class User(_id: ObjectId, name: String)
So there's no guarantee that the ObjectId in Tweet actually resolves to a User (making it less typesafe). I also have to write the same query for each class that references User (or move it to some trait).
So what I'd like to achieve is to have case class Tweet(_id: ObjectId, content: String, userId: User), in code, and the ObjectId in the database. Is this possible, and if so, how? What are good alternatives?
Yes, it's possible. Actually it's even simpler than having a "user" sub-document in a "tweet". When "user" is a reference, it is just a scalar value, MongoDB and "Subset" has no mechanisms to query subdocument fields.
I've prepared a simple REPLable snippet of code for you (it assumes you have two collections -- "tweets" and "users").
Preparations...
import org.bson.types.ObjectId
import com.mongodb._
import com.osinka.subset._
import Document.DocumentId
val db = new Mongo("localhost") getDB "test"
val tweets = db getCollection "tweets"
val users = db getCollection "users"
Our User case class
case class User(_id: ObjectId, name: String)
A number of fields for tweets and user
val content = "content".fieldOf[String]
val user = "user".fieldOf[User]
val name = "name".fieldOf[String]
Here more complicated things start to happen. What we need is a ValueReader that's capable of getting ObjectId based on field name, but then goes to another collection and reads an object from there.
This can be written as a single piece of code, that does all things at once (you may see such a variant in the answer history), but it would be more idiomatic to express it as a combination of readers. Suppose we have a ValueReader[User] that reads from DBObject:
val userFromDBObject = ValueReader({
case DocumentId(id) ~ name(name) => User(id, name)
})
What's left is a generic ValueReader[T] that expects ObjectId and retrieves an object from a specific collection using supplied underlying reader:
class RefReader[T](val collection: DBCollection, val underlying: ValueReader[T]) extends ValueReader[T] {
override def unpack(o: Any):Option[T] =
o match {
case id: ObjectId =>
Option(collection findOne id) flatMap {underlying.unpack _}
case _ =>
None
}
}
Then, we may say our type class for reading Users from references is merely
implicit val userReader = new RefReader[User](users, userFromDBObject)
(I am grateful to you for this question, since this use case is quite
rare and I had no real motivation to develop a generic solution. I think
I need to include this kind of helper into "Subset" finally..
I shall appreciate your feedback on this approach)
And this is how you would use it:
import collection.JavaConverters._
tweets.find.iterator.asScala foreach {
case Document.DocumentId(id) ~ content(content) ~ user(u) =>
println("%s - %s by %s".format(id, content, u))
}
Alexander Azarov answer works probably fine, but I would personally not do it this way.
What you have is a Tweet that only have an ObjectId reference to the user.
And you want to load the user during tweet load because for your domain it is probably easier to manipulate. In any case, unless you use subdocuments (not always a good choice), you have to query the DB again to retrieve the user data, and this is what is done by Alexander Azarov.
You would rather do a transformation function that transforms a Tweet to a TweetWithUser or something like that.
def transform(tweet: Tweet) = TweetWithUser( tweet.id, tweet.content, findUserWithId(tweet.userId) )
I don't really see why you would expect a framework to resolve something that you could have done yourself very easily in a single line of code.
And remember in your application, in some cases you don't even need the whole User object, so it is expensive to query twice the database while it's not always needed. You should only use the case class with the full User data when you really need that user data, and not simply always load the full user data because it seems more convenient.
Or if you want to manipulate User objects anyway, you would have a User proxy, on which you could access the id attribute directly, and on any other access, a db query would be done. In Java/SQL, Hibernate is doing with lazy loading of relationships, but I'm not sure it's a good idea to use that with MongoDB and it breaks immutability

Mongoid Query Syntax Question

I need to retrieve a set of answers according to 2 attributes.
This is what i want to do:
# where liker_ids is an array and user_id is a bson in the answer document
feed_answers=Answer.any_in(:liker_ids=>to_use,:user_id.in=>to_use).desc()map{|a|a}
What i ended up doing:
# to use
to_use=[some array of ids]
friend_answers=Answer.any_in(:liker_ids=>to_use).map{|a|a}
liked_answers=Answer.where(:user_id.in=>to_use).map{|a|a}
feed_answers=(friend_answers+ liked_answers).sort{|x,y| y.created_at<=>x.created_at}
The problem is that I do not not know how to combine the 2 queries into one. I have been trying out various combinations, but nothing seems to work. and my hacked together method is highly inefficient of course.
You should do(Missing parameter to desc):
Answer.any_in(:liker_ids=>to_use, :user_id.in=>to_use).desc(:created_at)
But the any_in here is not correctly used, it behaves similar to where in this situation. You probably want or:
Answer.or(:liker_ids=>to_use).or(:user_id.in=>to_use).desc(:created_at)
# or
Answer.any_of({:liker_ids=>to_use}, {:user_id.in=>to_use}).desc(:created_at)
# or
Answer.or({:liker_ids=>to_use}, {:user_id.in=>to_use}).desc(:created_at)
You don't need that map at the end of criteria chain, mongoid criteria are lazy loaded, when they encounter a method which criteria do not respond to. They can also leverage mongodb cursors, so it is advised not to use map if it is not necessary. You should use Criteia#only or Criteria#without if you want to retrieve subset of fields from mongodb.