Include score in Morphia full text search - mongodb

I am trying to use the MongoDB full text indices in Morphia. I need to return the score for each documents as well as have the results sorted. This is what my query looks like without Morphia:
db.getCollection('disease').find( { $text: { $search: "brain" } },
{ score: { $meta: "textScore" } } )
.sort( { score: { $meta: "textScore" } } )
This works correctly and returns hits sorted by score.
I also can do this using the MongoDB Java driver directly without Morphia.
// search with the Java driver
BasicDBObject textSearch = new BasicDBObject("$search", "brain");
BasicDBObject search = new BasicDBObject("$text", textSearch);
BasicDBObject meta = new BasicDBObject("$meta", "textScore");
BasicDBObject score = new BasicDBObject("score", meta);
List<DBObject> diseases = collection.find(search, score).sort(score).toArray();
Assert.assertEquals(2, diseases.size());
Assert.assertEquals("brain", diseases.get(0).get("name"));
Assert.assertEquals("benign-brain", diseases.get(1).get("name"));
I can't figure out how to accomplish the same thing in Morphia. Here is the example from the Morphia documentation (http://mongodb.github.io/morphia/1.0/guides/querying/#text-searching):
List<Greeting> good = datastore.createQuery(Greeting.class)
.search("good")
.order("_id")
.asList();
Assert.assertEquals(4, good.size());
The example does not return score and is ordering by "_id". I don't see any way to handle the $meta operator in Morphia. Has anyone done something similar?

Because Morphia maps back to your type, the only way to expose the score in this situation would be to add a score field to your entity type and map it back to that field. This isn't awesome because it starts to pollute your business types with database metadata fields. You could always try mapping back out a special value object type containing whatever metadata you'd like. That would at least keep your business objects free of this metadata.

Following the advice from #evanchooly and the OP, I was able to resolve my version of this issue as follows:
first I created the query anlong with the search.
Datastore morphiaDS = ...;
Query<myMorphiaModel> query = morphiaDS.createQuery(myMorphiaModel.class)
.field("helloField").equal("world")
.search("yadayadayada"); // Search performs a text search
Next, I transitioned to using the direct Java Mongo drivers. It seems right now, while this issue by #evanchooly is still open, We'll need to use mongo drivers and can't go pure morphia.
BasicDBObject meta = new BasicDBObject("$meta", "textScore");
BasicDBObject score = new BasicDBObject("score", meta);
List<DBObject> results = query.getCollection()
.find(query.getQueryObject(), score)
.sort(score).toArray();
Finally I converted the list of generalized objects to my original morphia model
Morphia morphia = mongoService.getMorphia();
List<myMorphiaModel> searchDocs = results.stream()
.map((result) -> morphia.fromDBObject(myMorphiaModel.class, result))
.collect(Collectors.toList());
One troubleshooting issue to watch out for is that I had to make sure I had 'score' represented in the find, not just in the sort.
Hope this clarifies the helpful answers that others have presented....

Looks like this issue has been fixed with the commit https://github.com/mongodb/morphia/commit/af4d64f6de3c0b1437dd216f5762d03bf98cdcb0 and now you can just do:
List<Greeting> good = datastore.createQuery(Greeting.class)
.search("good")
.project(Meta.textScore("score"))
.order(Meta.textScore("score"))
The only caveat now is that as the project on score is mandatory to be able to sort by score, if no other projections are added, the result just contains the score field. So, projections for all needed fields have to be added into the query.
Hope this helps.

Related

DataStax Stargate Document API

What does the a JSON blob with search filters, allowed operators: $eq, $ne, $in, $nin, $gt, $lt, $gte, $lte, $exists in the Swagger documentation that is shown in the DataStax Document API Swagger UI, it's not that documented so I want to ask if the query string is based on MongoDB?
The Document API exposed on top of Cassandra is provided by the open source project Stargate, indeed developed by Datastax and embedded in their Saas solution Astra.
The JSON query String than you created is parsed and converted in a proper CQL query under the hood.
Source code doesn't lie you can find the full code here and specially parsing of the where clause here
public List<FilterCondition> convertToFilterOps(
List<PathSegment> prependedPath,
JsonNode filterJson) {
List<FilterCondition> conditions = new ArrayList<>();
if (!filterJson.isObject()) {
throw new DocumentAPIRequestException("Search was expecting a JSON object as input.");
}
ObjectNode input = (ObjectNode) filterJson;
Iterator<String> fields = input.fieldNames();
while (fields.hasNext()) {
String fieldName = fields.next();
if (fieldName.isEmpty()) {
throw new DocumentAPIRequestException(
"The field(s) you are searching for can't be the empty string!");
}
...
The query string is pretty similar in spirit to what you'd find with Mongo.
Here are some sample where clauses to give an idea:
{"name": {"$eq": "Eric"}} - simple enough, matches documents that have a field name with value Eric
{"a.age": {"$gt": 0}} - You can also reference nested fields in a document
{"friends.[0].name": {"$in": ["Cassandra"]}} - Array elements are referenced using [], this would match if the document's first friend is named Cassandra.
{"friends.*.age": {"$gte": 24}} - Wildcard * can be used to match any element in an array, or any field at a particular level of nesting. This matches any friend whose age is >= 24.

optimizing query for $exists in sub property

I need to search for the existence of a property that is within another object.
the collection contains documents that look like:
"properties": {
"source": {
"a/name": 12837,
"a/different/name": 76129
}
}
As you can see below, part of the query string is from a variable.
With some help from JohnnyHK (see mongo query - does property exist? for more info), I've got a query that works by doing the following:
var name = 'a/name';
var query = {};
query['properties.source.' + name] = {$exists: true};
collection.find(query).toArray(function...
Now I need to see if I can index the collection to improve the performance of this query.
I don't have a clue how to do this or if it is even possible to index for this.
Suggestions?
2 things happening in here.
First probably you are looking for sparse indexes.
http://docs.mongodb.org/manual/core/index-sparse/
In your case it could be a sparse index on "properties.source.a/name" field. Making indexes on field will dramatically improve your query lookup time.
db.yourCollectionName.createIndex( { "properties.source.a/name": 1 }, { sparse: true } )
Second thing. Always when you want to know whether your query is fast/slow, use mongo console, run your query and on its result call explain method.
db.yourCollectionName.find(query).explain();
Thanks to it you will know whether your query uses indexes or not, how many documents it had to check in order to complete query and some others useful information.

spring data mongo - mongotemplate count with query hint

The mongo docs specify that you can specify a query hint for count queries using the following syntax:
db.orders.find(
{ ord_dt: { $gt: new Date('01/01/2012') }, status: "D" }
).hint( { status: 1 } ).count()
Can you do this using the mongo template? I have a Query object and am calling the withHint method. I then call mongoTemplate.count(query); However, I'm pretty sure it's not using the hint, though I'm not positive.
Sure, there are a few forms of this including going down to the basic driver, but assuming using your defined classes you can do:
Date date = new DateTime(2012,1,1,0,0).toDate();
Query query = new Query();
query.addCriteria(Criteria.where("ord_dt").gte(date));
query.addCriteria(Criteria.where("status").is("D"));
query.withHint("status_1");
long count = mongoOperation.count(query, Class);
So you basically build up a Query object and use that object passed to your operation, which is .count() in this case.
The "hint" here is the name of the index as a "string" name of the index to use on the collection. Probably something like "status_1" by default, but whatever the actual name is given.

MongoDB: Retrieving the first document in a collection

I'm new to Mongo, and I'm trying to retrieve the first document from a find() query:
> db.scores.save({a: 99});
> var collection = db.scores.find();
[
{ "a" : 99, "_id" : { "$oid" : "51a91ff3cc93742c1607ce28" } }
]
> var document = collection[0];
JS Error: result is undefined
This is a little weird, since a collection looks a lot like an array. I'm aware of retrieving a single document using findOne(), but is it possible to pull one out of a collection?
The find method returns a cursor. This works like an iterator in the result set. If you have too many results and try to display them all in the screen, the shell will display only the first 20 and the cursor will now point to the 20th result of the result set. If you type it the next 20 results will be displayed and so on.
In your example I think that you have hidden from us one line in the shell.
This command
> var collection = db.scores.find();
will just assign the result to the collection variable and will not print anything in the screen. So, that makes me believe that you have also run:
> collection
Now, what is really happening. If you indeed have used the above command to display the content of the collection, then the cursor will have reached the end of the result set (since you have only one document in your collection) and it will automatically close. That's why you get back the error.
There is nothing wrong with your syntax. You can use it any time you want. Just make sure that your cursor is still open and has results. You can use the collection.hasNext() method for that.
Is that the Mongo shell? What version? When I try the commands you type, I don't get any extra output:
MongoDB shell version: 2.4.3
connecting to: test
> db.scores.save({a: 99});
> var collection = db.scores.find();
> var document = collection[0];
In the Mongo shell, find() returns a cursor, not an array. In the docs you can see the methods you can call on a cursor.
findOne() returns a single document and should work for what you're trying to accomplish.
So you can have several options.
Using Java as the language, but one option is to get a db cursor and iterate over the elements that are returned. Or just simply grab the first one and run.
DBCursor cursor = db.getCollection(COLLECTION_NAME).find();
List<DOCUMENT_TYPE> retVal = new ArrayList<DOCUMENT_TYPE>(cursor.count());
while (cursor.hasNext()) {
retVal.add(cursor.next());
}
return retVal;
If you're looking for a particular object within the document, you can write a query and search all the documents for it. You can use the findOne method or simply find and get a list of objects matching your query. See below:
DBObject query = new BasicDBObject();
query.put(SOME_ID, ID);
DBObject result = db.getCollection(COLLECTION_NAME).findOne(query) // for a single object
DBCursor cursor = db.getCollection(COLLECTION_NAME).find(query) // for a cursor of multiple objects

How to add a field to a document which contains the result of the comparison of two other fields

I would like to speed up an query on my mongoDB which uses $where to compare two fields in the document, which seems to be really slow.
My query look like this:
db.mycollection.find({ $where : "this.lastCheckDate < this.modificationDate})
What I would like to do is add a field to my document, i.e. isCheckDateLowerThenModDate, on which I could execute a probably much faster query:
db.mycollection.find({"isCheckDateLowerThenModDate":true})
I quite new to mongoDB an have no idea how to do this. I would appreciate if someone could give me some hints or examples on
How to initialize such a field on an existing collection
How to maintain this field. Which means how to update this field when lastCheckDate or modificationDate changes.
Thanks in advance for your help!
You are thinking in a right way!
1.How to initialize such a field on an existing collection.
Most simple way is to load each document (from your language), calculate this field, update and save.
Or you could perform an update via mongo shell:
db.mycollection.find().forEach(function(doc) {
if(doc.lastCheckDate < doc.modificationDate)
{
doc.isCheckDateLowerThenModDate = true;
}
else
{
doc.isCheckDateLowerThenModDate = false;
}
db.mycollection.save(doc);
});
2.How to maintain this field. Which means how to update this field when
lastCheckDate or modificationDate changes.
You have to do it yourself from your client code. Make some wrapper for update, save operations and recalculate this value each time there. To be absolutely sure that this update works -- write unit tests.
The $where clause is slow because it is evaluating each document using the JavaScript interpreter.
There are a few alternatives:
1) Assuming your use case is "look for records that need updating", take advantage of a sparse index:
add a boolean field like needsChecking and $set this whenever the modificationDate is updated
in your "check" procedure, find the documents that have this field set (should be fast due to the sparse index)
db.mycollection.find({'needsChecking':true});
after you've done whatever check is needed, $unset the needsChecking field.
2) A new (and faster) feature in MongoDB 2.2 is the Aggregation Framework.
Here is an example of adding a "isUpdated" field based on the date comparison, and then filtering the matching documents:
db.mycollection.aggregate(
{ $project: {
_id: 1,
name: 1,
type: 1,
modificationDate: 1,
lastCheckDate: 1,
isUpdated: { $gt:["$modificationDate","$lastCheckDate"] }
}},
{ $match : {
isUpdated : true,
}}
)
Some current caveats of using the Aggregation Framework are:
you have to specify fields to include aside from _id
the result is limited to the current maximum BSON document size (16Mb in MongoDB 2.2)