Sort results by search term similarity - mongodb

I have this users collection:
{
"_id" : ObjectId("501faa18a34feb05890004f2"),
"username" : "joanarocha",
}
{
"_id" : ObjectId("501faa19a34feb05890005d3"),
"username" : "cristianarodrigues",
}
{
"_id" : ObjectId("501faa19a34feb05890006d8"),
"username" : "anarocha",
}
When I query this: db.users.find({'username': /anaro/i}) results are sorted in natural order (insertion order).
I would like to sort them in a similarity search-term order. In this case results should return by this order:
{
"_id" : ObjectId("501faa19a34feb05890006d8"),
"username" : "anarocha",
}
{
"_id" : ObjectId("501faa18a34feb05890004f2"),
"username" : "joanarocha",
}
{
"_id" : ObjectId("501faa19a34feb05890005d3"),
"username" : "cristianarodrigues",
}

Unfortunately, MongoDB doesn't support full text search ranking by default.
First of all, you will need a algorithm to calculate the similarity between strings. See following links:
String similarity algorithims?
String similarity -> Levenshtein distance
Then you need to write a javascript function using the algorithm to compare two strings to pass it in your query. See the following link to see how to achieve that:
Mongo complex sorting?

Related

mongodb - is it possible to filter after an $elemMatch projection in a find query?

I have documents like this in a collection called 'variants':
{
"_id" : "An_FM000900_Var_10_100042505_100042505_G_A",
"analysisId" : "FM000900",
"chromosome" : 10,
"start" : 100042505,
"end" : 100042505,
"size" : 1,
"reference" : "G",
"alternative" : "A",
"effects" : [
{
"_id" : "Analysis:FM000900-Variant:An_FM000900_Var_10_100042505_100042505_G_A-Effect:0",
"biotype" : "protein_coding",
"impact" : "LOW",
},
{
"_id" : "Analysis:FM000900-Variant:An_FM000900_Var_10_100042505_100042505_G_A-Effect:1",
"biotype" : "protein_coding",
"impact" : "MODERATE",
}
]
}
I want to find documents in that collection that meet some criteria ("analysisId":"FM000900"), and after that I want to project over 'effects' array field to bring just the first element in 'effects' array that meet some criteria ("biotype" : "protein_coding" and "impact" : "MODERATE").
The thing is that I just want to show the main 'variant' document if and only if at least one element in the 'effects' array has meet the criteria.
With the following query I get the expected result except that I get 'variant' documents with 'effects' array field empty.
db.getCollection('variants').find(
{
"analysisId":"FM000900"
}
,
{
"effects":{
"$elemMatch" : {
"biotype" : "protein_coding",
"impact" : "MODERATE"
}
}
}
).skip(0).limit(200)
Can somebody transform this query to only get 'variant' documents with some element in 'effect' array after the projection if possible?
Can it be done in another way, without using aggregation framework if possible? as the collection has millions of documents and it has to be performant.
Thanks a lot, guys!!!
Simply use $elemMatch as query operator in addition of your projection, it will filter variants that have at least one effects array element that match all conditions.
So your query will be :
db.getCollection('variants').find(
{
"analysisId":"FM000900",
"effects":{
"$elemMatch" : {
"biotype" : "protein_coding",
"impact" : "MODERATE"
}
}
}
,
{
"effects":{
"$elemMatch" : {
"biotype" : "protein_coding",
"impact" : "MODERATE"
}
}
}
).skip(0).limit(200)
In addition, a compound multikey index that covers both query and projection can improve reading performance, but use it carefully as it can drastically reduce writing performances.

Find by sub-documents field value with case insensitive

I know this is a bit of newb question but I'm having a hard time figuring out how to write a query to find some information. I have several documents (or orders) much like the one below and I am trying to see if there is any athlete with the name I place in my query.
How do I write a query to find all records where the athleteLastName = Doe (without case sensitivity)?
{
"_id" : ObjectId("57c9c885950f57b535892433"),
"userId" : "57c9c74a0b61b62f7e071e42",
"orderId" : "1000DX",
"updateAt" : ISODate("2016-09-02T18:44:21.656Z"),
"createAt" : ISODate("2016-09-02T18:44:21.656Z"),
"paymentsPlan" :
[
{
"_id" : ObjectId("57c9c885950f57b535892432"),
"customInfo" :
{
"formData" :
{
"athleteLastName" : "Doe",
"athleteFirstName" : "John",
"selectAttribute" : ""
}
}
}
]
}
You need to use dot notation to access the embedded documents and regex because you want case insensitive.
db.collection.find({'paymentsPlan.customInfo.formData.athleteLastName': /Doe/i}

Zipping two collections in mongoDB

Not a question about joins in mongoDB
I have two collections in mongoDB, which do not have a common field and which I would like to apply a zip function to (like in Python, Haskell). Both collections have the same number of documents.
For example:
Let's say one collection (Users) is for users, and the other (Codes) is of unique randomly generated codes.
Collection Users:
{ "_id" : ObjectId(""), "userId" : "123"}
{ "_id" : ObjectId(""), "userId" : "456"}
Collection Codes:
{ "_id" : ObjectId(""), "code" : "randomCode1"}
{ "_id" : ObjectId(""), "code" : "randomCode2"}
The desired output would to assign a user to a unique code. As follows:
Output
{ "_id" : ObjectId(""), "code" : "randomCode1", "userId" : "123"}
{ "_id" : ObjectId(""), "code" : "randomCode2", "userId" : "456"}
Is there any way of doing this with the aggregation pipeline?
Or perhaps with map reduce? Don't think so because it only works on one collection.
I've considered inserting another random id into both collections for each document pair, and then using $lookup with this new id, but this seems like an overkill. Also the alternative would be to export and use Python, since there aren't so many documents, but again I feel like there should be a better way.
I would do something like this to get the records from collection 1 & 2 and merge the required fields into single object.
You have already confirmed that number of records in collection 1 and 2 are same.
The below code will loop through the cursor and map the required fields into one object. Finally, you can print the object to console or insert into another new collection (commented the insert).
var usersCursor = db.users.find( { } );
var codesCursor = db.codes.find( { } );
while (usersCursor.hasNext() && codesCursor.hasNext()) {
var user = usersCursor.next();
var code = codesCursor.next();
var outputObj = {};
outputObj ["_id"] = new ObjectId();
outputObj ["userId"] = user["userId"];
outputObj ["code"] = code["code"];
printjson( outputObj);
//db.collectionName.insertOne(outputObj);
}
Output:-
{
"_id" : ObjectId("58348512ba41f1f22e600c74"),
"userId" : "123",
"code" : "randomCode1"
}
{
"_id" : ObjectId("58348512ba41f1f22e600c75"),
"userId" : "456",
"code" : "randomCode2"
}
Unlike relational database in MongoDB you doing JOIN stuff at the app level (so it will be easy to horizontal scale the database). You need to do that in the app level.

Resolving MongoDB DBRef array using Mongo Native Query and working on the resolved documents

My MongoDB collection is made up of 2 main collections :
1) Maps
{
"_id" : ObjectId("542489232436657966204394"),
"fileName" : "importFile1.json",
"territories" : [
{
"$ref" : "territories",
"$id" : ObjectId("5424892224366579662042e9")
},
{
"$ref" : "territories",
"$id" : ObjectId("5424892224366579662042ea")
}
]
},
{
"_id" : ObjectId("542489262436657966204398"),
"fileName" : "importFile2.json",
"territories" : [
{
"$ref" : "territories",
"$id" : ObjectId("542489232436657966204395")
}
],
"uploadDate" : ISODate("2012-08-22T09:06:40.000Z")
}
2) Territories, which are referenced in "Map" objects :
{
"_id" : ObjectId("5424892224366579662042e9"),
"name" : "Afghanistan",
"area" : 653958
},
{
"_id" : ObjectId("5424892224366579662042ea"),
"name" : "Angola",
"area" : 1252651
},
{
"_id" : ObjectId("542489232436657966204395"),
"name" : "Unknown",
"area" : 0
}
My objective is to list every map with their cumulative area and number of territories. I am trying the following query :
db.maps.aggregate(
{'$unwind':'$territories'},
{'$group':{
'_id':'$fileName',
'numberOf': {'$sum': '$territories.name'},
'locatedArea':{'$sum':'$territories.area'}
}
})
However the results show 0 for each of these values :
{
"result" : [
{
"_id" : "importFile2.json",
"numberOf" : 0,
"locatedArea" : 0
},
{
"_id" : "importFile1.json",
"numberOf" : 0,
"locatedArea" : 0
}
],
"ok" : 1
}
I probably did something wrong when trying to access to the member variables of Territory (name and area), but I couldn't find an example of such a case in the Mongo doc. area is stored as an integer, and name as a string.
I probably did something wrong when trying to access to the member variables of Territory (name and area), but I couldn't find an example
of such a case in the Mongo doc. area is stored as an integer, and
name as a string.
Yes indeed, the field "territories" has an array of database references and not the actual documents. DBRefs are objects that contain information with which we can locate the actual documents.
In the above example, you can clearly see this, fire the below mongo query:
db.maps.find({"_id":ObjectId("542489232436657966204394")}).forEach(function(do
c){print(doc.territories[0]);})
it will print the DBRef object rather than the document itself:
o/p: DBRef("territories", ObjectId("5424892224366579662042e9"))
so, '$sum': '$territories.name','$sum': '$territories.area' would show you '0' since there are no fields such as name or area.
So you need to resolve this reference to a document before doing something like $territories.name
To achieve what you want, you can make use of the map() function, since aggregation nor Map-reduce support sub queries, and you already have a self-contained map document, with references to its territories.
Steps to achieve:
a) get each map
b) resolve the `DBRef`.
c) calculate the total area, and the number of territories.
d) make and return the desired structure.
Mongo shell script:
db.maps.find().map(function(doc) {
var territory_refs = doc.territories.map(function(terr_ref) {
refName = terr_ref.$ref;
return terr_ref.$id;
});
var areaSum = 0;
db.refName.find({
"_id" : {
$in : territory_refs
}
}).forEach(function(i) {
areaSum += i.area;
});
return {
"id" : doc.fileName,
"noOfTerritories" : territory_refs.length,
"areaSum" : areaSum
};
})
o/p:
[
{
"id" : "importFile1.json",
"noOfTerritories" : 2,
"areaSum" : 1906609
},
{
"id" : "importFile2.json",
"noOfTerritories" : 1,
"areaSum" : 0
}
]
Map-Reduce functions should not be and cannot be used to resolve DBRefs in the server side.
See what the documentation has to say:
The map function should not access the database for any reason.
The map function should be pure, or have no impact outside of the
function (i.e. side effects.)
The reduce function should not access the database, even to perform
read operations. The reduce function should not affect the outside
system.
Moreover, a reduce function even if used(which can never work anyway) will never be called for your problem, since a group w.r.t "fileName" or "ObjectId" would always have only one document, in your dataset.
MongoDB will not call the reduce function for a key that has only a
single value

Find one document in mongodb with a preference toward "Starts With"

I have a mongo database of names.
Let's say it looks like this:
{ "_id" : ObjectId("513a18c1f9e9b5c19fd80014"), "name" : "Mary Sue" }
{ "_id" : ObjectId("513a18d9f9e9b5c19fd80015"), "name" : "Tammy Sue" }
{ "_id" : ObjectId("513a18e4f9e9b5c19fd80016"), "name" : "Sueellen" }
{ "_id" : ObjectId("513a18eaf9e9b5c19fd80017"), "name" : "Ellen" }
{ "_id" : ObjectId("513a195af9e9b5c19fd80018"), "name" : "Sue" }
{ "_id" : ObjectId("513a1ccaf9e9b5c19fd80019"), "name" : "Eddie" }
I would like to be able to perform a (case-insensitive) query for a single result which will prioritize the return value like so:
If "name" starts with my string, then return the first alphabetical "starts with" result.
Otherwise, if name contains my string, then return the first alphabetical result.
Examples:
A search for /sue/i should return "Sue".
A search for /e/i should return "Eddie".
A search for /len/i should return "Ellen".
A search for /ue/i should return "Mary Sue".
Is it possible to do this without either doing 2 separate calls (one for /^len/i, then for /len/i if I got 0 results), or finding every match and parsing the results myself?
I happen to be using node.js and mongoose here, but a generic mongo answer would also be fine so I can understand the concepts.
This is just a matter of the right regexp. You can specify multiple matches in a regexp. It's worth having a look at http://regex.learncodethehardway.org/book/ and get a deeper understanding of the regexp. OR download 2.4 from www.mongodb.org and try out the new text index option
http://docs.mongodb.org/manual/release-notes/2.4/#text-indexes