MongoDB WildCard query taking too much time using Grails - mongodb

MongoDB v4.0.2
Grails 3.3.5
I've more than 20 Millions of records stored in a collection. I'm trying to wildcard search in that collection like as follows...
def personList = Person.collection.find(['vehicleNumber': ['$regex':/.*GJ18AD.*/] ]).sort(["datetime":-1])
Index on Person Collection
db.person.getIndexes()
{
"v" : 2,
"key" : {
"vehicleNumber" : 1
},
"name" : "vehicleNumber_1",
"ns" : "analytics.person",
"weights" : {
"numberPlate" : 1
},
"default_language" : "english",
"language_override" : "language",
"textIndexVersion" : 3
}
Is there any other way for the wildcard search?

There is no changes required in the indexing. But the minor change in the filter Object which I'm passing to the collection.
Previously, I was using following filter object syntax:
def personList = Person.collection.find(['vehicleNumber': ['$regex':/.*GJ18AD.*/] ]).sort(['datetime':-1])
Then I've change only the regex in the above syntax:
def personList = Person.collection.find(['vehicleNumber': ['$regex':'.*GJ18AD.*'] ]).sort(['datetime':-1])
It's works for me in the MongoDB version 4.2.1.

Related

Zipping two collections in mongoDB

Not a question about joins in mongoDB
I have two collections in mongoDB, which do not have a common field and which I would like to apply a zip function to (like in Python, Haskell). Both collections have the same number of documents.
For example:
Let's say one collection (Users) is for users, and the other (Codes) is of unique randomly generated codes.
Collection Users:
{ "_id" : ObjectId(""), "userId" : "123"}
{ "_id" : ObjectId(""), "userId" : "456"}
Collection Codes:
{ "_id" : ObjectId(""), "code" : "randomCode1"}
{ "_id" : ObjectId(""), "code" : "randomCode2"}
The desired output would to assign a user to a unique code. As follows:
Output
{ "_id" : ObjectId(""), "code" : "randomCode1", "userId" : "123"}
{ "_id" : ObjectId(""), "code" : "randomCode2", "userId" : "456"}
Is there any way of doing this with the aggregation pipeline?
Or perhaps with map reduce? Don't think so because it only works on one collection.
I've considered inserting another random id into both collections for each document pair, and then using $lookup with this new id, but this seems like an overkill. Also the alternative would be to export and use Python, since there aren't so many documents, but again I feel like there should be a better way.
I would do something like this to get the records from collection 1 & 2 and merge the required fields into single object.
You have already confirmed that number of records in collection 1 and 2 are same.
The below code will loop through the cursor and map the required fields into one object. Finally, you can print the object to console or insert into another new collection (commented the insert).
var usersCursor = db.users.find( { } );
var codesCursor = db.codes.find( { } );
while (usersCursor.hasNext() && codesCursor.hasNext()) {
var user = usersCursor.next();
var code = codesCursor.next();
var outputObj = {};
outputObj ["_id"] = new ObjectId();
outputObj ["userId"] = user["userId"];
outputObj ["code"] = code["code"];
printjson( outputObj);
//db.collectionName.insertOne(outputObj);
}
Output:-
{
"_id" : ObjectId("58348512ba41f1f22e600c74"),
"userId" : "123",
"code" : "randomCode1"
}
{
"_id" : ObjectId("58348512ba41f1f22e600c75"),
"userId" : "456",
"code" : "randomCode2"
}
Unlike relational database in MongoDB you doing JOIN stuff at the app level (so it will be easy to horizontal scale the database). You need to do that in the app level.

Resolving MongoDB DBRef array using Mongo Native Query and working on the resolved documents

My MongoDB collection is made up of 2 main collections :
1) Maps
{
"_id" : ObjectId("542489232436657966204394"),
"fileName" : "importFile1.json",
"territories" : [
{
"$ref" : "territories",
"$id" : ObjectId("5424892224366579662042e9")
},
{
"$ref" : "territories",
"$id" : ObjectId("5424892224366579662042ea")
}
]
},
{
"_id" : ObjectId("542489262436657966204398"),
"fileName" : "importFile2.json",
"territories" : [
{
"$ref" : "territories",
"$id" : ObjectId("542489232436657966204395")
}
],
"uploadDate" : ISODate("2012-08-22T09:06:40.000Z")
}
2) Territories, which are referenced in "Map" objects :
{
"_id" : ObjectId("5424892224366579662042e9"),
"name" : "Afghanistan",
"area" : 653958
},
{
"_id" : ObjectId("5424892224366579662042ea"),
"name" : "Angola",
"area" : 1252651
},
{
"_id" : ObjectId("542489232436657966204395"),
"name" : "Unknown",
"area" : 0
}
My objective is to list every map with their cumulative area and number of territories. I am trying the following query :
db.maps.aggregate(
{'$unwind':'$territories'},
{'$group':{
'_id':'$fileName',
'numberOf': {'$sum': '$territories.name'},
'locatedArea':{'$sum':'$territories.area'}
}
})
However the results show 0 for each of these values :
{
"result" : [
{
"_id" : "importFile2.json",
"numberOf" : 0,
"locatedArea" : 0
},
{
"_id" : "importFile1.json",
"numberOf" : 0,
"locatedArea" : 0
}
],
"ok" : 1
}
I probably did something wrong when trying to access to the member variables of Territory (name and area), but I couldn't find an example of such a case in the Mongo doc. area is stored as an integer, and name as a string.
I probably did something wrong when trying to access to the member variables of Territory (name and area), but I couldn't find an example
of such a case in the Mongo doc. area is stored as an integer, and
name as a string.
Yes indeed, the field "territories" has an array of database references and not the actual documents. DBRefs are objects that contain information with which we can locate the actual documents.
In the above example, you can clearly see this, fire the below mongo query:
db.maps.find({"_id":ObjectId("542489232436657966204394")}).forEach(function(do
c){print(doc.territories[0]);})
it will print the DBRef object rather than the document itself:
o/p: DBRef("territories", ObjectId("5424892224366579662042e9"))
so, '$sum': '$territories.name','$sum': '$territories.area' would show you '0' since there are no fields such as name or area.
So you need to resolve this reference to a document before doing something like $territories.name
To achieve what you want, you can make use of the map() function, since aggregation nor Map-reduce support sub queries, and you already have a self-contained map document, with references to its territories.
Steps to achieve:
a) get each map
b) resolve the `DBRef`.
c) calculate the total area, and the number of territories.
d) make and return the desired structure.
Mongo shell script:
db.maps.find().map(function(doc) {
var territory_refs = doc.territories.map(function(terr_ref) {
refName = terr_ref.$ref;
return terr_ref.$id;
});
var areaSum = 0;
db.refName.find({
"_id" : {
$in : territory_refs
}
}).forEach(function(i) {
areaSum += i.area;
});
return {
"id" : doc.fileName,
"noOfTerritories" : territory_refs.length,
"areaSum" : areaSum
};
})
o/p:
[
{
"id" : "importFile1.json",
"noOfTerritories" : 2,
"areaSum" : 1906609
},
{
"id" : "importFile2.json",
"noOfTerritories" : 1,
"areaSum" : 0
}
]
Map-Reduce functions should not be and cannot be used to resolve DBRefs in the server side.
See what the documentation has to say:
The map function should not access the database for any reason.
The map function should be pure, or have no impact outside of the
function (i.e. side effects.)
The reduce function should not access the database, even to perform
read operations. The reduce function should not affect the outside
system.
Moreover, a reduce function even if used(which can never work anyway) will never be called for your problem, since a group w.r.t "fileName" or "ObjectId" would always have only one document, in your dataset.
MongoDB will not call the reduce function for a key that has only a
single value

Query returning null - Using Java Driver for Mongo

I am trying to update a collection by querying the id of the collection, but for some reason I am not able to update the collection at all. What am I doing wrong?
DBCollection collection = db.getCollection("sv_office_src");
for(OutputSet o : outputData) {
BasicDBObject searchOfficeQuery = new BasicDBObject();
searchOfficeQuery.append("_id", o.getId());
//collection.findOne(searchOfficeQuery);
System.out.println(searchOfficeQuery);
BasicDBObject newUpdateDocument = new BasicDBObject();
newUpdateDocument.append("target_key", o.getTarget_key());
newUpdateDocument.append("target_office_id", o.getTarget_office_id());
newUpdateDocument.append("target_firm_id", o.getTarget_firm_id());
newUpdateDocument.append("score", o.getScore());
BasicDBObject updatedDocument = new BasicDBObject();
updatedDocument.append("$set", newUpdateDocument);
collection.update(searchOfficeQuery, updatedDocument);
System.out.println("Output : " + updatedDocument);
}
and the output is as follows:
{ "_id" : "52c6f6d5250c7ef0f654c7dd"}
Output :
{ "$set" : { "target_key" : "440786|PO BOX 15007|||WILMINGTON|US-NC|28408-5007|US" , "target_office_id" : "503677" , "target_firm_id" : "87277" , "score" : "17"}}
So I am getting the right document, but in the mongo shell when I am doing the following, you will see that the updated columns are blank.
I know the key Firm_Name for the above id.
> db.sv_office_src.find ({Firm_Name: "1717 Capital Management Company"})
{ "_id" : ObjectId("52c77b8d250ca11d792200aa"), "Firm_Name" : "1717 Capital Management Company", "Firm_Id" : "6715", "Office_Id" : "200968", "Office_Address_Line_1" : "PO BOX 15626", "Office_Address_Line_2"
: "", "Office_Address_Line_3" : "", "Office_City" : "WILMINGTON", "Office_Region_Ref_Code" : "US-DE", "Office_Postal_Code" : "19850-5626", "Office_Country_Ref_Code" : "US", "src_key" : "200968|PO BOX 15626||
|WILMINGTON|US-DE|19850-5626|US", "target_key" : "", "target_office_id" : "", "target_firm_id" : "", "target_firm_name" : "", "score" : "" }
The id's for the two documents are not the same:
{ "_id" : "52c6f6d5250c7ef0f654c7dd"}
vs.
{ "_id" : ObjectId("52c77b8d250ca11d792200aa") <snip/>
Two issue here. The hex values are different and the type of the first one looks to be a "string" and the type if the second is ObjectId.
If your OutputSet.getId() method is returning the hex string then you can convert it to an ObjectId (http://api.mongodb.org/java/current/org/bson/types/ObjectId.html) by passing it to the constructor:
searchOfficeQuery.append("_id", new ObjectId( o.getId() ) );
You can also inspect the WriteResult (http://api.mongodb.org/java/current/com/mongodb/WriteResult.html) from the update command to see how many documents each update updated. Look at the WriteResult.getN() method. In this case I would expect it to be 1 if it finds the document and updates it, zero if it does not find the document.
HTH,
Rob.
The _id in your searchOfficeQuery and the _id in your "proof" from your Firm_Name query don't match. You might try
db.sv_office_src.find({ "_id" : ObjectId("52c6f6d5250c7ef0f654c7dd")})
to see the document that you actually $set all those fields in.

Calling ensureIndex with compound key results in _id field in index object

When I call ensureIndex from the mongo shell on a collection for a compound index an _id field of type ObjectId is auto-generated in the index object.
> db.system.indexes.find();
{ "name" : "_id_", "ns" : "database.coll", "key" : { "_id" : 1 } }
{ "_id" : ObjectId("4ea78d66413e9b6a64c3e941"), "ns" : "database.coll", "key" : { "a.b" : 1, "a.c" : 1 }, "name" : "a.b_1_a.c_1" }
This makes intuitive sense as all documents in a collection need an _id field (even system.indexes, right?), but when I check the indexes generated by morphia's ensureIndex call for the same collection *there is no _id property*.
Looking at morphia's source code, it's clear that it's calling the same code that the shell uses, but for some reason (whether it's the fact that I'm creating a compound index or indexing an Embedded document or both) they produce different results. Can anyone explain this behavior to me?
Not exactly sure how you managed to get an _id field in the indexes collection but both shell and Morphia originated ensureIndex calls for compound indexes do not put an _id field in the index object :
> db.test.ensureIndex({'a.b':1, 'a.c':1})
> db.system.indexes.find({})
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.test", "name" : "_id_" }
{ "v" : 1, "key" : { "a.b" : 1, "a.c" : 1 }, "ns" : "test.test", "name" : "a.b_1_a.c_1" }
>
Upgrade to 2.x if you're running an older version to avoid running into now resolved issues. And judging from your output you are running 1.8 or earlier.

Datatype changes from Int to Float when using group aggregation

I'm trying to use the group aggregation.
I have documents of the following structure in my mongodb:
{ "_id" : ObjectId("4ddcdc9ab4d8a3a90345508e"), "vehicleId" : "1", "timestamp" : ISODate("2011-05-25T10:40:25.856Z"), "speed" : 1 }
{ "_id" : ObjectId("4ddcdc9ab4d8a3a90345508f"), "vehicleId" : "2", "timestamp" : ISODate("2011-05-25T10:40:26.232Z"), "speed" : 2 }
In a test, I want to get the latest speed per vehicleId, i.e. I'm
doing the following:
val key = MongoDBObject("vehicleId" -> true)
val cond = MongoDBObject.empty
val initial = MongoDBObject("timestamp" -> 0)
val reduce =
"""function(doc, prev) {
if (doc.timestamp > prev.timestamp) {
prev.speed = doc.speed;
prev.timestamp = doc.timestamp;
}
}"""
val groupedSpeed = collection.group(key, cond, initial, reduce)
for (dbObject: DBObject <- groupedSpeed) {
println(dbObject.toString)
The weird thing is that in the collection groupedSpeed, the field
speed is not an Int anymore:
{ "vehicleId" : "2" , "timestamp" : { "$date" : "2011-05-25T10:40:49Z"} , "speed" : 2.0}
{ "vehicleId" : "1" , "timestamp" : { "$date" : "2011-05-25T10:40:49Z"} , "speed" : 1.0}
Did I miss something? I'm using casbah 2.1.2.
Cheers,
Christian
[UPDATE] Looks like this is normal in javascript and bson, see here: casbah mailing list
Javascript represents all numeric values as doubles.
You might want to extend your application logic so whenever a new document is inserted, in a separate collection you also update the max speed for that vehicle, so you never have to do group queries.