Select latest documents in MongoDB - mongodb

My app records some data in MongoDB collection. There are three types of events: 'event1', 'event2', 'event3'. And this elements adds periodically.
Event structure example:
{ 'Data' : 'a234235', 'Type' : 'event1', 'Timestamp' : 1366006599 }
{ 'Data' : 'b978543', 'Type' : 'event2', 'Timestamp' : 1366006600 }
{ 'Data' : 'c567921', 'Type' : 'event3', 'Timestamp' : 1366006601 }
{ 'Data' : 'd327863', 'Type' : 'event1', 'Timestamp' : 1366007100 }
{ 'Data' : 'e012315', 'Type' : 'event2', 'Timestamp' : 1366007102 }
{ 'Data' : 'f834721', 'Type' : 'event3', 'Timestamp' : 1366007103 }
Please help compose right query for get current state of each event in database. I need three elements of different types and maximal timestamp.

Since you may have ~200 events (as noted in your comments on the question), I would suggest the most efficient approach would be to create a summary document that records the latest events (by type). This avoids the needs to do separate queries and should be easy to maintain using a $set as new event types are observed.
The summary doc would look like:
{
'Event1' : { 'Data' : 'd327863', 'Timestamp' : 1366007100 },
'Event2' : { 'Data' : 'e012315', 'Timestamp' : 1366007102 },
'Event3' : { 'Data' : 'f834721', 'Timestamp' : 1366007103 }
}
This pre-aggregated report pattern will avoid the need for multiple queries if you often need to find the maximal event by type, and save on (potentially large) space for an index on Timestamp which would otherwise be needed to find the latest entry efficiently.

You can use the following JavaScript to get your work done. The drawback here is that I am assuming your events are constant else you need to change this in the "Type" array field.
You can invoke this while connecting to DB.
a=function()
{
var Type = new Array();
Type[0]="event1";
Type[1]="event2";
Type[2]="event3";
Type[3]="event4";
Type[4]="event5";
for (i=0;i<Type.length;i++)
{
var myCursor = db.foo.find({Type:Type[i]}).sort({Timestamp:-1}).limit(1);
myCursor.forEach(printjson);
}
}
and this is the output I got for my test case.
MongoDB shell version: 2.2.2
connecting to: test
{
"_id" : ObjectId("516bc991cde2925693705103"),
"Data" : "d327863",
"Type" : "event1",
"Timestamp" : 1366007100
}
{
"_id" : ObjectId("516bc991cde2925693705104"),
"Data" : "e012315",
"Type" : "event2",
"Timestamp" : 1366007102
}
{
"_id" : ObjectId("516bc992cde2925693705105"),
"Data" : "f834721",
"Type" : "event3",
"Timestamp" : 1366007103
}
bye
Also note that, by default it connects to test db. Indexing on Type and Timestamp will help you in performance.

You will need a separate query for each event.
db.yourCollection.find({Type:"event1"}).sort({Timestamp:-1}).limit(1);
db.yourCollection.find({Type:"event2"}).sort({Timestamp:-1}).limit(1);
db.yourCollection.find({Type:"event3"}).sort({Timestamp:-1}).limit(1);
Note that creating an index on the Timestamp field will likely improve the performance a lot.
See also: http://docs.MongoDB.org/manual/reference/method/cursor.sort/

Related

PyMongo delete document by type

Cashbook Collection
{ "_id" : ObjectId("1"), "DR" : "Bank", "CR" : "Roger", "Amount" : "100.00" }
{ "_id" : ObjectId("2"), "DR" : "Bank", "CR" : "Amy", "Amount" : 999.99 }
...
CB = conn['Cashbook']
def CB_del_mult(search,value):
query = {search:value}
CB.delete.many(query)
CB.delete_many('Amount'," { $type : 'string' }")
According to the MongoDO Docs, I can query by type.
I am attempting to remove all documents from Cashbook collection depending on whether 'Amount' is a string value. In this example, the 1st entry will be completely removed as Amount is “100.00” (not 100.00)
The above code raises no errors, however is not deleting the string values.
Something like CB.delete_many('Amount',{$type : 'string'} ) raises an invalid syntax error
Many thanks
JS and Python syntaxes are different and you need to use the correct one for the language you are working with.
Use pymongo documentation to find out the proper method signatures, options etc. to use with pymongo. E.g. https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html for delete_many.
CB.delete_many({'Amount': {'$type': 'string'}})

Adding to a double-nested array in MongoDB

I have a double nested array in my MongoDB schema and I'm trying to add an entirely new array element to a second-level nested array using $push. I'm getting the error cannot use the part (...) to traverse the element
A documents have the following structure
{
"_id" : ObjectId("5d8e37eb46c064790a28a467"),
"org-name" : "Manchester University NHS Foundation Trust",
"domain" : "mft.nhs.uk",
"subdomains" : [ {
"name" : "careers.mft.nhs.uk",
"firstSeen" : "2017-10-06 11:32:00",
"history" : [
{
"a_rr" : "80.244.185.184",
"timestamp" : ISODate("2019-09-27T17:24:57.148Z"),
"asn" : 61323,
"asn_org" : "Ukfast.net Limited",
"city" : null,
"country" : "United Kingdom",
"shodan" : {
"ports" : [
{
"port" : 443,
"versions" : [
"TLSv1",
"-SSLv2",
"-SSLv3",
"TLSv1.1",
"TLSv1.2",
"-TLSv1.3"
],
"cpe" : "cpe:/a:apache:http_server:2.4.18",
"product" : "Apache httpd"
}
],
"timestamp" : ISODate("2019-09-27T17:24:58.538Z")
}
}
]
}
]
}
What I'm attempting to do is refresh the details held in the history array and add another entire array entry to represent the most recently collected data for the subdomain.name
The net result is that I will have multiple entries in the history array, each one timestamped the the date that the data was refreshed. That way I have a historical record of changes to any of the data held.
I've read that I can't use $push on a double-nested array but the other advice about using arrayfilters all appear to be related to updating an entry in an array rather than simply appending an entirely new document - unless I'm missing something!
I'm using PyMongo and would simply like to build a new dictionary containing all of the data elements and simply append it to the history.
Thanks!
Straightforward in pymongo:
record = db.mycollection.find_one()
record['subdomains'][0]['history'].append({'another': 'record'})
db.mycollection.replace_one({'_id': record['_id']}, record)

Using MongoDB field as key reference for another value

I'm still new to MongoDB, in an aggregate query I'm trying to use a value from a field as the key to getting the other value. For example in the following document,
{
"_id" : ObjectId("5d9c245bb6c0ac7a34a43bf5"),
"status" : {
"code" : "ES004",
"params" : {
"star" : "VSP"
}
},
"description" : "{star} has been added to the cast/crew officially",
"stringToReplace" : "{star}",
"valueToReplace" : "status.params.star"
}
I wanted to replace the value of stringToReplace in description with the value of valueToReplace (i.e., value in status.params.star which is VSP)
Is there any way of doing this in an aggregate query or only way to go with this is Mongo shell?

MONGODB - cast type of every object in array of objects

I have MongoDB Collection where some documents have arrays of objects. One of the fields of this objects is timestamp.
The problem is that historically some of timestamp values are Strings (e.g. '2018-02-25T13:33:56.675000') or Date and some of them are Double (e.g. 1528108521726.26).
I have to convert all of them to Double.
I've built the query to get all the documents with the problematic type:
db.getCollection('Cases').find({sent_messages: {$elemMatch:{timestamp: {$type:[2, 9]}}}})
And I also know how to convert Date-string to double using JS:
new Date("2018-02-18T06:39:20.797Z").getTime()
> 1518935960797
But I can't build the proper query to perform the update.
Here is an example of such a document:
{
"_id" : ObjectId("6c88f656532aab00050dc023"),
"created_at" : ISODate("2018-05-18T03:43:18.986Z"),
"updated_at" : ISODate("2018-05-18T06:39:20.798Z"),
"sent_messages" : [
{
"timestamp" : ISODate("2018-02-18T06:39:20.797Z"),
"text" : "Hey",
"sender" : "me"
}
],
"status" : 1
}
After the update it should be:
{
"_id" : ObjectId("6c88f656532aab00050dc023"),
"created_at" : ISODate("2018-05-18T03:43:18.986Z"),
"updated_at" : ISODate("2018-05-18T06:39:20.798Z"),
"sent_messages" : [
{
"timestamp" : 1518935960797.00,
"text" : "Hey",
"sender" : "me"
}
],
"status" : 1
}
As per your question, you are trying to fetch the record first.
db.getCollection('Cases').find({sent_messages: {$elemMatch:{timestamp: {$type:[2, 9]}}}})
Then convert date in JS:
new Date("2018-02-18T06:39:20.797Z").getTime()
And then this is an update query:
db.getCollection('Cases').updateOne({_id:ObjectId("6c88f656532aab00050dc023")}, { $set: { "sent_messages.$.timestamp" : "218392712937.0" }})
And if you want to update all records then you should write some forEach mechanism. I think you have already this implemented.
Hope this may help you.
Finally I just do it with JS code that can be run in mongo console:
db.getCollection('Cases').find({sent_messages: {$elemMatch:{timestamp: {$type:[2, 9]}}}}).forEach(function(doc) {
print('=================');
print(JSON.stringify(doc));
doc.sent_messages.forEach(function(msg){
var dbl = new Date(msg.timestamp).getTime();
print(dbl);
msg.timestamp = dbl;
});
print(JSON.stringify(doc))
db.Cases.save(doc);
} )
Thanks all for your help!

Resolving MongoDB DBRef array using Mongo Native Query and working on the resolved documents

My MongoDB collection is made up of 2 main collections :
1) Maps
{
"_id" : ObjectId("542489232436657966204394"),
"fileName" : "importFile1.json",
"territories" : [
{
"$ref" : "territories",
"$id" : ObjectId("5424892224366579662042e9")
},
{
"$ref" : "territories",
"$id" : ObjectId("5424892224366579662042ea")
}
]
},
{
"_id" : ObjectId("542489262436657966204398"),
"fileName" : "importFile2.json",
"territories" : [
{
"$ref" : "territories",
"$id" : ObjectId("542489232436657966204395")
}
],
"uploadDate" : ISODate("2012-08-22T09:06:40.000Z")
}
2) Territories, which are referenced in "Map" objects :
{
"_id" : ObjectId("5424892224366579662042e9"),
"name" : "Afghanistan",
"area" : 653958
},
{
"_id" : ObjectId("5424892224366579662042ea"),
"name" : "Angola",
"area" : 1252651
},
{
"_id" : ObjectId("542489232436657966204395"),
"name" : "Unknown",
"area" : 0
}
My objective is to list every map with their cumulative area and number of territories. I am trying the following query :
db.maps.aggregate(
{'$unwind':'$territories'},
{'$group':{
'_id':'$fileName',
'numberOf': {'$sum': '$territories.name'},
'locatedArea':{'$sum':'$territories.area'}
}
})
However the results show 0 for each of these values :
{
"result" : [
{
"_id" : "importFile2.json",
"numberOf" : 0,
"locatedArea" : 0
},
{
"_id" : "importFile1.json",
"numberOf" : 0,
"locatedArea" : 0
}
],
"ok" : 1
}
I probably did something wrong when trying to access to the member variables of Territory (name and area), but I couldn't find an example of such a case in the Mongo doc. area is stored as an integer, and name as a string.
I probably did something wrong when trying to access to the member variables of Territory (name and area), but I couldn't find an example
of such a case in the Mongo doc. area is stored as an integer, and
name as a string.
Yes indeed, the field "territories" has an array of database references and not the actual documents. DBRefs are objects that contain information with which we can locate the actual documents.
In the above example, you can clearly see this, fire the below mongo query:
db.maps.find({"_id":ObjectId("542489232436657966204394")}).forEach(function(do
c){print(doc.territories[0]);})
it will print the DBRef object rather than the document itself:
o/p: DBRef("territories", ObjectId("5424892224366579662042e9"))
so, '$sum': '$territories.name','$sum': '$territories.area' would show you '0' since there are no fields such as name or area.
So you need to resolve this reference to a document before doing something like $territories.name
To achieve what you want, you can make use of the map() function, since aggregation nor Map-reduce support sub queries, and you already have a self-contained map document, with references to its territories.
Steps to achieve:
a) get each map
b) resolve the `DBRef`.
c) calculate the total area, and the number of territories.
d) make and return the desired structure.
Mongo shell script:
db.maps.find().map(function(doc) {
var territory_refs = doc.territories.map(function(terr_ref) {
refName = terr_ref.$ref;
return terr_ref.$id;
});
var areaSum = 0;
db.refName.find({
"_id" : {
$in : territory_refs
}
}).forEach(function(i) {
areaSum += i.area;
});
return {
"id" : doc.fileName,
"noOfTerritories" : territory_refs.length,
"areaSum" : areaSum
};
})
o/p:
[
{
"id" : "importFile1.json",
"noOfTerritories" : 2,
"areaSum" : 1906609
},
{
"id" : "importFile2.json",
"noOfTerritories" : 1,
"areaSum" : 0
}
]
Map-Reduce functions should not be and cannot be used to resolve DBRefs in the server side.
See what the documentation has to say:
The map function should not access the database for any reason.
The map function should be pure, or have no impact outside of the
function (i.e. side effects.)
The reduce function should not access the database, even to perform
read operations. The reduce function should not affect the outside
system.
Moreover, a reduce function even if used(which can never work anyway) will never be called for your problem, since a group w.r.t "fileName" or "ObjectId" would always have only one document, in your dataset.
MongoDB will not call the reduce function for a key that has only a
single value