MongoDB: Missing fields after sort() when using projection - mongodb

So I have a database filled with image information, and I want to retrieve a subset of the fields sorted by ascending date. I use the following query to retrieve the aggregated set:
db.images.find({}, {rel_path: 1, date: 1}).sort({'date.year': 1, 'date.month': 1})
I expect this query to return a set looking something like this:
{
"_id": ObjectId("530deb1060832c64291a11a7"),
"date": { "year: 2006, "month": 2 },
"rel_path": "/mnt/backup/Backup/Photos/asdfasdfasdf.jpg"
}
{
"_id": ObjectId("530de1db60832c64291a05ec"),
"date": { "year: 2006, "month": 5 },
"rel_path": "/mnt/backup/Backup/Photos/qweqweqwe.jpg"
}
... <more documents> ...
What I get, however, looks like this:
{
"_id": ObjectId("530deb1060832c64291a11a7"),
"rel_path": "/mnt/backup/Backup/Photos/asdfasdfasdf.jpg"
}
{
"_id": ObjectId("530de1db60832c64291a05ec"),
"rel_path": "/mnt/backup/Backup/Photos/qweqweqwe.jpg"
}
... <more documents> ...
If I skip the 'sort()' I get all fields from my projection, so it seems the 'date' field somehow is removed by the 'sort()' call.
Anyone have any idea what's going on here?
Edit: Here's a sample document by request:
{
"_id" : ObjectId("530de16860832c64291a0562"),
"orientation" : 1,
"camera_make" : "Apple",
"camera_model" : "iPhone 4",
"rel_path" : "Bröllopsbilder/IMG_0997.JPG",
"file_size" : 1827977,
"date" : { "month" : "10", "year" : "2011" },
"root" : "/mnt/backup/Backup/Bilder/",
"md5" : "fb26ebf24914d515144be5e53797744b"
}

The find() query looks fine and it works as expected. I tested it by running it against a similar data set.
Reason this could be happening is when a few documents in the collection do not have the "date" field. Try running the same query by adding a filter criteria in the find query to return only those results where "date" field exists using $exists operator i.e.,:
db.images.find({date:{$exists:true}}, {rel_path: 1, date: 1})
.sort({'date.year': 1, 'date.month': 1})

Related

Display field document in mongoDB after execute Query aggregate

This is an example of a data document
{
"_id" : ObjectId("5f437e7846103b2ad0fc5d7d"),
"order_no" : "O-200824-AGFJDQW",
"shipment_no" : "S-200824-AGWCRRM",
"member_id" : 2200140,
"ponta_id" : "9990010100280214",
"plu" : 14723,
"product_name" : "AQUA Air Mineral Botol Air Pet 600ml",
"qty" : 2,
"store_id" : "TD46",
"stock_on_hand" : 0,
"transaction_date" : ISODate("2020-08-24T08:28:29.931Z"),
"created_date" : ISODate("2020-08-24T08:46:48.441Z")
}
this is the data query that I run
var bulan = 12 //month is written with number. example: August = 8
db.log_stock_oos.aggregate([
{
$project: {
month: {
$month: '$transaction_date'
}
}
},
{
$match: {month: bulan}
}
]);
but the result is like this after I run the query
{
"_id" : ObjectId("5f44689607fe453fbfba433e"),
"month" : 12
}
how to make the output exactly like the document display that I attached above??
this is my reference
When you use the projection, its kind of if your value 1 then include the field, if your value 0 then exclude the field from the whole documents. Projection
You can do two things
Use the projection
db.collection.aggregate([
{
$project: {
month: {
$month: "$transaction_date"
},
order_no: 1,
shipment_no: 1,
member_id: 1,
//other fields like above with the value 1
}
},
// match stages
])
Use $addFields
use $addFields incited of $project in your code. If will create a filed if not exists in your document, else it will overwrite the field

MongoDb How to group by month and year from string

I am having field dateStr in collection
{ .... "dateStr" : "07/01/2020" .... }
{ .... "dateStr" : "07/01/1970" .... }
I want to group by month and year from dateStr field
I have tried
db.collection.aggregate(
{$project : {
month : {$month : new Date("$dateStr")},
year : {$year : new Date("$dateStr")}
}},
{$group : {
_id : {month : "$month" ,year : "$year" },
count : {$sum : 1}
}})
Output :
{
"result" : [
{
"_id" : {
"month" : 1,
"year" : 1970
},
"count" : 2
}
],
"ok" : 1
}
But I am having two years 1970,2020. Why I am getting single record?
You cannot use the date aggregation operators on anything else that is tho a Date object itself. Your ultimate best option is to convert these "strings" to proper Date objects so you can query correctly in this and future operations.
That said, if your "strings" always have a common structure then there is a way to do this with the aggregation framework tools. It requires a lot of manipulation thought that does not makes this an "optimal" approach to dealing with the problem. But with a set structure of "double digits" and a consistent delimiter this is possible with the $substr operator:
db.collection.aggregate([
{ "$group": {
"_id": {
"year": { "$substr": [ "$dateStr", 7, 4 ] },
"month": { "$substr": [ "$dateStr", 4, 2 ] }
},
"count": { "$sum": 1 }
}}
])
So JavaScript casting does not work inside the aggregation framework. You can always "feed" input to the pipeline based on "client code" evaluation, but the aggregation process itself does not evaluate any code. Just like the basic query engine, this is all based on a "data structure" implementation that uses "native operator" instructions to do the work.
You cannot convert strings to dates in the aggregation pipeline. You should work with real BSON Date objects, but you can do it with strings if there is a consistent format that you can present in a "lexical order".
I still suggest that you convert these to BSON Dates ASAP. And beware that the "ISODate" or UTC value is constructed with a different string form. ie:
new Date("2020-01-07")
Being in "yyyy-mm-dd" format. At least for the JavaScript invocation.

Mongodb Update/Upsert array exact match

I have a collection :
gStats : {
"_id" : "id1",
"criteria" : ["key1":"value1", "key2":"value2"],
"groups" : [
{"id":"XXXX", "visited":100, "liked":200},
{"id":"YYYY", "visited":30, "liked":400}
]
}
I want to be able to update a document of the stats Array of a given array of criteria (exact match).
I try to do this on 2 steps :
Pull the stat document from the array of a given "id" :
db.gStats.update({
"criteria" : {$size : 2},
"criteria" : {$all : [{"key1" : "2096955"},{"value1" : "2015610"}]}
},
{
$pull : {groups : {"id" : "XXXX"}}
}
)
Push the new document
db.gStats.findAndModify({
query : {
"criteria" : {$size : 2},
"criteria" : {$all : [{"key1" : "2015610"}, {"key2" : "2096955"}]}
},
update : {
$push : {groups : {"id" : "XXXX", "visited" : 29, "liked" : 144}}
},
upsert : true
})
The Pull query works perfect.
The Push query gives an error :
2014-12-13T15:12:58.571+0100 findAndModifyFailed failed: {
"value" : null,
"errmsg" : "exception: Cannot create base during insert of update. Cause
d by :ConflictingUpdateOperators Cannot update 'criteria' and 'criteria' at the
same time",
"code" : 12,
"ok" : 0
} at src/mongo/shell/collection.js:614
Neither query is working in reality. You cannot use a key name like "criteria" more than once unless under an operator such and $and. You are also specifying different fields (i.e groups) and querying elements that do not exist in your sample document.
So hard to tell what you really want to do here. But the error is essentially caused by the first issue I mentioned, with a little something extra. So really your { "$size": 2 } condition is being ignored and only the second condition is applied.
A valid query form should look like this:
query: {
"$and": [
{ "criteria" : { "$size" : 2 } },
{ "criteria" : { "$all": [{ "key1": "2015610" }, { "key2": "2096955" }] } }
]
}
As each set of conditions is specified within the array provided by $and the document structure of the query is valid and does not have a hash-key name overwriting the other. That's the proper way to write your two conditions, but there is a trick to making this work where the "upsert" is failing due to those conditions not matching a document. We need to overwrite what is happening when it tries to apply the $all arguments on creation:
update: {
"$setOnInsert": {
"criteria" : [{ "key1": "2015610" }, { "key2": "2096955" }]
},
"$push": { "stats": { "id": "XXXX", "visited": 29, "liked": 144 } }
}
That uses $setOnInsert so that when the "upsert" is applied and a new document created the conditions specified here rather than using the field values set in the query portion of the statement are used instead.
Of course, if what you are really looking for is truly an exact match of the content in the array, then just use that for the query instead:
query: {
"criteria" : [{ "key1": "2015610" }, { "key2": "2096955" }]
}
Then MongoDB will be happy to apply those values when a new document is created and does not get confused on how to interpret the $all expression.

In a Mongo collection, how do you query for a specific object in an array?

I'm trying to retrieve an object from an array in mongodb. Below is my document:
{
"_id" : ObjectId("53e9b43968425b29ecc87ffd"),
"firstname" : "john",
"lastname" : "smith",
"trips" : [
{
"submitted" : 1407824585356,
"tripCategory" : "staff",
"tripID" : "1"
},
{
"tripID" : "2",
"tripCategory" : "volunteer"
},
{
"tripID" : "3",
"tripCategory" : "individual"
}
]
}
My ultimate goal is to update only when trips.submitted is absent so I thought I could query and determine what the mongo find behavior would look like
if I used the $and query operator. So I try this:
db.users.find({
$and: [
{ "trips.tripID": "1" },
{ "trips": { $elemMatch: { submitted: { $exists: true } } } }
]
},
{ "trips.$" : 1 } //projection limits to the FIRST matching element
)
and I get this back:
{
"_id" : ObjectId("53e9b43968425b29ecc87ffd"),
"trips" : [
{
"submitted" : 1407824585356,
"tripCategory" : "staff",
"tripID" : "1"
}
]
}
Great. This is what I want. However, when I run this query:
db.users.find({
$and: [
{ "trips.tripID": "2" },
{ "trips": { $elemMatch: { submitted: { $exists: true } } } }
]
},
{ "trips.$" : 1 } //projection limits to the FIRST matching element
)
I get the same result as the first! So I know there's something odd about my query that isn't correct. But I dont know what. The only thing I've changed between the queries is "trips.tripID" : "2", which in my head, should have prompted mongo to return no results. What is wrong with my query?
If you know the array is in a specific order you can refer to a specific index in the array like this:-
db.trips.find({"trips.0.submitted" : {$exists:true}})
Or you could simply element match on both values:
db.trips.find({"trips" : {$elemMatch : {"tripID" : "1",
"submitted" : {$exists:true}
}}})
Your query, by contrast, is looking for a document where both are true, not an element within the trips field that holds for both.
The output for your query is correct. Your query asks mongo to return a document which has the given tripId and the field submitted within its trips array. The document you have provided in your question satisfies both conditions for both tripIds. You are getting the first element in the array trips because of your projection.
I have assumed you will be filtering records by the person's name and then retrieving the elements inside trips based on the field-exists criteria. The output you are expecting can be obtained using the following:
db.users.aggregate(
[
{$match:
{
"firstname" : "john",
"lastname" : "smith"
}
},
{$unwind: "$trips"},
{$match:
{
"trips.tripID": "1" ,
"trips.submitted": { $exists: true }
}
}
]
)
The aggregation pipeline works as follows. The first $match operator filters one document (in this case the document for john smith) The $unwind operator in mongodb aggregation unwinds the specified array (trips in this case), in effect denormalizing the sub-records associated with the parent records. The second $match operator filters the denormalized/unwound documents further to obtain the one required as per your query.

Upsert with pymongo and a custom _id field

I'm attempting to store pre-aggregated performance metrics in a sharded mongodb according to this document.
I'm trying to update the minute sub-documents in a record that may or may not exist with an upsert like so (self.collection is a pymongo collection instance):
self.collection.update(query, data, upsert=True)
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
data:
{ 'minute': { '16': { '45': 1.6693091}}}
The problem is that in this case the 'minute' subdocument always only has the last hour: { minute: metric} entry, the minute subdocument does not create new entries for other hours, it's always overwriting the one entry.
I've also tried this with a $set style data entry:
{ '$set': { 'minute': { '16': { '45': 1.6693091}}}}
but it ends up being the same.
What am I doing wrong?
In both of the examples listed you are simply setting a field ('minute')to a particular value, the only reason it is an addition the first time you update is because the field itself does not exist and so must be created.
It's hard to determine exactly what you are shooting for here, but I think what you could do is alter your schema a little so that 'minute' is an array. Then you could use $push to add values regardless of whether they are already present or $addToSet if you don't want duplicates.
I had to alter your document a little to make it valid in the shell, so my _id (and some other fields) are slightly different to yours, but it should still be close enough to be illustrative:
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
}
}
Now let's add a minute field with an array of documents instead of a single document:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '16': {'45': 1.6693091}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
}
]
}
Then, to illustrate the addition, add a slightly different entry (since I am using $addToSet this is required for a new field to be added:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '17': {'48': 1.6693391}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
},
{
"17" : {
"48" : 1.6693391
}
}
]
}
I ended up setting the fields like this:
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
I'm setting the metrics like this:
data = {"$set": {}}
for metric in csv:
date_utc = metric['date'].astimezone(pytz.utc)
data["$set"]["minute.%d.%d" % (date_utc.hour,
date_utc.minute)] = float(metric['metric'])
which creates data like this:
{"$set": {'minute.16.45': 1.6693091,
'minute.16.46': 1.566343,
'minute.16.47': 1.22322}}
So that when self.collection.update(query, data, upsert=True) is run it updates those fields.