MongoDB get object id by finding on another column value - mongodb

I am new to querying dbs and especially mongodb.If I run :
db.<customers>.find({"contact_name: Anny Hatte"})
I get:
{
"_id" : ObjectId("55f7076079cebe83d0b3cffd"),
"company_name" : "Gap",
"contact_name" : "Anny Hatte",
"email" : "ahatte#gmail.com"
}
I wish to get the value of the "_id" attribute from this query result. How do I achieve that?
Similarly, if I have another collection, named items, with the following data:
{
"_id" : ObjectId("55f7076079cebe83d0b3d009"),
"_customer" : ObjectId("55f7076079cebe83d0b3cfda"),
"school" : "St. Patrick's"
}
Here, the "_customer" field is the "_id" of the customer collection (the previous collection). I wish to get the "_id", the "_customer" and the "school" field values for the record where "_customer" of items-collection equals "_id" of customers-collection.
How do I go about this?

I wish to get the value of the "_id" attribute from this query result.
How do I achieve that?
The find() method returns a cursor to the results, which you can iterate and retrieve the documents in the result set. You can do this using forEach().
var cursor = db.customers.find({"contact_name: Anny Hatte"});
cursor.forEach(function(customer){
//access all the attributes of the document here
var id = customer._id;
})
You could make use of the aggregation pipeline's $lookup stage that has been introduced as part of 3.2, to look up and fetch the matching rows in some other related collection.
db.customers.aggregate([
{$match:{"contact_name":"Anny Hatte"}},
{$lookup:{
"from":"items",
"localField":"_id",
"foreignField":"_customer",
"as":"items"
}}
])
In case you are using a previous version of mongodb where the stage is not supported, then, you would need to fire an extra query to lookup the items collection, for each customer.
db.customers.find(
{"contact_name":"Anny Hatte"}).map(function(customer){
customer["items"] = [];
db.items.find({"_customer":customer._id}).forEach(function(item){
customer.items.push(item);
})
return customer;
})

Related

pymongo update creates a new record without upserting

I have an issue where I do an update on a document, however, the update creates a new document and I'm not upserting in my update.
This is my testing code.
I do a find to see if the document exists by checking if "lastseen" doesn't exist:
result = DATA_Collection.find({"sessionID":"12345","lastseen":{"$exists":False}})
if result.count() == 1:
DATA_Collection.update({"sessionID":"12345"},{"$set":{"lastseen":"2021-05-07"}})
When I do an aggregate check to find duplicates I get a few, one example below.
> db.DATA_Collection.find({ "sessionID" : "237a5fb8" })
{ "_id" : ObjectId("60bdf7b05c961b4d27d33bde"), "sessionID" : "237a5fb8", "firstseen" : ISODate("1970-01-19T20:51:09Z"), "lastseen" : ISODate("2021-06-07T12:34:20Z") }
{ "_id" : ObjectId("60bdf7fa7d35ea0f046a2514"), "sessionID" : "237a5fb8", "firstseen" : ISODate("1970-01-19T20:51:09Z") }
I remove all the records in the collection and rerun the script, the same happens again.
Any advice will be much appreciated.
Firstly your pymongo commands are deprecated; use update_one() or update_many() instead of update(); count_documents() instead of count().
Secondly double check you are referencing the same collections as you mention DATA_Collection and VPN_DATA;
How are you defining a "duplicate"? Unless you create a unique index on the field(s), the records won't be duplicates as they have different _id fields.
You need something like:
record = db.VPN_DATA.find_one({'sessionID': '12345', 'lastseen': {'$exists': False}})
if record is not None:
db.VPN_DATA.update_one({'_id': record.get('_id')}, {'$set': {'lastseen': '2021-05-07'}})

How to check if a portion of an _id from one collection appears in another

I have a collection where the _id is of the form [message_code]-[language_code] and another where the _id is just [message_code]. What I'd like to do is find all documents from the first collection where the message_code portion of the _id does not appear in the second collection.
Example:
> db.colA.find({})
{ "_id" : "TRM1-EN" }
{ "_id" : "TRM1-ES" }
{ "_id" : "TRM2-EN" }
{ "_id" : "TRM2-ES" }
> db.colB.find({})
{ "_id" : "TRM1" }
I want a query that will return TRM2-EN and TRM-ES from colA. Of course in my live data, there are thousands of records in each collection.
According to this question which is trying to do something similar, we have to save the results from a query against colB and use it in an $in condition in a query against colA. In my case, I need to strip the -[language_code] portion before doing this comparison, but I can't find a way to do so.
If all else fails, I'll just create a new field in colA that contains only the message code, but is there a better way do it?
Edit:
Based on Michael's answer, I was able to come up with this solution:
var arr = db.colB.distinct("_id")
var regexs = arr.map(function(elm){
return new RegExp(elm);
})
var result = db.colA.find({_id : {$nin : regexs}}, {_id : true})
Edit:
Upon closer inspection, the above method doesn't work after all. In the end, I just had to add the new field.
Disclaimer: This is a little hack it may not end well.
Get distinct _id using collection.distinct method.
Build a regular expression array using Array.prototype.map()
var arr = db.colB.distinct('_id');
arr.map(function(elm, inx, tab) {
tab[inx] = new RegExp(elm);
});
db.colA.find({ '_id': { '$nin': arr }})
I'd add a new field to colA since you can index it and if you have hundreds of thousands of documents in each collection splitting the strings will be painfully slow.
But if you don't want to do that you could make use of the aggregation framework's $substr operator to extract the [message-code] then do a $match on the result.

Override existing Docs in production MongoDB

I have recently changed one of my fields from object to array of objects.
In my production I have only 14 documents with this field, so I decided to change those fields.
Is there any best practices to do that?
As it is in my production I need to do it in a best way possible?
I got the document Id's of those collections.like ['xxx','yyy','zzz',...........]
my doc structure is like
_id:"xxx",option1:{"op1":"value1","op2":"value2"},option2:"some value"
and I want to change it like(converting object to array of objects)
_id:"xxx",option1:[{"op1":"value1","op2":"value2"},
{"op1":"value1","op2":"value2"}
],option2:"some value"
Can I use upsert? If so How to do it?
Since you need to create the new value of the field based on the old value, you should retrieve each document with a query like
db.collection.find({ "_id" : { "in" : [<array of _id's>] } })
then iterate over the results and $set the value of the field to its new value:
db.collection.find({ "_id" : { "in" : [<array of _id's>] } }).forEach(function(doc) {
oldVal = doc.option1
newVal = compute_newVal_from_oldVal(oldVal)
db.collection.update({ "_id" : doc._id }, { "$set" : { "option" : newVal } })
})
The document structure is rather schematic, so I omitted putting in actual code to create newVal from oldVal.
Since it is an embedded document type you could use push query
db.collectionname.update({_id:"xxx"},{$push:{option1:{"op1":"value1","op2":"value2"}}})
This will create document inside embedded document.Hope it helps

using an Object (subdocument) with varying fields as _id

Our (edX) original Mongo persistence representation uses a bson dictionary (aka object or subdocument) as the _id value (see, mongo/base.py). This id is missing a field.
Can some documents' _id values have more subfields than others without totally screwing up indexing?
What's the best way to handle existing documents without the additional field? Remove and replace them? Try to query w/ new _id format and if fails fall over to query w/o the new field? Try to query with both new and old _id format in one query?
To be more specific, the current format is
{'_id': {
'tag': 'i4x', // yeah, it's always this fixed value
'org': your_school_x,
'course': a_catalog_number,
'category': the_xblock_type,
'name': uniquifier_within_course
}}
I need to add 'run': the_session_or_term_for_course_run or 'course_id': org/course/run.
Documents within a collection need not have values for _id that are of the same structure. Hence, it is perfectly acceptable to have the following documents within a collection:
> db.foo.find()
{ "_id" : { "a" : 1 } }
{ "_id" : { "a" : 1, "b" : 2 } }
{ "_id" : { "c" : 1, "b" : 2 } }
Note that because the index is on only _id, only queries that specify a value for _id will use the index:
db.foo.find({_id:1}) // will use the index on _id
db.foo.find({_id:{state:"Alaska"}) // will use the index on _id
db.foo.find({"_id.a":1}) // will NOT use the index on _id
Note also that only a complete match of the "value" of _id will return a document. So this returns no documents for the collection above:
db.foo.find({_id:{c:1}})
Hence, for your case, you are welcome to add fields to the object that is the value for the _id key. And it does not matter that all documents have a different structure. But if you are hoping to query the collection by_id and have it be efficient, you are going to need to add indexes for all relevant sub parts that might be used in isolation. That is not super efficient.
_id is no different than any other key in this regard.

Get position of selected document in collection [mongoDB]

How to get position (index) of selected document in mongo collection?
E.g.
this document: db.myCollection.find({"id":12345})
has index 3 in myCollection
myCollection:
id: 12340, name: 'G'
id: 12343, name: 'V'
id: 12345, name: 'A'
id: 12348, name: 'N'
If your requirement is to find the position of the document irrespective of any order, that is not
possible as MongoDb does not store the documents in specific order.
However,if you want to know the index based on some field, say _id , you can use this method.
If you are strictly following auto increments in your _id field. You can count all the documents
that have value less than that _id, say n , then n + 1 would be index of the document based on _id.
n = db.myCollection.find({"id": { "$lt" : 12345}}).count() ;
This would also be valid if documents are deleted from the collection.
As far as I know, there is no single command to do this, and this is impossible in general case (see Derick's answer). However, using count() for a query done on an ordered id value field seems to work. Warning: this assumes that there is a reliably ordered field, which is difficult to achieve in a concurrent writer case. In this example _id is used, however this will only work with a single writer case.:
MongoDB shell version: 2.0.1
connecting to: test
> use so_test
switched to db so_test
> db.example.insert({name: 'A'})
> db.example.insert({name: 'B'})
> db.example.insert({name: 'C'})
> db.example.insert({name: 'D'})
> db.example.insert({name: 'E'})
> db.example.insert({name: 'F'})
> db.example.find()
{ "_id" : ObjectId("4fc5f040fb359c680edf1a7b"), "name" : "A" }
{ "_id" : ObjectId("4fc5f046fb359c680edf1a7c"), "name" : "B" }
{ "_id" : ObjectId("4fc5f04afb359c680edf1a7d"), "name" : "C" }
{ "_id" : ObjectId("4fc5f04dfb359c680edf1a7e"), "name" : "D" }
{ "_id" : ObjectId("4fc5f050fb359c680edf1a7f"), "name" : "E" }
{ "_id" : ObjectId("4fc5f053fb359c680edf1a80"), "name" : "F" }
> db.example.find({_id: ObjectId("4fc5f050fb359c680edf1a7f")})
{ "_id" : ObjectId("4fc5f050fb359c680edf1a7f"), "name" : "E" }
> db.example.find({_id: {$lte: ObjectId("4fc5f050fb359c680edf1a7f")}}).count()
5
>
This should also be fairly fast if the queried field is indexed. The example is in mongo shell, but count() should be available in all driver libs as well.
This might be very slow but straightforward method. Here you can pass as usual query. Just I am looping all the documents and checking if condition to match the record. Here I am checking with _id field. You can use any other single field or multiple fields to check it.
var docIndex = 0;
db.url_list.find({},{"_id":1}).forEach(function(doc){
docIndex++;
if("5801ed58a8242ba30e8b46fa"==doc["_id"]){
print('document position is...' + docIndex);
return false;
}
});
There is no way that MongoDB can return this as it does not keep documents in order in the database, just like MySQL f.e. doesn't name row numbers.
The ObjectID trick from jhonkola will only work if only one client creates new elements, as the ObjectIDs are generated on the client side, with the first part being a timestamp. There is no guaranteed order if different clients talk to the same server. Still, I would not rely on this.
I also don't quite understand what you are trying to do though, so perhaps mention that in your question? I can then update the answer.
Restructure your collection to include the position of any entry i.e {'id': 12340, 'name': 'G', 'position': 1} then when searching the database collection(myCollection) using the desired position as a query
The queries I use that return the entire collection all use sort to get a reproducible order, find.sort.forEach works with the script above to get the correct index.