mongodb $addToSet failure, specify full document to insert - mongodb

I've done a bit of research on this and haven't come across anything that jumps out at me immediately as what I'm looking for.
Say we have a document (or documents) in a collection that look something like this:
//First example document
{
"_id": "JK",
"letters": ["J", "K"]
}
//Second example document
{
"_id": "LM",
"letters": ["L"]
}
So I run a query like the one below to see if I have any matching documents and of course I don't so I expect to get null.
> db.example.findOne({"_id": "LM", "letters": {"$in": ["M"]}})
null
So I do an update and add "M" to the letters array on the documents (syntax may not be quite right):
> db.example.update({"_id": "LM"}, {"$addToSet": {"letters": "M"}})
I run the possibility of not having a matching _id, so the findOne would would also return null given the example documents in the collection for this query.
> db.example.findOne({"_id": "AB", "letters": {"$in": ["A"]}})
null
Based on the way I've constructed the above query, I get null back when "A" is not found in letters or the _id of "AB" is not found on any document. In this case I know that this document isn't in there because I know what is in the collection.
What I'd like to do is keep my update query from above with $addToSet and modify it to use upsert WHILE ALSO specifying the document to insert in the event that $addToSet fails due to the document not existing to cut down on database transactions. Is this possible? Or will I have to break up my queries a bit in order to accommodate this?
Because this information may influence answers:
I do my querying through mongo shell and pymongo.
mongo version: 2.6.11
pymongo version: 2.8
Thanks for any help!
EDIT: So after a break and a bit more digging, it seems setOnInsert does what I was looking for. I do believe that this probably solves my issue, but I've not had a chance to test yet.

Related

MongoDB $in operator array max length

In Meteor I use MongoDB to store a collection of Objects. There is around 500k docs inserted.
I use Objets.find({ "_id": { "$in": objIds } }); Where objIds is an array of _id. This works fine when I have an array length of 1000 but when I try with 13145 _ids the app stops responding.
Obviously there's already an index on the _id field and also this search probably won't ever happen but I'm not sure if this is normal behavior. Is there a max length for the $in operator? Couldn't find one in the documentation.
Here's my publish in Meteor :
Meteor.publish('objetsByIds', function objetsByIdsPublication(objIds) {
return Objets.find({ "_id": { "$in": objIds } });
})
Don't know much about Meteor, BUT, MongoDB uses cursors when retrieving large amounts of data, and depending on the driver implementation is how Meteor handle this.
Though you could take a look at cursors here, other idea that comes to my mind is to divide the query. So if you know 1000 works well, make a loop where, using mod, you make the results be 1000 documents long.

How to query all documents, filter for a specific field and return the value for each document in Elasticsearch?

I'm currently running an Elasticsearch instance which is synchronizing from a MongoDB via river. The MongoDB contains entries like this:
{field1: "value1", field2: "value2", cars: ["BMW", "Ford", "Porsche"]}
Not every entry in Mongo does have a cars field.
Now I want to create an ElasticSearch query which is searching over every document and return just the cars field from every single document indexed in Elasticsearch.
Is it even possible? Elasticsearch must touch every single document to return the cars field. Maybe querying with Mongo is just easier and as fast as Elasticsearch. What do you think?
The following query POSTed to hostname:9200/_search should get you started:
{
"filter": {
"exists": {
"field": "cars"
}
},
"fields": ["cars"]
}
The filter clause limits the results to documents with a cars field.
The fields clause says to only return the cars field. If you wanted the entire document returned, you would leave this section out.
References:
https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#_response_filtering
Make elasticsearch only return certain fields?
Elasticsearch (from my understanding) is not intended to be a SSoT database. It is very good at text searching, and analytics aggregations, but it isn't necessarily intended to be your primary database.
However, your use case isn't necessarily non performant in elasticsearch, it sounds like you just want to filter for your cars field, which you can do as documented here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-fields.html
Lastly, I would actually venture that elasticsearch is faster than mongo in this case (assuming that the cars field is NOT indexed and elasticsearch is, which is their respective defaults), since you probably want to filter out the case in which the cars field is not set.
tl;dr elasticsearch isn't intended for your particularly use-case, but it probably is faster than mongo assuming you filter out the cars field being 'missing'

Why does db.collection.find({}).maxTimeMS(100) return everything, but db.collection.find({}, {$maxTimeMS: 100}) only returns object IDs?

I'm using MongoDB 2.6.8. According to the $maxTimeMS reference, these two queries should behave identically:
> db.collection.find({}).maxTimeMS(100)
> db.collection.find({}, {$maxTimeMS: 100})
The first query does exactly what I want, but the second query restricts only returns the object IDs of the documents. I tried increasing $maxTimeMS to 100000000 and there was no change in behavior.
Why am I getting different results for these two commands?
You found a bug in the documentation.
The reason that db.collection.find({}, {$maxTimeMS: 100}) returns only the _id of each object is because mongoDB is interpreting the {$maxTimeMS: 100} portion of the query as a projection.
So it thinks you want to see all the documents and you want to see the fields _id and the field $maxTimeMS. Of course, none of your documents have a $maxTimeMS field, so they only show the _id.
The proper way to perform the query you want without the shortcut is:
db.collection.find({ $query: {}, $maxTimeMS: 100 })

$elemMatch Projection on a Simple Array

Imagine a collection of movies (stored in a MongoDB collection), with each one looking something like this:
{
_id: 123456,
name: 'Blade Runner',
buyers: [1123, 1237, 1093, 2910]
}
I want to get a list of movies, each one with an indication whether buyer 2910 (for example) bought it.
Any ideas?
I know I can change [1123, 1237, 1093, 2910] to [{id:1123}, {id:1237}, {id:1093}, {id:2910}] to allow the use of $elemMatch in the projection, but would prefer not to touch the structure.
I also know I can perhaps use the $unwind operator (within the aggregation framework), but that seems very wasteful in cases where buyer has thousands of values (basically exploding each document into thousands of copies in memory before matching).
Any other ideas? Am I missing something really simple here?
You can use the $setIsSubset aggregation operator to do this:
var buyer = 2910;
db.movies.aggregate(
{$project: {
name: 1,
buyers: 1,
boughtIt: {$setIsSubset: [[buyer], '$buyers']}
}}
)
That will give you all movie docs with a boughtIt field added that indicates whether buyer is contained in the the movie's buyers array.
This operator was added in MongoDB 2.6.
Not really sure of your intent here, but you don't need to change the structure just to use $elemMatch in projection. You can just issue like this:
db.movies.find({},{ "buyers": { "$elemMatch": { "$eq": 2910 } } })
That would filter the returned array elements to just the "buyer" that was indicated, or nothing where this was not present. It is true to point out that the $eq operator used here is not actually documented, but it does exist. So that may not be immediately clear that you can construct a condition in that way.
It seems a little wasteful to me though as you are returning "everything" regardless of whether the "buyer" is present or not. So a "query" seems more logical than a projection:
db.movies.find({ "buyers": 2910 })
And optionally either just keeping only that matched result:
db.movies.find({ "buyers": 2910 },{ "buyers.$": 1})
Set operators in the aggregation framework give you more options with $project which can do more to alter the document. But if you just want to know if someone "bought" the item, then a "query" seems the be logical and fastest way to do so.

Mongo remove last documents

I would like to know how to delete, for example, the last 100 documents inserted in my collection.
How is it possible from the shell?
You should be able to use the _id to sort on last inserted, as outlined in the answer here:
db.coll.find().sort({_id:-1}).limit(100);
It looks like using limit on the standard mongo remove operation isn't supported though, so you might use something like this to delete the 100 documents:
for(i=0;i<100;i++) {
db.coll.findAndModify({query :{}, sort: {"_id" : -1}, remove:true})
}
See the docs for more on findAndModify.