MongoDB find if all array elements are in the other bigger array - mongodb

I have an array of id's of LEGO parts in a LEGO building.
// building collection
{
"name": "Gingerbird House",
"buildingTime": 45,
"rating": 4.5,
"elements": [
{
"_id": 23,
"requiredElementAmt": 14
},
{
"_id": 13,
"requiredElementAmt": 42
}
]
}
and then
//elements collection
{
"_id": 23,
"name": "blue 6 dots brick",
"availableAmt":20
}
{
"_id": 13,
"name": "red 8 dots brick",
"availableAmt":50
}
{"_id":254,
"name": "green 4 dots brick",
"availableAmt":12
}
How can I find it's possible to build a building? I.e. database will return the building only if the "elements" array in a building document consists of those elements that I have in a warehouse(elements collection) require less(or equal) amount of certain element.
In SQL(which from I came recently) I would write something likeSELECT * FROM building WHERE id NOT IN(SELECT fk_building FROM building_elemnt_amt WHERE fk_element NOT IN (1, 3))
Thank you in advance!

I wont pretend I get how it works in SQL without any comparison, but in mongodb you can do something like that:
db.buildings.find({/* building filter, if any */}).map(function(b){
var ok = true;
b.elements.forEach(function(e){
ok = ok && 1 == db.elements.find({_id:e._id, availableAmt:{$gt:e.requiredElementAmt}}).count();
})
return ok ? b : false;
}).filter(function(b){return b});
or
db.buildings.find({/* building filter, if any */}).map( function(b){
var condition = [];
b.elements.forEach(function(e){
condition.push({_id:e._id, availableAmt:{$gt:e.requiredElementAmt}});
})
return db.elements.find({$or:condition}).count() == b.elements.length ? b : false;
}).filter(function(b){return b});
The last one should be a bit quicker, but I did not test. If performance is a key, it must be better to mapReduce it to run subqueries in parallel.
Note: The examples above work with assumption that buildings.elements have no elements with the same id. Otherwise the array of elements needs to be pre-processed before b.elements.forEach to calculate total requiredElementAmt for non-unique ids.
EDIT: How it works:
Select all/some documents from buildings collection with find:
db.buildings.find({/* building filter, if any */})
returns a cursor, which we iterate with map applying the function to each document:
map(function(b){...})
The function itself iterates over elements array for each buildings document b:
b.elements.forEach(function(e){...})
and find number of documents in elements collection for each element e
db.elements.find({_id:e._id, availableAmt:{$gte:e.requiredElementAmt}}).count();
which match a condition:
elements._id == e._id
and
elements.availableAmt >= e.requiredElementAmt
until first request that return 0.
Since elements._id is unique, this subquery returns either 0 or 1.
First 0 in expression ok = ok && 1 == 0 turns ok to false, so rest of the elements array will be iterated without touching the db.
The function returns either current buildings document, or false:
return ok ? b : false
So result of the map function is an array, containing full buildings documents which can be built, or false for ones that lacks at least 1 resource.
Then we filter this array to get rid of false elements, since they hold no useful information:
filter(function(b){return b})
It returns a new array with all elements for which function(b){return b} doesn't return false, i.e. only full buildings documents.

Related

Replace part of an array in a mongo db document

With a document structure like:
{
_id:"1234",
values : [
1,23,... (~ 2000 elements)
]
}
where values represent some time series
I need to update some elements in the values array and I'm looking for an efficient way to do it. The number of elements and the positions to update vary.
I would not like to get the whole array back to the client (application layer) so i'm doing something like :
db.coll.find({ "_id": 1234 })
db.coll.update(
{"_id": 128244 },
{$set: {
"values.100": 123,
"values.200": 124
}})
To be more precise, i'm using pymongo and bulk operations
dc = dict()
dc["values.100"] = 102
dc["values.200"] = 103
bulk = db.coll.initialize_ordered_bulk_op()
bulk.find({ "_id": 1234 }).update_one({$set:dc})
....
bulk.execute()
Would you know some better way to do it ?
Would it be possible to indicate a range in the array like (values from l00 to 110) ?

MongoDB, retrieve specific field in array of objects

In my collection I have an array of objects. I'd like to share only a subset of those objects, but I can't find out how to do this?
Here are a few things I tried:
db.collections.find({},
{ fields: {
'myField': 1, // works
'myArray': 1, // works
'myArray.$': 1, // doesn't work
'myArray.$.myNestedField': 1, // doesn't work
'myArray.0.myNestedField': 1, // doesn't work
}
};
myArray.myNestedField':1 for projecting nested fields from the array.
I'll briefly explain all the variants you have.
'myField': 1 -- Projecting a field value
'myArray': 1 -- Projecting a array as a whole - (Can be scalar, embedded and sub document)
The below variants works only with positional operator($) in the query preceding the projections and projects only the first element matching the query.
'myArray.$': 1
'myArray.$.myNestedField': 1
This is not a valid projection operation.
'myArray.0.myNestedField': 1
More here on how to query & project documents

Is there a way to return part of an array in a document in MongoDB?

Pretend I have this document:
{
"name": "Bob",
"friends": [
"Alice",
"Joe",
"Phil"
],
"posts": [
12,
15,
55,
61,
525,
515
]
}
All is good with only a handful of posts. However, let's say posts grows substantially (and gets to the point of 10K+ posts). A friend mentioned that I might be able to keep the array in order (i.e. the first entry is the ID of the newest post so I don't have to sort) and append new posts to the beginning. This way, I could get the first, say, 10 elements of the array to get the 10 newest items.
Is there a way to only retrieve posts n at a time? I don't need 10K posts being returned, when most of them won't even be looked at, but I still need to keep around for records.
You can use $slice operator of mongoDB in projection to get n elements from array like following:
db.collection.find({
//add condition here
}, {
"posts": {
$slice: 3 //set number of element here
//negative number slices from end of array
}
})
You can do this :
create a list for posts you want to have (say you want first 3 posts) and return that list
for doc in db.collections.find({your query}):
temp = ()
for i in range (2):
temp.push(doc['posts'][i])
return temp

PyMongo updating array records with calculated fields via cursor

Basically the collection output of an elaborate aggregate pipeline for a very large dataset is similar to the following:
{
"_id" : {
"clienta" : NumberLong(460011766),
"clientb" : NumberLong(2886729962)
},
"states" : [
[
"fixed", "fixed.rotated","fixed.rotated.off"
]
],
"VBPP" : [
244,
182,
184,
11,
299,
],
"PPF" : 72.4,
}
The intuitive, albeit slow, way to update these fields to be calculations of their former selves (length and variance of an array) with PyMongo before converting to arrays is as follows:
records_list = []
cursor = db.clientAgg.find({}, {'_id' : 0,
'states' : 1,
'VBPP' : 1,
'PPF': 1})
for record in cursor:
records_list.append(record)
for dicts in records_list:
dicts['states'] = len(dicts['states'])
dicts['VBPP'] = np.var(dicts['VBPP'])
I have written various forms of this basic flow to optimize for speed, but bringing in 500k dictionaries in memory to modify them before converting them to arrays to go through a machine learning estimator is costly. I have tried various ways to update the records directly via a cursor with variants of the following with no success:
cursor = db.clientAgg.find().skip(0).limit(50000)
def iter():
for item in cursor:
yield item
l = []
for x in iter():
x['VBPP'] = np.var(x['VBPP'])
# Or
# db.clientAgg.update({'_id':x['_id']},{'$set':{'x.VBPS': somefunction as above }},upsert=False, multi=True)
I also unsuccessfully tried using Mongo's usual operators since the variance is as simple as subtracting the mean from each element of the array, squaring the result, then averaging the results.
If I could successfully modify the collection directly then I could utilize something very fast like Monary or IOPro to load data directly from Mongo and into a numpy array without the additional overhead.
Thank you for your time
MongoDB has no way to update a document with values calculated from the document's fields; currently you can only use update to set values to constants that you pass in from your application. So you can set document.x to 2, but you can't set document.x to document.y + document.z or any other calculated value.
See https://jira.mongodb.org/browse/SERVER-11345 and https://jira.mongodb.org/browse/SERVER-458 for possible future features.
In the immediate future, PyMongo will release a bulk API that allows you to send a batch of distinct update operations in a single network round-trip which will improve your performance.
Addendum:
I have two other ideas. First, run some Javascript server-side. E.g., to set all documents' b fields to 2 * a:
db.eval(function() {
var collection = db.test_collection;
collection.find().forEach(function(doc) {
var b = 2 * doc.a;
collection.update({_id: doc._id}, {$set: {b: b}});
});
});
The second idea is to use the aggregation framework's $out operator, new in MongoDB 2.5.2, to transform the collection into a second collection that includes the calculated field:
db.test_collection.aggregate({
$project: {
a: '$a',
b: {$multiply: [2, '$a']}
}
}, {
$out: 'test_collection2'
});
Note that $project must explicitly include all the fields you want; only _id is included by default.
For a million documents on my machine the former approach took 2.5 minutes, and the latter 9 seconds. So you could use the aggregation framework to copy your data from its source to its destination, with the calculated fields included. Then, if desired, drop the original collection and rename the target collection to the source's name.
My final thought on this, is that MongoDB 2.5.3 and later can stream large result sets from an aggregation pipeline using a cursor. There's no reason Monary can't use that capability, so you might file a feature request there. That would allow you to get documents from a collection in the form you want, via Monary, without having to actually store the calculated fields in MongoDB.

Mongo DB: how to select items with nested array count > 0

The database is near 5GB. I have documents like:
{
_id: ..
user: "a"
hobbies: [{
_id: ..
name: football
},
{
_id: ..
name: beer
}
...
]
}
I want to return users who have more then 0 "hobbies"
I've tried
db.collection.find({"hobbies" : { &gt : 0}}).limit(10)
and it takes all RAM and no result.
How to do conduct this select?
And how to return only: id, name, count ?
How to do it with c# official driver?
TIA
P.S.
near i've found:
"Add new field to hande category size. It's a usual practice in mongo world."
is this true?
In this specific case, you can use list indexing to solve your problem:
db.collection.find({"hobbies.0" : {$exists : true}}).limit(10)
This just makes sure a 0th element exists. You can do the same to make sure the list is shorter than n or between x and y in length by checking the existing of elements at the ends of the range.
Have you tried using hobbies.length. i haven't tested this, but i believe this is the right way to query the range of the array in mongodb
db.collection.find({$where: '(this.hobbies.length > 0)'})
You can (sort of) check for a range of array lengths with the $size operator using a logical $not:
db.collection.find({array: {$not: {$size: 0}}})
That's somewhat true.
According to the manual
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24size
$size
The $size operator matches any array with the specified number of
elements. The following example would match the object {a:["foo"]},
since that array has just one element:
db.things.find( { a : { $size: 1 } } );
You cannot use $size to find a range of sizes (for example: arrays
with more than 1 element). If you need to query for a range, create an
extra size field that you increment when you add elements
So you can check for array size 0, but not for things like 'larger than 0'
Earlier questions explain how to handle the array count issue. Although in your case if ZERO really is the only value you want to test for, you could set the array to null when it's empty and set the option to not serialize it, then you can test for the existence of that field. Remember to test for null and to create the array when you want to add a hobby to a user.
For #2, provided you added the count field it's easy to select the fields you want back from the database and include the count field.
if you need to find only zero hobbies, and if the hobbies key is not set for someone with zero hobbies , use EXISTS flag.
Add an index on "hobbies" for performance enhancement :
db.collection.find( { hobbies : { $exists : true } } );
However, if the person with zero hobbies has empty array, and person with 1 hobby has an array with 1 element, then use this generic solution :
Maintain a variable called "hcount" ( hobby count), and always set it equal to size of hobbies array in any update.
Index on the field "hcount"
Then, you can do a query like :
db.collection.find( { hcount : 0 } ) // people with 0 hobbies
db.collection.find( { hcount : 5 } ) // people with 5 hobbies
3 - From #JohnPs answer, "$size" is also a good operator for this purpose.
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24size