Delete matching two regular expressions on the same field - mongodb

I need to delete documents but with multiple condition at the same variable.
db.getCollection('system_parameter').remove(
{
"variable":/^pickup_kota/,
"variable":
{
$nin:
[
/^pickup_kota_jakarta/
]
}
}
)
What I'm trying to do is, I want to delete all the data with the same prefix ('pickup_kota'), but excluding the ('pickup_kota_jakarta') documents.
If I execute the query above, ALL of the data is removed including let's say prefix 'some_doc' but excluding 'pickup_kota_jakarta'

All MongoDB query arguments are already AND conditions, so just include them on the same key:
db.getCollection('system_parameter').remove({
"variable": {
"$regex": /^pickup_kota/,
"$nin": [
/^pickup_kota_jakarta/
]
}
})
Or you can always write the "long form" with $and
db.getCollection('system_parameter').remove({
"$and": [
{ "variable": /^pickup_kota/ },
{ "variable": "$nin": [/^pickup_kota_jakarta/] }
]
})
So with two documents like this:
{ "variable" : "pickup_kota_somewhere" }
{ "variable" : "pickup_kota_jakarta" }
Only the first one gets removed
But as long as you can use a different operator such as $regex here to separate the conditions onto keys, then you don't need the full form.
Also since those are both anchored to the start of the string, it's more efficient for MongoDB to do the two comparisons than attempting a regular expression to meet both possible conditions. The only regular expression that could would break that bounding rule, and obviate the efficiency gained by searching anchored to the beginning of the string which can use an index.

You can try $all operator.
db.getCollection('system_parameter').remove({
"variable": {$all: [
{$regex: /^pickup_kota/},
{$nin:[/^pickup_kota_jakarta/]}
]}
})

Related

Appending a string to all values in an array field in MongoDB

I have a collection filled with docs that contain an "ip_addresses" field, which is an array of strings (IPs). I want to append '/32' to all of these values in all of my docs that don't already have a CIDR range suffix.
Here's my issue:
I don't know how to use the current value of the IP which is being iterated on.
Even if I did, $concat doesn't seem to work and throws an error even with placeholder values (as in the query below) - The dollar ($) prefixed field '$concat' in 'ip_addresses.0.$concat' is not valid for storage.
Here is my current query which throws the error:
db.getCollection('docs').update(
{},
{ $set: { "ip_addresses.$[ip]": { "$concat": [ "1.1.1.1", "/32" ] } } },
{
arrayFilters: [ { "ip": { $not: /.+\/\d{1,2}/ } } ],
multi: true
}
)
I'd appreciate help using the current values in the array in the $concat command and resolving the error.
Let's tackle each of the problems.
2 - $concat is an aggregation operator, hence it cannot be used for an update. you can view the list of available operators for an update here. You might notice that none of these "work" on actual document values which brings me back to the first question you had.
1 - In order you use a current document values within an update you have to use pipelined updates which is only available for Mongo v4.2+, if you're using a lesser Mongo version you have to fetch the documents into memory and do the update one by one in code. If you are on Mongo v4.2+ then the pipeline update syntax goes like this:
db.collection.updateMany(
{ip_addresses: {$exists: true}},
[
{
$set: {
ip_addresses: {
$map: {
input: '$ip_addresses',
as: 'ipAddress',
in: {
$cond: [
{
$regexMatch: {input: "$$ipAddress", regex: /.+\/\d{1,2}/}
},
'$$ipAddress',
{
$concat: [
'$$ipAddress',
'/32'
]
}
]
}
}
}
}
}
]
);

Remove complete document or element from array based on condition

My collection documents are:
{
"_id" : 1,
"fruits" : [ {"name":"pears"},
{"name":"grapes"},
{"name":"bananas"} ],
}
{
"_id" : 2,
"fruits" : [ {"name":"bananas"} ],
}
I need to remove the whole document when the fruits contains only "bananas" or only remove the fruit "bananas" when there are more than one fruit in the fruits array.
My final collection after running the required query should be:
{
"_id" : 1,
"fruits" : [ {"name":"pears"},
{"name":"grapes"}],
}
I am currently using two queries to get this done:
db.collection.remove({'fruits':{$size:1, $elemMatch:{'name': 'bananas'} }}) [this will remove the document when only one fruit present]
and
db.collection.update({},{$pull:{'fruits':{'name':'bananas'}}},{multi: true}) [this will remove the entry 'bananas' from the array]
Is there any way to combine these into one query?
EDIT: Final take
-- I guess there is no "one query" to perform the above tasks since the intents are very different of both the actions.
-- The best that can be performed is: club the actions into a bulk_write query which saves on the network I/O(as suggested in the answer by Neil). This is believe is more beneficial when you have multiple such actions being fired. Also, bulk_write can provide the feature of locking in the sense that the "ordered" mode of the bulk_write makes the actions sequential, breaking and halting execution in case of error.
Hence bulk_write is more beneficial when the actions performed need to be sequential. Somewhat like "chaining" in JS. There is also the option to perform un-ordered bulk_writes.
Also, the actions specified in the bulk write, operate on the collection level as individual actions.
You basically want bulk_write() here to do them both. Also Use $exists to ensure there's only one element:
from pymongo import UpdateMany, DeleteMany
db.collection.bulk_write(
[
UpdateMany(
{ "fruits.1": { "$exists": True }, "fruits.name": "bananas" },
{ "$pull":{
'fruits': { 'name':'bananas' }
}}
),
DeleteMany(
{ "fruits.1": { "$exists": False }, "fruits.name": "bananas" }
)
],
ordered=False
)
You don't really need $elemMatch for "one" condition and you should be using update_many() and in this case UpdateMany() instead of { "multi": true }. And that option is different in "pymongo" anyway. Then of course there is delete_many() or DeleteMany() for the "bulk" context.
Bulk operations send one request with one response, which is better than sending multiple requests. Also "update" and "delete" are two different things, but the single request can combine just like this.
The $size operator is valid but $exists can apply to a "range" where $size cannot, so it's generally a bit more flexible.
i.e Just as a $exists range example
# Array between 2 and 4 elements
db.collection.update_many(
{
"fruits.1": { "$exists": True },
"fruits.4": { "$exists": False },
"fruits.name": "bananas"
},
{ "$pull":{
'fruits': { 'name':'bananas' }
}}
)
And of course in the context here you actually want to know the difference between other possible things in the array and those with "only" a single "bananas".
The ordered=False here actually refers to two different ways that "bulk write" requests can be handled
Ordered - Where True ( which is the "default" ) then the operations are executed in "serial order" as they appear in the array of operations sent with the "bulk write". If any error occurs here then the batch stops execution at the point of the error and returns an exception.
UnOrdered - Where False the operations are executed in "parallel" within reasonable constraints on the server. If any error occurs there is still an exception raised, however this does not stop other operations within the "bulk write" from completing. Any errors are returned with the "array index" from the list provided to the command of which operation caused the error.
This option can used to "tune" the desired behavior in particular to error reporting and continuation, and also allows a degree of "parallelism" to the execution where "serial" is not actually required of the operations. Since these two statements do not actually depend on one or the other and will in fact select different documents anyway, then ordered=False is probably the better option in terms of efficiency here.
db.users.aggregate(
[{
$project: {
data: {
$filter: {
input: "$fruits",
as: "filterData",
cond: { $ne: [ "$$filterData.name", 'bananas' ] }
}
}
}
},
{
$unwind: {
path : "$data",
preserveNullAndEmptyArrays : false
}
},
{
$group: {
_id:"$_id",
data: { $addToSet: "$data" }
}
},
])
I think above query would give you perfect results

How to do regex (or text) search in multiple objects in a document

My MongoDB document has structure like below:
{
"sentence 0":{
"chunk":["some text",
"text",
"abc"]
},
"sentence 1":{
"chunk":["some text",
"this is a perfect thing",
"abc"]
}
}
I need to find all the documents which have word "perfect" in chunk of any sentence X.
So far I got this, which is wrong as it doesn't even search inside all sentence fields.
db.collection.find({"Sentence 0":{ $elemMatch: {"$regex": ".*perfect.*"}}}).limit(10)
Those are not arrays and therefore $elemMatch does not apply, since it's only use is with actual arrays and also for "multiple" criteria instead of one condition.
They are in fact "sub-documents" specified by "key". Your path therefore needs to be exact:
db.collection.find({ "sentence 1.chunk": { "$regex": ".perfect." }})
If you want both "paths" you need an $or:
db.collection.find({
"$or": [
{ "sentence 0.chunk": { "$regex": ".perfect." }},
{ "sentence 1.chunk": { "$regex": ".perfect." }}
]
})
In order to do that "without" specific paths you do the query in JavaScript logic using $where:
db.collection.find(function() {
return Object.keys(this).filter(k => /^sentence/).some(k => {
return this[k].chunk.some(ch => /.*perfect.*/)
})
})
Either case is pretty horrible since you are searching with a $regex that is not "anchored" with the caret ^ for the beginning of the string. As such a "full collection scan" is performed in order to match as opposed to using any available index. The same constraint applies to $where.
The structure therefore is not great. Instead you should be using "real arrays" which can represent a "consistent path" to the data to search:
{
"sentences": [
{
"chunk": [ "some text",
"text",
"abc"
]
},
"chunk": [ "some text",
"this is a perfect thing",
"abc"
]
}
]
}
Then we can actually at least create an index and query at a specific path:
db.collection.find({
"sentences.chunk": { "$regex": "^some" }
})
Or for "real words" then actually use a text index on "sentences.chunk" and search that way in an efficient manner using $text
db.collection.find({
"$text": {
"$search": "something"
}
})
But of course that does not match things like "the" or "and" because of how text search works.
It all depends on your "real" use case. But you should at the very least avoid structuring documents using "named keys" which have "specific paths" since they are inherently bad for query purposes.
N.B Spaces in key names is also bad practice. It might seem "human readable", but you are asking a "machine" to read it more than you are asking a "human" do understand it. Label names are a separate thing to how you structure data.

mongodb: document with the maximum number of matched targets

I need help to solve the following issue. My collection has a "targets" field.
Each user can have 0 or more targets.
When I run my query I'd like to retrieve the document with the maximum number of matched targets.
Ex:
documents=[{
targets:{
"cluster":"01",
}
},{
targets:{
"cluster":"01",
"env":"DC",
"core":"PO"
}
},{
targets:{
"cluster":"01",
"env":"DC",
"core":"PO",
"platform":"IG"
}
}];
userTarget={
"cluster":"01",
"env":"DC",
"core":"PO"
}
You seem to be asking to return the document where the most conditions were met, and possibly not all conditions. The basic process is an $or query to return the documents that can match either of the conditions. Then you basically need a statement to calculate "how many terms" were met in the document, and return the one that matched the most.
So the combination here is an .aggregate() statement using the intitial results from $or to calculate and then sort the results:
// initial targets object
var userTarget = {
"cluster":"01",
"env":"DC",
"core":"PO"
};
// Convert to $or condition
// and the calcuation condition to match
var orCondition = [],
scoreCondition = []
Object.keys(userTarget).forEach(function(key) {
var query = {},
cond = { "$cond": [{ "$eq": ["$target." + key, userTarget[key]] },1,0] };
query["target." + key] = userTarget[key];
orCondition.push(query);
scoreCondition.push(cond);
});
// Run aggregation
Model.aggregate(
[
// Match with condition
{ "$match": { "$or": orCondition } },
// Calculate a "score" based on matched fields
{ "$project": {
"target": 1,
"score": {
"$add": scoreCondition
}
}},
// Sort on the greatest "score" (descending)
{ "$sort": { "score": -1 } },
// Return the first document
{ "$limit": 1 }
],
function(err,result) {
// check errors
// Remember that result is an array, even if limitted to one document
console.log(result[0]);
}
)
So before processing the aggregate statement, we are going to generate the dynamic parts of the pipeline operations based on the input in the userTarget object. This would produce an orCondition like this:
{ "$match": {
"$or": [
{ "target.cluster" : "01" },
{ "target.env" : "DC" },
{ "target.core" : "PO" }
]
}}
And the scoreCondition would expand to a coding like this:
"score": {
"$add": [
{ "$cond": [{ "$eq": [ "$target.cluster", "01" ] },1,0] },
{ "$cond": [{ "$eq": [ "$target.env", "DC" ] },1,0] },
{ "$cond": [{ "$eq": [ "$target.core", "PO" ] },1,0] },
]
}
Those are going to be used in the selection of possible documents and then for counting the terms that could match. In particular the "score" is made by evaluating each condition within the $cond ternary operator, and then either attributing a score of 1 where there was a match, or 0 where there was not a match on that field.
If desired, it would be simple to alter the logic to assign a higher "weight" to each field with a different value going towards the score depending on the deemed importance of the match. At any rate, you simply $add these score results together for each field for the overall "score".
Then it is just a simple matter of applying the $sort to the returned "score", and then using $limit to just return the top document.
It's not super efficient, since even though there is a match for all three conditions the basic question you are asking of the data cannot presume that there is, hence it needs to look at all data where "at least one" condition was a match, and then just work out the "best match" from those possible results.
Ideally, I would personally run an additional query "first" to see if all three conditions were met, and if not then look for the other cases. That still is two separate queries, and would be different from simply just pushing the "and" conditions for all fields as the first statement in $or.
So the preferred implementation I think should be:
Look for a document that matches all given field values; if not then
Run the either/or on every field and count the condition matches.
That way, if all fields match then the first query is fastest and only needs to fall back to the slower but required implementaion shown in the listing if there was no actual result.

Mongoexport - modify large array fields to their counts

I have a large collection that I'd like to export to CSV, but I'd like to do some trimming to some of the fields. (e.g. I just need to know the number of elements in some, and just to know if others exist or not in the doc)
I would like to do the equivalent to a map function on the fields, so that fields that contain a list will be exported to the list size, and some fields that sometimes exist and sometimes do not, I would like to have them exported as boolean flags.
e.g. if my rows looks like this
{_id:"id1", listField:[1,2,3], optionalField: "...", ... }
{_id:"id2", listField:[1,2,3,4], ... }
I'd like to run a mongoexport to CSV that will result in this
_id, listField.length, optinalField.exists
"id1", 3, , true
"id2", 4, , false
Is that possible using mongoexport? (assume MongoDB version 3.0)
If not, is there another way to do that?
The mongoexport utility itself is pretty spartan and just a basic tool bundled in the suite. You can add "query" filters, but pretty much just like .find() queries in general, the intention is to return documents "as is" rather than "manipulate" the content.
Just as with other query operations, the .aggregate() method is something useful for document manipulation. So in order to "manipulate" the output to something different from the original document source, you would do:
db.collection.aggregate([
{ "$project": {
"listField": { "$size": "$listField" },
"optionalField": {
"$cond": [
{ "$ifNull": [ "$optionalField", false ] },
true,
false
]
}
}}
])
The $size operator returns the "size" of the array, and the $ifNull tests for the presence, either returning the field value or the alternate. Pass that result into $cond to get a true/false return rather than the field value. "_id" is always implicit, unless you specifically ask to omit it.
That would give you the "reduced" output, but in order to go to CSV then you would have to code that export yourself, as mongoexport does not run aggregation pipeline queries.
But the code to do so should be quite trivial ( pick a library for your language ), and the aggregation statement is also trivial as you can see here.
For the "really basic" approach, then just send a script to the mongo shell, as a very rudimentary form of programming:
db.collection.aggregate([
{ "$project": {
"listField": { "$size": "$listField" },
"optionalField": {
"$cond": [
{ "$ifNull": [ "$optionalField", false ] },
true,
false
]
}
}}
]).forEach(function(doc) {
print(Object.keys(doc).map(function(key) {
return doc[key]
}).join(","));
});
Which would output:
id1,3,true
id2,4,false