MongoDB Partial Unique Index on Object Field in Array Field - mongodb

Lets say I have a collection whose objects have a structure like this:
{
"a": [
{"name": "foo", "meh": "whatever"},
{"name": "bar", "meh": "hem"}
],
"other_stuff": { }
}
It's possible for an object in this collection to not have the "a" field. I'd like to enforce a constraint on the database such that, if an object has the "a" field, that none of the "name" fields in objects contained in that "a" field are duplicates across the entire collection.
So for example, the following object would be flagged as a constraint violation if the above object were already in the collection:
{
"a": [
{"name": "bar", "meh": "meh meh"}
],
"other_stuff": { }
}
Furthermore, the following object would be flagged as a constraint violation regardless of other documents already in the collection:
{
"a": [
{"name": "boo", "meh": "blub"},
{"name": "boo", "meh": "glub"}
],
"other_stuff": { }
}
Is it possible to specify a partial unique index for MongoDB? If it's different between Mongo 3.2, 3.4, 3.6 and 4.0 it would be nice to know that too - I don't care about earlier than 3.2.
I was thinking it might be something like this to prevent duplication across documents (because in Javscript [] > '' === false and undefined > '' === false and ['anything'] > '' === true for the passive reader):
db.MyCollection.createIndex(
{ "a.name": 1 },
{
"unique": true,
"partialFilterExpression": {
"a": { "$gt": '' }
}
}
)
... and I think the only way to prevent duplicate values within a document is to enforce in application logic the use of the $addToSet operator when operating on the "a" field.
I'd be happy to be corrected or corroborated on either count.

Related

Pymongo access specific field value within nested dict

In Pymongo application, while iterating through every document of the collection, how to access a specific field value of the JSON structure?
{
"_id": {
"$oid": "5e1c2b0bacbdaehujjjbdsh"
},
"a": {
"data_type": "abc",
"data_format": "xyz",
"data_version": "1",
},
"b": "123",
"c": "345"
}
Based on the following code snippet, how do I access the value associated with the key 'data_format' which is nested within the key 'a' ---
for document in col.find():
data_format_val = document['a']['data_format'] # not working
Relatively new to Mongodb query commands.
It's possible that some of the documents of the collection may not have the key 'a'.
Try using $exists to make sure the field is present like this: Syntax: { field: { $exists: } }

ElasticSearch Multi Index Query

simple question: I have multiple indexes in my elasticsearch engine mirrored by postgresql using logstash. ElasticSearch performs well for fuzzy searches, but now I need to use references within the indexes, that need to be handled by the queries.
Index A:
{
name: "alice",
_id: 5
}
...
Index B:
{
name: "bob",
_id: 3,
best_friend: 5
}
...
How do I query:
Get every match of index B with field name starting with "b" and index A referenced by "best_friend" with the name starting with "a"
Is this even possible with elasticsearch?
Yes, that's possible: POST A,B/_search will query multiple indexes.
In order to match a record from a specific index, you can use meta-data field _index
Below is a query that gets every match of index B with field name starting with "b" and index A with the name starting with "a" but not matches a reference as you usually do in relational SQL databases. foreign key reference matching (join) in Elastic and every NoSQL is YOUR responsibility AFAIK. refer to Elastic Definitive Guide to find out the best approach to your needs. Lastly, NoSQL is not SQL, change your mind.
POST A,B/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"prefix": {
"name": "a"
}
},
{
"term": {
"_index": "A"
}
}
]
}
},
{
"bool": {
"must": [
{
"prefix": {
"name": "b"
}
},
{
"term": {
"_index": "B"
}
}
]
}
}
]
}
}
}

mongodb $addToSet to a non-array field when update on upsert

My recent project encountered the same problem as this one: the question
db.test.update(
{name:"abc123", "config.a":1 },
{$addToSet:{ config:{a:1,b:2} } },
true
)
Will produce such error:
Cannot apply $addToSet to a non-array field
But after changed to:
db.test.update(
{name:"abc123", "config.a":{$in:[1]} },
{$addToSet:{ config:{a:1,b:2} } },
true
)
It works fine.
Also referenced this link: Answer
Can Any one explain what's going on? "config.a":1 will turn config to be an object? Where "config.a":{$in:[1]} won't?
What you are trying to do here is add a new item to an array only where the item does not exist and also create a new document where it does not exist. You choose $addToSet because you want the items to be unique, but in fact you really want them to be unique by "a" only.
So $addToset will not do that, and you rather need to "test" the element being present. But the real problem here is that it is not possible to both do that and "upsert" at the same time. The logic cannot work as a new document will be created whenever the array element was not found, rather than append to the array element like you want.
The current operation errors by design as $addToSet cannot be used to "create" an array, but only to "add" members to an existing array. But as stated already, you have other problems with achieving the logic.
What you need here is a sequence of update operations that each "try" to perform their expected action. This can only be done with multiple statements:
// attempt "upsert" where document does not exist
// do not alter the document if this is an update
db.test.update(
{ "name": "abc" },
{ "$setOnInsert": { "config": [{ "a": 1, "b": 2 }] }},
{ "upsert": true }
)
// $push the element where "a": 1 does not exist
db.test.update(
{ "name": "abc", "config.a": { "$ne": 1 } },
{ "$push": { "config": { "a": 1, "b": 2 } }}
)
// $set the element where "a": 1 does exist
db.test.update(
{ "name": "abc", "config.a": 1 },
{ "$set": { "config.$.b": 2 } }
)
On a first iteration the first statement will "upsert" the document and create the array with items. The second statement will not match the document because the "a" element has the value that was specified. The third statement will match the document but it will not alter it in a write operation because the values have not changed.
If you now change the input to "b": 3 you get different responses but the desired result:
db.test.update(
{ "name": "abc" },
{ "$setOnInsert": { "config": [{ "a": 1, "b": 3 }] }},
{ "upsert": true }
)
db.test.update(
{ "name": "abc", "config.a": { "$ne": 1 } },
{ "$push": { "config": { "a": 1, "b": 3 } }}
)
db.test.update(
{ "name": "abc", "config.a": 1 },
{ "$set": { "config.$.b": 3 } }
)
So now the first statement matches a document with "name": "abc" but does not do anything since the only valid operations are on "insert". The second statement does not match because "a" matches the condition. The third statment matches the value of "a" and changes "b" in the matched element to the desired value.
Subsequently changing "a" to another value that does not exist in the array allows both 1 and 3 to do nothing but the second statement adds another member to the array keeping the content unique by their "a" keys.
Also submitting a statement with no changes from existing data will of course result in a response that says nothing is changed on all accounts.
That's how you do your operations. You can do this with "ordered" Bulk operations so that there is only a single request and response from the server with the valid response to modified or created.
As explained in this issue on the MongoDB JIRA (https://jira.mongodb.org/browse/SERVER-3946), this can be solved in a single query:
The following update fails because we use $addToSet on a field which we have also included in the first argument (the field which accepts the fields and values to query for). As far as I understand it, you can't use upsert: true in this scenario where we $addToSet to the same field we query with.
db.foo.update({x : "a"}, {$addToSet: {x: "b"}} , {upsert: true}); // FAILS
The solution is to use $elemMatch: {$eq: field: value}
db.foo.update({x: {$elemMatch: {$eq: "a"}}}, {$addToSet: {x: "b"}}, {upsert: true});

Update an Element if Position is Unknown with Upsert

It looks like you(/I ) cannot have both upsert and an array element update operation.
If you do (python):
findDct = {
"_id": ObjectId("535e3ab9c36b4417d031402f"),
'events.ids': '176976332'
}
print col.update(findDct, {"$set" : {"events.$.foo": "bar"} }, upsert=True)
It will throw:
pymongo.errors.DuplicateKeyError: insertDocument :: caused by :: 11000 E11000
duplicate key error index: test.col.$_id_ dup key: { : ObjectId('535e3ab9c36b4417d031402f') }
This happens because "_id" is of course an index and mongo tries to insert the document as a new since the find query fails on its 'events.ids': '176976332' part (cheat).
Is it possible to update an unknown element in array with upsert True/how?
Yes it is, but you are going about it in the wrong way. Rather than make "finding" the element that you are not sure whether it exists or not, then try to apply the $addToSet operator instead:
db.collection.update(
{ "_id": ObjectId("535e3ab9c36b4417d031402f" },
{
"$addToSet": { "events": { "foo": "bar" } }
},
{ "upsert": true }
)
Please also note from the positional $ operator documentation that you should not use the $ operator with "upserts" as this will result in the field name being interpreted as a "literal" ( which includes the value as in "events.$.foo" ) and that will be the actual field inserted into the document.
Try to make sure that your array "insert/upsert" operations specify the whole array content in order to make this work.
Another adaptation is with the "bulk" methods, the pymongo driver already has a nice API for this, but this is a general form:
db.runCommand({
"update": "collection",
"updates": [
{
"q": { "_id": ObjectId("535e3ab9c36b4417d031402f" } },
"u": {
"$addToSet": {
"events": {
"foo": "bar", "bar": "baz"
}
}
},
"upsert": true
},
{
"q": { "_id": ObjectId("535e3ab9c36b4417d031402f" } },
"u": {
"$set": { "events.foo": "bar" }
}
}
]
})
But still being very careful that you are not producing duplicates in your sub-document array if you can clearly see the case there. But it is a method, as each update will cascade down even if the first form failed to add anything. Not the best case example, but I hope you see the point.

mongodb: return an array of document ids

Is it possible to query mongodb to return array of matching document id values, without the related keys?
Please consider following 'parent' data structur:
{
"_id": ObjectId("52448e4697fb2b775cb5c3a7"),
"name": "Peter",
"children": [
{
"name": "joe"
}
]
},
{
"_id": ObjectId("52448e4697fb2b775cb5c3b6"),
"name": "Marry",
"children": [
{
"name": "joe"
}
]
}
I would to query for an array of parent _ids whose children have the name "joe"
For provided sample data, I would like the following output returned from mongo:
[ObjectId("52448e4697fb2b775cb5c3a7"), ObjectId("52448e4697fb2b775cb5c3b6")]
I know that I can query for an output like this, which also contains the keys
[{"_id": ObjectId("52448e4697fb2b775cb5c3a7")}, {"_id": ObjectId("52448e4697fb2b775cb5c3b6")}]
However I need to push above array to another document with an update operation like this:
db.statistic.update({"date": today}, {$push: {"children": [ObjectId("52448e4697fb2b775cb5c3a7"), ObjectId("52448e4697fb2b775cb5c3b6")]}}, true, false)
I would like to avoid sorting out the document structure, in case it is possible to just return an array containing the appropriate values using mongo
It should be possible by
db.coll.distinct("_id", {"children.name": "joe"})