Querying Mongo DB nested Documents - mongodb

I am using complex nested mongo DB document. I have tried to use aggregate with replaceroot but using replaceroot I am able to go for only one sub-level. In my ocument I am having 3 sub levels. I tried to get all document and loop information in python but I ended up using 5nested for loops.
[
{
"_id": "A-38",
"resourcePaths": [
{
"resourcePathName": "some1",
"method": "POST",
"scenarioDetails": [
{
"scenarioId": "4-ABCD",
"scenarioName": "scenaio1",
"propertiesDatatype": {},
"requestData": [
{
"sql": "Select * from tbl",
"insUpdFlag": "Y",
},
{
"digit": "",
"insUpdFlag": "N",
}
],
"updatedBy": "db_insert",
},
{
"scenarioId": "4-QZWBJ3",
"scenarioName": "Scen2",
"requestData": [
{
"sql": "",
"insUpdFlag": "N",
}
],
"updated": "01/31/2023 09:16:41"
}
]
}
],
}
]
I am trying get the scenario details which is having "insUpdFlag" as Y like this
{
"scenarioId": "4-ABCD",
"scenarioName": "scenaio1",
"propertiesDatatype": {},
"requestData": [
{
"sql": "Select * from tbl",
"insUpdFlag": "Y",
},
{
"digit": "",
"insUpdFlag": "N",
}
],
"updatedBy": "db_insert",
}
I tried but giving me entire document. I need some help here. TYIA
db.collection.find({
"resourcePaths.scenarioDetails.requestData.insUpdFlag": "Y"
},

Related

POSTGRES jsonb document GIN Indexes are created on what all objects?

I am using Postgres DB 13.5. From pgdocs -
The technical difference between a jsonb_ops and a jsonb_path_ops GIN
index is that the former creates independent index items for each key
and value in the data, while the latter creates index items only for
each value in the data. Basically, each jsonb_path_ops index item
is a hash of the value and the key(s) leading to it; for example to
index {"foo": {"bar": "baz"}}
Understanding the above in detail is important for me coz my jdata (document) is big with many keys and nested objects. Consider my json data that is stored as jsonb in a column named jdata looks like below -
{
"supplier": {
"id": "3c67b6eb-3b0d-492d-8736-66df107b83b3",
"customer": {
"type": "pro",
"name": "John George",
"address": [
{
"add-id": "098ad4df-2a90-4fda-8f92-dbe8d7196732",
"addressActive": true,
"street": "abc street",
"zip": 94044,
"staying-since": "long long",
"accessibility": {
"traffic": "heavy/congested",
"bestwaytoreach": {
"weekdays": {
"bart/metro/calltrain": true,
"price": {
"off-peak-hours": "affordable",
"peak-hours": "high"
},
"journey-time": "super-fast"
}
},
"weekends": {
"byroad": {
"ok": true,
"distance": "long",
"has-tolls": {
"true": true,
"toll-price": "relatively-high"
},
"journey-speed": "fast"
}
}
}
},
{
"add-id": "ddd1d2a0-9050-4bcf-a3ad-2e608d65e468",
"addressActive": true,
"street": "xyz street",
"zip": 10001,
"staying-since": "moved recently",
"accessibility": {
"traffic": "heavy/congested",
"bestwaytoreach": {
"weekdays": {
"subway": true,
"price": {
"off-peak-hours": "affordable",
"peak-hours": "high"
},
"journey-speed": "super-fast"
}
},
"weekends": {
"byroad": {
"ok": true,
"distance": "moderate",
"tolls": {
"has-tolls": true,
"toll-price": "relatively-high"
},
"journey-time": "super-fast"
}
}
}
}
],
"firstName": "John",
"lastName": "CRAWFORD",
"emailAddresses": {
"personal": [
"johnreplies#jg.com",
"ursjohn#jg.com",
"1234#jg.com"
],
"official": [
{
"repies-in": "1 day",
"email": "jg#jg.com"
},
{
"check's regularly": true,
"repies-in": "1 Hour",
"email": "jg-watching#jg.com"
}
]
},
"cities": [
"NYC",
"LA",
"SF",
"DC"
],
"splCustFlag": null,
"isPerson": true,
"isEntity": false,
"allowEmailSolicit": "Y",
"allowPhoneSolicit": "Y",
"taxPayer": true,
"suffix": null,
"title": null,
"birthDate": "05/10/1993",
"loyaltyPrograms": null,
"phoneNumbers-summary": [
1234567890,
1234567899,
1234567898,
1234567897
],
"phoneNumbers": [
{
"description": null,
"extension": null,
"number": 1234567890,
"countryCode": null,
"type": "Business"
},
{
"description": null,
"extension": null,
"number": 1234567899,
"countryCode": null,
"type": "Home"
}
],
"data-privacy": {
"required": true,
"laws": [
"CCPA",
"GDPR"
]
}
}
}
}
Now if I create GIN jsonb_ops index for jdata column - I want to clarify what all keys and values will be part of index.
For example - "staying-since" is a key nested at below path and it's part of "address" array too. But it's still a key, thought nested deep in the document. So will it be part of the index.
{
"supplier": {
"customer": {
"address": [
{
"staying-since": "long long" ...
And similarly "long long" is a value of a deeply nested key. Will it also be indexed.
And if GIN jsonb_path_ops index is created for jdata column --
Will a hash of "long long" value along with the path that leads to it will also be indexed.
hash(
"supplier": {
"customer": {
"address":[{"staying-since": "long long"}]
}
}
)
will the above also gets index.
I am aware about the operators that are supported by the GIN index types and am aware about the usage of these operators -
jsonb_ops ? ?& ?| #> #? ##
jsonb_path_ops #> #? ##

MongoDB: $set specific fields for a document array elements only if not null

I have a collection with the following documents (for example):
{
"_id": {
"$oid": "61acefe999e03b9324czzzzz"
},
"matchId": {
"$oid": "61a392cc54e3752cc71zzzzz"
},
"logs": [
{
"actionType": "CREATE",
"data": {
"talent": {
"talentId": "qq",
"talentVersion": "2.10",
"firstName": "Joelle",
"lastName": "Doe",
"socialLinks": [
{
"type": "FACEBOOK",
"url": "https://www.facebook.com"
},
{
"type": "LINKEDIN",
"url": "https://www.linkedin.com"
}
],
"webResults": [
{
"type": "VIDEO",
"date": "2021-11-28T14:31:40.728Z",
"link": "http://placeimg.com/640/480",
"title": "Et necessitatibus",
"platform": "Repellendus"
}
]
},
"createdBy": "DEVELOPER"
}
},
{
"actionType": "UPDATE",
"data": {
"talent": {
"firstName": "Joelle new",
"webResults": [
{
"type": "VIDEO",
"date": "2021-11-28T14:31:40.728Z",
"link": "http://placeimg.com/640/480",
"title": "Et necessitatibus",
"platform": "Repellendus"
}
]
}
}
}
]
},
{
"_id": {
"$oid": "61acefe999e03b9324caaaaa"
},
"matchId": {
"$oid": "61a392cc54e3752cc71zzzzz"
},
"logs": [....]
}
a brief breakdown: I have many objects like this one in the collection. they are a kind of an audit log for actions takes on other documents, 'Match(es)'. for example CREATE + the data, UPDATE + the data, etc.
As you can see, logs field of the document is an array of objects, each describing one of these actions.
data for each action may or may not contain specific fields, that in turn can also be an array of objects: socialLinks and webResults.
I'm trying to remove sensitive data from all of these documents with specified Match ids.
For each document, I want to go over the logs array field, and change the value of specific fields only if they exist, for example: change firstName to *****, same for lastName, if those appear. also, go over the socialLinks array if exists, and for each element inside it, if a field url exists, change it to ***** as well.
What I've tried so far are many minor variations for this query:
$set: {
'logs.$[].data.talent.socialLinks.$[].url': '*****',
'logs.$[].data.talent.webResults.$[].link': '*****',
'logs.$[].data.talent.webResults.$[].title': '*****',
'logs.$[].data.talent.firstName': '*****',
'logs.$[].data.talent.lastName': '*****',
},
and some play around with this kind of aggregation query:
[{
$set: {
'talent.socialLinks.$[el].url': {
$cond: [{ $ne: ['el.url', null] },'*****', undefined],
},
},
}]
resulting in errors like: message: "The path 'logs.0.data.talent.socialLinks' must exist in the document in order to apply array updates.",
But I just cant get it to work... :(
Would love an explanation on how to exactly achieve this kind of set-only-if-exists behaviour.
A working example would also be much appreciated, thx.
Would suggest using $\[<indentifier>\] (filtered positional operator) and arrayFilters to update the nested document(s) in the array field.
In arrayFilters, with $exists to check the existence of the certain document which matches the condition and to be updated.
db.collection.update({},
{
$set: {
"logs.$[a].data.talent.socialLinks.$[].url": "*****",
"logs.$[b].data.talent.webResults.$[].link": "*****",
"logs.$[b].data.talent.webResults.$[].title": "*****",
"logs.$[c].data.talent.firstName": "*****",
"logs.$[d].data.talent.lastName": "*****",
}
},
{
arrayFilters: [
{
"a.data.talent.socialLinks": {
$exists: true
}
},
{
"b.data.talent.webResults": {
$exists: true
}
},
{
"c.data.talent.firstName": {
$exists: true
}
},
{
"d.data.talent.lastName": {
$exists: true
}
}
]
})
Sample Mongo Playground

Cannot use Nested VariableOperators.mapItemsOf in Spring Data MongoDb

I'm forced to use the aggregation framework and the project operation of Spring Data MongoDb.
What I'd like to do is creating an array of object as a result of a project operation.
Considering this intermediate aggregation result:
{
"processes": [
{
"id": "101a",
"assignees": [
{
"id": "201a",
"username": "carl93"
},
{
"id": "202a",
"username": "susan"
}
]
},
{
"id": "101b",
"assignees": [
{
"id": "201a",
"username": "carl93"
},
{
"id": "202a",
"username": "susan"
}
]
}
]
}
I'm trying to get for each process, all the assignee usernames and ids. Hence, what I want to obtain is something like this:
[
{
"results": [
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101a"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101a"
},
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101b"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101b"
}
]
}
]
To reach this goal I'm using 2 nested VariableOperators.mapItemsOf obtaining:
org.springframework.data.mapping.MappingException: Cannot convert [Document{{id= 201a, value= carl93, parentObjectId= 101a}}, Document{{id= 202a, value = susan, parentObjectId= 101a}}]
of type class java.util.ArrayList into an instance of class java.lang.Object!
Implement a custom Converter<class java.util.ArrayList, class java.lang.Object> and register it with the CustomConversions.
Here's the code that I'm currently using:
new ProjectionOperation().and(
VariableOperators.mapItemsOf("processes")
.as("pr")
.andApply(
VariableOperators.mapItemsOf("$pr.ownership.assignees")
.as("ass")
.andApply(aggregationOperationContext -> {
Document document = new Document();
document.append("id", "$$ass.id");
document.append("value", "$$ass.username");
document.append("parentObjectId", "$$pr.id");
return document;
})
)
).as("results");
The code produces this:
[
[
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101a"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101a"
}
],
[
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101b"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101b"
}
]
]
As you can see there are 2 nested arrays, [[],[]]. This is the reason why the exception is thrown.
Nevertheless what I want to obtain is just one array, adding all the objects in it (possibly without duplicates or null values). I've tried the addToSet operator and other aggregtion operators, without any success.
Use $reduce with $concatArrays to join the arrays.
new ProjectionOperation().and(
ArrayOperators.arrayOf("processes")
.reduce(ArrayOperators.ConcatArrays.arrayOf("$$value").concat(
VariableOperators.mapItemsOf("$$this.ownership.assignees")
.as("ass")
.andApply(aggregationOperationContext -> {
Document document = new Document();
document.append("id", "$$ass.id");
document.append("value", "$$ass.username");
document.append("parentObjectId", "$$this.id");
return document;
})
)).startingWith(Arrays.asList())
).as("results");

MongoDB: addToSet not working when adding subdocuments to an array

I am new to MongoDB and I am stuck trying to get unique subdocuments in an array.
A document in my collection looks like this:
{
"PubDate": "1/01/01 00:00",
"Title": "Identification of DNA-Dependent Protein Kinase Catalytic Subunit (DNA-PKcs) as a Novel Target of Bisphenol A",
"Datums": [
{
"evidence_id": "3515620_6",
"evidence": [
"\n\nTo examine the interaction between DNA-PKcs and Ku70/Ku80 more directly, we performed immunoprecipitation (IP) using FLAG-Ku70 or FLAG-Ku80 recombinants, which were expressed in 293T cells after IR-irradiation (Fig. 4B\n ) or UV-irradiation (Fig. 4C\n ). After IR-irradiation, co-precipitation of DNA-PKcs with Ku80 increased compared with that in the non-irradiated controls (Fig. 4B\n lanes 7 and 8)."
],
"map": {
"change": [
{
"Text": "increased"
}
],
"subject": [
{
"Entity": {
"strings": [
"dna-pkcs"
],
"uniprotSym": "P78527"
}
}
],
"treatment": [
{
"Entity": {
"strings": [
"dna-pkcs"
],
"uniprotSym": "P78527"
}
}
],
"assay": [
{
"Text": "copptby"
}
]
}
},
{
"evidence_id": "3515620_6",
"evidence": [
"\n\nTo examine the interaction between DNA-PKcs and Ku70/Ku80 more directly, we performed immunoprecipitation (IP) using FLAG-Ku70 or FLAG-Ku80 recombinants, which were expressed in 293T cells after IR-irradiation (Fig. 4B\n ) or UV-irradiation (Fig. 4C\n ). After IR-irradiation, co-precipitation of DNA-PKcs with Ku80 increased compared with that in the non-irradiated controls (Fig. 4B\n lanes 7 and 8)."
],
"map": {
"change": [
{
"Text": "increased"
}
],
"subject": [
{
"Entity": {
"strings": [
"dna-pkcs"
],
"uniprotSym": "P78527"
}
}
],
"treatment": [
{
"Entity": {
"strings": [
"dna-pkcs"
],
"uniprotSym": "P78527"
}
}
],
"assay": [
{
"Text": "copptby"
}
]
}
},
{
"evidence_id": "3515620_6",
"evidence": [
"\n\nTo examine the interaction between DNA-PKcs and Ku70/Ku80 more directly, we performed immunoprecipitation (IP) using FLAG-Ku70 or FLAG-Ku80 recombinants, which were expressed in 293T cells after IR-irradiation (Fig. 4B\n ) or UV-irradiation (Fig. 4C\n ). After IR-irradiation, co-precipitation of DNA-PKcs with Ku80 increased compared with that in the non-irradiated controls (Fig. 4B\n lanes 7 and 8)."
],
"map": {
"change": [
{
"Text": "increased"
}
],
"subject": [
{
"Entity": {
"strings": [
"dna-pkcs"
],
"uniprotSym": "P78527"
}
}
],
"treatment": [
{
"Entity": {
"strings": [
"dna-pkcs"
],
"uniprotSym": "P78527"
}
}
],
"assay": [
{
"Text": "copptby"
}
]
}
}
],
"Volume": "7",
"FullJournalName": "PLoS ONE",
"Authors": "Ito Y, Ito T, Karasawa S, Enomoto T, Nashimoto A, Hase Y, Sakamoto S, Mimori T, Matsumoto Y, Yamaguchi Y, Handa H",
"Issue": "12",
"Pages": "e50481",
"PMCID": "3515620"
}
In the above example, the "Datums" field has only one subdocument, but usually, the "Datums" field will have around 20-30 subdocuments. I want my MongoDB query to output documents (that satisfy certain criteria), where the "Datums" field will have unique subdocuments in its array. To do that I am using the following MongoDB query:
db.My_Datums.aggregate(
[
{ "$match": {
"Datums":
{
"$elemMatch":
{
"map.treatment.Entity.uniprotSym": { "$in": ["P33981", "P78527"] },
"map.assay.Text": "copptby"
}
}
}},
{ "$project": { "PMCID":1, "Title":1, "PubDate":1, "Volume":1, "Issue":1, "Pages":1, "FullJournalName":1, "Authors":1, "Datums.map.assay.Text":1, "Datums.map.change.Text":1, "Datums.map.subject.Entity.strings":1, "Datums.map.treatment.Entity.uniprotSym":1, "Datums.evidence_id":1, "_id":0 }},
{ "$unwind": "$Datums" },
{ "$match": { "Datums.map.treatment.Entity.uniprotSym": { "$in": ["P33981", "P78527"] }, "Datums.map.assay.Text": "copptby" }},
{ "$group": { "_id": "$PMCID", "Datums": { "$addToSet": "$Datums" }}}
]
#{ allowDiskUse: 1 }
)
But on running the above command, I am getting the below output:
{u'Datums': [{u'evidence_id': u'3515620_6',
u'map': {u'assay': [{u'Text': u'copptby'}],
u'change': [{u'Text': u'increased'}],
u'subject': [{u'Entity': {u'strings': u'dna-pkcs'}}],
u'treatment': [{u'Entity': {u'uniprotSym': u'P78527'}}]}},
{u'evidence_id': u'3515620_6',
u'map': {u'assay': [{u'Text': u'copptby'}],
u'change': [{u'Text': u'increased'}],
u'subject': [{u'Entity': {u'strings': u'dna-pkcs'}}],
u'treatment': [{u'Entity': {u'uniprotSym': u'P78527'}}]}},
{u'evidence_id': u'3515620_6',
u'map': {u'assay': [{u'Text': u'copptby'}],
u'change': [{u'Text': u'increased'}],
u'subject': [{u'Entity': {u'strings': u'dna-pkcs'}}],
u'treatment': [{u'Entity': {u'uniprotSym': u'P78527'}}]}}],
u'_id': u'3515620'}
What I am not understanding is why addToSet adding duplicate subdocuments to "Datums". Is there any way I can filter out the duplicates? What am I doing wrong in my query? I have searched a lot and read up a lot, but couldnt find any solution. Any MongoDB guru out there who could help this noob?? I will be eternally grateful to you!
Thanks in advance!

Querying Multi Level Nested fields on Elastic Search

I'm new to Elastic Search and to the non-SQL paradigm.
I've been following ES tutorial, but there is one thing I couldn't put to work.
In the following code (I'me using PyES to interact with ES) I create a single document, with a nested field (subjects), that contains another nested field (concepts).
from pyes import *
conn = ES('127.0.0.1:9200') # Use HTTP
# Delete and Create a new index.
conn.indices.delete_index("documents-index")
conn.create_index("documents-index")
# Create a single document.
document = {
"docid": 123456789,
"title": "This is the doc title.",
"description": "This is the doc description.",
"datepublished": 2005,
"author": ["Joe", "John", "Charles"],
"subjects": [{
"subjectname": 'subject1',
"subjectid": [210, 311, 1012, 784, 568],
"subjectkey": 2,
"concepts": [
{"name": "concept1", "score": 75},
{"name": "concept2", "score": 55}
]
},
{
"subjectname": 'subject2',
"subjectid": [111, 300, 141, 457, 748],
"subjectkey": 0,
"concepts": [
{"name": "concept3", "score": 88},
{"name": "concept4", "score": 55},
{"name": "concept5", "score": 66}
]
}],
}
# Define the nested elements.
mapping1 = {
'subjects': {
'type': 'nested'
}
}
mapping2 = {
'concepts': {
'type': 'nested'
}
}
conn.put_mapping("document", {'properties': mapping1}, ["documents-index"])
conn.put_mapping("subjects", {'properties': mapping2}, ["documents-index"])
# Insert document in 'documents-index' index.
conn.index(document, "documents-index", "document", 1)
# Refresh connection to make queries.
conn.refresh()
I'm able to query subjects nested field:
query1 = {
"nested": {
"path": "subjects",
"score_mode": "avg",
"query": {
"bool": {
"must": [
{
"text": {"subjects.subjectname": "subject1"}
},
{
"range": {"subjects.subjectkey": {"gt": 1}}
}
]
}
}
}
}
results = conn.search(query=query1)
for r in results:
print r # as expected, it returns the entire document.
but I can't figure out how to query based on concepts nested field.
ES documentation refers that
Multi level nesting is automatically supported, and detected,
resulting in an inner nested query to automatically match the relevant
nesting level (and not root) if it exists within another nested query.
So, I tryed to build a query with the following format:
query2 = {
"nested": {
"path": "concepts",
"score_mode": "avg",
"query": {
"bool": {
"must": [
{
"text": {"concepts.name": "concept1"}
},
{
"range": {"concepts.score": {"gt": 0}}
}
]
}
}
}
}
which returned 0 results.
I can't figure out what is missing and I haven't found any example with queries based on two levels of nesting.
Ok, after trying a tone of combinations, I finally got it using the following query:
query3 = {
"nested": {
"path": "subjects",
"score_mode": "avg",
"query": {
"bool": {
"must": [
{
"text": {"subjects.concepts.name": "concept1"}
}
]
}
}
}
}
So, the nested path attribute (subjects) is always the same, no matter the nested attribute level, and in the query definition I used the attribute's full path (subject.concepts.name).
Shot in the dark since I haven't tried this personally, but have you tried the fully qualified path to Concepts?
query2 = {
"nested": {
"path": "subjects.concepts",
"score_mode": "avg",
"query": {
"bool": {
"must": [
{
"text": {"subjects.concepts.name": "concept1"}
},
{
"range": {"subjects.concepts.score": {"gt": 0}}
}
]
}
}
}
}
I have some question for JCJS's answer. why your mapping shouldn't like this?
mapping = {
"subjects": {
"type": "nested",
"properties": {
"concepts": {
"type": "nested"
}
}
}
}
I try to define two type-mapping maybe doesn't work, but be a flatten data; I think we should nested in nested properties..
At last... if we use this mapping nested query should like this...
{
"query": {
"nested": {
"path": "subjects.concepts",
"query": {
"term": {
"name": {
"value": "concept1"
}
}
}
}
}
}
It's vital for using full path for path attribute...but not for term key can be full-path or relative-path.