How to search mongodb collection map JSON - mongodb

I have the JSON below in mongodb and would like write a bson.M filter to get a specific JSON in collection.
JSONs in collection:
{
"Id": "3fa85f64",
"Type": "DDD",
"Status": "PRESENT",
"List": [{
"dd": "55",
"cc": "33"
}],
"SeList": {
"comm_1": {
"seId": "comm_1",
"serName": "nmf-comm"
},
"comm_2": {
"seId": "comm_2",
"serName": "aut-comm"
}
}
}
{
"Id": "3fa8556",
"Type": "CCC",
"Status": "PRESENT",
"List": [{
"dd": "22",
"cc": "34"
}],
"SeList": {
"dnn_1": {
"seId": "dnn_1",
"serName": "dnf-comm"
},
"dnn_2": {
"seId": "dnn_2",
"serName": "dn2-comm"
}
}
}
I have written below the bson.M filter to select the first JSON but did not work because I do not know how to handle the map keys in the "SeList.serName". The keys comm_1, comm_2, dnn_1, etc could be any string.
filter := bson.M{"Type": DDD, "Status": "PRESENT", "SeList.serName": nmf-comm} // does not work because the "SeList.serName" is not correct.
I need help about how to select any JSON based on the example filter above.

Related

Mongodb subdocument structure best practices and queries

I've seen 2 main types of schema for subdocuments:
{
"cbill#boogiemail:com": {
"outbound": [
{
"name": "First",
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "active"
},
"data": {
}
},
{
"name": "Second",
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "draft"
},
"data": {
}
}
],
"inbound" : [
{
"name": "First",
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "active"
},
"data": {
}
},
{
"name": "Second",
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "draft"
},
"data": {
}
}
]
}
}
The alternative structure is:
{
"cbill#boogiemail:com": {
"outbound": {
"First": {
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "active"
},
"data": {
}
},
"Second": {
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "draft"
},
"data": {
}
}
},
"inbound" : {
"First": {
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "active"
},
"data": {
}
},
"Second": {
"state": {
"saved": "cbill#boogiemail.com",
"edited": "connie#boogiemail.com",
"status": "draft"
},
"data": {
}
}
}
}
}
The main difference between the two is the structure of the inbound/outbound subdocuments.
What is the best practice for Mongo DB subdocument structures?
And in each case, what query would get me the subdocument pointed to by:
cbill#boogiemail:com.inbound.Second ?
To add a bit more information:
The collection will have many different documents starting with different email addresses, but each document in the collection will only have a few subdocuments under the inbound/outbound keys.
You want to structure your collections and documents in a way that reflects how you intend to use the data. If you're going to do a lot of complex queries, especially with subdocuments, you might find it easier to split your documents up into separate collections. An example of this would be splitting comments from blog posts.
Your comments could be stored as an array of subdocuments:
# Example post document with comment subdocuments
{
title: 'How to Mongo!'
content: 'So I want to talk about MongoDB.',
comments: [
{
author: 'Renold',
content: 'This post, it's amazing.'
},
...
]
}
This might cause problems, though, if you want to do complex queries on just comments (e.g. picking the most recent comments from all posts or getting all comments by one author.) If you plan on making these complex queries, you'd be better off creating two collections: one for comments and the other for posts.
# Example post document with "ForeignKeys" to comment documents
{
_id: ObjectId("50c21579c5f2c80000000000"),
title: 'How to Mongo!',
content: 'So I want to talk about MongoDB.',
comments: [
ObjectId("50c21579c5f2c80000000001"),
ObjectId("50c21579c5f2c80000000002"),
...
]
}
# Example comment document with a "ForeignKey" to a post document
{
_id: ObjectId("50c21579c5f2c80000000001"),
post_id: ObjectId("50c21579c5f2c80000000000"),
title: 'Renold',
content: 'This post, it's amazing.'
}
This is similar to how you'd store "ForeignKeys" in a relational database. Normalizing your documents like this makes for querying both comments and posts easy. Also, since you're breaking up your documents, each document will take up less memory. The trade-off, though, is you have to maintain the ObjectId references whenever there's a change to either document (e.g. when you insert/update/delete a comment or post.) And since there are no event hooks in Mongo, you have to do all this maintenance in your application.
On the other-hand, if you don't plan on doing any complex queries on a document's subdocuments, you might benefit from storing monolithic objects. For instance, a user's preferences isn't something you're likely to make queries for:
# Example user document with address subdocument
{
ObjectId("50c21579c5f2c800000000421"),
name: 'Howard',
password: 'naughtysecret',
address: {
state: 'FL',
city: 'Gainesville',
zip: 32608
}
}
Found the answer from here (https://www.tutorialspoint.com/how-to-select-a-specific-subdocument-in-mongodb) after some slight modifications to that.
The query for the second example (which was the one that I was most interested in) was:
find({ "cbill#boogiemail:com.inbound": {$exists: true}},{"cbill#boogiemail:com.inbound.Second":1}).pretty()
This results in:
{
"_id" : ObjectId("6216a9940b84b1a642cb925e"),
"cbill#boogiemail:com" : {
"inbound" : {
"Second" : {
"state" : {
"saved" : "cbill#boogiemail.com",
"edited" : "connie#boogiemail.com",
"status" : "draft"
},
"data" : {
}
}
}
}
}
Whether this is the most efficient query I'm not sure - feel free to post any better alternatives.

MongoDB: $set specific fields for a document array elements only if not null

I have a collection with the following documents (for example):
{
"_id": {
"$oid": "61acefe999e03b9324czzzzz"
},
"matchId": {
"$oid": "61a392cc54e3752cc71zzzzz"
},
"logs": [
{
"actionType": "CREATE",
"data": {
"talent": {
"talentId": "qq",
"talentVersion": "2.10",
"firstName": "Joelle",
"lastName": "Doe",
"socialLinks": [
{
"type": "FACEBOOK",
"url": "https://www.facebook.com"
},
{
"type": "LINKEDIN",
"url": "https://www.linkedin.com"
}
],
"webResults": [
{
"type": "VIDEO",
"date": "2021-11-28T14:31:40.728Z",
"link": "http://placeimg.com/640/480",
"title": "Et necessitatibus",
"platform": "Repellendus"
}
]
},
"createdBy": "DEVELOPER"
}
},
{
"actionType": "UPDATE",
"data": {
"talent": {
"firstName": "Joelle new",
"webResults": [
{
"type": "VIDEO",
"date": "2021-11-28T14:31:40.728Z",
"link": "http://placeimg.com/640/480",
"title": "Et necessitatibus",
"platform": "Repellendus"
}
]
}
}
}
]
},
{
"_id": {
"$oid": "61acefe999e03b9324caaaaa"
},
"matchId": {
"$oid": "61a392cc54e3752cc71zzzzz"
},
"logs": [....]
}
a brief breakdown: I have many objects like this one in the collection. they are a kind of an audit log for actions takes on other documents, 'Match(es)'. for example CREATE + the data, UPDATE + the data, etc.
As you can see, logs field of the document is an array of objects, each describing one of these actions.
data for each action may or may not contain specific fields, that in turn can also be an array of objects: socialLinks and webResults.
I'm trying to remove sensitive data from all of these documents with specified Match ids.
For each document, I want to go over the logs array field, and change the value of specific fields only if they exist, for example: change firstName to *****, same for lastName, if those appear. also, go over the socialLinks array if exists, and for each element inside it, if a field url exists, change it to ***** as well.
What I've tried so far are many minor variations for this query:
$set: {
'logs.$[].data.talent.socialLinks.$[].url': '*****',
'logs.$[].data.talent.webResults.$[].link': '*****',
'logs.$[].data.talent.webResults.$[].title': '*****',
'logs.$[].data.talent.firstName': '*****',
'logs.$[].data.talent.lastName': '*****',
},
and some play around with this kind of aggregation query:
[{
$set: {
'talent.socialLinks.$[el].url': {
$cond: [{ $ne: ['el.url', null] },'*****', undefined],
},
},
}]
resulting in errors like: message: "The path 'logs.0.data.talent.socialLinks' must exist in the document in order to apply array updates.",
But I just cant get it to work... :(
Would love an explanation on how to exactly achieve this kind of set-only-if-exists behaviour.
A working example would also be much appreciated, thx.
Would suggest using $\[<indentifier>\] (filtered positional operator) and arrayFilters to update the nested document(s) in the array field.
In arrayFilters, with $exists to check the existence of the certain document which matches the condition and to be updated.
db.collection.update({},
{
$set: {
"logs.$[a].data.talent.socialLinks.$[].url": "*****",
"logs.$[b].data.talent.webResults.$[].link": "*****",
"logs.$[b].data.talent.webResults.$[].title": "*****",
"logs.$[c].data.talent.firstName": "*****",
"logs.$[d].data.talent.lastName": "*****",
}
},
{
arrayFilters: [
{
"a.data.talent.socialLinks": {
$exists: true
}
},
{
"b.data.talent.webResults": {
$exists: true
}
},
{
"c.data.talent.firstName": {
$exists: true
}
},
{
"d.data.talent.lastName": {
$exists: true
}
}
]
})
Sample Mongo Playground

What is the best way to query an array of subdocument in MongoDB?

let's say I have a collection like so:
{
"id": "2902-48239-42389-83294",
"data": {
"location": [
{
"country": "Italy",
"city": "Rome"
}
],
"time": [
{
"timestamp": "1626298659",
"data":"2020-12-24 09:42:30"
}
],
"details": [
{
"timestamp": "1626298659",
"data": {
"url": "https://example.com",
"name": "John Doe",
"email": "john#doe.com"
}
},
{
"timestamp": "1626298652",
"data": {
"url": "https://www.myexample.com",
"name": "John Doe",
"email": "doe#john.com"
}
},
{
"timestamp": "1626298652",
"data": {
"url": "http://example.com/sub/directory",
"name": "John Doe",
"email": "doe#johnson.com"
}
}
]
}
}
Now the main focus is on the array of subdocument("data.details"): I want to get output only of relevant matches e.g:
db.info.find({"data.details.data.url": "example.com"})
How can I get a match for all "data.details.data.url" contains "example.com" but won't match with "myexample.com". When I do it with $regex I get too many results, so if I query for "example.com" it also return "myexample.com"
Even when I do get partial results (with $match), It's very slow. I tried this aggregation stages:
{ $unwind: "$data.details" },
{
$match: {
"data.details.data.url": /.*example.com.*/,
},
},
{
$project: {
id: 1,
"data.details.data.url": 1,
"data.details.data.email": 1,
},
},
I really don't understand the pattern, with $match, sometimes Mongo do recognize prefixes like "https://" or "https://www." and sometime it does not.
More info:
My collection has dozens of GB, I created two indexes:
Compound like so:
"data.details.data.url": 1,
"data.details.data.email": 1
Text Index:
"data.details.data.url": "text",
"data.details.data.email": "text"
It did improve the query performance but not enough and I still have this issue with the $match vs $regex. Thanks for helpers!
Your mistake is in the regex. It matches all URLs because the substring example.com is in all URLs. For example: https://www.myexample.com matches the bolded part.
To avoid this you have to use another regex, for example that just start with that domain.
For example:
(http[s]?:\/\/|www\.)YOUR_SEARCH
will check that what you are searching for is behind an http:// or www. marks.
https://regex101.com/r/M4OLw1/1
I leave you the full query.
[
{
'$unwind': {
'path': '$data.details'
}
}, {
'$match': {
'data.details.data.url': /(http[s]?:\/\/|www\.)example\.com/)
}
}
]
Note: you must scape special characters from the regex. A dot matches any character and the slash will close your regex causing an error.

Cannot use Nested VariableOperators.mapItemsOf in Spring Data MongoDb

I'm forced to use the aggregation framework and the project operation of Spring Data MongoDb.
What I'd like to do is creating an array of object as a result of a project operation.
Considering this intermediate aggregation result:
{
"processes": [
{
"id": "101a",
"assignees": [
{
"id": "201a",
"username": "carl93"
},
{
"id": "202a",
"username": "susan"
}
]
},
{
"id": "101b",
"assignees": [
{
"id": "201a",
"username": "carl93"
},
{
"id": "202a",
"username": "susan"
}
]
}
]
}
I'm trying to get for each process, all the assignee usernames and ids. Hence, what I want to obtain is something like this:
[
{
"results": [
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101a"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101a"
},
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101b"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101b"
}
]
}
]
To reach this goal I'm using 2 nested VariableOperators.mapItemsOf obtaining:
org.springframework.data.mapping.MappingException: Cannot convert [Document{{id= 201a, value= carl93, parentObjectId= 101a}}, Document{{id= 202a, value = susan, parentObjectId= 101a}}]
of type class java.util.ArrayList into an instance of class java.lang.Object!
Implement a custom Converter<class java.util.ArrayList, class java.lang.Object> and register it with the CustomConversions.
Here's the code that I'm currently using:
new ProjectionOperation().and(
VariableOperators.mapItemsOf("processes")
.as("pr")
.andApply(
VariableOperators.mapItemsOf("$pr.ownership.assignees")
.as("ass")
.andApply(aggregationOperationContext -> {
Document document = new Document();
document.append("id", "$$ass.id");
document.append("value", "$$ass.username");
document.append("parentObjectId", "$$pr.id");
return document;
})
)
).as("results");
The code produces this:
[
[
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101a"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101a"
}
],
[
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101b"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101b"
}
]
]
As you can see there are 2 nested arrays, [[],[]]. This is the reason why the exception is thrown.
Nevertheless what I want to obtain is just one array, adding all the objects in it (possibly without duplicates or null values). I've tried the addToSet operator and other aggregtion operators, without any success.
Use $reduce with $concatArrays to join the arrays.
new ProjectionOperation().and(
ArrayOperators.arrayOf("processes")
.reduce(ArrayOperators.ConcatArrays.arrayOf("$$value").concat(
VariableOperators.mapItemsOf("$$this.ownership.assignees")
.as("ass")
.andApply(aggregationOperationContext -> {
Document document = new Document();
document.append("id", "$$ass.id");
document.append("value", "$$ass.username");
document.append("parentObjectId", "$$this.id");
return document;
})
)).startingWith(Arrays.asList())
).as("results");

Querying Multi Level Nested fields on Elastic Search

I'm new to Elastic Search and to the non-SQL paradigm.
I've been following ES tutorial, but there is one thing I couldn't put to work.
In the following code (I'me using PyES to interact with ES) I create a single document, with a nested field (subjects), that contains another nested field (concepts).
from pyes import *
conn = ES('127.0.0.1:9200') # Use HTTP
# Delete and Create a new index.
conn.indices.delete_index("documents-index")
conn.create_index("documents-index")
# Create a single document.
document = {
"docid": 123456789,
"title": "This is the doc title.",
"description": "This is the doc description.",
"datepublished": 2005,
"author": ["Joe", "John", "Charles"],
"subjects": [{
"subjectname": 'subject1',
"subjectid": [210, 311, 1012, 784, 568],
"subjectkey": 2,
"concepts": [
{"name": "concept1", "score": 75},
{"name": "concept2", "score": 55}
]
},
{
"subjectname": 'subject2',
"subjectid": [111, 300, 141, 457, 748],
"subjectkey": 0,
"concepts": [
{"name": "concept3", "score": 88},
{"name": "concept4", "score": 55},
{"name": "concept5", "score": 66}
]
}],
}
# Define the nested elements.
mapping1 = {
'subjects': {
'type': 'nested'
}
}
mapping2 = {
'concepts': {
'type': 'nested'
}
}
conn.put_mapping("document", {'properties': mapping1}, ["documents-index"])
conn.put_mapping("subjects", {'properties': mapping2}, ["documents-index"])
# Insert document in 'documents-index' index.
conn.index(document, "documents-index", "document", 1)
# Refresh connection to make queries.
conn.refresh()
I'm able to query subjects nested field:
query1 = {
"nested": {
"path": "subjects",
"score_mode": "avg",
"query": {
"bool": {
"must": [
{
"text": {"subjects.subjectname": "subject1"}
},
{
"range": {"subjects.subjectkey": {"gt": 1}}
}
]
}
}
}
}
results = conn.search(query=query1)
for r in results:
print r # as expected, it returns the entire document.
but I can't figure out how to query based on concepts nested field.
ES documentation refers that
Multi level nesting is automatically supported, and detected,
resulting in an inner nested query to automatically match the relevant
nesting level (and not root) if it exists within another nested query.
So, I tryed to build a query with the following format:
query2 = {
"nested": {
"path": "concepts",
"score_mode": "avg",
"query": {
"bool": {
"must": [
{
"text": {"concepts.name": "concept1"}
},
{
"range": {"concepts.score": {"gt": 0}}
}
]
}
}
}
}
which returned 0 results.
I can't figure out what is missing and I haven't found any example with queries based on two levels of nesting.
Ok, after trying a tone of combinations, I finally got it using the following query:
query3 = {
"nested": {
"path": "subjects",
"score_mode": "avg",
"query": {
"bool": {
"must": [
{
"text": {"subjects.concepts.name": "concept1"}
}
]
}
}
}
}
So, the nested path attribute (subjects) is always the same, no matter the nested attribute level, and in the query definition I used the attribute's full path (subject.concepts.name).
Shot in the dark since I haven't tried this personally, but have you tried the fully qualified path to Concepts?
query2 = {
"nested": {
"path": "subjects.concepts",
"score_mode": "avg",
"query": {
"bool": {
"must": [
{
"text": {"subjects.concepts.name": "concept1"}
},
{
"range": {"subjects.concepts.score": {"gt": 0}}
}
]
}
}
}
}
I have some question for JCJS's answer. why your mapping shouldn't like this?
mapping = {
"subjects": {
"type": "nested",
"properties": {
"concepts": {
"type": "nested"
}
}
}
}
I try to define two type-mapping maybe doesn't work, but be a flatten data; I think we should nested in nested properties..
At last... if we use this mapping nested query should like this...
{
"query": {
"nested": {
"path": "subjects.concepts",
"query": {
"term": {
"name": {
"value": "concept1"
}
}
}
}
}
}
It's vital for using full path for path attribute...but not for term key can be full-path or relative-path.