Mongo find value with unknown parent key - mongodb

I am looking for a value in a Mongo table where its parent key might not have a descriptive or known name. Here is an example of what one of our documents looks like.
{
"assetsId": {
"0": "546cf2f8585ffa451bb68369"
},
"slotTypes": {
"0": { "usage": "json" },
"1": { "usage": "image" }
}
}
I am looking to see if this contains "usage": "json" in slotTypes, but I can't guarantee that the parent key for this usage will be "0".
I tried using the following query without any luck:
db.documents.find(
{
slotTypes:
{
$elemMatch:
{
"usage": "json"
}
}
}
)
Sorry in advance if this is a really basic question, but I'm not used to working in a nosql database.

I'm not sure you're going to be able to solve elegantly this with your current schema; slotTypes should be an array of sub-documents, which would allow your $elemMatch query to work. Right now, it's an object with numeric-ish keys.
That is, your document schema should be something like:
{
"assetsId": {
"0": "546cf2f8585ffa451bb68369"
},
"slotTypes": [
{ "usage": "json" },
{ "usage": "image" }
]
}
If changing the data layout isn't an option, then you're going to need to basically scan through every document to find matches with $where. This is slow, unindexable, and awkward.
db.objects.find({$where: function() {
for(var key in this.slotTypes) {
if (this.slotTypes[key].usage == "json") return true;
}
return false;
}})
You should read the documentation on $where to make sure you understand the caveats of it, and for the love of all that is holy, sanitize your inputs to the function; this is live code that is executing in the context of your database.

Related

MongoDB: Update array elements on multiple conditions

Colleagues, good afternoon!
I'm struggling with some issue. I am using MongoDB 4.4.4. My assignment looks like this:
There is a set of elements methods[].subcategories[].actions[]. Please note that all objects are arrays and they may be absent. Elements of the actions[] array consists of the _id and title fields.
It is necessary to find the actual value of recordId by the field actions.title of the element and write it to the element of the actions array.
List of current values:
0af4cd2e-78cb-109b-8178-d5a7ba0e0012, Inspection
0af4cd2e-78cb-109b-8178-d5a7ba130014, Screening
0af4cd2e-78cb-109b-8178-d5a7ba170016, Poll
0af4cd2e-78cb-109b-8178-d5a7ba1b0018, Getting written explanations
0af4cd2e-78cb-109b-8178-d5a7ba1e001a, Request for documents
0af4cd2e-78cb-109b-8178-d5a7ba21001c, Sampling (samples)
0af4cd2e-78cb-109b-8178-d5a7ba23001e, Instrumental examination
0af4cd2e-78cb-109b-8178-d5a7ba260020, Test
0af4cd2e-78cb-109b-8178-d5a7ba2b0022, Expertise
0af4cd2e-78cb-109b-8178-d5a7ba2d0024, Experiment
3b7205c1-8282-4b63-8121-b82aacd7ca67, Request for documents that, in accordance with the mandatory requirements, must be located at the location (carrying out activities) of the controlled person (its branches, representative offices, separate structural divisions) or the object of control
I wrote the following code:
db.getSiblingDB("ervk_core").getCollection("supervision1").updateMany(
{},
{
"$set": {
"methods.subcategories.actions.$[elem1]._id": "0af4cd2e-78cb-109b-8178-d5a7ba0e0012",
"methods.subcategories.actions.$[elem2]._id": "0af4cd2e-78cb-109b-8178-d5a7ba0e0014",
"methods.subcategories.actions.$[elem3]._id": "0af4cd2e-78cb-109b-8178-d5a7ba0e0016",
"methods.subcategories.actions.$[elem4]._id": "0af4cd2e-78cb-109b-8178-d5a7ba0e0018",
"methods.subcategories.actions.$[elem5]._id": "0af4cd2e-78cb-109b-8178-d5a7ba0e001a",
"methods.subcategories.actions.$[elem6]._id": "0af4cd2e-78cb-109b-8178-d5a7ba0e001c",
"methods.subcategories.actions.$[elem7]._id": "0af4cd2e-78cb-109b-8178-d5a7ba0e001e",
"methods.subcategories.actions.$[elem8]._id": "0af4cd2e-78cb-109b-8178-d5a7ba260020",
"methods.subcategories.actions.$[elem9]._id": "0af4cd2e-78cb-109b-8178-d5a7ba260022",
"methods.subcategories.actions.$[elem10]._id": "0af4cd2e-78cb-109b-8178-d5a7ba260024",
"methods.subcategories.actions.$[elem11]._id": "3b7205c1-8282-4b63-8121-b82aacd7ca67",
}
},
{
"arrayFilters": [
{
"elem1.title": "Inspection",
},
{
"elem2.title": "Search",
},
{
"elem3.title": "Poll",
},
{
"elem4.title": "Receipt of Written Explanations",
},
{
"elem5.title": "Retrieval of Documents",
},
{
"elem6.title": "Sampling (samples)",
},
{
"elem7.title": "Instrumental examination",
},
{
"elem8.title": "Trial",
},
{
"elem9.title": "Expertise",
},
{
"elem10.title": "Experiment",
},
{
"elem11.title": "Request for documents that, in accordance with the mandatory requirements, must be located at the location (carrying out activities) of the controlled person (its branches, representative offices, separate structural divisions) or the object of control",
}
]
}
);
However, it gives the error "The path 'methods.subcategories.actions' must exist in the document in order to apply array updates.". I understand why it occurs - due to the absence of the actions[] array. But how can I account for the fact that methods[].subcategories[].actions[] arrays may be missing. And did I write the code correctly, otherwise I'm already a little confused. Thanks a lot in advance!
What you can do is add array filters checks to the method and subcategory object to see the nested array exists, this will solve your issue as Mongo will not continue checking the nested conditions in case they don't exist. Here's how you'd do it:
db.collection.updateMany(
{
"methods.subcategories.actions": {
$exists: true
}
},
{
"$set": {
"methods.$[methodElem].subcategories.$[subCatElem].actions.$[elem1]._id": "0af4cd2e-78cb-109b-8178-d5a7ba0e0012",
"methods.$[methodElem].subcategories.$[subCatElem].actions.$[elem2]._id": "0af4cd2e-78cb-109b-8178-d5a7ba0e0014",
"methods.$[methodElem].subcategories.$[subCatElem].actions.$[elem3]._id": "0af4cd2e-78cb-109b-8178-d5a7ba0e0016",
"methods.$[methodElem].subcategories.$[subCatElem].actions.$[elem4]._id": "0af4cd2e-78cb-109b-8178-d5a7ba0e0018",
"methods.$[methodElem].subcategories.$[subCatElem].actions.$[elem5]._id": "0af4cd2e-78cb-109b-8178-d5a7ba0e001a",
"methods.$[methodElem].subcategories.$[subCatElem].actions.$[elem6]._id": "0af4cd2e-78cb-109b-8178-d5a7ba0e001c",
"methods.$[methodElem].subcategories.$[subCatElem].actions.$[elem7]._id": "0af4cd2e-78cb-109b-8178-d5a7ba0e001e",
"methods.$[methodElem].subcategories.$[subCatElem].actions.$[elem8]._id": "0af4cd2e-78cb-109b-8178-d5a7ba260020",
"methods.$[methodElem].subcategories.$[subCatElem].actions.$[elem9]._id": "0af4cd2e-78cb-109b-8178-d5a7ba260022",
"methods.$[methodElem].subcategories.$[subCatElem].actions.$[elem10]._id": "0af4cd2e-78cb-109b-8178-d5a7ba260024",
"methods.$[methodElem].subcategories.$[subCatElem].actions.$[elem11]._id": "3b7205c1-8282-4b63-8121-b82aacd7ca67"
}
},
{
"arrayFilters": [
{
"methodElem.subcategories": {
$exists: true
}
},
{
"subCatElem.actions": {
$exists: true
}
},
{
"elem1.title": "Inspection"
},
{
"elem2.title": "Search"
},
{
"elem3.title": "Poll"
},
{
"elem4.title": "Receipt of Written Explanations"
},
{
"elem5.title": "Retrieval of Documents"
},
{
"elem6.title": "Sampling (samples)"
},
{
"elem7.title": "Instrumental examination"
},
{
"elem8.title": "Trial"
},
{
"elem9.title": "Expertise"
},
{
"elem10.title": "Experiment"
},
{
"elem11.title": "Request for documents that, in accordance with the mandatory requirements, must be located at the location (carrying out activities) of the controlled person (its branches, representative offices, separate structural divisions) or the object of control"
}
]
})
Mongo Playground
I also changed the updates query to just ignore documents with no nested actions, this is just to save time.

Change data type from string to date while skipping missing data

The core collection (other collections in the DB refer back to this one) in my DB contains 3 fields with date information which at this point is formatted as strings like MM/DD/YYYY. Further, there are a range of documents for which this field contains missing data, i.e. "". I populated this collection by running the mongoimport command on a JSON file.
My goal is to convert these date-fields into actual ISODate data types, so as to allow filtering the collection by dates. Further, I want MongoDB to know that empty strings indicate missing values. I have read quite widely on this, leading me to try a bunch of things:
Trying a forEach statement - This worked, but only for the very first document.
db.collection.find().forEach(function(element){
element.startDate = ISODate(element.startDate);
db.collection.save(element);
})
Using kind of a for-loop: this worked well, but stopped once it encountered a missing value (so it transformed about 11 values):
db.collection.update(
{
"startDate":{
"$type":"string"
}
},
[
{
"$set":{
"startDate":{
"$dateFromString":{
"dateString":"$startDate",
"format":"%m/%d/%Y"
}
}
}
}
]
)
So, both of these approaches kind of worked - but I don't know how to apply them to the entire collection. Further, I'd be interested in performing this task in the most efficient way possible. However, I only want to do this once - data that will be added in the future should hopefully be correctly formatted at the import stage.
db.collection.updateMany(
{
"$and": [
{ "startDate": { "$type": "string" } },
{ "startDate": { "$ne": "" } }
]
},
[
{
"$set": {
"startDate": {
"$dateFromString": {
"dateString": "$startDate",
"format": "%m/%d/%Y"
}
}
}
}
]
)
Filtering out empty string than doing the transformation will ignore documents that have empty string in date field.

Why doesn't this Cloudant/couchdb $regex query work?

I am trying to pull (and delete) all records from our database that don't have a URL with the word 'box' in it. This is the query I'm using:
{
"selector": {
"$not": {
"url": {
"$regex": ".*box.*"
}
}
},
"limit": 50
}
This query returns no records. But if I remove the $not, I get all records that do have the word 'box' in the url, but that's the opposite of what I want. Why do I get no results when adding the $not?
I have tried adding a simple base to the query like "_id":{"$gte":0} but that doesn't help.
from the Cloudant doc:
You can create more complex selector expressions by combining
operators. However, for Cloudant NoSQL DB Query indexes of type json,
you cannot use 'combination' or 'array logical' operators such as
$regex as the basis of a query.
$not is a combination operator and therefore cannot be the basis of a query
i am able to get the following to work:
index
{
"index": {
"fields": ["url"]
},
"name" : "url-json-index",
"type" : "json"
}
query
{
"selector": {
"url": {
"$not": {
"$regex": ".*box.*"
}
}
},
"limit": 50,
"use_index": "url-json-index"
}
if you are still seeing problems, can you provide the output from _/explain and the indexes you have in place.
The "no results" issue is due to a bug in text indexes that has been recently fixed. However, neither $not nor $regex operators are able to take advantage of global indexes so will always result in a full database or index scan.
The way to optimise this query is to use a partial index. A partial index filters documents at indexing time rather than at query time, creating an index over a subset of the database. You then need to tell the _find endpoint to explicitly use the partial index. For example, create an index which only includes documents not matching your regex:
POST /<db>/_index
{
"index": {
"partial_filter_selector": {
"url": {
"$not": {
"$regex": ".*box.*"
}
}
},
"fields": ["type"]
},
"ddoc" : "url-not-box",
"type" : "json"
}
then at query time:
{
"selector": {
"url": {
"$not": {
"$regex": ".*box.*"
}
}
},
"limit": 50,
"use_index": "url-not-box"
}
You can see how many documents are scanned to fulfil the query in the Cloudant UI - the execution statistics are displayed in a popup underneath the query text area.
You may also find this This article about partial indexes helpful.

Mongodb: Update a field with data from a sub-sub field?

I'm trying to update a field in a collection with data from the same collection, but from a sub-sub field in it, and either can't get the syntax right, or I'm just doing it wrong.
I've spent quite some time now digging around here, but can't seem to get anywhere.
Here's the example structure of the users collection:
{
"_id": "12345qwerty",
"services": {
"oauth": {
"CharacterID": 12345678,
"CharacterName": "Official Username",
},
},
"name": "what I want to change",
"username": "OfficialUsername"
}
What I'm trying to do would be pretty trivial with SQL, ie: update all the display names to match a trusted source...
update users
set name = services.oauth.CharacterName;
...but I'm having trouble getting in MongoDB, and I have a feeling im doing it wrong.
Here's what i have so far, but it doesn't work as expected.
db.users.find().snapshot().forEach(
function (elem) {
db.users.update(
{ _id: elem._id },
{ $set: { name: elem.services.oauth.CharacterName } }
);
}
);
I can set the name to be anything at the base level, but can't set it to be something from the sublevel, as it doesn't recognise the sub-fields.
Any help would be greatly appreciated!
db.users.update({"services.oauth.CharacterName": {$exists: true}},{$set: {"name": "services.oauth.CharacterName"}},{multi:true})
I am setting name at the root of your document to be equal to the value in services.oauth.CharacterName in the sub sub document. multi = true will update multiple document, I am only updating documents that have the services.oauth.CharacterName value.

Elasticsearch: Find substring match

I want to perform both exact word match and partial word/substring match. For example if I search for "men's shaver" then I should be able to find "men's shaver" in the result. But in case case I search for "en's shaver" then also I should be able to find "men's shaver" in the result.
I using following settings and mappings:
Index settings:
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
Mappings:
PUT /my_index/my_type/_mapping
{
"my_type": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
Insert records:
POST /my_index/my_type/_bulk
{ "index": { "_id": 1 }}
{ "name": "men's shaver" }
{ "index": { "_id": 2 }}
{ "name": "women's shaver" }
Query:
1. To search by exact phrase match --> "men's"
POST /my_index/my_type/_search
{
"query": {
"match": {
"name": "men's"
}
}
}
Above query returns "men's shaver" in the return result.
2. To search by Partial word match --> "en's"
POST /my_index/my_type/_search
{
"query": {
"match": {
"name": "en's"
}
}
}
Above query DOES NOT return anything.
I have also tried following query
POST /my_index/my_type/_search
{
"query": {
"wildcard": {
"name": {
"value": "%en's%"
}
}
}
}
Still not getting anything.
I figured it is because of "edge_ngram" type filter on Index which is not able to find "partial word/sbustring match".
I tried "n-gram" type filter as well but it is slowing down the search alot.
Please suggest me how to achieve both excact phrase match and partial phrase match using same index setting.
To search for partial field matches and exact matches, it will work better if you define the fields as "not analyzed" or as keywords (rather than text), then use a wildcard query.
See also this.
To use a wildcard query, append * on both ends of the string you are searching for:
POST /my_index/my_type/_search
{
"query": {
"wildcard": {
"name": {
"value": "*en's*"
}
}
}
}
To use with case insensitivity, use a custom analyzer with a lowercase filter and keyword tokenizer.
Custom Analyzer:
"custom_analyzer": {
"tokenizer": "keyword",
"filter": ["lowercase"]
}
Make the search string lowercase
If you get search string as AsD: change it to *asd*
The answer given by #BlackPOP will work, but it uses the wildcard approach, which is not preferred as it has a performance issue and if abused can create a huge domino effect (performance issue) in the Elastic cluster.
I have written a detailed blog on partial search/autocomplete covering the latest options available in Elasticsearch as of today (Dec 2020) with performance in mind. For more trade-off information please refer to this answer.
IMHO a better approach will be to use the customized n-gram tokenizer according to use-case, which will have already tokens needed for search term so it will be faster, although it will have a bigger index size, but you size is not that costly and speed will be better with more control on how exactly you want substring search to work.
Also size can be controlled if you are conservative in defining the min and max gram in tokenizer setting.
By searching with any string or substring Use:
query: {
or: [{
match_phrase_prefix: {
name: str
}
}, {
match_phrase_prefix: {
surname: str
}
}]
}
Happy coding with Elastic Search....