Find dictionary keys in mongodb with dot infront of their key values - mongodb

I have some data in a mongodb database collection that looks like this
{
"_id": {
"$oid": "63737b4b654d9b6a0c3a2006"
},
"tag": {
"tagName": 0.10534846782684326
}
}
and I want to check if a dictionary with a specific tagName exists. To do so, we can apply this
mycollection.find({f"tag.{'tagName'}": {"$exists": True}})
However, some tagNames have a dot . in front, e.g.,
{
"_id": {
"$oid": "63737b4b654d9b6a0c3a2006"
},
"tag": {
".tagName": 0.10534846782684326
}
}
So when I run the query
mycollection.find({f"tag.{'.tagName'}": {"$exists": True}})
returns that the dictionary whose key name is .tagName is not found. This is because of the double dot in f"tag.{'.tagName'}". Can we write the query in such a way in order to avoid this situation?
Mongodb version:
db version v4.4.13
Build Info: {
"version": "4.4.13",
"gitVersion": "df25c71b8674a78e17468f48bcda5285decb9246",
"openSSLVersion": "OpenSSL 1.1.1f 31 Mar 2020",
"modules": [],
"allocator": "tcmalloc",
"environment": {
"distmod": "ubuntu2004",
"distarch": "x86_64",
"target_arch": "x86_64"
}
}

The first syntax looks a little odd to me. I don't think it should have the curly brackets. You can see in this playground example that it doesn't find the first document. So you may be looking to remove the curly brackets from the query in both situations, and here is an example where doing so correctly returns the first document.
Now regarding the . character in the name, one approach would be to use the $getField operator. That operator helps retrieve field names that are otherwise ambiguous or contain special characters. An example (that would only retrieve the second document) might look like this:
db.collection.find({
$expr: {
$ifNull: [
{
$getField: {
field: ".tagName",
input: "$tag"
}
},
false
]
}
})
Playground example here
You may combine the two conditions with a $or to return both documents, playground example here.
I would recommend updating your data to remove the extra . character. Its presence is going to make working with the data more difficult and probably cause some performance issues since many of the operations won't be able to effectively use indexes.
Version 4.4 and earlier
As noted in the comments, the $getField operator is new in version 5.0. To accomplish something similar prior to that you could use the $objectToArray operator.
Effectively what you will do here is convert $tag to an array of k, v pairs where k contains the field name. You can then filter directly against that name (k) looking for the value(s) of interest.
The verbose, but arguably more readable, approach to doing so looks like this:
db.collection.aggregate([
{
"$addFields": {
"tagNames": {
"$objectToArray": "$tag"
}
}
},
{
$match: {
"tagNames.k": {
$in: [
"tagName",
".tagName"
]
}
}
},
{
$project: {
tagNames: 0
}
}
])
You could probably collapse it down and do it directly in find() (via $expr usage), as demonstrated here. But doing so requires a little more knowledge about your schema and the structure of the tag field. Overall though, working with field names that contain dots is even more difficult prior to 5.0, which further strengthens the suggestion to correct the underlying data.

Related

MongoDB query that looks for documents with lowercase values

Is it possible to make a MongoDB query that searches a field for completely lowercase string values?
Something like this pseudo query perhaps?
{ address: { $eq: { $toLower: "$address" } } }
...that would return docs with data like: { "address": "123 main st" }, but won't return docs like { "address": "123 Main St" }, or is such a query not possible with MongoDB?
Based on the clarification, yes what you want is possible and you were pretty close with the original syntax. Try something like the following:
db.collection.find({
$expr: {
$eq: [
{
$toLower: "$address"
},
"$address"
]
}
})
Playground link is here.
There may be some extra considerations depending on language, collation, etc. But this should serve as a good starting point.
Yes, you can use aggregation pipeline that makes specific fields lowercase and than does matching against them, for examples look at
https://www.mongodb.com/docs/manual/reference/operator/aggregation/toLower/#example
and https://www.mongodb.com/docs/manual/reference/operator/aggregation/match/#examples
On large datasets this way of querying would not be efficient, but for one time queries may be useful.

Change data type from string to date while skipping missing data

The core collection (other collections in the DB refer back to this one) in my DB contains 3 fields with date information which at this point is formatted as strings like MM/DD/YYYY. Further, there are a range of documents for which this field contains missing data, i.e. "". I populated this collection by running the mongoimport command on a JSON file.
My goal is to convert these date-fields into actual ISODate data types, so as to allow filtering the collection by dates. Further, I want MongoDB to know that empty strings indicate missing values. I have read quite widely on this, leading me to try a bunch of things:
Trying a forEach statement - This worked, but only for the very first document.
db.collection.find().forEach(function(element){
element.startDate = ISODate(element.startDate);
db.collection.save(element);
})
Using kind of a for-loop: this worked well, but stopped once it encountered a missing value (so it transformed about 11 values):
db.collection.update(
{
"startDate":{
"$type":"string"
}
},
[
{
"$set":{
"startDate":{
"$dateFromString":{
"dateString":"$startDate",
"format":"%m/%d/%Y"
}
}
}
}
]
)
So, both of these approaches kind of worked - but I don't know how to apply them to the entire collection. Further, I'd be interested in performing this task in the most efficient way possible. However, I only want to do this once - data that will be added in the future should hopefully be correctly formatted at the import stage.
db.collection.updateMany(
{
"$and": [
{ "startDate": { "$type": "string" } },
{ "startDate": { "$ne": "" } }
]
},
[
{
"$set": {
"startDate": {
"$dateFromString": {
"dateString": "$startDate",
"format": "%m/%d/%Y"
}
}
}
}
]
)
Filtering out empty string than doing the transformation will ignore documents that have empty string in date field.

How does 'fuzzy' work in MongoDB's $searchBeta stage of aggregation?

I'm not quite understanding how fuzzy works in the $searchBeta stage of aggregation. I'm not getting the desired result that I want when I'm trying to implement full-text search on my backend. Full text search for MongoDB was released last year (2019), so there really aren't many tutorials and/or references to go by besides the documentation. I've read the documentation, but I'm still confused, so I would like some clarification.
Let's say I have these 5 documents in my db:
{
"name": "Lightning Bolt",
"set_name": "Masters 25"
},
{
"name": "Snapcaster Mage",
"set_name": "Modern Masters 2017"
},
{
"name": "Verdant Catacombs",
"set_name": "Modern Masters 2017"
},
{
"name": "Chain Lightning",
"set_name": "Battlebond"
},
{
"name": "Battle of Wits",
"set_name": "Magic 2013"
}
And this is my aggregation in MongoDB Compass:
db.cards.aggregate([
{
$searchBeta: {
search: { //search has been deprecated, but it works in MongoDB Compass; replace with 'text'
query: 'lightn',
path: ["name", "set_name"],
fuzzy: {
maxEdits: 1,
prefixLength: 2,
maxExpansion: 100
}
}
}
}
]);
What I'm expecting my result to be:
[
{
"name": "Lightning Bolt", //lightn is in 'Lightning'
"set_name": "Masters 25"
},
{
"name": "Chain Lightning", //lightn is in 'Lightning'
"set_name": "Battlebond"
}
]
What I actually get:
[] //empty array
I don't really understand why my result is empty, so it would be much appreciated if someone explained what I'm doing wrong.
What I think is happening:
db.cards.aggregate... is looking for documents in the "name" and "set_name" fields for words that have a max edit of one character variation from the "lightn" query. The documents that are in the cards collection contain edits that are greater than 2, and therefor your expected result is an empty array. "Fuzzy is used to find strings which are similar to the search term or terms"; used with maxEdits and prefixLength.
Have you tried the term operator with the wildcard option? I think the below aggregation would get you the results you were actually expecting.
e.g.
db.cards.aggregate([
{$searchBeta:
{"term":
{"path":
["name","set_name"],
"query": "l*h*",
"wildcard":true}
}}]).pretty()
You need to provide an index to use with your search query.
The index is basically the analyzer that your query will use to process your results regarding if you want to a full match of the text, or you want a partial match etc.
You can read more about Analyzers from here
In your case, an index based on STANDARD analyzer will help.
After you create your index your code, modified below, will work:
db.cards.aggregate([
{
$search:{
text: { //search has been deprecated, but it works in MongoDB Compass; replace with 'text'
index: 'index_name_for_analyzer (STANDARD in your case)'
query: 'lightn',
path: ["name"] //since you only want to search in one field
fuzzy: {
maxEdits: 1,
prefixLength: 2,
maxExpansion: 100
}
}
}
}
]);

Mongoexport - modify large array fields to their counts

I have a large collection that I'd like to export to CSV, but I'd like to do some trimming to some of the fields. (e.g. I just need to know the number of elements in some, and just to know if others exist or not in the doc)
I would like to do the equivalent to a map function on the fields, so that fields that contain a list will be exported to the list size, and some fields that sometimes exist and sometimes do not, I would like to have them exported as boolean flags.
e.g. if my rows looks like this
{_id:"id1", listField:[1,2,3], optionalField: "...", ... }
{_id:"id2", listField:[1,2,3,4], ... }
I'd like to run a mongoexport to CSV that will result in this
_id, listField.length, optinalField.exists
"id1", 3, , true
"id2", 4, , false
Is that possible using mongoexport? (assume MongoDB version 3.0)
If not, is there another way to do that?
The mongoexport utility itself is pretty spartan and just a basic tool bundled in the suite. You can add "query" filters, but pretty much just like .find() queries in general, the intention is to return documents "as is" rather than "manipulate" the content.
Just as with other query operations, the .aggregate() method is something useful for document manipulation. So in order to "manipulate" the output to something different from the original document source, you would do:
db.collection.aggregate([
{ "$project": {
"listField": { "$size": "$listField" },
"optionalField": {
"$cond": [
{ "$ifNull": [ "$optionalField", false ] },
true,
false
]
}
}}
])
The $size operator returns the "size" of the array, and the $ifNull tests for the presence, either returning the field value or the alternate. Pass that result into $cond to get a true/false return rather than the field value. "_id" is always implicit, unless you specifically ask to omit it.
That would give you the "reduced" output, but in order to go to CSV then you would have to code that export yourself, as mongoexport does not run aggregation pipeline queries.
But the code to do so should be quite trivial ( pick a library for your language ), and the aggregation statement is also trivial as you can see here.
For the "really basic" approach, then just send a script to the mongo shell, as a very rudimentary form of programming:
db.collection.aggregate([
{ "$project": {
"listField": { "$size": "$listField" },
"optionalField": {
"$cond": [
{ "$ifNull": [ "$optionalField", false ] },
true,
false
]
}
}}
]).forEach(function(doc) {
print(Object.keys(doc).map(function(key) {
return doc[key]
}).join(","));
});
Which would output:
id1,3,true
id2,4,false

How to get (or aggregate) distinct keys of array in MongoDB

I'm trying to get MongoDB to aggregate for me over an array with different key-value pairs, without knowing keys (Just a simple sum would be ok.)
Example docs:
{data: [{a: 3}, {b: 7}]}
{data: [{a: 5}, {c: 12}, {f: 25}]}
{data: [{f: 1}]}
{data: []}
So basically each doc (or it's array really) can have 0 or many entries, and I don't know the keys for those objects, but I want to sum and average the values over those keys.
Right now I'm just loading a bunch of docs and doing it myself in Node, but I'd like to offload that work to MongoDB.
I know I can unwind those first, but how to proceed from there? How to sum/avg/min/max the values if I don't know the keys?
If you do not know the keys or cannot make a reasonable educated guess then you are basically stuck from going any further with the aggregation framework. You could supply "all of the keys" for consideration, but I supect your acutal data looks more like this:
{ "data": [{ "film": 10 }, { "televsion": 5 },{ "boardGames": 1 }] }
So there would be little point here findin out all the "key names" and then throwing that at an aggregation statement.
For the record though, "this is why you do not structure your data storage like this". Information like "film" here should not be used as a "key" name, because it is useful "data" that could be searched upon and most importantly "indexed" in a database system.
So your data should really look like this:
{
"data": [
{ "type": "film", "value": 10 },
{ "type": "televsion", "valule": 5 },
{ "type": "boardGames", "value": 1 }
]
}
Then the aggregation statement is simple, as are many other things:
db.collection.aggregate([
{ "$unwind": "$data" },
{ "$group": {
"_id": null,
"sum": { "$sum": "$data.value" },
"avg": { "$avg": "$data.value" }
}}
])
But since the key names are constantly changing in documents and do not have a uniform structure, then you need JavaScript processing on the server to traverse the keys, and that meand mapReduce:
db.collection.mapReduce(
function() {
this.data.forEach(function(data) {
Object.keys(data).forEach(function(key) {
emit(null,data[key]); // emit the value regardless of key name
});
});
},
function(key,values) {
return Array.sum(values); // Just summing for example
},
{ "out": { "inline": 1 } }
)
And of course the JavaScript execution here will work much more slowly than the native coded operators available to the aggregation framework.
So this should be an abject lesson as to why you don not use "data" as "key names" when storing data in a database. The aggregation framework works with standard structres and is fast, falling back to JavaScript processing is more flexible, but the cost is mostly in speed and other features.