Is it possible to make a MongoDB query that searches a field for completely lowercase string values?
Something like this pseudo query perhaps?
{ address: { $eq: { $toLower: "$address" } } }
...that would return docs with data like: { "address": "123 main st" }, but won't return docs like { "address": "123 Main St" }, or is such a query not possible with MongoDB?
Based on the clarification, yes what you want is possible and you were pretty close with the original syntax. Try something like the following:
db.collection.find({
$expr: {
$eq: [
{
$toLower: "$address"
},
"$address"
]
}
})
Playground link is here.
There may be some extra considerations depending on language, collation, etc. But this should serve as a good starting point.
Yes, you can use aggregation pipeline that makes specific fields lowercase and than does matching against them, for examples look at
https://www.mongodb.com/docs/manual/reference/operator/aggregation/toLower/#example
and https://www.mongodb.com/docs/manual/reference/operator/aggregation/match/#examples
On large datasets this way of querying would not be efficient, but for one time queries may be useful.
Related
I have some data in a mongodb database collection that looks like this
{
"_id": {
"$oid": "63737b4b654d9b6a0c3a2006"
},
"tag": {
"tagName": 0.10534846782684326
}
}
and I want to check if a dictionary with a specific tagName exists. To do so, we can apply this
mycollection.find({f"tag.{'tagName'}": {"$exists": True}})
However, some tagNames have a dot . in front, e.g.,
{
"_id": {
"$oid": "63737b4b654d9b6a0c3a2006"
},
"tag": {
".tagName": 0.10534846782684326
}
}
So when I run the query
mycollection.find({f"tag.{'.tagName'}": {"$exists": True}})
returns that the dictionary whose key name is .tagName is not found. This is because of the double dot in f"tag.{'.tagName'}". Can we write the query in such a way in order to avoid this situation?
Mongodb version:
db version v4.4.13
Build Info: {
"version": "4.4.13",
"gitVersion": "df25c71b8674a78e17468f48bcda5285decb9246",
"openSSLVersion": "OpenSSL 1.1.1f 31 Mar 2020",
"modules": [],
"allocator": "tcmalloc",
"environment": {
"distmod": "ubuntu2004",
"distarch": "x86_64",
"target_arch": "x86_64"
}
}
The first syntax looks a little odd to me. I don't think it should have the curly brackets. You can see in this playground example that it doesn't find the first document. So you may be looking to remove the curly brackets from the query in both situations, and here is an example where doing so correctly returns the first document.
Now regarding the . character in the name, one approach would be to use the $getField operator. That operator helps retrieve field names that are otherwise ambiguous or contain special characters. An example (that would only retrieve the second document) might look like this:
db.collection.find({
$expr: {
$ifNull: [
{
$getField: {
field: ".tagName",
input: "$tag"
}
},
false
]
}
})
Playground example here
You may combine the two conditions with a $or to return both documents, playground example here.
I would recommend updating your data to remove the extra . character. Its presence is going to make working with the data more difficult and probably cause some performance issues since many of the operations won't be able to effectively use indexes.
Version 4.4 and earlier
As noted in the comments, the $getField operator is new in version 5.0. To accomplish something similar prior to that you could use the $objectToArray operator.
Effectively what you will do here is convert $tag to an array of k, v pairs where k contains the field name. You can then filter directly against that name (k) looking for the value(s) of interest.
The verbose, but arguably more readable, approach to doing so looks like this:
db.collection.aggregate([
{
"$addFields": {
"tagNames": {
"$objectToArray": "$tag"
}
}
},
{
$match: {
"tagNames.k": {
$in: [
"tagName",
".tagName"
]
}
}
},
{
$project: {
tagNames: 0
}
}
])
You could probably collapse it down and do it directly in find() (via $expr usage), as demonstrated here. But doing so requires a little more knowledge about your schema and the structure of the tag field. Overall though, working with field names that contain dots is even more difficult prior to 5.0, which further strengthens the suggestion to correct the underlying data.
I'm trying to achieve a join on three String fields between two collections on MongoDB v. 4.4.3: one containing the original documents, the other the translations.
Both document types look like this:
{
"_id" : ObjectId("60644367b521563be8044f07"),
"dsId" : "2051918",
"lcId" : "data_euscreenXL_EUS_15541BBE705033639D4E06691D7A5D2E",
"pgId" : "1",
(...)
This MongoDB query does what I need, embedding the Translations in the result:
db.Original.aggregate([
{ $match: { query parameters } },
{ $lookup:
{
from: "Translation",
let: { "origDsId": "$dsId", origLcId: "$lcId", "origPgId": "$pgId" },
pipeline: [
{ $match:
{ $expr:
{ $and:
[
{ $eq: [ "$dsId", "$$origDsId" ] },
{ $eq: [ "$lcId", "$$origLcId" ] },
{ $eq: [ "$pgId", "$$origPgId" ] }
]
}
}
},
{ $project: { dsId: 0, _id: 0 } }
],
as: "translations"
}
}])
However, I can't figure out how to write the equivalent Morphia query. I updated to Morphia v.2.2, which adds the required features, but it's all very new and hasn't yet been documented on morphia.dev; I couldn't find much more on Javadoc either. This Morphia unit test on Github looked interesting and I tried copying that approach:
Aggregation<Original> query = datastore.aggregate(Original.class)
.match(eq("dsId", datasetId), eq("lcId", localId))
.lookup(Lookup.lookup(Translation.class)
.let("origDsId", value("$dsId"))
.let("origLcId", value("$lcId"))
.let("origPgId", value("$pgId"))
.pipeline(Match.match(expr(Expressions.of()
.field("$and",
array(Expressions
.of().field("$eq",
array(field("dsId"),
field("$origDsId"))),
Expressions
.of().field("$eq",
array(field("lcId"),
field("$origLcId"))),
Expressions
.of().field("$eq",
array(field("pgId"),
field("$origPgId"))))))))
.as("translations"));
...
This returns the Original documents, but fails to join the Translations.
The problem is that the syntax of the pipeline stage is rather puzzling. I wonder if anyone can shed some light on this?
the unit test example does not use (or need?) the double-$ form seen in "$$origDsId"? From the MongoDB documentation I understand that this form is used to refer to externally defined variables (eg in the "let" assignment before the "pipeline") but they don't work in the quoted example either;
what is the role of the static ArrayExpression "array"? It looks as if it's a kind of assignment container, where Expressions.of().field("$eq", array(field("dsId"), field("$origDsId"))) might mean something like "dsId" = "$origDsId" - which would be what I need (if it would work ;) )
I tried all sorts of combinations, using field("$origDsId"), value("$origDsId"), field("$$origDsId"), value("$$origDsId"), etcetera, but having no luck so far.
Thanks in advance!
I'm not quite understanding how fuzzy works in the $searchBeta stage of aggregation. I'm not getting the desired result that I want when I'm trying to implement full-text search on my backend. Full text search for MongoDB was released last year (2019), so there really aren't many tutorials and/or references to go by besides the documentation. I've read the documentation, but I'm still confused, so I would like some clarification.
Let's say I have these 5 documents in my db:
{
"name": "Lightning Bolt",
"set_name": "Masters 25"
},
{
"name": "Snapcaster Mage",
"set_name": "Modern Masters 2017"
},
{
"name": "Verdant Catacombs",
"set_name": "Modern Masters 2017"
},
{
"name": "Chain Lightning",
"set_name": "Battlebond"
},
{
"name": "Battle of Wits",
"set_name": "Magic 2013"
}
And this is my aggregation in MongoDB Compass:
db.cards.aggregate([
{
$searchBeta: {
search: { //search has been deprecated, but it works in MongoDB Compass; replace with 'text'
query: 'lightn',
path: ["name", "set_name"],
fuzzy: {
maxEdits: 1,
prefixLength: 2,
maxExpansion: 100
}
}
}
}
]);
What I'm expecting my result to be:
[
{
"name": "Lightning Bolt", //lightn is in 'Lightning'
"set_name": "Masters 25"
},
{
"name": "Chain Lightning", //lightn is in 'Lightning'
"set_name": "Battlebond"
}
]
What I actually get:
[] //empty array
I don't really understand why my result is empty, so it would be much appreciated if someone explained what I'm doing wrong.
What I think is happening:
db.cards.aggregate... is looking for documents in the "name" and "set_name" fields for words that have a max edit of one character variation from the "lightn" query. The documents that are in the cards collection contain edits that are greater than 2, and therefor your expected result is an empty array. "Fuzzy is used to find strings which are similar to the search term or terms"; used with maxEdits and prefixLength.
Have you tried the term operator with the wildcard option? I think the below aggregation would get you the results you were actually expecting.
e.g.
db.cards.aggregate([
{$searchBeta:
{"term":
{"path":
["name","set_name"],
"query": "l*h*",
"wildcard":true}
}}]).pretty()
You need to provide an index to use with your search query.
The index is basically the analyzer that your query will use to process your results regarding if you want to a full match of the text, or you want a partial match etc.
You can read more about Analyzers from here
In your case, an index based on STANDARD analyzer will help.
After you create your index your code, modified below, will work:
db.cards.aggregate([
{
$search:{
text: { //search has been deprecated, but it works in MongoDB Compass; replace with 'text'
index: 'index_name_for_analyzer (STANDARD in your case)'
query: 'lightn',
path: ["name"] //since you only want to search in one field
fuzzy: {
maxEdits: 1,
prefixLength: 2,
maxExpansion: 100
}
}
}
}
]);
i'm trying to make views in MongoDB to avoid unnecessary returns. In the documentation says that the aggregation functions can take variables with a double dollar sign, taking this in mind i have created a view that in this example should take one variable to filter customerIds and group the results to sum the payments of different documents.
Example:
db.createView(
"viewName",
"myCollection",
[{
$match: { "customerId": "$$customerId", }
},{
$group: {
_id: null,
total: "$amount",
}
}]
)
The view is created OK and if i put some valid customerId in the aggregation function that works ok, but i don't have the slightest idea how to execute the view and pass the customerID that i need.
Any ideas? The mongodb documentation does not help me in this situation and i really need to create this as a view, since there are many applications that will connect to this view(s).
I have tried:
db.viewName.find({customerId: "some valid id"});
You can access it just like a collection, for example I am creating a view via:
db.runCommand({
create: 'AuthorsView',
viewOn: 'authors',
pipeline: [{
"$group": {
"_id": "$email",
"count": {
"$sum": 1
}
}
}]
})
Since this is now an existing view I can simply do:
db.getCollection('AuthorsView').find({})
To see all the documents or to add more parameters to the find
Not sure what you mean by passing variables since views are just like collections ... you run queries against them via find & aggregate.
First, you can't pass variables to $match without $expr. There is no error because "$$..." is interpreted as a string.
Second, If we fix things like this:
db.createView(
"viewName",
"myCollection",
[{
$match: {$expr:{$eq:["$customerId","$$customerId"]}}, }
},{
$group: {
_id: null,
total: "$amount",
}
}]
)
But $$ is not a system variable, so this won't work. You could pass a system variable like $$ROOT or a field path $field.path; the user-defined variables are made up of system variables or collection data.
I have an array of objects containing dates of when a hotel is available to book within Mongo. It looks something like this, using ISO Date formats as said here.
Here's what document looks like, trying to keep it short for the example.
available: [
{
"start":"2014-04-07T00:00:00.000000",
"end":"2014-04-08T00:00:00.000000"
},
{
"start":"2014-04-12T00:00:00.000000",
"end":"2014-04-15T00:00:00.000000"
},
{
"start":"2014-04-17T00:00:00.000000",
"end":"2014-04-22T00:00:00.000000"
},
]
Now, I need query two dates, check in date and check out date. If the dates are available, Mongo should return the document, otherwise it won't. Here are a few test cases:
2014-04-06 TO 2014-04-08 should NOT return.
2014-04-13 TO 2014-04-16 should NOT return.
2014-04-17 TO 2014-04-21 should return.
How would I go about forming this in to a Mongo query? Using $elemMatch looked like it would be a good start, but I don't know where to take it after that so all three examples I posted above work with the same query. Any help is appreciated.
db.collection.find({
"available": {
"$elemMatch": {
"start": { "$lte": new Date("2014-04-17") },
"end": { "$gte": new Date("2014-04-21") }
}
}
})
How about this command?
Well I actually hope your documents have real ISODates rather than what appears to be strings. When they do then the following query form matches as expected:
db.collection.find({
"available": {
"$elemMatch": {
"start": { "$gte": new Date("2014-04-17") },
"end": { "$gte": new Date("2014-04-21") }
}
}
})