Is there a way to include a Int32 field in a search index in MongoDB (with Atlas Search)? - mongodb

I have a collection in a Mongo Atlas DB on which I have a search index including some specific string fields. What I want to do is include a Int32 field in this search index to be able to do a search on this number, along with the other fields. I tried to add the field (Number) as a new field in the search index, with the type number, but it doesn't work. I guess it's because it compares the query, a string, with an Int32, but is there a way to make it work ? Or do I have to copy the "Number" in another field "NumberString" to include in the search index ?
Here is an example of one of these documents :
{
“_id” : ObjectId(“010000000000000000000003”),
“Description” : {
“fr-CA” : “Un lot de test”,
“en-CA” : “A test item”
},
“Name” : {
“fr-CA” : “Lot de test”,
“en-CA” : “Test item”
},
“Number” : 345,
“Partners” : [],
[...]
}
The index :
{
“mappings”: {
“dynamic”: false,
“fields”: {
“Description”: {
“fields”: {
“en-CA”: {
“analyzer”: “lucene.english”,
“searchAnalyzer”: “lucene.english”,
“type”: “string”
},
“fr-CA”: {
“analyzer”: “lucene.french”,
“searchAnalyzer”: “lucene.french”,
“type”: “string”
}
},
“type”: “document”
},
“Name”: {
“fields”: {
“en-CA”: {
“analyzer”: “lucene.english”,
“searchAnalyzer”: “lucene.english”,
“type”: “string”
},
“fr-CA”: {
“analyzer”: “lucene.french”,
“searchAnalyzer”: “lucene.french”,
“type”: “string”
}
},
“type”: “document”
},
“Number”:
{
“representation”: “int64”,
“type”: “number”
},
“Partners”: {
“fields”: {
“Name”: {
“type”: “string”
}
},
“type”: “document”
}}}}
And finally the query I try to do.
db.[myDB].aggregate([{ $search: { "index": "default", "text": { "query": "345", "path": ["Number", "Name.fr-CA", "Description.fr-CA", "Partners.Name"]}}}])
For this example, I want the query to be applied on Number, Name, Description and Partners and to return everything that matches. I would expect to have the item #345, but also any items with 345 in the name or description. Is it possible ?
Thanks !

With your current datatype you, should be able to search for #345 in text. However, I would structure the query like so, to support the numeric field as well:
db.[myDB].aggregate([
{
$search: {
"index": "default",
"compound": {
"should":[
{
"text": {
"query": "345",
"path": ["Name.fr-CA", "Description.fr-CA", "Partners.Name"]
}
},
{
"near": {
"origin": 345,
"path": "Number",
"pivot": 2
}
}
]
}
}
}
])

Related

How to use Atlas Search to find a text containing a subtext

I have a collection hosted on Atlas,
I currently have declared an Atlas Search index with the default configuration, but I am unable to use it to find documents that partially matches the text.
For instance, I have the following documents :
[
{
_id: 'ABC123',
designation: 'ENPHASE IQ TERMINAL CABLE 3PH-1 UD',
supplierIdentifier: 205919
},
{
_id: 'DEF456',
designation: 'ENPHASE CABLE VERT IQ 60/72CELLS 400VAC',
supplierIdentifier: 205919
},
{
_id: 'GHI789',
designation: 'P/SOLAR PC ASTROENERGY 275W 60 CELULAS',
supplierIdentifier: 206382
}
]
If I use the text search to search "EN", Nothing is returned :
[{ "$search" : { "index" : "default", "text" : { "query" : "EN", "path" : { "wildcard" : "*"}}, "count": {"type": "total"}}}]
No result
But if i use the regex search, my documents are correctly returned :
db.testproducts.aggregate([{ "$search" : { "index" : "default", "regex" : { "query" : "(.*)EN(.*)", "allowAnalyzedField" : true, "path" : { "wildcard" : "*"}}, "count": {"type": "total"}}}])
[
{
_id: 'ABC123',
designation: 'ENPHASE IQ TERMINAL CABLE 3PH-1 UD',
supplierIdentifier: 205919
},
{
_id: 'DEF456',
designation: 'ENPHASE CABLE VERT IQ 60/72CELLS 400VAC',
supplierIdentifier: 205919
},
{
_id: 'GHI789',
designation: 'P/SOLAR PC ASTROENERGY 275W 60 CELULAS',
supplierIdentifier: 206382
}
]
As the regex operator is pretty slow, how to achieve the same with the text search ?
Gfhyser, you have a few options and I'm not sure which one you will like the best as they both have limitations.
Option 1, you can specify a path. As you can imagine, wildcard paths and leading ad trailing regex can be expensive. If you know the path you want search is designation, performance will be better if you change your existing query to:
db.testproducts.aggregate([{ "$search" : { "index" : "default", "regex" : { "query" : "(.*)EN(.*)", "allowAnalyzedField" : true, "path" : "designation", "count": {"type": "total"}}}])
Option 2, you can refine your search. Ask yourself if you are truly looking for Enphase and Energy wherever they appear in the same result.
Option 3,The final option is somewhat experimental for me because I need to spend more time on it. I simply want to help. It might be the best performing, involves you reversing your tokens indexed and when querying with a custom analyzer because it can speed up leading wild card queries.If you don't mind a bit of complexity, here is how it would look. Let me know if works out as I don't use regular expressions as much these days.
I create a custom analyzer in the sample_airbnb.listings_and_reviews dataset to search with leading wildcard characters. The index looks like:
{
"analyzer": "lucene.keyword",
"mappings": {
"dynamic": false,
"fields": {
"name": [
{
"dynamic": true,
"type": "document"
},
{
"type": "string"
}
],
"summary": {
"analyzer": "fastRegex",
"type": "string"
}
}
},
"analyzers": [
{
"charFilters": [],
"name": "fastRegex",
"tokenFilters": [
{
"type": "reverse"
}
],
"tokenizer": {
"type": "keyword"
}
}
]
}
And a query that exploits this speed and has the flexibility to potentially match both of your desired terms would look like this:
[
{
'$search': {
'index': 'reviews_search',
'compound': {
'should': [
{
'wildcard': {
'query': '*cated*',
'path': 'summary',
'allowAnalyzedField': true
}
}
]
}
}
}
]

Trying to iterate through a collection of MongoDb results to update another collection

I'm trying to update some documents from DB-collection1 (source db) over to DB-collection2 (destination DB) .. all on the same MongoDb (with same permissions, etc).
So for each document from DB-Collection1, update a specific document in DB-collectoin2, if it exists.
The documents in DB-collection1 have following shape:
{
"_id": {
"commentId": "082f3de6-a268-46b5-803f-89bafd172621"
},
"appliesTo": {
"targets": [
{
"_id": {
"documentId": "b1eb1ad5-e74c-4a64-a4f3-bdc67ba70b35"
},
"type": "Document"
}
]
}
}
And the matching document in DB-collection2 is:
{
"_id": {
"documentId": "b1eb1ad5-e74c-4a64-a4f3-bdc67ba70b35"
},
"name": "jill"
},
I'm using a cursor to iterate through the source collection but I'm not sure how I can do this?
This is the javascript code mongo shell script I'm trying right now, when I run the following command on a machine where mongo is installed:
CLI: root#f0cc2f13e70c:/src/scripts# mongo --host localhost --username root --password example copyFoosToBars.js
// copyFoosToBars.js
function main() {
print('Starting script.')
print()
var foosDb = db.getSiblingDB('foos');
var barsDb = db.getSiblingDB('bars');
// Grab all the 'foos' which have a some barId in some convoluted schema.
var sourceFoos = foosDb.getCollection('foos')
.find(
{
"appliesTo.targets.type" : "Document",
"_meta.deleted": null
},
{
"_id" : 0,
"appliesTo.targets._id.documentId" : 1
}
);
sourceFoos.forEach(function(foo){
// Check if this document exists in the bars-db
var desinationBars = barsDb.getCollection('bars')
.find(
{
"_id.documentId" : foo.appliesTo.targets._id.documentId,
},
);
printjson(desinationBars);
// If destinationBars document exists, then add a the field 'Text' : 'hi there' to the document -or- update the existing field, if the 'Text' field already exists in this document.
});
print()
print()
print('----------------------------------------------')
}
main()
So here's some sample json output for the first part of the query -> which proves that I have some data which passes that 'find/search' clause:
Starting script.
{
"appliesTo" : {
"targets" : [
{
"_id" : {
"barId" : "810e66e2-66d1-44f4-be0e-980309d8df8f"
}
}
]
}
}
{
"appliesTo" : {
"targets" : [
{
"_id" : {
"barId" : "54f25223-67bb-4d5d-ad47-24392e4acbdf"
}
}
]
}
}
{
"appliesTo" : {
"targets" : [
{
"_id" : {
"barId" : "34c83da3-eafd-41bf-93af-3a45d1644225"
}
}
]
}
}
This doesn't work.
MongoDB server version: 4.0.22
WARNING: shell and server versions do not match
Starting script.
uncaught exception: TypeError: comment.appliesTo.targets._id is undefined :
main/
<snip>
Can someone please suggest some clues as to how I can fix this, please?
First of all you need to safeguard against multiple items in the appliesTo.targets.
A document
{
"_id": {
"commentId": "082f3de6-a268-46b5-803f-89bafd172621"
},
"appliesTo": {
"targets": [
{
"_id": {
"documentId": "should-not-be-updated"
},
"type": "AnyOtherType"
},
{
"_id": {
"documentId": "b1eb1ad5-e74c-4a64-a4f3-bdc67ba70b35"
},
"type": "Document"
}
]
}
}
Will be selected by
.find(
{
"appliesTo.targets.type" : "Document",
"_meta.deleted": null
},
{
"_id" : 0,
"appliesTo.targets._id.documentId" : 1
}
);
with the resulting document:
{
"appliesTo": {
"targets": [
{
"_id": {
"documentId": "should-not-be-updated"
}
},
{
"_id": {
"documentId": "b1eb1ad5-e74c-4a64-a4f3-bdc67ba70b35"
}
}
]
}
}
so foo.appliesTo.targets[0]._id.documentId will be "should-not-be-updated".
Structure of the document does not allow to use $elemMatch, so you have to either use aggregation framework or filter the array clientside. The aggregation has benefit of running serverside and reduce amount of data to transfer to the client.
Secondly, there is no point to find documents from DB-collection2. You can update all matching ones straight away, like in "update...where" SQL .
So the code must be something like following:
var sourceFoos = db.foos.aggregate([
{
$unwind: "$appliesTo.targets"
},
{
$match: {
"appliesTo.targets.type": "Document",
"appliesTo.targets._id.documentId": {
$exists: true
},
"_meta.deleted": null
}
},
{
$project: {
_id: 0,
"documentId": "$appliesTo.targets._id.documentId"
}
}
]);
sourceFoos.forEach(function(foo){
db.bars.updateMany(
{"_id.documentId" : foo.documentId},
{$set:{'Text' : 'hi there'}}
)
})
If there are a lot of documents expected in the cursor I would recommend to look at bulk updates to speed it up, but as I mentioned earlier in this case mongo shell might not be an ideal tool.
target is an array so for accessing to the _id try like this :
foo.appliesTo.targets[0]._id.barId
use async/await with try/catch and use .toArray() after find query
// copyFoosToBars.js
async function main() {
try {
var foosDb = db.getSiblingDB("foos");
var barsDb = db.getSiblingDB("bars");
// Grab all the 'foos' which have a some barId in some convoluted schema.
var sourceFoos = await foosDb
.getCollection("foos")
.find(
{
"appliesTo.targets.type": "Bar",
"_meta.deleted": null,
},
{
_id: 0,
"appliesTo.targets._id.fooId": 1,
}
).toArray();
for (foo of sourceFoos) {
var desinationBars = await barsDb
.getCollection("bars")
.find({
"_id.barId": foo.appliesTo.targets[0]._id.barId,
})
.toArray();
console.log(desinationBars);
if(desinationBars.length> 0){
//do somthings
}
}
} catch (error) {
console.log(error)
}
}
in the first find query you select "appliesTo.targets._id.fooId" : 1so select ( fooId but in the result json show "barId" : "810e66e2-66d1-44f4-be0e-980309d8df8f" it's conflict, Anyway, I hope this solution solves your problem

how to lower case the value of unique:true keys in mongodb?

I have created a mongodb and by mistake have entered duplicate values in the form of capital and small case letters.
I have made the index unique. MongoDB is case sensitive and hence, considered the capital letter and small letter as different values.
Now my problem is the database have got around 32 GB. and I came across this issue. Kindly help me.
Here is the sample:
db.tt.createIndex({'email':1},{unique:true})
> db.tt.find().pretty()
{
"_id" : ObjectId("591d706c0ef9acde11d7af66"),
"email" : "g#gmail.com",
"src" : [
{
"acc" : "ln"
},
{
"acc" : "drb"
}
]
}
{
"_id" : ObjectId("591d70740ef9acde11d7af68"),
"email" : "G#gmail.com",
"src" : [
{
"acc" : "ln"
},
{
"acc" : "drb"
},
{
"acc" : "dd"
}
]
}
How I can make the email as lowercase and assign the src values to the original one. Kindly help me.
you can achive this using $toLower aggregation operator like this :
db.tt.aggregate([
{
$project:{
email:{
$toLower:"$email"
},
src:1
}
},
{
$unwind:"$src"
},
{
$group:{
_id:"$email",
src:{
$addToSet:"$src"
}
}
},
{
$project:{
_id:0,
email:"$_id",
src:1
}
},
{
$out:"anotherCollection"
}
])
$addToSet allow to keep oly one distinct occurence of src items
this will write this document to a new collection named anotherCollection:
{ "email" : "g#gmail.com", "src" : [ { "acc" : "dd" }, { "acc" : "drb" }, { "acc" : "ln" } ] }
Note that with $out, you can averwrite directly your tt collection, however before doing this make sure to understand what your doing because all previous data will be lost
The most efficient way I can think of to merge the data is run an aggregation and loop the result to write back to the collection in bulk operations:
var ops = [];
db.tt.aggregate([
{ "$unwind": "$src" },
{ "$group": {
"_id": { "$toLower": "$email" },
"src": { "$addToSet": "$src" },
"ids": { "$addToSet": "$_id" }
}}
]).forEach(doc => {
var id = doc.ids.shift();
ops = [
...ops,
{
"deleteMany": {
"filter": { "_id": { "$in": doc.ids } }
}
},
{
"updateOne": {
"filter": { "_id": id },
"update": {
"$set": { "email": doc._id },
"$addToSet": { "src": { "$each": doc.src } }
}
}
},
];
if ( ops.length >= 500 ) {
db.tt.bulkWrite(ops);
ops = [];
}
});
if ( ops.length > 0 )
db.tt.bulkWrite(ops);
In steps, that's $unwind the array items so they can be merged via $addToSet, under a $group on using $toLower on the email value. You also want to keep the set of unique source document ids.
In the loop you shift the first _id value off of doc.ids and update that document with the lowercase email and the revised "src" set. Using $addToSet here makes the operation write safe with any other updates that might occur to the document.
Then the other operation in the loop deletes the other documents that shared the same converted case email, so there are no duplicates. Actually do that one first. The default "ordered" operations make sure this is fine.
And do it in the shell, since it's a one-off operation and is really just as simple as listing as shown.

How to limit the no of columns in output while doing aggregate operation in Mongo DB

My function looks like below.
function (x)
{
var SO2Min = db.AirPollution.aggregate(
[
{
$match : {"SO2":{$ne:'NA'}, "State":{$eq: x} }
},
{
$group:
{
_id: x,
SO2MinQuantity: { $min: "$SO2" }
}
},
{
$project:
{SO2MinQuantity: '$SO2MinQuantity'
}
}
]
)
db.AirPollution.update
(
{
"State": "West Bengal"},
{
$set: {
"MaxSO2": SO2Max
}
},
{
"multi": true
}
);
}
Here, AirPolltuion is my Collection. If I run this function, the collection gets updated with new column MaxSO2 as below.
{
"_id" : ObjectId("5860a2237796484df5656e0c"),
"Stn Code" : 11,
"Sampling Date" : "02/01/15",
"State" : "West Bengal",
"City/Town/Village/Area" : "Howrah",
"Location of Monitoring Station" : "Bator, Howrah",
"Agency" : "West Bengal State Pollution Control Board",
"Type of Location" : "Residential, Rural and other Areas",
"SO2" : 10,
"NO2" : 40,
"RSPM/PM10" : 138,
"PM 2.5" : 83,
"MaxSO2" : {
"_batch" : [
{
"_id" : "West Bengal",
"SO2MaxQuantity" : 153
}
],
"_cursor" : {}
}
}
Where we can see, that MaxSO2 has been added as a sub document. But I want that column to be added inside same document as a field, not as a part of sub document. Precisely, I dont want batch and cursor fields to come up. Please help.
Since the aggregate function returns a cursor, you can use the toArray() method which returns an array that contains all the documents from a cursor and then access the aggregated field. Because you are returning a single value from the aggregate, there's no need to iterate the results array, just access the first and only single document in the result to get the value.
Once you get this value you can then update your collection using updateMany() method. So you can refactor your code to:
function updateMinAndMax(x) {
var results = db.AirPollution.aggregate([
{
"$match" : {
"SO2": { "$ne": 'NA' },
"State": { "$eq": x }
}
},
{
"$group": {
"_id": x,
"SO2MinQuantity": { "$min": "$SO2" },
"SO2MaxQuantity": { "$max": "$SO2" }
}
},
]).toArray();
var SO2Min = results[0]["SO2MinQuantity"];
var SO2Max = results[0]["SO2MaxQuantity"];
db.AirPollution.updateMany(
{ "State": x },
{ "$set": { "SO2MinQuantity": SO2Min, "SO2MaxQuantity": SO2Max } },
);
}
updateMinAndMax("West Bengal");

select only subdocuments or arrays

{
"_id":{
"oid":"4f33bf69873dbc73a7d21dc3"
},
"country":"IND",
"states":[{
"name":"orissa",
"direction":"east",
"population":41947358,
"districts":[{
"name":"puri",
"headquarter":"puri",
"population":1498604
},
{
"name":"khordha",
"headquarter":"bhubaneswar",
"population":1874405
}
]
},
{
"name":"andhra pradesh",
"direction":"south",
"population":84665533,
"districts":[{
"name":"rangareddi",
"headquarter":"hyderabad",
"population":3506670
},
{
"name":"vishakhapatnam",
"headquarter":"vishakhapatnam",
"population":3789823
}
]
}
]
}
In above collection(i.e countries) i have only one document , and i want to fetch the details about a particular state (lets say "country.states.name" : "orissa" ) ,But i want my result as here under instead of entire document .Is there a way in Mogo...
{
"name": "orissa",
"direction": "east",
"population": 41947358,
"districts": [
{
"name": "puri",
"headquarter": "puri",
"population": 1498604
},
{
"name": "khordha",
"headquarter": "bhubaneswar",
"population": 1874405
}
]
}
Thanks
Tried this:
db.countries.aggregate(
{
"$project": {
"state": "$states",
"_id": 0
}
},
{
"$unwind": "$state"
},
{
"$group": {
"_id": "$state.name",
"state": {
"$first": "$state"
}
}
},
{
"$match": {
"_id": "orissa"
}
}
);
And got:
{
"result" : [
{
"_id" : "orissa",
"state" : {
"name" : "orissa",
"direction" : "east",
"population" : 41947358,
"districts" : [
{
"name" : "puri",
"headquarter" : "puri",
"population" : 1498604
},
{
"name" : "khordha",
"headquarter" : "bhubaneswar",
"population" : 1874405
}
]
}
}
],
"ok" : 1
You can't do it right now, but you will be able to with $unwind in the aggregation framework. You can try it now with the experimental 2.1 branch, the stable version will come out in 2.2, probably in a few months.
Any query in mongodb always return root document.
There is only one way for you to load one sub document with parent via $slice if you know ordinal number of state in nested array:
// skip ordinalNumberOfState -1, limit 1
db.countries.find({_id: 1}, {states:{$slice: [ordinalNumber -1 , 1]}})
$slice work in default order (as documents was inserted in nested array).
Also if you don't need fields from a country you can include only _id and states in result:
db.countries.find({_id: 1}, {states:{$slice: [ordinalNumber -1 , 1]}, _id: 1})
Then result document will looks like this one:
{
"_id":{
"oid":"4f33bf69873dbc73a7d21dc3"
},
"states":[{
"name":"orissa",
"direction":"east",
"population":41947358,
"districts":[{
"name":"puri",
"headquarter":"puri",
"population":1498604
},
{
"name":"khordha",
"headquarter":"bhubaneswar",
"population":1874405
}
]
}]
}
db.countries.find({ "states": { "$elemMatch": { "name": orissa }}},{"country" : 1, "states.$": 1 })
If you don't want to use aggregate, you can do it pretty easily at the application layer using underscore (included by default):
var country = Groops.findOne({"property":value);
var state _.where(country, {"state":statename});
This will give you the entire state record that matches statename. Very convenient.