use geonear with fuzzy search text mongodb - mongodb

I have the following query
db.hotels.aggregate([
{
$search: {
index:'txtIdx', // this is the index name
text: {
query:"sun",
path:['name','address.landmark','address.city','address.state','address.country'],
fuzzy: {
maxEdits:2,
prefixLength: 1,
},
},
},
},
{
$project: {
_id: 1,
name: 1,
address:1,
score: { $meta: "searchScore" }
}
},
{$limit:100},
])
there is also a field called 'location' in hotels' collection, which has coordinates as follows
"location": {
"type": "Point",
"coordinates": [
72.867804,
19.076033
]
}
how can I use geonear with this search query to only return near by hotels from user, with provided latitude, longitude and distance.
I also tried this query
{
$search: {
index:'searchIndex',
compound: {
must: {
text: {
query:'sun',
path:['name','address.landmark','address.city','address.state','address.country'],
fuzzy: {
maxEdits:2,
prefixLength: 3,
},
},
},
should: {
near:{
origin: {
type: 'Point',
coordinates: [-122.45665489904827,37.75118012951178],
},
pivot: 1000,
path: 'location'
},
}
}
}
},
but above query returns results which are not even around that location. It returns same result as 'search' would provide without 'near'.I have created 'geo' index for location, still it doesn't return nearby hotels.
Or is there another way apart from using geonear along with search? I am trying since past 2 days now, and I haven't found anything useful. Also I want to use fuzzy text search only. Please let me know if there is a way to solve this?

Related

Atlas Search works too slow when using facet

I have a big collection (over 22M records, approx. 25GB) on an M10 cluster with MongoDB version 4.4.10. I set up an Atlas search index on one field (address) and it works pretty fast when I request through the search tester. However, when I try to paginate it by specifying a facet, it gets extremely slow in comparison with the query without the facet. Is there a way to optimize the facet or somehow replace the facet with one that works faster ? Below are the plain query and another one with the facet:
db.getCollection("users").aggregate([{
$search: {
index: 'address',
text: {
query: '7148 BIG WOODS DR',
path: {
'wildcard': '*'
}
}
}
}]);
db.getCollection("users").aggregate([{
$search: {
index: 'address',
text: {
query: '7148 BIG WOODS DR',
path: {
'wildcard': '*'
}
}
}
}, {
$facet: {
paginatedResult: [
{
$limit: 50
},
{
$skip: 0
}
],
totalCount: [
{
$count: 'total'
}
]
}
}]);
The fast and recommend way is using facet with the $searchMeta stage to retrieve metadata results only for the query
"$searchMeta": {
"index":"search_index_with_facet_fields",
"facet":{
"operator":{
"compound":{
"must":[
{
"text":{
"query":"red shirt",
"path":{
"wildcard":"*"
}
}
},
{
"compound":{
"filter":[
{
"text":{
"query":["clothes"],
"path":"category"
}
},
{
"text":{
"query":[
"maroon",
"blackandred",
"blackred",
"crimson",
"burgandy",
"burgundy"
],
"path":"color"
}
}
]
}
}
]
}
},
"facets":{
"brand":{
"type":"string",
"path":"brand"
},
"size":{
"type":"string",
"path":"size"
},
"color":{
"type":"string",
"path":"color"
}
}
}
}
}
Here we are fetching 3 facets brand, size, and color, which we need to be defined in your search_index as Facet fields such as
{
"mappings": {
"dynamic": false,
"fields": {
"category": [
{
"type": "string"
}
],
"brand": [
{
"type": "string"
},
{
"type": "stringFacet"
}
],
"size": [
{
"type": "string"
},
{
"type": "stringFacet"
}
],
"color": [
{
"type": "string"
},
{
"type": "stringFacet"
}
]
}
}
}
category is defined only as string since we are not using it in facets but only as a filter field.
We can also replace filter op with must or should based on our requirement.
Finally, we will get as our result.
*p.s. I am also new to mongo and got to this solution after searching a lot, so please upvote if you find it useful, also let me know if there is any error/improvement you notice. Thanks *

How to group by geospatial attribute in mongodb?

I have a set of documents in mongodb and I am trying to group the document set using the nearest geopoint coordinates within distance of 100m radius to a given document, and get the average value of type and the $first value for cordinates. A sample document set is as below. Is there a way to do this using existing functions in mongodb aggregation pipeline or do I have to use newly introduced $function to build a custom aggregation function. Any suggestions are highly appreciated.
{"_id":{"$oid":"5e790cfe46fa8260f41d2626"},
"cordinates":[103.96277219999999,1.3437526],
"timestamp":1584991486436,
"user":{"$oid":"5e4bbbc31eac8e2e3ca219a6"},
"type": 1,
"__v":0}
{"_id":{"$oid":"5e790d7346fa8260f41d2627"},
"cordinates":[103.97242539965999,1.33508],
"timestamp":1584991603400,
"user":{"$oid":"5e4bbbc31eac8e2e3ca219a6"},
"type": 1,
"__v":0}
{"_id":{"$oid":"5e790d7346fa8260f41d2627"},
"cordinates":[103.97242539990099,1.33518],
"timestamp":1584991603487,
"user":{"$oid":"5e4bbbc31eac8e2e3ca219a6"},
"type": 2,
"__v":0}
A sample document that would be expected as output after aggregation pipeline.
{"avgCordinates":[103.97242539990099,1.33518],
"avgType": 1.6,
}
I managed solve this by by building a custom function to represent a single value for the geospatial coordinate and then grouping by the returned values. I was able to group nearby coordinates to a single document as the function I used to transform values would also map to nearby scalar values. So far it has given me expected outputs for the heatmap. But still I'm not sure this is the correct way to do this. There should be a better answer for this. I have posted my aggregation pipeline below. Any suggestions for improving this are appreciated.
[
{
'$match': {
'timestamp': {
'$gte': 1599889338000
}
}
}, {
'$addFields': {
'singleCoordinate': {
'$function': {
'body': 'function(coordinates){return ((coordinates[1]+90)*180+coordinates[0])*1000000000000;}',
'args': [
'$coordinates', '$geonear'
],
'lang': 'js'
}
}
}
}, {
'$group': {
'_id': {
'$subtract': [
'$singleCoordinate', {
'$mod': [
'$singleCoordinate', 100
]
}
]
},
'coordinates': {
'$first': '$coordinates'
},
'avgType': {
'$avg': '$type'
}
}
}, {
'$addFields': {
'latitude': {
'$arrayElemAt': [
'$coordinates', 1
]
},
'longitude': {
'$arrayElemAt': [
'$coordinates', 0
]
},
'weight': {
'$multiply': [
'$avgType', '$_id'
]
}
}
}, {
'$project': {
'_id': false,
'coordinates': false,
'avgType': false
}
}
]

Mongoose aggregate match returns empty array

I'm working with mongodb aggregations using mongoose and a I'm doubt what am I doing wrong in my application.
Here is my document:
{
"_id": "5bf6fe505ca52c2088c39a45",
"loc": {
"type": "Point",
"coordinates": [
-43.......,
-19......
]
},
"name": "......",
"friendlyName": "....",
"responsibleName": "....",
"countryIdentification": "0000000000",
"categories": [
"5bf43af0f9b41a21e03ef1f9"
]
"created_at": "2018-11-22T19:06:56.912Z",
"__v": 0
}
At the context of my application I need to search documents by GeoJSON, and I execute this search using geoNear. Ok it works fine! But moreover I need to "match" or "filter" specific "categories" in the document. I think it's possible using $match but certainly I'm doing the things wrong. Here is the code:
CompanyModel.aggregate(
[
{
"$geoNear": {
"near": {
"type": "Point",
"coordinates": [pageOptions.loc.lng, pageOptions.loc.lat]
},
"distanceField": "distance",
"spherical": true,
"maxDistance": pageOptions.distance
}
},
{
"$match": {
categories: { "$in": [pageOptions.category] }
}
}
]
).then(data => {
resolve({ statusCode: 200, data: data });
}).catch(err => {
console.log(err);
reject({ statusCode: 500, error: "Error getting documents", err: err });
})
pageOptions:
var pageOptions = {
loc: {
lat: parseFloat(req.query.lat),
lng: parseFloat(req.query.lng)
},
distance: parseInt(req.query.distance) || 10000,
category: req.params.category || ""
}
If I remove $match I get all the documents by location, but I need to filter specific categories... I don't believe that I need to filter it manually, I believe it can be possible with aggregation functions...
So anyone can help me with this mongoose implementation?
Thanks for all help
In MongoDB you need to make sure that data type in your document matches the type in your query. In this case you have a string stored in the database and you're trying to use ObjectId to build the $match stage. To fix that you can use valueOf() operator on pageOptions.category, try:
{
"$match": {
categories: { "$in": [pageOptions.category.valueOf()] }
}
}

MongoDB Text Search on Large Dataset

I have a book collection with currently 7.7 million records and I have setup a text index as follows that will allow me to search the collection by title and author as follows:
db.book.createIndex( { title: "text", author: "text" }, {sparse: true, background: true, weights: {title: 15, author: 5}, name: "text_index"} )
The problem is when I use a search query that will return a lot of results eg John and then sort by the textScore the time to perform the query is over 60 seconds.
Please see an example query below:
db.runCommand(
{
aggregate: "book",
pipeline : [
{ $match: { $text: { $search: "John" } } },
{ $sort: { score: { $meta: "textScore" } } },
{ $limit: 6 }
],
allowDiskUse : true
}
)
Can anyone suggest a solution to reduce this search time down to a reasonable level?
Many thanks.

$geoWithin not returning anything

I'm trying to use $geoWithin and $centerSpehere to return a list of items within a radius, but no luck.
This is my item's schema:
var ItemSchema = new Schema({
type : String,
coordinates : []
});
ItemSchema.index({coordinates: '2dsphere'});
This is my database item that I should be seeing:
{
"_id": {
"$oid": "552fae4c13f82d0000000002"
},
"type": "Point",
"coordinates": [
6.7786656,
51.2116958
],
"__v": 0
}
This is running on the server currently just to test, the coordinates seen here will eventually be variable.
Item.find( {
coordinates: { $geoWithin: { $centerSphere: [ [ 51, 6 ], 100/6378.1 ] } }
}, function(err, items) {
console.log(items); // undefined
});
Items are always undefined, even though that coordinate is within 100Km from the other coordinate.
I get no errors in the console.
Any ideas of what's happening? Is the schema wrong?
Thanks.
The format's wrong. The GeoJSON needs to live under one field:
{
"location" : {
"type": "Point",
"coordinates": [6.7786656, 51.2116958]
}
}
See e.g. create a 2dsphere index.