MongoDB full text search and lookup operator [duplicate] - mongodb

This question already has an answer here:
How to perform a $text search on a 'joined' collection via $lookup?
(1 answer)
Closed 3 years ago.
Full text search in MongoDB seems to be a nice feature, especially when it comes to the need of high performance search and indexes.
However, I am wondering why a full text search is not allowed over more than one collection.
For example: I have a supplier and a products collection (every supplier has n products) I would really like to do a search over all products, but the name of the supplier should also be matched.
Right now I am doing this with a $lookup and then a match with a regular expression. This works, but its very slowly (500 - 800ms).
If I used a full text search with an index, this would increase performance significantly. But the $text-operator has to be the first stage in the aggregation pipeline, not allowing me to use a $lookup first.
(See restrictions-section: https://docs.mongodb.com/manual/reference/operator/query/text/#op._S_text)
Any ideas how I could increase the speed of a text search over multiple collections?

for someone who still looking for solution
db.getCollection('projects').aggregate([{
"$match": {
"$text": {"$search": query }
}}, {
"$lookup": {
"from": "users",
"localField": "uuid",
"foreignField": "uuid",
"as": "user"
}}, {
"$sort": {
"score": { "$meta": "textScore" }
}
}]);
where query is a text search string.
This query is for projects collection with key uuid, which refers to users collection with uuid key, all sorted by relevance.

Related

Get the document with the maximum value of a field when all documents are referenced through a foreign key in Mongo

I have 2 collections, one for experiments containing experiment name and run number (an experiment can be run multiple times), and one for some metrics/values logged for a specific experiment-run pair.
experiments: {
experiment_name: str
run: int
}
metrics: {
experiment: experiments.ID
metric_name: str
metric_value: float
What I would like is, for a given value of metric_name, find the maximum value of run from the referenced experiment that logged the metric_name.
One way I can gather is to first get all documents in metrics that contain the specific metric_name. Then iterate through the referenced experiments.IDs, materialize the corresponding list of experiments documents, and find the maximum value of run.
But this seems tedious and inelegant (and slow). Is there a faster way, or some built-in foreign key pattern reference I can use? (Also, would appreciate any solutions/links in pymongo specifically, if possible).
You could let the MongoDB server do the tedious work for you doing something like this.
N.B.: The pipeline should be directly usable by pymongo.
db.metrics.aggregate([
{
"$match": {
"metric_name": your_metric_name
}
},
{
"$lookup": {
"from": "experiments",
"localField": "experiment",
"foreignField": "_id",
"as": "experiment"
}
},
{"$unwind": "$experiment"},
{
"$group": {
"_id": "$experiment._id",
"metric_name": {"$first": "$metric_name"},
"experiment_name": {"$first": "$experiment.experiment_name"},
"max_run": {"$max": "$experiment.run"}
}
}
])
Try it on mongoplayground.net with a bit of nonsensical data.

How to search for text in a MongoDB collection based on data stored in another collection?

Say we have 2 collections in a Mongo Atlas database.
Users
Inventory
Users has name and userId
Inventory has inventoryId, userId
I want to search for all the inventory items by name of the user. There can be multiple entries in the user table with the same name. What is an efficient way to do this, which can also handle a lot of documents in these 2 collections?
In general, if you want to search a lot of documents, in two collections, where text is one of the filter criteria (in this case name), the best solution is often lookup.
Here is an example modified from the sample_mflix dataset on the steps you would need to cover in your aggregation pipeline:
var db = Products,
joinSearchQuery = db.Users.aggregate([
{
'$search': {
'text': {
'query': 'Ned Stark', // could also be autocomplete to be a nicer feature
'path': 'name'
}
}
}, {
'$lookup': {
'from': 'Inventory',
'localField': 'userId',
'foreignField': 'userId',
'as': 'inventoryFromUser',
'pipeline': [
...
]
}
}
]);
What's interesting about this query is that the name search aspect could be really expensive and crappy if just a regular text query, as searches are best done using Atlas Search. If there is an interactive search form, autocomplete using $search could also be interesting. There's a free forever tier so it doesn't cost money unless it's very big.
From what I know the most efficient way is to use $lookup, but it is only available in a stage of an aggregation pipeline
mongo.collection('users').aggregate([
...
{$lookup: {
from: "inventory", //name of the collection
localField: "userId", //localField
foreignField: "userId", //foreignField
as: "inventory". //as <=> name in the document
}},
...
]).toArray(),
This kind of doing usually require data manipulation because the new field added is an array.

what is the difference between MongoDB find and aggregate in below queries?

select records using aggregate:
db.getCollection('stock_records').aggregate(
[
{
"$project": {
"info.created_date": 1,
"info.store_id": 1,
"info.store_name": 1,
"_id": 1
}
},
{
"$match": {
"$and": [
{
"info.store_id": "563dcf3465512285781608802a"
},
{
"info.created_date": {
$gt: ISODate("2021-07-18T21:07:42.313+00:00")
}
}
]
}
}
])
select records using find:
db.getCollection('stock_records').find(
{
'info.store_id':'563dcf3465512285781608802a',
'info.created_date':{ $gt:ISODate('2021-07-18T21:07:42.313+00:00')}
})
What is difference between these queries and which is best for select by id and date condition?
I think your question should be rephrased to "what's the difference between find and aggregate".
Before I dive into that I will say that both commands are similar and will perform generally the same at scale. If you want specific differences is that you did not add a project option to your find query so it will return the full document.
Regarding which is better, generally speaking unless you need a specific aggregation operator it's best to use find instead, it performs better
Now why is the aggregation framework performance "worse"? it's simple. it just does "more".
Any pipeline stage needs aggregation to fetch the BSON for the document then convert them to internal objects in the pipeline for processing - then at the end of the pipeline they are converted back to BSON and sent to the client.
This, especially for large queries has a very significant overhead compared to a find where the BSON is just sent back to the client.
Because of this, if you could execute your aggregation as a find query, you should.
Aggregation is slower than find.
In your example, Aggregation
In the first stage, you are returning all the documents with projected fields
For example, if your collection has 1000 documents, you are returning all 1000 documents each having specified projection fields. This will impact the performance of your query.
Now in the second stage, You are filtering the documents that match the query filter.
For example, out of 1000 documents from the stage 1 you select only few documents
In your example, find
First, you are filtering the documents that match the query filter.
For example, if your collection has 1000 documents, you are returning only the documents that match the query condition.
Here You did not specify the fields to return in the documents that match the query filter. Therefore the returned documents will have all fields.
You can use projection in find, instead of using aggregation
db.getCollection('stock_records').find(
{
'info.store_id': '563dcf3465512285781608802a',
'info.created_date': {
$gt: ISODate('2021-07-18T21:07:42.313+00:00')
}
},
{
"info.created_date": 1,
"info.store_id": 1,
"info.store_name": 1,
"_id": 1
}
)

How can I join a prefix Id like pat_XXXXXXXX to ObjectID(XXXXXXX) in MongoDB

I am trying to get the email address from an patient table in mongoDB compass and I need to join it with an patientId in the orders collection that looks like patientId : "pat_XXXXXXX" to the ID in the Orders collection that looks like _id : ObjectId('XXXXXXXX'). The problem I have is when I do a $lookup function the array returns null.
{
"$lookup": {
"from": "patients",
"localField": "patientId",
"foreignField": "_id",
"as": "patients"
}}
Then when I tried to use the function $toObjectId to change the patientId to an Object like this { "addFields": { "userObjectId": { "$toObjectId": "patientId" }}}, I get the error Failed to optimize pipeline :: caused by :: Failed to parse objectId 'patientId' in $convert with no onError value: Invalid string length for parsing to OID, expected 24 but found 9. The patientId length is 24 characters after the pat_ part so I think that is what the problem is. Is there a way to join an Id with a prefix to an ObjectId?
I managed to fix the problem by doing the following,
Go to the Patients Collection and choose the aggregations section.
Click add stage and Choose the option $addFields, then create a variable patientId where you covert the Object Id to string and concatenate pat_ the front of the Id.
{patient_Id: {$concat : ["pat_", {$toString : "$_id"}]}}
Then add another stage and choose the option $lookup. This will connect the two collections of the patients and orders. This can be done using the following
{
from: 'orders',
localField: 'patient_Id',
foreignField: 'patientId',
as: 'orders'
}
Add another stage $unwind this will unpack the orders collection allowing you to use the variables there.
{
path: "$orders"
}

Mongo db using result of query in another query using $in

I have the following model in mongo db:
User collection
{
_id:12345,
name:"Joe",
age:15,
}
Addresses collection
{
_id:7663,
userId:12345,
Street:"xyz",
number:"1235",
city:"New York",
state:"NY"
}
Now I want to get all the addresses of users above the age of 20. What I thought was to query all the ids of users above 20 and with the result of this query use the $in operator to find the addresses.
My question is, is there a way to turn this into one query? Is there a better way to query this?
(obs: this is just an example, with my problem I cannot embed addresses into users)
In Mongo shell you can use the result of one query in another. For example:
use database // the name of your database
db.coll1.find({_id:{$nin:db.coll2.distinct("coll1_id")}})
Here collection coll1 contains an _id field. Then you can check for any ids that are not in collection coll2's list of the coll1_id field. So this is a way to clean up two tables if you have records in coll1 which have no reference via the coll1_id field in coll2.
Another approach doing the same thing:
use database // the name of your database
temp = db.coll2.distinct("coll1_id");
db.coll1.find({_id:{$nin:temp}})
The first example does it in one command, the second does it in two, but the concept is the same. Using results from one query in another. Many different ways to do this. Also, the .toArray() method can be useful to create arrays if you're doing more than just using distinct().
Use the aggregation framework where the $lookup pipeline stage provides the functionality to join the two collections:
db.user.aggregate([
{ "$match": { "age": { "$gt": 20 } } },
{
"$lookup": {
"from": "addresses",
"localField": "_id",
"foreignField": "userId",
"as": "address"
}
}
])
The above will create a new array field called address (as specified in the $lookup as option) and this contains the matching documents from the from collection addresses. If the specified name already exists in the input document, the existing field is overwritten.