Find records where field ends with other field - mongodb

I have title, director and englishTitle fields.
{
title: "Iron Man",
director: "Someone Important",
englishTitle: "Iron Man Someone Important"
}
I need to find all the records that have englishTitle ending with director's value.
How can I perform such query with MongoDB?

As described here, you can use regex : https://docs.mongodb.com/manual/reference/operator/query/regex/
In your case it would be
{ englishTitle: { $regex: /^.*director$/ } }
For finding the value of director, I suppose you can use "$where"
https://docs.mongodb.com/manual/reference/operator/query/where/
db.myCollection.find( function() {
var possibleDirector = this.englishTitle.substr(this.englishTitle.length - 1 - this.director.length);
return (possibleDirector === this.director);
} );
(maybe it would require little polishing like checking the length to not obtaint negative value in substr)

Related

How to guarantee unique primary key with one update query

In my Movie schema, I have a field "release_date" who can contain nested subdocuments.
These subdocuments contains three fields :
country_code
date
details
I need to guarantee the first two fields are unique (primary key).
I first tried to set a unique index. But I finally realized that MongoDB does not support unique indexes on subdocuments.
Index is created, but validation does not trigger, and I can still add duplicates.
Then, I tried to modify my update function to prevent duplicates, as explained in this article (see Workarounds) : http://joegornick.com/2012/10/25/mongodb-unique-indexes-on-single-embedded-documents/
$ne works well but in my case, I have a combination of two fields, and it's a way more complicated...
$addToSet is nice, but not exactly what I am searching for, because "details" field can be not unique.
I also tried plugin like mongoose-unique-validator, but it does not work with subdocuments ...
I finally ended up with two queries. One for searching existing subdocument, another to add a subdocument if the previous query returns no document.
insertReleaseDate: async(root, args) => {
const { movieId, fields } = args
// Searching for an existing primary key
const document = await Movie.find(
{
_id: movieId,
release_date: {
$elemMatch: {
country_code: fields.country_code,
date: fields.date
}
}
}
)
if (document.length > 0) {
throw new Error('Duplicate error')
}
// Updating the document
const response = await Movie.updateOne(
{ _id: movieId },
{ $push: { release_date: fields } }
)
return response
}
This code works fine, but I would have preferred to use only one query.
Any idea ? I don't understand why it's so complicated as it should be a common usage.
Thanks RichieK for your answer ! It's working great.
Just take care to put the field name before "$not" like this :
insertReleaseDate: async(root, args) => {
const { movieId, fields } = args
const response = await Movie.updateOne(
{
_id: movieId,
release_date: {
$not: {
$elemMatch: {
country_code: fields.country_code,
date: fields.date
}
}
}
},
{ $push: { release_date: fields } }
)
return formatResponse(response, movieId)
}
Thanks a lot !

MongoDB Full and Partial Text Search

Env:
MongoDB (3.2.0) with Mongoose
Collection:
users
Text Index creation:
BasicDBObject keys = new BasicDBObject();
keys.put("name","text");
BasicDBObject options = new BasicDBObject();
options.put("name", "userTextSearch");
options.put("unique", Boolean.FALSE);
options.put("background", Boolean.TRUE);
userCollection.createIndex(keys, options); // using MongoTemplate
Document:
{"name":"LEONEL"}
Queries:
db.users.find( { "$text" : { "$search" : "LEONEL" } } ) => FOUND
db.users.find( { "$text" : { "$search" : "leonel" } } ) => FOUND (search caseSensitive is false)
db.users.find( { "$text" : { "$search" : "LEONÉL" } } ) => FOUND (search with diacriticSensitive is false)
db.users.find( { "$text" : { "$search" : "LEONE" } } ) => FOUND (Partial search)
db.users.find( { "$text" : { "$search" : "LEO" } } ) => NOT FOUND (Partial search)
db.users.find( { "$text" : { "$search" : "L" } } ) => NOT FOUND (Partial search)
Any idea why I get 0 results using as query "LEO" or "L"?
Regex with Text Index Search is not allowed.
db.getCollection('users')
.find( { "$text" : { "$search" : "/LEO/i",
"$caseSensitive": false,
"$diacriticSensitive": false }} )
.count() // 0 results
db.getCollection('users')
.find( { "$text" : { "$search" : "LEO",
"$caseSensitive": false,
"$diacriticSensitive": false }} )
.count() // 0 results
MongoDB Documentation:
Text Search
$text
Text Indexes
Improve Text Indexes to support partial word match
As at MongoDB 3.4, the text search feature is designed to support case-insensitive searches on text content with language-specific rules for stopwords and stemming. Stemming rules for supported languages are based on standard algorithms which generally handle common verbs and nouns but are unaware of proper nouns.
There is no explicit support for partial or fuzzy matches, but terms that stem to a similar result may appear to be working as such. For example: "taste", "tastes", and tasteful" all stem to "tast". Try the Snowball Stemming Demo page to experiment with more words and stemming algorithms.
Your results that match are all variations on the same word "LEONEL", and vary only by case and diacritic. Unless "LEONEL" can be stemmed to something shorter by the rules of your selected language, these are the only type of variations that will match.
If you want to do efficient partial matches you'll need to take a different approach. For some helpful ideas see:
Efficient Techniques for Fuzzy and Partial matching in MongoDB by John Page
Efficient Partial Keyword Searches by James Tan
There is a relevant improvement request you can watch/upvote in the MongoDB issue tracker: SERVER-15090: Improve Text Indexes to support partial word match.
As Mongo currently does not supports partial search by default...
I created a simple static method.
import mongoose from 'mongoose'
const PostSchema = new mongoose.Schema({
title: { type: String, default: '', trim: true },
body: { type: String, default: '', trim: true },
});
PostSchema.index({ title: "text", body: "text",},
{ weights: { title: 5, body: 3, } })
PostSchema.statics = {
searchPartial: function(q, callback) {
return this.find({
$or: [
{ "title": new RegExp(q, "gi") },
{ "body": new RegExp(q, "gi") },
]
}, callback);
},
searchFull: function (q, callback) {
return this.find({
$text: { $search: q, $caseSensitive: false }
}, callback)
},
search: function(q, callback) {
this.searchFull(q, (err, data) => {
if (err) return callback(err, data);
if (!err && data.length) return callback(err, data);
if (!err && data.length === 0) return this.searchPartial(q, callback);
});
},
}
export default mongoose.models.Post || mongoose.model('Post', PostSchema)
How to use:
import Post from '../models/post'
Post.search('Firs', function(err, data) {
console.log(data);
})
Without creating index, we could simply use:
db.users.find({ name: /<full_or_partial_text>/i}) (case insensitive)
If you want to use all the benefits of MongoDB's full-text search AND want partial matches (maybe for auto-complete), the n-gram based approach mentioned by Shrikant Prabhu was the right solution for me. Obviously your mileage may vary, and this might not be practical when indexing huge documents.
In my case I mainly needed the partial matches to work for just the title field (and a few other short fields) of my documents.
I used an edge n-gram approach. What does that mean? In short, you turn a string like "Mississippi River" into a string like "Mis Miss Missi Missis Mississ Mississi Mississip Mississipp Mississippi Riv Rive River".
Inspired by this code by Liu Gen, I came up with this method:
function createEdgeNGrams(str) {
if (str && str.length > 3) {
const minGram = 3
const maxGram = str.length
return str.split(" ").reduce((ngrams, token) => {
if (token.length > minGram) {
for (let i = minGram; i <= maxGram && i <= token.length; ++i) {
ngrams = [...ngrams, token.substr(0, i)]
}
} else {
ngrams = [...ngrams, token]
}
return ngrams
}, []).join(" ")
}
return str
}
let res = createEdgeNGrams("Mississippi River")
console.log(res)
Now to make use of this in Mongo, I add a searchTitle field to my documents and set its value by converting the actual title field into edge n-grams with the above function. I also create a "text" index for the searchTitle field.
I then exclude the searchTitle field from my search results by using a projection:
db.collection('my-collection')
.find({ $text: { $search: mySearchTerm } }, { projection: { searchTitle: 0 } })
I wrapped #Ricardo Canelas' answer in a mongoose plugin here on npm
Two changes made:
- Uses promises
- Search on any field with type String
Here's the important source code:
// mongoose-partial-full-search
module.exports = exports = function addPartialFullSearch(schema, options) {
schema.statics = {
...schema.statics,
makePartialSearchQueries: function (q) {
if (!q) return {};
const $or = Object.entries(this.schema.paths).reduce((queries, [path, val]) => {
val.instance == "String" &&
queries.push({
[path]: new RegExp(q, "gi")
});
return queries;
}, []);
return { $or }
},
searchPartial: function (q, opts) {
return this.find(this.makePartialSearchQueries(q), opts);
},
searchFull: function (q, opts) {
return this.find({
$text: {
$search: q
}
}, opts);
},
search: function (q, opts) {
return this.searchFull(q, opts).then(data => {
return data.length ? data : this.searchPartial(q, opts);
});
}
}
}
exports.version = require('../package').version;
Usage
// PostSchema.js
import addPartialFullSearch from 'mongoose-partial-full-search';
PostSchema.plugin(addPartialFullSearch);
// some other file.js
import Post from '../wherever/models/post'
Post.search('Firs').then(data => console.log(data);)
If you are using a variable to store the string or value to be searched:
It will work with the Regex, as:
{ collection.find({ name of Mongodb field: new RegExp(variable_name, 'i') }
Here, the I is for the ignore-case option
The quick and dirty solution, that worked for me: use text search first, if nothing is found, then make another query with a regexp. In case you don't want to make two queries - $or works too, but requires all fields in query to be indexed.
Also, you'd better not to use case-insensitive rx, because it can't rely on indexes. In my case I've made lowercase copies of used fields.
Good n-gram based approach for fuzzy matching is explained here
(Also explains how to score higher for Results using prefix Matching)
https://medium.com/xeneta/fuzzy-search-with-mongodb-and-python-57103928ee5d
Note : n-gram based approaches can be storage extensive and mongodb collection size will increase.
I create an additional field which combines all the fields within a document that I want to search. Then I just use regex:
user = {
firstName: 'Bob',
lastName: 'Smith',
address: {
street: 'First Ave',
city: 'New York City',
}
notes: 'Bob knows Mary'
}
// add combined search field with '+' separator to preserve spaces
user.searchString = `${user.firstName}+${user.lastName}+${user.address.street}+${user.address.city}+${user.notes}`
db.users.find({searchString: {$regex: 'mar', $options: 'i'}})
// returns Bob because 'mar' matches his notes field
// TODO write a client-side function to highlight the matching fragments
full/partial search in MongodB for a "pure" Meteor-project
I adpated flash's code to use it with Meteor-Collections and simpleSchema but without mongoose (means: remove the use of .plugin()-method and schema.path (altough that looks to be a simpleSchema-attribute in flash's code, it did not resolve for me)) and returing the result array instead of a cursor.
Thought that this might help someone, so I share it.
export function partialFullTextSearch(meteorCollection, searchString) {
// builds an "or"-mongoDB-query for all fields with type "String" with a regEx as search parameter
const makePartialSearchQueries = () => {
if (!searchString) return {};
const $or = Object.entries(meteorCollection.simpleSchema().schema())
.reduce((queries, [name, def]) => {
def.type.definitions.some(t => t.type === String) &&
queries.push({[name]: new RegExp(searchString, "gi")});
return queries
}, []);
return {$or}
};
// returns a promise with result as array
const searchPartial = () => meteorCollection.rawCollection()
.find(makePartialSearchQueries(searchString)).toArray();
// returns a promise with result as array
const searchFull = () => meteorCollection.rawCollection()
.find({$text: {$search: searchString}}).toArray();
return searchFull().then(result => {
if (result.length === 0) throw null
else return result
}).catch(() => searchPartial());
}
This returns a Promise, so call it like this (i.e. as a return of a async Meteor-Method searchContact on serverside).
It implies that you attached a simpleSchema to your collection before calling this method.
return partialFullTextSearch(Contacts, searchString).then(result => result);
import re
db.collection.find({"$or": [{"your field name": re.compile(text, re.IGNORECASE)},{"your field name": re.compile(text, re.IGNORECASE)}]})

MongoDB - query embedded documents [duplicate]

I imported some sort-of sloppy XML data into a Mongo database. Each Document has nested sub-documents to a depth of around 5-10. I would like to find() documents that have a particular value of a particular field, where the field may appear at any depth in the sub-documents (and may appear multiple times).
I am currently pulling each Document into Python and then searching that dictionary, but it would be nice if I could state a filter prototype where the database would only return documents that have a particular value of the field name somewhere in their contents.
Here is an example document:
{
"foo": 1,
"bar": 2,
"find-this": "Yes!",
"stuff": {
"baz": 3,
"gobble": [
"wibble",
"wobble",
{
"all-fall-down": 4,
"find-this": "please find me"
}
],
"plugh": {
"plove": {
"find-this": "Here too!"
}
}
}
}
So, I'd like to find documents that have a "find-this" field, and (if possible) to be able to find documents that have a particular value of a "find-this" field.
You are right in the certain statement of a BSON document is not an XML document. Since XML is loaded into a tree structure that comprises of "nodes", searching on an arbitary key is quite easy.
A MonoDB document is not so simple to process, and this is a "database" in many respects, so it is generally expected to have a certain "uniformity" of data locations in order to make it easy to both "index" and search.
Nonetheless, it can be done. But of course this does mean a recursive process executing on the server and this means JavaScript processing with $where.
As a basic shell example, but the general function is just a string argument to the $where operator everywhere else:
db.collection.find(
function () {
var findKey = "find-this",
findVal = "please find me";
function inspectObj(doc) {
return Object.keys(doc).some(function(key) {
if ( typeof(doc[key]) == "object" ) {
return inspectObj(doc[key]);
} else {
return ( key == findKey && doc[key] == findVal );
}
});
}
return inspectObj(this);
}
)
So basically, test the keys present in the object to see if they match the desired "field name" and content. If one of those keys happens to be an "object" then recurse into the function and inspect again.
JavaScript .some() makes sure that the "first" match found will return from the search function giving a true result and returning the object where that "key/value" was present at some depth.
Note that $where essentially means traversing your whole collection unless there is some other valid query filter than can be applied to an "index" on the collection.
So use with care, or not at all and just work with re-structring the data into a more workable form.
But this will give you your match.
Here is one example, which I use for recursive search for Key-Value anywhere in document structure:
db.getCollection('myCollection').find({
"$where" : function(){
var searchKey = 'find-this';
var searchValue = 'please find me';
return searchInObj(obj);
function searchInObj(obj){
for(var k in obj){
if(typeof obj[k] == 'object' && obj[k] !== null){
if(searchInObj(obj[k])){
return true;
}
} else {
if(k == searchKey && obj[k] == searchValue){
return true;
}
}
}
return false;
}
}
})

Refer to subfields without specifing name in a mongodb [duplicate]

I imported some sort-of sloppy XML data into a Mongo database. Each Document has nested sub-documents to a depth of around 5-10. I would like to find() documents that have a particular value of a particular field, where the field may appear at any depth in the sub-documents (and may appear multiple times).
I am currently pulling each Document into Python and then searching that dictionary, but it would be nice if I could state a filter prototype where the database would only return documents that have a particular value of the field name somewhere in their contents.
Here is an example document:
{
"foo": 1,
"bar": 2,
"find-this": "Yes!",
"stuff": {
"baz": 3,
"gobble": [
"wibble",
"wobble",
{
"all-fall-down": 4,
"find-this": "please find me"
}
],
"plugh": {
"plove": {
"find-this": "Here too!"
}
}
}
}
So, I'd like to find documents that have a "find-this" field, and (if possible) to be able to find documents that have a particular value of a "find-this" field.
You are right in the certain statement of a BSON document is not an XML document. Since XML is loaded into a tree structure that comprises of "nodes", searching on an arbitary key is quite easy.
A MonoDB document is not so simple to process, and this is a "database" in many respects, so it is generally expected to have a certain "uniformity" of data locations in order to make it easy to both "index" and search.
Nonetheless, it can be done. But of course this does mean a recursive process executing on the server and this means JavaScript processing with $where.
As a basic shell example, but the general function is just a string argument to the $where operator everywhere else:
db.collection.find(
function () {
var findKey = "find-this",
findVal = "please find me";
function inspectObj(doc) {
return Object.keys(doc).some(function(key) {
if ( typeof(doc[key]) == "object" ) {
return inspectObj(doc[key]);
} else {
return ( key == findKey && doc[key] == findVal );
}
});
}
return inspectObj(this);
}
)
So basically, test the keys present in the object to see if they match the desired "field name" and content. If one of those keys happens to be an "object" then recurse into the function and inspect again.
JavaScript .some() makes sure that the "first" match found will return from the search function giving a true result and returning the object where that "key/value" was present at some depth.
Note that $where essentially means traversing your whole collection unless there is some other valid query filter than can be applied to an "index" on the collection.
So use with care, or not at all and just work with re-structring the data into a more workable form.
But this will give you your match.
Here is one example, which I use for recursive search for Key-Value anywhere in document structure:
db.getCollection('myCollection').find({
"$where" : function(){
var searchKey = 'find-this';
var searchValue = 'please find me';
return searchInObj(obj);
function searchInObj(obj){
for(var k in obj){
if(typeof obj[k] == 'object' && obj[k] !== null){
if(searchInObj(obj[k])){
return true;
}
} else {
if(k == searchKey && obj[k] == searchValue){
return true;
}
}
}
return false;
}
}
})

How to search for text or expression in multiple fields

db.movies.find({"original_title" : {$regex: input_data, $options:'i'}}, function (err, datares){
if (err || datares == false) {
db.movies.find({"release_date" : {$regex: input_data + ".*", $options:'i'}}, function (err, datares){
if(err || datares == false){
db.movies.find({"cast" : {$regex: input_data, $options:'i'}}, function (err, datares){
if(err || datares == false){
db.movies.find({"writers" : {$regex: input_data, $options:'i'}}, function (err, datares){
if(err || datares == false){
db.movies.find({"genres.name" : {$regex: input_data, $options:'i'}}, function (err, datares){
if(err || datares == false){
db.movies.find({"directors" : {$regex: input_data, $options:'i'}}, function (err, datares){
if(err || datares == false){
res.status(451);
res.json({
"status" : 451,
"error code": "dataNotFound",
"description" : "Invalid Data Entry."
});
return;
} else{
res.json(datares);
return;
}
});
} else {
res.json(datares);
return;
}
});
} else {
res.json(datares);
return;
}
});
} else {
res.json(datares);
return;
}
});
} else {
res.json(datares);
return;
}
});
} else {
res.json(datares);
return;
}
});
I am trying to implement a so called "all-in-one" search so that whenever a user types in any kind of movie related information, my application tries to return all relevant information. However I have noticed that this transaction might be expensive on the backend and sometimes the host is really slow.
How do I smoothly close the db connection and where should I use it?
I read here that it is best not to close a mongodb connection in node.js >>Why is it recommended not to close a MongoDB connection anywhere in Node.js code?
Is the a proper way to implement a all-in-one search kind of a thing by using nested find commands?
Your current approach is full of problems and is not necessary to do this way. All you are trying to do is search for what a can gather is a plain string within a number of fields in the same collection. It may possibly be a regular expression construct but I'm basing two possibilities on a plain text search that is case insensitive.
Now I am not sure if you came to running one query dependant on the results of another because you didn't know another way or though it would be better. Trust me on this, that is not a better approach than anything listed here nor is it really required as will be shown:
Regex query all at once
The first basic option here is to continue your $regex search but just in a singular query with the $or operator:
db.movies.find(
{
"$or": [
{ "original_title" : { "$regex": input_data, "$options":"i"} },
{ "release_date" : { "$regex": input_data, "$options":"i"} },
{ "cast" : { "$regex": input_data, "$options":"i"} },
{ "writers" : { "$regex": input_data, "$options":"i"} },
{ "genres.name" : { "$regex": input_data, "$options":"i"} },
{ "directors" : { "$regex": input_data, "$options":"i"} }
]
},
function(err,result) {
if(err) {
// respond error
} else {
// respond with data or empty
}
}
);
The $or condition here effectively works like "combining queries" as each argument is treated as a query in itself as far as document selection goes. Since it is one query than all the results are naturally together.
Full text Query, multiple fields
If you are not really using a "regular expression" built from regular expression operations i.e ^(\d+)\bword$, then you are probably better off using the "text search" capabilities of MongoDB. This approach is fine as long as you are not looking for things that would be generally excluded, but your data structure and subject actually suggests this is the best option for what you are likely doing here.
In order to be able to perform a text search, you first need to create a "text index", specifically here you want the index to span multiple fields in your document. Dropping into the shell for this is probably easiest:
db.movies.createIndex({
"original_title": "text",
"release_date": "text",
"cast" : "text",
"writers" : "text",
"genres.name" : "text",
"directors" : "text"
})
There is also an option to assign a "weight" to fields within the index as you can read in the documentation. Assigning a weight give "priority" to the terms listed in the search for the field that match in. For example "directors" might be assigned more "weight" than "cast" and matches for "Quentin Tarantino" would therefore "rank higher" in the results where he was a director ( and also a cast member ) of the movie and not just a cast member ( as in most Robert Rodriguez films ).
But with this in place, performing the query itself is very simple:
db.movies.find(
{ "$text": { "$search": input_data } },
function(err,result) {
if(err) {
// respond error
} else {
// respond with data or empty
}
}
);
Almost too simple really, but that is all there is to it. The $text query operator knows to use the required index ( there can only be one text index per collection ) and it will just then look through all of the defined fields.
This is why I think this is the best fit for your use case here.
Parallel Queries
The final alternate I'll give here is you still want to demand that you need to run separate queries. I still deny that you do need to only query if the previous query does not return results, and I also re-assert that the above options should be considered "first", with preference to text search.
Writing dependant or chained asynchronous functions is a pain, and very messy. Therefore I suggest leaning a little help from another library dependency and using the node-async module here.
This provides an aync.map.() method, which is perfectly suited to "combining" results by running things in parallel:
var fields = [
"original_title",
"release_date",
"cast",
"writers",
"genres.name",
"directors"
];
async.map(
fields,
function(field,callback) {
var search = {},
cond = { "$regex": input_data, "$options": "i" };
search[field] = cond; // assigns the field to search
db.movies.find(search,callback);
},
function(err,result) {
if(err) {
// respond error
} else {
// respond with data or empty
}
}
);
And again, that is it. The .map() operator takes each field and transposes that into the query which in turn returns it's results. Those results are then accessible after all queries are run in the final section, "combined" as if they were a single result set, just as the other alternates do here.
There is also a .mapSeries() variant that runs each query in series, or .mapLimit() if you are otherwise worried about using database connections and concurrent tasks, but for this small size this should not be a problem.
I really don't think that this option is necessary, however if the Case 1 regular expression statements still apply, this "may" possibly provide a little performance benefit due to running queries in parallel, but at the cost of increased memory and resource consumption in your application.
Anyhow, the round up here is "Don't do what you are doing", you don't need to and there are better ways to handle the task you want to achieve. And all of them are mean cleaner and easier to code.