Since v3.4 collations are available for find operations, especially as matches for diacritic characters are concerned. While a find query with a definite value ($eq opeartor or corresponding construct) will match letters and correspondent diacritics, the same is not true if a $regex is used in order to achieve a match on a partial search string (a 'LIKE').
Is there a to make the $regex query use the collation the same way than the $eq query?
consider example collection testcoll:
{ "_id" : ObjectId("586b7a0163aff45945462bea"), "city" : "Antwerpen" },
{ "_id" : ObjectId("586b7a0663aff45945462beb"), "city" : "Antwërpen" }
this query will find both records
db.testcoll.find({city: 'antwerpen'}).collation({"locale" : "en_US", "strength" : 1});
the same query using a regex will not (finds the record with 'Antwerpen' only)
db.testcoll.find({city: /antwe/i}).collation({"locale" : "en_US", "strength" : 1});
I face this same issue today and I searched the Internet like crazy trying to find a solution. Didn't find any. So I came up with my on solution, a little frankenstein that worked for me.
I created a function which removes all the special characters from a string and then replaces all the characters that could be special to the equivalent regexp that could be special. In the end I just add a "i" option to cover the capitalize strings in my DB.
export const convertStringToRegexp = (text: string) => {
let regexp = '';
const textNormalized = text
.normalize('NFD')
.replace(/[\u0300-\u036f]/g, '') // remove all accents
.replace(/[|\\{}()[\]^$+*?.]/g, '\\$&') // remove all regexp reserved char
.toLowerCase();
regexp = textNormalized
.replace(/a/g, '[a,á,à,ä,â,ã]')
.replace(/e/g, '[e,é,ë,è,ê]')
.replace(/i/g, '[i,í,ï,ì,î]')
.replace(/o/g, '[o,ó,ö,ò,õ,ô]')
.replace(/u/g, '[u,ü,ú,ù,û]')
.replace(/c/g, '[c,ç]')
.replace(/n/g, '[n,ñ]')
.replace(/[ªº°]/g, '[ªº°]');
return new RegExp(regexp, 'i'); // "i" -> ignore case
};
And in my find() method, I just use this function with $regex option, like this:
db.testcoll.find({city: {$regex: convertStringToRegexp('twerp')} })
/*
Output:
[
{ "_id" : ObjectId("586b7a0163aff45945462bea"), "city" : "Antwerpen" },
{ "_id" : ObjectId("586b7a0663aff45945462beb"), "city" : "Antwërpen" }
]
*/
I also create a .spec.ts file (using Chai) to test this function. Of course you could adapt to Jest.
describe('ConvertStringToRegexp', () => {
it('should convert all "a" to regexp', () => {
expect(convertStringToRegexp('TAÁdaáh!')).to.deep.equal(
/t[a,á,à,ä,â,ã][a,á,à,ä,â,ã]d[a,á,à,ä,â,ã][a,á,à,ä,â,ã]h!/i
);
});
it('should convert all "e" to regexp', () => {
expect(convertStringToRegexp('MEÉeéh!')).to.deep.equal(
/m[e,é,ë,è,ê][e,é,ë,è,ê][e,é,ë,è,ê][e,é,ë,è,ê]h!/i
);
});
it('should convert all "i" to regexp', () => {
expect(convertStringToRegexp('VÍIiishí!')).to.deep.equal(
/v[i,í,ï,ì,î][i,í,ï,ì,î][i,í,ï,ì,î][i,í,ï,ì,î]sh[i,í,ï,ì,î]!/i
);
});
it('should convert all "o" to regexp', () => {
expect(convertStringToRegexp('ÓOoóhhhh!!!!')).to.deep.equal(
/[o,ó,ö,ò,õ,ô][o,ó,ö,ò,õ,ô][o,ó,ö,ò,õ,ô][o,ó,ö,ò,õ,ô]hhhh!!!!/i
);
});
it('should convert all "u" to regexp', () => {
expect(convertStringToRegexp('ÚUhuuúll!')).to.deep.equal(
/[u,ü,ú,ù,û][u,ü,ú,ù,û]h[u,ü,ú,ù,û][u,ü,ú,ù,û][u,ü,ú,ù,û]ll!/i
);
});
it('should convert all "c" to regexp', () => {
expect(convertStringToRegexp('Cacacacaca')).to.deep.equal(
/[c,ç][a,á,à,ä,â,ã][c,ç][a,á,à,ä,â,ã][c,ç][a,á,à,ä,â,ã][c,ç][a,á,à,ä,â,ã][c,ç][a,á,à,ä,â,ã]/i
);
});
it('should remove all special characters', () => {
expect(
convertStringToRegexp('hello 123 °º¶§∞¢£™·ª•*!##$%^WORLD?.')
).to.deep.equal(
/h[e,é,ë,è,ê]ll[o,ó,ö,ò,õ,ô] 123 [ªº°][ªº°]¶§∞¢£™·[ªº°]•\*!##\$%\^w[o,ó,ö,ò,õ,ô]rld\?\./i
);
});
it('should accept all regexp reserved characters', () => {
expect(
convertStringToRegexp('Olá [-[]{}()*+?.,\\o/^$|#s] Mundo! ')
).to.deep.equal(
/* eslint-disable #typescript-eslint/no-explicit-any */
/[o,ó,ö,ò,õ,ô]l[a,á,à,ä,â,ã] \[-\[\]\{\}\(\)\*\+\?\.,\\[o,ó,ö,ò,õ,ô]\/\^\$\|#s\] m[u,ü,ú,ù,û][n,ñ]d[o,ó,ö,ò,õ,ô]! /i
);
});
});
Documentation
Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes.
There is no need to use collation on top of regex. You can functionally implement this behaviour using the correct regex.
Considering to the Antwerpen example the following regex gives you all the matches in the database:
/antw[eë]rpen/i
To generate the above regex you have to regex-replace your search string first using the following replace formula:
str.replace(/e/ig, '[eë]')
And of course you have to do it with all diactric character. Also you can simply use the following library: diacritic-regex.
I'm trying to create a simple search-function in SailsJS (beta), using the waterline ORM for data handling. My function in my controller does the search using the predefined find()-method of waterline.
I wonder, is it possible to send an SQL-regexp pattern to the waterline find()-function to make it fetch all users having a matching REGEXP?
My find-method in my controller contains this (except the regexp functionality, which I'm asking for):
// Replace any spaces with SQL regex wildcards
var regexp = req.body.search.replace(' ', '*');
// Show found users
User.find()
.where({
or: [
{username: { contains: req.body.search }},
{email: { contains: req.body.search }},
{fullName: { regexp: regexp }}
]
}).exec(function(err, users){
if (err){
return res.send(403, 'Failed to find users' + JSON.stringify(err));
}
res.view('userlist', {
users: users
});
});
The fullName attribute is defined as an instance method within the attributes, and is probably not searchable. However, is there a way to make a decent search for names?
This is an extract from my User model:
module.exports = {
attributes: {
firstName: {
type: 'string'
},
lastName: {
type: 'string'
},
fullName: function(){
return this.firstName + ' ' + this.lastName;
},
(...some more attributes, as well as email and username...)
}
}
How do i make the find()-method work with SQL regex? My goal is to make a search for "Hans Eikerol" display "Hans Martin Eikerol" in the result set.
Author Edit:
Thanks for the answer:) However, I managed to find a suitable solution to my problem without having to use regex pattern search. This is my new controller method:
var split = req.body.search.split(' ');
// Assume last piece of search is the lastName, ignore middlenames
var firstName = split[0];
var lastName = split.pop();
// Show found users
User.find()
.where({
or: [
{username: { contains: req.body.search }},
{email: { contains: req.body.search }},
{firstName: { contains: firstName }},
{lastName: { contains: lastName }},
{firstName: {startsWith: firstName}, lastName: lastName}
]
}).exec(function(err, users){
if (err){
return res.send(403, 'Failed to find users' + JSON.stringify(err));
}
res.view('admin/users', {
users: users
});
});
Waterline maps .find() to the underlying find syntax for the database you're connected to. It doesn't support regular expressions because regex is not supported by all of the database adapters. One could argue that since regular expression search comes standard with most mainstream databases these days, that the feature might have a place in Waterline, but in the meantime you can use .query to do a native SQL query. Then you coerce the results back to Waterline models. For example, with sails-mysql:
User.query("[some sql query using RLIKE]", function(err, rows) {
if (err){
return res.send(403, 'Failed to find users' + JSON.stringify(err));
}
res.view('userlist', {
// Map the rows returned by the query to Waterline instances
users: rows.map(function(row){return new User._model(row);});
});
});
thanks a lot for your answer, I have implemented something like this, using waterline on sails:
.find({ $and:[{"userType":"basic"},{"name":{contains:req.params.name}} ] })
.exec(function(err,seller){
if(err){
res.send(err);
}else{
console.log(seller);
res.send(seller)
}
})