Mongo query with regex fails when backslash\newline is there in a field

Mongo query with regex fails when backslash\newline is there in a field - mongodb

Hi I have a field in a user collection called "Address".User saving their address from a textarea in my application. mongodb convert it to new line like following.
{
"_id": ObjectId("56a9ba0ffbe0856d4f8b456d"),
"address": "HOUSE NO. 3157,\r\nSECTOR 50-D",
"pincode": "",
},
{
"_id": ObjectId("56a9ba0ffbe0856d4f8b456d"),
"address": "HOUSE NO. 3257,\r\nSECTOR 50-C",
"pincode": "",
}
So now When I am running a search query on the basis of "address".Like following:
guardianAdd = $dm->getRepository('EduStudentBundle:GuardianAddress')->findBy(array(
'address' => new \MongoRegex('/.*' .$data['address'] . '.*/i'),
'isDelete' => false
));
echo count($guardianAdd);die;
it does not give any result. My Searchi key word is : "HOUSE NO.3157 SECTOR 50-D".
However if I am searching using like: HOUSE NO. 3157 its giving correct result.
Please advice how to fix this.Thanks in advance

First of all, trailing .* are redundant. regexps /.*aaa.*/ and /aaa/ are identical and match the same pattern.
Second, you probably need to use multiline modifier /pattern/im
Finally, it is not quite clear what you want to fix. The best think you can do is to provide some basic explanation of regex syntax in the search form, so users can search properly, e.g. HOUSE NO.*3157.*SECTOR 50-D to get best results.
You can make some bold assumptions and build the pattern with something like
$pattern = implode('\W+',preg_split('/\W+/', $data['address']))
which will give you a regexp HOUSE\W+NO\W+3157\W+SECTOR\W+50\W+D for different kind of HOUSE NO.3157 SECTOR 50-D requests, but it will cut all the regex flexibility available with bare input, and eventually will result with unexpected response anyway. You can follow this slippery slope and end up with your own query DSL to compile to regex, but I doubt it can be any better or more convenient than pure regex. It will be more error prone for sure.
Asking right question to get right answers is true not only on SO, but also in your application. Unfortunately there is no general solution to search for something that people have in mind, but fail to ask. I believe that in your particular case best code is no code.

Related

RESTapi nesting endpoint

Ok, new to RESTapi so not sure if I am using the correct terminology for what I want to ask so bear with me. I believe what I am asking about is nested resources in a service but I want to ask specifically about using it for separating a blob of "closely related" content. It may be easier to provide an example. Let's say I have the following service that could output the following:
/Policy
"data": [ {
"name": "PolicyName1",
"description": "",
"size": 25000,
.... (bunch of other fields)
"specialEnablement": true,
“specialEnablementOptions”: { <-- options below valid only if specialEnablement is true
“optionType”: “TypeII”,
“optionFlagA”: false,
“optionFlagB”: true,
“optionFlagC”: false,
...(bunch of other options here)
}
},
{ . . . }],
The specialEnablementOptions are only used if specialEnablement is 'true'. It is all part of this /Policy service so has no primary key other than the policy "name" (and doesnt make sense to have to generate one) so does not fall under some of the other questions I have been reading about nested resources.
It does make it more readable to separate this set of information since there are 12 or so options but, this is REST so, maybe human readability does not weigh heavily here.
I am being told that, if we do it this way, it makes it more complex to work with during POST/PUT/PATCH commands. Specifically, it is being said in my group that if we do this, we should require two calls....one that creates the policy main information then the user must call a second time to PATCH the specialEnablementOptions (assuming specialEnablement is true). This seems kludgy to me.
I am looking for expert advise on what the best practice is.
My questions:
Does having the specialEnablementOptions nested in this way cause a
lot of complexity. Seems to me that either way we have to verify
that the settings are valid?
Does having the specialEnablementOptions nested in this way require
two calls? In other words, can a user not do a POST/PATCH/PUT for
all the fields including those in the specialEnablementOptions in
one call? We are planning to provide a way for the user to do a
PATCH of just the specialEnablementOptions options without changing
any of the first level for ease of use but is there something that
prevents them from creating or modifying all settings in one call?
Another option is to just get rid of the nested
specialEnablementOptions and put everything at the same level. I
dont have a problem with this but wasn't sure if this was just being
lazy. I dont mind doing this if the consensus is it is the best way
to do it....but I also have a second example that is similar to this
scenario but is a bit more complex where putting everything under the parent level is not really optimal (I will show in the next example)
So, my second example is as follows:
/anotherPolicy
"data": [ {
"name": "APolicyName1",
"description": "",
"count": 123,
"lastModified": "2022-05-17-20.37.27.000000",
[{
"ownerId": 1
"ownerCount": 1818181
"specialFlags": 'ABA'
},
{ . . . }]
},
{ . . . }],
The above 'count' is the total number associated to that policy and then there is a nested resource by owner where the count by owner can be seen..plus maybe other information specific to that owner. The SUM(ownerCount) would equal "count" above it. Does this scenario change any of the answers to the questions above?
I appreciate your help. I found a ton of information and reference on when to use or not use nested endpoints but all the examples seem to orient around subjects that seem like they could easily be separated into two resource...for instance whether to nest /employees under /departments or /comments under /posts. Also, they didn't deal with the complexities of having nested endpoints vs avoiding them. And last, if using nesting is unnecessary as a readability standpoint.

Flutter and google places API results

hope someone can help me with this.
I've managed to return search autocomplete results and for the most part everything is ok and results are almost restricted to the area but ocassionaly I get areas outside of the radius when non of the queried letters match the area.
However I want to apply restrictions to display results that are only in the specified area/radius. I've tried applying strictbounds parameter, but strictbounds combined with types : 'address' is just showing no results or single result. When I remove types the results automatically show only points of interest which I don't need. Only need addresses.
Anyone have any idea whats wrong in this?
Uri uri = Uri.https("maps.googleapis.com", "maps/api/place/autocomplete/json", {
"input": query,
"location": poznanLocation,
"language": "pl",
"radius": searchRadius,
"key": apiKey,
"bounds": "52.22,15.33|53.13,17.36",
"strictbounds": "true",
"types": "address",
"sessiontoken": _sessionToken,
});

Are you sure there are suitable results that should be returned?
Depending on the values of location and radius(*), there may be no suitable results left after adding both strictbounds and types=address.
If results are returned when removing strictbounds, are they strictly within the region defined by location and radius? If they are, please file a bug. Otherwise, that is precisely the intended behavior of strictbounds.
(*) The Place Autocomplete web service will ignore the bounds parameter, because it is not supported.

Array in QueryParams wiremock

I need to pass through queryParams in wireMock
Person=[{age=6,name=AAA}]
I've been getting 404 when I test in the curl request. Aly leads would be appreciated.
Only thing i know is
"Person": {
"matches": "^[a-zA-Z0-9_-]*$"
}
I'm not sure on how to validate the array.

Your regex matching is incorrect. The value of Person is [{age=6,name=AAA}], so your matcher will need to capture all of that to be considered a match. You've used some special characters that need escaping, as well as an invalid capture group (which I don't think we really need here.) Without any more guidance from your post on what you precisely need to match, here's two rough ideas:
"Person": {
"matches": "^\[.*\]$"
}
The above just matches any character, an unlimited amount of times, between brackets.
"Person": {
"matches": "^\[\{age=\d*,name=\w*\}\]$"
}
The above matches any digit character, an unlimited amount of times, as well as any word character, an unlimited amount of times, so long as the rest of the value looks like [{age=,name=}].
I would advise you to take a look at a regex tool to work on forming these. My personal favorite is regex101.

How to escape some characters in postgresql

I have this data in one column in postgresql
{
"geometry":{
"status":"Point",
"coordinates":[
-122.421583,
37.795027
]
},
and i using his query
select * from students where data_json LIKE '%status%' ;
Above query return results but this one does not
select * from students where data_json LIKE '%status:%' ;
How can fix that

Of course the 2nd one doesn't find a match, there's no status: text in the value. I think you wanted:
select * from students where data_json LIKE '%"status":%'
... however, like most cases where you attempt text pattern matching on structured data this is in general a terrible idea that will bite you. Just a couple of problem examples:
{
"somekey": "the value is \"status\": true"
}
... where "status": appears as part of the text value and will match even though it shouldn't, and:
{
status : "blah"
}
where status has no quotes and a space between the quotes and colon. As far as JavaScript is concerned this is the same as "status": but it won't match.
If you're trying to find fields within json or extract fields from json, do it with a json parser. PL/V8 may be of interest, or the json libraries available for tools like pl/perl, pl/pythonu, etc. Future PostgreSQL versions will have functions to get a json key by path, test if a json value exists, etc, but 9.2 does not.
At this point you might be thinking "why don't I use regular expressions". Don't go there, you do not want to try to write a full JSON parser in regex. this blog entry is somewhat relevant.

whoosh doesn't search for short words like "C#"

i am using whoosh to index over 200,000 books. but i have encountered some problems with it.
the whoosh query parser returns NullQuery for words like "C#", "C++" with meta-characters in them and also for some other short words. this words are used in the title and body of some documents so i am not using keyword type for them. i guess the problem is in the analysis or query-parsing phase of searching or indexing but i can't touch my data blindly. can anyone help me to correct this issue. Tnx.
i fixed the problem by creating a StandardAnalyzer with a regex pattern that meets my requirements,here is the regex pattern:
'\w+[#+.\w]*'
this will make tokenizing of fields to be done successfully, and also the searching goes well.
but when i use queries like "some query++*" or "some##*" the parsed query will be a single Every query, just the '*'. also i found that this is not related to my analyzer and this is the Whoosh's default behavior. so here is my new question: is this behavior correct or it is a bug??
note: removing the WildcardPlugin from the query-parser solves this problem but i also need the WildcardPlugin.
now i am using the following code:
from whoosh.util import rcompile
#for matching words like: '.NET', 'C++' and 'C#'
word_pattern = rcompile('(\.|[\w]+)(\.?\w+|#|\+\+)*')
#i don't need words shorter that two characters so i don't change the minsize default
analyzer = analysis.StandardAnalyzer(expression=word_pattern)
... now in my schema:
...
title = fields.TEXT(analyzer=analyzer),
...
this will solve my first problem, yes. but the main problem is in searching. i don't want to let users to search using the Every query or *. but when i parse queries like C++* i end up an Every(*) query. i know that there is some problem but i can't figure out what it is.

I had the same issue and found out that StandardAnalyzer() uses minsize=2 by default. So in your schema, you have to tell it otherwise.
schema = whoosh.fields.Schema(
name = whoosh.fields.TEXT(stored=True, analyzer=whoosh.analysis.StandardAnalyzer(minsize=1)),
# ...
)