Find the documents which contains a specific value in a array - mongodb

I want to find documents that contain a single - (symbol).
occupationalCategory array consists of single - (symbol) instead of a double on specific employerId.
wrongly inserted with (single - symbol)
"occupationalCategory" : [
"15-1132.00 - Software Developers, Applications"
],
its should be : (double -- symbol)
"occupationalCategory" : [
"15-1132.00 -- Software Developers, Applications"
]
Please help me to get those documents.

As you mentioned that the string pattern is consistent, you can use regex to match the string pattern.
^\d+-\d+.\d{2} - [\w\s]+, \w+$
^ - Start with
\d - Match with digit
+ - Refer previous character/symbol with at least 1 occurrence
- - - Symbol
\d+.\d{2} - Match for decimal pattern
\w - Word character
\s - Spacing character
$ - Match End
Sample Regex 101 & Test Output
db.collection.find({
"occupationalCategory": {
$regex: "\\d+-\\d+.\\d{2} - [\\w\\s]+, \\w+",
$options: "m"
}
})
Sample Mongo Playground

Related

Extracting range of unpadded string

I'd like to extract the Range<String.Index> of a sentence within its whitespace padding. For example,
let padded = " El águila (🦅). "
let sentenceRangeInPadded = ???
assert(padded[sentenceRangeInPadded] == "El águila (🦅).") // The test!
Here's some regex that I started with, but looks like variable length lookbehinds aren't supported.
let sentenceRangeInPadded = padded.range(of: #"(?<=^\s*).*?(?=\s*$)"#, options: .regularExpression)!
I'm not looking to extract the sentence (could just use trimmingCharacters(in:) for that), just the Range.
Thanks for reading!
You may use
#"(?s)\S(?:.*\S)?"#
See the regex demo.
Details
(?s) - a DOTALL modifier making . match any char, including line break chars
\S - the first non-whitespace char
(?:.*\S)? - an optional non-capturing group matching
.* - any 0+ chars as many as possible
\S - up to the last non-whitespace char.

Avoiding duplicate items in a comma-separated list of two-letter words

I need to write a regex which allows a group of 2 chars only once. This is my current regex :
^([A-Z]{2},)*([A-Z]{2}){1}$
This allows me to validate something like this :
AL,RA,IS,GD
AL
AL,RA
The problem is that it validates also AL,AL and AL,RA,AL.
EDIT
Here there are more details.
What is allowed:
AL,RA,GD
AL
AL,RA
AL,IS,GD
What it shouldn't be allowed:
AL,RA,AL
AL,AL
AL,RA,RA
AL,IS,AL
IS,IS,AL
IS,GD,GD
IS,GD,IS
I need that every group of two characters appears only once in the sequence.
Try something like this expression:
/^(?:,?(\b\w{2}\b)(?!.*\1))+$/gm
I have no knowledge of swift, so take it with a grain of salt. The idea is basically to only match a whole line while making sure that no single matched group occurs at a later point in the line.
First of all, let's shorten your pattern. It can be easily achieved since the length of each comma-separated item is fixed and the list items are only made up of uppercase ASCII letters. So, your pattern can be written as ^(?:[A-Z]{2}(?:,\b)?)+$. See this regex demo.
Now, you need to add a negative lookahead that will check the string for any repeating two-letter sequence at any distance from the start of string, and within any distance between each. Use
^(?!.*\b([A-Z]{2})\b.*\b\1\b)(?:[A-Z]{2}(?:,\b)?)+$
See the regex demo
Possible implementation in Swift:
func isValidInput(Input:String) -> Bool {
return Input.range(of: #"^(?!.*\b([A-Z]{2})\b.*\b\1\b)(?:[A-Z]{2}(?:,\b)?)+$"#, options: .regularExpression) != nil
}
print(isValidInput(Input:"AL,RA,GD")) // true
print(isValidInput(Input:"AL,RA,AL")) // false
Details
^ - start of string
(?!.*\b([A-Z]{2})\b.*\b\1\b) - a negative lookahead that fails the match if, immediately to the right of the current location, there is:
.* - any 0+ chars other than line break chars, as many as possible
\b([A-Z]{2})\b - a two-letter word as a whole word
.* - any 0+ chars other than line break chars, as many as possible
\b\1\b - the same whole word as in Group 1. NOTE: The word boundaries here are not necessary in the current scenario where the word length is fixed, it is two, but if you do not know the word length, and you have [A-Z]+, you will need the word boundaries, or other boundaries depending on the situation
(?:[A-Z]{2}(?:,\b)?)+ - 1 or more sequences of:
[A-Z]{2} - two uppercase ASCII letters
(?:,\b)? - an optional sequence: , only if followed with a word char: letter, digit or _. This guarantees that , won't be allowed at the end of the string
$ - end of string.
You can use a negative lookahead with a back-reference:
^(?!.*([A-Z]{2}).*\1).*
if, as in the all the examples in the question, it is known that the string contains only comma-separated pairs of capital letters. I will relax that assumption later in my answer.
Demo
The regex performs the following operations:
^ # match beginning of line
(?! # begin negative lookahead
.* # match 0+ characters (1+ OK)
([A-Z]{2}) # match 2 uppercase letters in capture group 1
.* # match 0+ characters (1+ OK)
\1 # match the contents of capture group 1
) # end negative lookahead
.* # match 0+ characters (the entire string)
Suppose now that one or more capital letters may appear between each pair of commas, or before the first comma or after the last comma, but it is only strings of two letters that cannot be repeated. Moreover, I assume the regex must confirm the regex has the desired form. Then the following regex could be used:
^(?=[A-Z]+(?:,[A-Z]+)*$)(?!.*(?:^|,)([A-Z]{2}),(?:.*,)?\1(?:,|$)).*
Demo
The regex performs the following operations:
^ # match beginning of line
(?= # begin pos lookahead
[A-Z]+ # match 1+ uc letters
(?:,[A-Z]+) # match ',' then by 1+ uc letters in a non-cap grp
* # execute the non-cap grp 0+ times
$ # match the end of the line
) # end pos lookahead
(?! # begin neg lookahead
.* # match 0+ chars
(?:^|,) # match beginning of line or ','
([A-Z]{2}) # match 2 uc letters in cap grp 1
, # match ','
(?:.*,) # match 0+ chars, then ',' in non-cap group
? # optionally match non-cap grp
\1 # match the contents of cap grp 1
(?:,|$) # match ',' or end of line
) # end neg lookahead
.* # match 0+ chars (entire string)
If there is no need check that the string contains only comma-separated strings of one or more upper case letters the postive lookahead at the beginning can be removed.

Swift Regex Search String Except \r\n and \t

I am attempting to match phone numbers that is 6 digits or more with the following regex in swift. Phone numbers can also possess paranthesis and + for country codes.
"[0-9\\s\\-\\+\\(\\)]{6,}".
However, the above implementation matches \r\n and \t as well. How can I write the regex such that it will not match any \r\n or \t.
I attempted the following but didn't work:
"[0-9\\s\\-\\+\\(\\)(^\\r\\n\\t)]{6,}"
"[0-9\\s\\-\\+\\(\\)(?: (\\r|\\n|\\r\\n|\\t)]{6,}"
Thanks.
I suggest using
let regex = "^(?:[ +()-]*[0-9]){6,}[ +()-]*$"
Or
let regex = "^(?:[ +()-]*[0-9]){6,}[ +()-]*\\z"
Details
^ - start of string
(?:[ +()-]*[0-9]){6,} - six or more repetitions of
[ +()-]* - zero or more spaces, +, (, ) or - chars
[0-9] - a digit
[ +()-]* - zero or more spaces, +, (, ) or - chars
$ - end of string (\z is the very end of string).
If the pattern is used inside NSPredicate with MATCHES you may omit the ^ and $/\z anchors.

Parsing Infix Mathematical Expressions in Swift Using Regular Expressions

I would like to convert a string that is formatted as an infix mathematical to an array of tokens, using regular expressions. I'm very new to regular expressions, so forgive me if the answer to this question turns out to be too trivial
For example:
"31+2--3*43.8/1%(1*2)" -> ["31", "+", "2", "-", "-3", "*", "43.8", "/", "1", "%", "(", "*", "2", ")"]
I've already implemented a method that achieves this task, however, it consists of many lines of code and a few nested loops. I figured that when I define more operators/functions that may even consist of multiple characters, such as log or cos, it would be easier to edit a regex string rather than adding many more lines of code to my working function. Are regular expressions the right job for this, and if so, where am I going wrong? Or am I better off adding to my working parser?
I've already referred to the following SO posts:
How to split a string, but also keep the delimiters?
This one was very helpful, but I don't believe I'm using 'lookahead' correctly.
Validate mathematical expressions using regular expression?
The solution to the question above doesn't convert the string into an array of tokens. Rather, it checks to see if the given string is a valid mathematical expression.
My code is as follows:
func convertToInfixTokens(expression: String) -> [String]?
{
do
{
let pattern = "^(((?=[+-/*]))(-)?\\d+(\\.\\d+)?)*"
let regex = try NSRegularExpression(pattern: pattern)
let results = regex.matches(in: expression, range: NSRange(expression.startIndex..., in: expression))
return results.map
{
String(expression[Range($0.range, in: expression)!])
}
}
catch
{
return nil
}
}
When I do pass a valid infix expression to this function, it returns nil. Where am I going wrong with my regex string?
NOTE: I haven't even gotten to the point of trying to parse parentheses as individual tokens. I'm still figuring out why it won't work on this expression:
"-99+44+2+-3/3.2-6"
Any feedback is appreciated, thanks!
Your pattern does not work because it only matches text at the start of the string (see ^ anchor), then the (?=[+-/*]) positive lookahead requires the first char to be an operator from the specified set but the only operator that you consume is an optional -. So, when * tries to match the enclosed pattern sequence the second time with -99+44+2+-3/3.2-6, it sees +44 and -?\d fails to match it (as it does not know how to match + with -?).
Here is how your regex matches the string:
You may tokenize the expression using
let pattern = "(?<!\\d)-?\\d+(?:\\.\\d+)?|[-+*/%()]"
See the regex demo
Details
(?<!\d) - there should be no digit immediately to the left of the current position
-? - an optional -
\d+ - 1 or more digits
(?:\.\d+)? - an optional sequence of . and 1+ digits
| - or
\D - any char but a digit.
Output using your function:
Optional(["31", "+", "2", "-", "-3", "*", "43.8", "/", "1", "%", "(", "1", "*", "2", ")"])

How to search MongoDB through the comand line with a wildcard

I'm trying to search MongoDB for some info using a wildcard. I'm trying to find all the "agents" near a given zip code using some type of wildcard. Here's what I have:
db.agents.find({company_address:"49085"},{_id:1,email:1,company_address:1}).pretty()
For the zip code, can I use something like: ...find({company_address:"490*"}...?
You could use a regex to find patterns in text/strings.
Asumming an address starts with a number:
...find({company_address:{ $regex: '^490' }})
This admits everything after 490 ...
Case you wanted to test a zip code, for example:
For example:
...find({company_address:{ $regex: '^490[0-9]+$' }})
That finds strings starting with 490 and continued by one or more digits.
...find({company_address:{ $regex: '^490[0-9]{1,5}$' }})
This other is for strings starting with 490 and continued by between 1 or 5 digits.
...find({company_address:{ $regex: '^490[0-9]{1,}$' }})
Goes for starting with 490 and having at least 1 more digit.
...find({company_address:{ $regex: '^490[0-9]{4}$' }})
Goes for starting with 490 and continued exactly by 4 digits.
The ^ pattern means start of string, and $ means end of string, that way it ensures it's always a number.
For more info on regex, look here: http://docs.mongodb.org/manual/reference/operator/query/regex/
And you can test some regex at regex101, see you should pick Java Script on the right as MongoDB works with Java Script
Try this "starts with" regex pattern:
db.agents.find(
{
company_address: /^490/
}, //SQL equivalent like '490%'
{_id:1, email: 1, company_address: 1}
)
given that the company_address has a string value.