How to search MongoDB through the comand line with a wildcard - mongodb

I'm trying to search MongoDB for some info using a wildcard. I'm trying to find all the "agents" near a given zip code using some type of wildcard. Here's what I have:
db.agents.find({company_address:"49085"},{_id:1,email:1,company_address:1}).pretty()
For the zip code, can I use something like: ...find({company_address:"490*"}...?

You could use a regex to find patterns in text/strings.
Asumming an address starts with a number:
...find({company_address:{ $regex: '^490' }})
This admits everything after 490 ...
Case you wanted to test a zip code, for example:
For example:
...find({company_address:{ $regex: '^490[0-9]+$' }})
That finds strings starting with 490 and continued by one or more digits.
...find({company_address:{ $regex: '^490[0-9]{1,5}$' }})
This other is for strings starting with 490 and continued by between 1 or 5 digits.
...find({company_address:{ $regex: '^490[0-9]{1,}$' }})
Goes for starting with 490 and having at least 1 more digit.
...find({company_address:{ $regex: '^490[0-9]{4}$' }})
Goes for starting with 490 and continued exactly by 4 digits.
The ^ pattern means start of string, and $ means end of string, that way it ensures it's always a number.
For more info on regex, look here: http://docs.mongodb.org/manual/reference/operator/query/regex/
And you can test some regex at regex101, see you should pick Java Script on the right as MongoDB works with Java Script

Try this "starts with" regex pattern:
db.agents.find(
{
company_address: /^490/
}, //SQL equivalent like '490%'
{_id:1, email: 1, company_address: 1}
)
given that the company_address has a string value.

Related

Regex expression in q to match specific integer range following string

Using q’s like function, how can we achieve the following match using a single regex string regstr?
q) ("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13") like regstr
>>> 0111110b
That is, like regstr matches the foo-strings which end in the numbers 8,9,10,11,12.
Using regstr:"foo[8-12]" confuses the square brackets (how does it interpret this?) since 12 is not a single digit, while regstr:"foo[1[0-2]|[1-9]]" returns a type error, even without the foo-string complication.
As the other comments and answers mentioned, this can't be done using a single regex. Another alternative method is to construct the list of strings that you want to compare against:
q)str:("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")
q)match:{x in y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
0111110b
If your eventual goal is to filter on the matching entries, you can replace in with inter:
q)match:{x inter y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
A variation on Cillian’s method: test the prefix and numbers separately.
q)range:{x+til 1+y-x}.
q)s:"foo",/:string 82,range 7 13 / include "foo82" in tests
q)match:{min(x~/:;in[;string range y]')#'flip count[x]cut'z}
q)match["foo";8 12;] s
00111110b
Note how unary derived functions x~/: and in[;string range y]' are paired by #' to the split strings, then min used to AND the result:
q)flip 3 cut's
"foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo"
"82" ,"7" ,"8" ,"9" "10" "11" "12" "13"
q)("foo"~/:;in[;string range 8 12]')#'flip 3 cut's
11111111b
00111110b
Compositions rock.
As the comments state, regex in kdb+ is extremely limited. If the number of trailing digits is known like in the example above then the following can be used to check multiple patterns
q)str:("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13"; "foo3x"; "foo123")
q)any str like/:("foo[0-9]";"foo[0-9][0-9]")
111111100b
Checking for a range like 8-12 is not currently possible within kdb+ regex. One possible workaround is to write a function to implement this logic. The function range checks a list of strings start with a passed string and end with a number within the range specified.
range:{
/ checking for strings starting with string y
s:((c:count y)#'x)like y;
/ convert remainder of string to long, check if within range
d:("J"$c _'x)within z;
/ find strings satisfying both conditions
s&d
}
Example use:
q)range[str;"foo";8 12]
011111000b
q)str where range[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
This could be made more efficient by checking the trailing digits only on the subset of strings starting with "foo".
For your example you can pad, fill with a char, and then simple regex works fine:
("."^5$("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")) like "foo[1|8-9][.|0-2]"

Find the documents which contains a specific value in a array

I want to find documents that contain a single - (symbol).
occupationalCategory array consists of single - (symbol) instead of a double on specific employerId.
wrongly inserted with (single - symbol)
"occupationalCategory" : [
"15-1132.00 - Software Developers, Applications"
],
its should be : (double -- symbol)
"occupationalCategory" : [
"15-1132.00 -- Software Developers, Applications"
]
Please help me to get those documents.
As you mentioned that the string pattern is consistent, you can use regex to match the string pattern.
^\d+-\d+.\d{2} - [\w\s]+, \w+$
^ - Start with
\d - Match with digit
+ - Refer previous character/symbol with at least 1 occurrence
- - - Symbol
\d+.\d{2} - Match for decimal pattern
\w - Word character
\s - Spacing character
$ - Match End
Sample Regex 101 & Test Output
db.collection.find({
"occupationalCategory": {
$regex: "\\d+-\\d+.\\d{2} - [\\w\\s]+, \\w+",
$options: "m"
}
})
Sample Mongo Playground

Could I specify pattern match priority in lex code?

I've got a related thread in the site(My lex pattern doesn't work to match my input file, how to correct it?)
The problems I met, is about how "greedy" lex will do pattern match, e.g. I've got my lex file:
$ cat b.l
%{
#include<stdio.h>
%}
%%
"12" {printf("head\n");}
"34" {printf("tail\n");}
.* {printf("content\n");}
%%
What I wish to say is, when meet "12", print "head"; when meet "34", print "tail", otherwise print "content" for the longest match that doesn't contain either "12" or "34".
But the fact was, ".*" was a greedy match that whatever I input, it prints "content".
My requirement is, when I use
12sdf2dfsd3sd34
as input, the output should be
head
content
tail
So seems there're 2 possible ways:
1, To specify a match priority for ".*", it should work only when neither "12" and "34" works to match. Does lex support "priority"?
2, to change the 3rd expression, as to match any contiguous string that doesn't contain sub-string of "12", or "34". But how to write this regular expression?
Does (f)lex support priority?
(F)lex always produces the longest possible match. If more than one rule matches the same longest match, the first one is chosen, so in that case it supports priority. But it does not support priority for shorter matches, nor does it implement non-greedy matching.
How to match a string which does not contain one or more sequences?
You can, with some work, create a regular expression which matches a string not containing specified substrings, but it is not particularly easy and (f)lex does not provide a simple syntax for such regular expressions.
A simpler (but slightly less efficient) solution is to match the string in pieces. As a rough outline, you could do the following:
"12" { return HEAD; }
"34" { if (yyleng > 2) {
yyless(yyleng - 2);
return CONTENT;
}
else
return TAIL;
}
.|\n { yymore(); }
This could be made more efficient by matching multiple characters when there is not chance of skipping a delimiter; change the last rule to:
.|[^13]+ { yymore(); }
yymore() causes the current token to be retained, so that the next match appends to the current token rather than starting a new token. yyless(x) returns all but the first x characters to the input stream; in this case, that is used to cause the end delimiter 34 to be rescanned after the CONTENT token is identified.
(That assumes you actually want to tokenize the input stream, rather than just print a debugging message, which is why I called it an outline solution.)

Datastage, Remove only last two characters of string

This function: Trim(In.Col, Right(In.Col, 2), 'T') works unless the last >2 characters are the same.
What I want:
abczzzz -> abczz
What I get:
abczzzz -> abc
How do I solve this?
The "T" option removes all trailing occurrences. Since you are limiting your input to only two characters with the Right() function, the second occurence will never be a trailing char.
It sounds though like you are just doing a substring..? If so, then you might just want to do a substring [ ] instead.
expression [ [ start, ] length ]
In.Col[(string length) - 2]
I prefer the Left() function, although it's equivalent here, as it's self-documenting.
Left(InLink.MyString, Len(InLink.MyString) - 2)

Behaviour of the project stage operator in projecting Arrays

My question is closely related to this, but not similar.
I have a sample document in my collection:
db.t.insert({"a":1,"b":2});
My intent is to project a field named combined of type array with the values of both a and b together.([1,2]).
I simply try to aggregate with a $project stage:
db.t.aggregate([
{$project:{"combined":[]}}
])
MongoDB throws an error: disallowed field type Array in object expression.
Which means a field cannot be projected as a array.
But when i use a $cond operator to project an array, the field gets projected.
db.t.aggregate([
{$project:{"combined":{$cond:[{$eq:[1,1]},["$a","$b"],"$a"]}}}
])
I get the o/p: {"combined" : [ "$a", "$b" ] }.
If you notice the output, the value of a and b are treated as if they were literals and not a field paths.
Can anyone please explain to me this behavior?, When i make the condition to fail,
db.t.aggregate([
{$project:{"combined":{$cond:[{$eq:[1,2]},["$a","$b"],"$a"]}}}
])
I get the expected output where $a is treated as a field path, since $a is not enclosed as an array element.
I've run into this before too and it's annoying, but it's actually working as documented for literal ["$a", "$b"]; the first error about disallowed field type is...not as clear why it complains. You have to follow the description of the grammar of the $project stage spread out in the documentation, however. I'll try to do that here. Starting at $project,
The $project stage has the following prototype form:
{ $project: { <specifications> } }
and specifications can be one of the following:
1. <field> : <1 or true or 0 or false>
2. <field> : <expression>
What's an expression? From aggregation expressions,
Expressions can include field paths and system variables, literals, expression objects, and operator expressions.
What are each of those things? A field path/system variable should be familiar: it's a string literal prefixed with $ or $$. An expression object has the form
{ <field1>: <expression1>, ... }
while an operator expression has one of the forms
{ <operator>: [ <argument1>, <argument2> ... ] }
{ <operator>: <argument> }
for some enumerated list of values for <operator>.
What's an <argument>? The documentation isn't clear on it, but from my experience I think it's any expression, subject to the syntax rules of the given operator (examine the operator expression "cond" : ... in the question).
Arrays fit in only as containers for argument lists and as literals. Literals are literals - their content is not evaluated for field paths or system variables, which is why the array literal argument in the $cond ends up with the value [ "$a", "$b" ]. The expressions in the argument array are evaluated.
The first error about Array being a disallowed value type is a bit odd to me, since an array literal is a valid expression, so according to the documentation it can be a value in an object expression. I don't see any ambiguity in parsing it as part of an object expression, either. It looks like it's just a rule they made to make the parsing easier? You can "dodge" it using $literal to put in a constant array value:
db.collection.project([{ "$project" : { "combined" : { "$literal" : [1, 2] } } }])
I hope this helps explain why things work this way. I was surprised the first time I tried to do something like [ "$a", "$b" ] and it didn't work as I expected. It'd be nice if there were a feature to pack field paths into an array, at least. I've found uses for it when $grouping on ordered pairs of values, as well.
There's a JIRA ticket, SERVER-8141, requesting an $array operator to help with cases like this.