NET::LDAP FIlter with OR - perl

In PERL, NET::LDAP, I'm trying to use-
my $psearch-$ldap->search(
base => $my_base,
attr => ["mail","employeeNumber","physicalDeliveryOfficeName"],
filter => "(&(mail=*)(!(employeeNumber=9*)) (&(physicalDeliveryOfficeName=100)) (|(physicalDeliveryOfficeName=274)))");
Saying "give me everyone with a mail entry, where employee number does not begin with 9 and the physicalDeliveryOfficeName is either 100 or 274".
I can get it to work using just 100 or using just 274 but I can't seem to figure out how to specify 100 OR 274.
I can't seem to find the correct filter string, ready pull my hair out... please help!!

I can't test this, but LDAP queries use prefix notation while we're use to using infix notation. Imagine if you want something that's either a dog or a cat. In infix notation, it would look something like this:
((ANIMAL = "cat") OR (ANIMAL = "dog"))
With prefix notation, the boolean operator goes at the beginning of the query:
(OR (ANIMAL = "cat") (ANIMAL = "dog"))
The advantage to prefixed notation comes when you do more than two checks per boolean. Here I'm looking for something that's either a cat, a dog or a wombat:
(OR (ANIMAL = "cat") (ANIMAL = "dog") (ANIMAL = "wombat"))
Notice that I only needed a single boolean operator in the front of my statement. This will OR together all three statements. With our standard infix notation, I would have to have a second OR operator:
((ANIMAL = "cat") OR (ANIMAL = "dog") OR (ANIMAL = "wombat"))
Prefix notation was created by a Polish Mathematician named Jan Lukasiewicz back in 1924 in Warsaw Univeristy and thus became known as Polish Notation. Later on, it was discovered that computers could work an equation from front to back if the equation was written in postfix notation which is the reverse of Polish Notation. Thus, Reverse Polish Notation (or RPN) was born.
Early HP calculators used RPN notation which became the Geek Sheik thing back in the early 1970s. Imagine the sense of brain superiority you get when you hand your calculator to someone and they have no early idea how to use it. The only way to be cooler back then was to have a Curta.
Okay, enough walking down nostalgia lane. Let's get back to the problem...
The easiest way to construct an infix operation is to build a tree diagram of what you want. Thus, you should sketch out your LDAP query as a tree:
AND
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
OR employee!=9* mail=*
/ \
/ \
/ \
/ \
/ \
phyDelOfficeName=100 phyDelOfficeName=274
To build a query based upon this tree, start with the bottom of the tree, and work your way up each layer. The bottom part of our tree is the OR part of our query:
(OR (physicalDeliveryOfficeName = 100) (physicalDeliveryOfficeName = 274))
Using LDAP's OR operator, the pipe (|) and removing the extra spaces, we get:
(|(physicalDeliveryOfficeName = 100)(physicalDeliveryOfficeName = 274))
When I build an LDAP query, I like to save each section as a Perl scalar variable. It makes it a bit easier to use:
$or_part = "|(physicalDeliveryOfficeName=100)(physicalDeliveryOfficeName=274)";
Notice I've left off the outer pair or parentheses. The outer set of parentheses return when you string all the queries back together. However, some people put them anyway. An extra set of parentheses doesn't hurt an LDAP query.
Now for the other two parts of the query:
$mailAddrExists = "mail=*";
$not_emp_starts_9 = "!(employee=9*)";
And, now we AND all three sections together:
"(&($mailAddrExists)($not_emp_starts_9)($or_part))"
Note that a single ampersand weaves it all together. I can substitute back each section to see the full query:
(&(mail=*)(!(employee=9*))(|(physicalDeliveryOfficeName=100)(physicalDeliveryOfficeName=274)))
Or like this:
my $psearch-$ldap->search(
base => $my_base,
attr => ["mail","employeeNumber","physicalDeliveryOfficeName"],
filter => "(&(mail=*)(!(employee=9*))(|(physicalDeliveryOfficeName=100)(physicalDeliveryOfficeName=274)))",
);
Or piecemeal:
my $mail = "mail=*";
my $employee = "!(employee=9*)";
my $physicalAddress = "|(physicalDeliveryOfficeName=100)"
. "(physicalDeliveryOfficeName=274)";
my $psearch-$ldap->search(
base => $my_base,
attr => ["mail","employeeNumber","physicalDeliveryOfficeName"],
filter => "(&($mail)($employee)($physicalAddress))",
);
As I said before, I can't test this. I hope it works. If nothing else, I hope you understand how to create an LDAP query and can figure out how to do it yourself.

Related

How to strip everything except digits from a string in Scala (quick one liners)

This is driving me nuts... there must be a way to strip out all non-digit characters (or perform other simple filtering) in a String.
Example: I want to turn a phone number ("+72 (93) 2342-7772" or "+1 310-777-2341") into a simple numeric String (not an Int), such as "729323427772" or "13107772341".
I tried "[\\d]+".r.findAllIn(phoneNumber) which returns an Iteratee and then I would have to recombine them into a String somehow... seems horribly wasteful.
I also came up with: phoneNumber.filter("0123456789".contains(_)) but that becomes tedious for other situations. For instance, removing all punctuation... I'm really after something that works with a regular expression so it has wider application than just filtering out digits.
Anyone have a fancy Scala one-liner for this that is more direct?
You can use filter, treating the string as a character sequence and testing the character with isDigit:
"+72 (93) 2342-7772".filter(_.isDigit) // res0: String = 729323427772
You can use replaceAll and Regex.
"+72 (93) 2342-7772".replaceAll("[^0-9]", "") // res1: String = 729323427772
Another approach, define the collection of valid characters, in this case
val d = '0' to '9'
and so for val a = "+72 (93) 2342-7772", filter on collection inclusion for instance with either of these,
for (c <- a if d.contains(c)) yield c
a.filter(d.contains)
a.collect{ case c if d.contains(c) => c }

Implement Scala-style String Interpolation In Scala

I want to implement a Scala-style string interpolation in Scala. Here is an example,
val str = "hello ${var1} world ${var2}"
At runtime I want to replace "${var1}" and "${var2}" with some runtime strings. However, when trying to use Regex.replaceAllIn(target: CharSequence, replacer: (Match) ⇒ String), I ran into the following problem:
import scala.util.matching.Regex
val placeholder = new Regex("""(\$\{\w+\})""")
placeholder.replaceAllIn(str, m => s"A${m.matched}B")
java.lang.IllegalArgumentException: No group with name {var1}
at java.util.regex.Matcher.appendReplacement(Matcher.java:800)
at scala.util.matching.Regex$Replacement$class.replace(Regex.scala:722)
at scala.util.matching.Regex$MatchIterator$$anon$1.replace(Regex.scala:700)
at scala.util.matching.Regex$$anonfun$replaceAllIn$1.apply(Regex.scala:410)
at scala.util.matching.Regex$$anonfun$replaceAllIn$1.apply(Regex.scala:410)
at scala.collection.Iterator$class.foreach(Iterator.scala:743)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1174)
at scala.util.matching.Regex.replaceAllIn(Regex.scala:410)
... 32 elided
However, when I removed '$' from the regular expression, it worked:
val placeholder = new Regex("""(\{\w+\})""")
placeholder.replaceAllIn(str, m => s"A${m.matched}B")
res2: String = hello $A{var1}B world $A{var2}B
So my question is that whether this is a bug in Scala Regex. And if so, are there other elegant ways to achieve the same goal (other than brutal force replaceAllLiterally on all placeholders)?
$ is a treated specially in the replacement string. This is described in the documentation of replaceAllIn:
In the replacement String, a dollar sign ($) followed by a number will be interpreted as a reference to a group in the matched pattern, with numbers 1 through 9 corresponding to the first nine groups, and 0 standing for the whole match. Any other character is an error. The backslash (\) character will be interpreted as an escape character and can be used to escape the dollar sign. Use Regex.quoteReplacement to escape these characters.
(Actually, that doesn't mention named group references, so I guess it's only sort of documented.)
Anyway, the takeaway here is that you need to escape the $ characters in the replacement string if you don't want them to be treated as references.
new scala.util.matching.Regex("""(\$\{\w+\})""")
.replaceAllIn("hello ${var1} world ${var2}", m => s"A\\${m.matched}B")
// "hello A${var1}B world A${var2}B"
It's hard to tell what you're expecting the behavior to do. The issue is that s"${m.matched}" is turning into "${var1}" (and "${var2}"). The '$' is special character to say "place the group with name {var1} here instead".
For example:
scala> placeholder.replaceAllIn(str, m => "$1")
res0: String = hello ${var1} world ${var2}
It replaces the match with the first capturing group (which is m itself).
It's hard to tell exactly what you're doing, but you could escape any $ like so:
scala> placeholder.replaceAllIn(str, m => s"${m.matched.replace("$","\\$")}")
res1: String = hello ${var1} world ${var2}
If what you really want to do is evaluate var1/var2 for some variables in the local scope of the method; that's not possible. In fact, the s"Hello, $name" pattern is actually converted into new StringContext("Hello, ", "").s(name) at compile time.

Does CoffeeScript have something like wildcard binding in destructing assignment?

In Scala, I can do things like:
val List(first, _, third) = List(1, 2, 3)
// first = 1
// third = 3
I currently have some CoffeeScript code in which I'm also not interested some elements in the array. Ideally, I would like to use a wildcard, like I'm doing in Scala.
[first, _, third] = [1, 2, 3]
Now, this does work, but it also adds this assignment:
_ = 2
… which is clearly not what I want. (I'm using underscore.) What's the preferred way of ignoring values in the array?
By the way, I'm using this for regular expression matching; the regular expression I'm using has optional groups that are really there for matching only, not to get any real data out.
match = /^(([a-z]+)|([0-9]+)) (week|day|year)(s)? (before|after) (.*)$/.exec str
if match?
[__, __, text, number, period, __, relation, timestamp] = match
…
In your specific case you could bypass the whole problem by using non-capturing groups in your regex:
(?:x)
Matches x but does not remember the match. These are called non-capturing parentheses. The matched substring can not be recalled from the resulting array's elements [1], ..., [n] or from the predefined RegExp object's properties $1, ..., $9.
If I'm reading your code right, you'd want:
/^(?:([a-z]+)|([0-9]+)) (week|day|year)(?:s)? (before|after) (.*)$/
You could also replace (?:s)? with s? since there's no need to group a single literal like that:
/^(?:([a-z]+)|([0-9]+)) (week|day|year)s? (before|after) (.*)$/
In either case, that you leave you with:
[__, text, number, period, relation, timestamp] = match
You could use an array slice to get rid of the leading __:
[text, number, period, relation, timestamp] = match[1..]
The match[1..] is a hidden call to Array.slice since the destructuring isn't smart enough (yet) to just skip match[0] when breaking match up. That extra method call may or may not matter to you.
There is no way to have wildcard assignments like that. You could use a double underscore __, like this
[first, __, third] = [1, 2, 3]
Personally, I would name the second variable in a way that makes sense, even if it is not used afterwards.

ignore spaces and cases MATLAB

diary_file = tempname();
diary(diary_file);
myFun();
diary('off');
output = fileread(diary_file);
I would like to search a string from output, but also to ignore spaces and upper/lower cases. Here is an example for what's in output:
the test : passed
number : 4
found = 'thetest:passed'
a = strfind(output,found )
How could I ignore spaces and cases from output?
Assuming you are not too worried about accidentally matching something like: 'thetEst:passed' here is what you can do:
Remove all spaces and only compare lower case
found = 'With spaces'
found = lower(found(found ~= ' '))
This will return
found =
withspaces
Of course you would also need to do this with each line of output.
Another way:
regexpi(output(~isspace(output)), found, 'match')
if output is a single string, or
regexpi(regexprep(output,'\s',''), found, 'match')
for the more general case (either class(output) == 'cell' or 'char').
Advantages:
Fast.
robust (ALL whitespace (not just spaces) is removed)
more flexible (you can return starting/ending indices of the match, tokenize, etc.)
will return original case of the match in output
Disadvantages:
more typing
less obvious (more documentation required)
will return original case of the match in output (yes, there's two sides to that coin)
That last point in both lists is easily forced to lower or uppercase using lower() or upper(), but if you want same-case, it's a bit more involved:
C = regexpi(output(~isspace(output)), found, 'match');
if ~isempty(C)
C = found; end
for single string, or
C = regexpi(regexprep(output, '\s', ''), found, 'match')
C(~cellfun('isempty', C)) = {found}
for the more general case.
You can use lower to convert everything to lowercase to solve your case problem. However ignoring whitespace like you want is a little trickier. It looks like you want to keep some spaces but not all, in which case you should split the string by whitespace and compare substrings piecemeal.
I'd advertise using regex, e.g. like this:
a = regexpi(output, 'the\s*test\s*:\s*passed');
If you don't care about the position where the match occurs but only if there's a match at all, removing all whitespaces would be a brute force, and somewhat nasty, possibility:
a = strfind(strrrep(output, ' ',''), found);

Perl Xpath: search item before a date year

I have an xml database that contains films, for example:
<film id="5">
<title>The Avengers</title>
<date>2012-09-24</date>
<family>Comics</family>
</film>
From a Perl script I want to find film by date.
If I search films of an exacly year, for example:
my $query = "//collection/film[date = 2012]";
it works exactly and return all films of 2012 year, but if I search all film before a year, it didn't work, for example:
my $query = "//collection/film[date < 2012]";
it returns all film..
Well, as usual, there's more than one way to do it. ) Either you let XPath tool know that it should compare dates (it doesn't know from the start) with something like this:
my $query = '//collection/film[xs:date(./date) < xs:date("2012-01-01")]';
... or you just bite the bullet and just compare the 'yyyy' substrings:
my $query = '//collection/film[substring(date, 1, 4) < "2012"]';
The former is better semantically, I suppose, but requires an advanced XML parser tool which supports XPath 2.0. And the latter was successfully verified with XML::XPath.
UPDATE: I'd like to give my explanation of why your first query works. ) See, you don't compare dates there - you compare numbers, but only because of '=' operator. Quote from the doc:
When neither object to be compared is a node-set and the operator is =
or !=, then the objects are compared by converting them to a common
type as follows and then comparing them. If at least one object to be
compared is a boolean, then each object to be compared is converted to
a boolean as if by applying the boolean function. Otherwise, if at
least one object to be compared is a number, then each object to be
compared is converted to a number as if by applying the number
function.
See? Your '2012-09-24' was converted to number - and became 2012. Which, of course, is equal to 2012. )
This doesn't work with any other comparative operators, though: that's why you need to either use substring, or convert the date-string to number. I supposed the first approach would be more readable - and faster as well, perhaps. )
Use this XPath, to check the year
//collection/film[substring-before(date, '-') < '2012']
Your Perl script will be,
my $query = "//collection/film[substring-before(date, '-') < '2012']";
OR
my $query = "//collection/film[substring-before(date, '-') = '2012']";
Simply use:
//collection/film[translate(date, '-', '') < 20120101]
This removes the dashes from the date then compares it for being less than 2012-01-01 (with the dashes removed).
In the same way you can get all films with dates prior a given date (not only year):
//collection/film[translate(date, '-', '') < translate($theDate, '-', '']