Perl Xpath: search item before a date year - perl

I have an xml database that contains films, for example:
<film id="5">
<title>The Avengers</title>
<date>2012-09-24</date>
<family>Comics</family>
</film>
From a Perl script I want to find film by date.
If I search films of an exacly year, for example:
my $query = "//collection/film[date = 2012]";
it works exactly and return all films of 2012 year, but if I search all film before a year, it didn't work, for example:
my $query = "//collection/film[date < 2012]";
it returns all film..

Well, as usual, there's more than one way to do it. ) Either you let XPath tool know that it should compare dates (it doesn't know from the start) with something like this:
my $query = '//collection/film[xs:date(./date) < xs:date("2012-01-01")]';
... or you just bite the bullet and just compare the 'yyyy' substrings:
my $query = '//collection/film[substring(date, 1, 4) < "2012"]';
The former is better semantically, I suppose, but requires an advanced XML parser tool which supports XPath 2.0. And the latter was successfully verified with XML::XPath.
UPDATE: I'd like to give my explanation of why your first query works. ) See, you don't compare dates there - you compare numbers, but only because of '=' operator. Quote from the doc:
When neither object to be compared is a node-set and the operator is =
or !=, then the objects are compared by converting them to a common
type as follows and then comparing them. If at least one object to be
compared is a boolean, then each object to be compared is converted to
a boolean as if by applying the boolean function. Otherwise, if at
least one object to be compared is a number, then each object to be
compared is converted to a number as if by applying the number
function.
See? Your '2012-09-24' was converted to number - and became 2012. Which, of course, is equal to 2012. )
This doesn't work with any other comparative operators, though: that's why you need to either use substring, or convert the date-string to number. I supposed the first approach would be more readable - and faster as well, perhaps. )

Use this XPath, to check the year
//collection/film[substring-before(date, '-') < '2012']
Your Perl script will be,
my $query = "//collection/film[substring-before(date, '-') < '2012']";
OR
my $query = "//collection/film[substring-before(date, '-') = '2012']";

Simply use:
//collection/film[translate(date, '-', '') < 20120101]
This removes the dashes from the date then compares it for being less than 2012-01-01 (with the dashes removed).
In the same way you can get all films with dates prior a given date (not only year):
//collection/film[translate(date, '-', '') < translate($theDate, '-', '']

Related

SPARQL how to make DAY/MONTH always return 2 digits

I have a datetime in my SPARQL-query that I want to transform to a date.
Therefore I do:
BIND(CONCAT(YEAR(?dateTime), "-",MONTH(?dateTime), "-", DAY(?dateTime)) as ?date)
This part of code works but returns for example 2022-2-3, I want it to be 2022-02-03. If the dateTime is 2022-11-23, nothing should change.
You can take the integers you get back from the YEAR, MONTH, and DAY functions and pad them with the appropriate number of zeros (after turning them into strings):
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT * WHERE {
BIND(2022 AS ?yearInt) # this would come from your YEAR(?dateTime) call
BIND(2 AS ?monthInt) # this would come from your MONTH(?dateTime) call
BIND(13 AS ?dayInt) # this would come from your DAY(?dateTime) call
# convert to strings
BIND(STR(?yearInt) AS ?year)
BIND(STR(?monthInt) AS ?month)
BIND(STR(?dayInt) AS ?day)
# pad with zeros
BIND(CONCAT("00", ?year) AS ?paddedYear)
BIND(CONCAT("0000", ?month) AS ?paddedMonth)
BIND(CONCAT("00", ?day) AS ?paddedDay)
# extract the right number of digits from the padded strings
BIND(SUBSTR(?paddedYear, STRLEN(?paddedYear)-3) AS ?fourDigitYear)
BIND(SUBSTR(?paddedDay, STRLEN(?paddedDay)-1) AS ?twoDigitDay)
BIND(SUBSTR(?paddedMonth, STRLEN(?paddedMonth)-1) AS ?twoDigitMonth)
# put it all back together
BIND(CONCAT(?fourDigitYear, "-", ?twoDigitMonth, "-", ?twoDigitDay) as ?date)
}
#gregory-williams gives a portable answer. An alternative is functions from F&O (XPath and XQuery Functions and Operators 3.1) "fn:format-...."
I'm not sure of the coverage in various triplestores - Apache Jena provides fn:format-number, which is needed for the question, but not fn-format-dateTime etc
See
https://www.w3.org/TR/xpath-functions-3/#formatting-the-number
https://www.w3.org/TR/xpath-functions-3/#formatting-dates-and-times
For example:
fn:format-number(1,"000") returns the string "001".
Apache Jena also has a local extension afn:sprintf using the C or Java syntax of sprintf:
afn:sprintf("%03d", 1) returns "001".

Trouble importing csv-data within MATLAB

I am trying to read in a csv-file that contains daily data on EUR/USD exchange rates including the dates specifying year, month and day. The problem is that using readtable(filename) puts single quotes around all table-entries and therefore hinders me using the data at all.
Detect import options:
opts = detectImportOptions('EUR_USD Historische Data.csv');
Read in the data:
EUR_USD = readtable('EUR_USD Historische Data.csv');
Substract dates and transform to datetime variable:
dt = EUR_USD(:,1);
dates = datetime(dt,'InputFormat','yyyyMMdd');
% Does not work because of single quotes
I was able to subtract closing prices and make them workable, but I am not sure if this is an elegant way of doing so:
closing_prices = str2double(table2array(EUR_USD(:,5)));
Ultimately the goal is to make the data workable. I need to compare two columns with datetime-variables and if dates do not match between the two columns I need to remove that entry such that in the end both columns match.
This is the vector with dates:
Dates vector wrong
I need it to look like this:
Dates vector correct
I think all you need to do is remove the ' character in order to read the data into datetime correctly. Look at the following example:
%stringz is the same as dt here: just the string data
T = table;
T.stringz = string(['''string1'''; '''string2'''; '''string3''']);
stringz = T.stringz;
%Run the for loop to remove the ' chars
for i = 1:length(stringz)
strval = char(stringz(i,1));
strval = strval(2:end-1);
strmat(i,1) = string(strval);
end
%Then load data into datetime after this for loop
dates = datetime(strmat,'InputFormat','yyyyMMdd');
strmat return a 3x1 string array with no ' characters on the outside of the string.

Function to split string in matlab and return second number

I have a string and I need two characters to be returned.
I tried with strsplit but the delimiter must be a string and I don't have any delimiters in my string. Instead, I always want to get the second number in my string. The number is always 2 digits.
Example: 001a02.jpg I use the fileparts function to delete the extension of the image (jpg), so I get this string: 001a02
The expected return value is 02
Another example: 001A43a . Return values: 43
Another one: 002A12. Return values: 12
All the filenames are in a matrix 1002x1. Maybe I can use textscan but in the second example, it gives "43a" as a result.
(Just so this question doesn't remain unanswered, here's a possible approach: )
One way to go about this uses splitting with regular expressions (MATLAB's strsplit which you mentioned):
str = '001a02.jpg';
C = strsplit(str,'[a-zA-Z.]','DelimiterType','RegularExpression');
Results in:
C =
'001' '02' ''
In older versions of MATLAB, before strsplit was introduced, similar functionality was achieved using regexp(...,'split').
If you want to learn more about regular expressions (abbreviated as "regex" or "regexp"), there are many online resources (JGI..)
In your case, if you only need to take the 5th and 6th characters from the string you could use:
D = str(5:6);
... and if you want to convert those into numbers you could use:
E = str2double(str(5:6));
If your number is always at a certain position in the string, you can simply index this position.
In the examples you gave, the number is always the 5th and 6th characters in the string.
filename = '002A12';
num = str2num(filename(5:6));
Otherwise, if the formating is more complex, you may want to use a regular expression. There is a similar question matlab - extracting numbers from (odd) string. Modifying the code found there you can do the following
all_num = regexp(filename, '\d+', 'match'); %Find all numbers in the filename
num = str2num(all_num{2}) %Convert second number from str

ignore spaces and cases MATLAB

diary_file = tempname();
diary(diary_file);
myFun();
diary('off');
output = fileread(diary_file);
I would like to search a string from output, but also to ignore spaces and upper/lower cases. Here is an example for what's in output:
the test : passed
number : 4
found = 'thetest:passed'
a = strfind(output,found )
How could I ignore spaces and cases from output?
Assuming you are not too worried about accidentally matching something like: 'thetEst:passed' here is what you can do:
Remove all spaces and only compare lower case
found = 'With spaces'
found = lower(found(found ~= ' '))
This will return
found =
withspaces
Of course you would also need to do this with each line of output.
Another way:
regexpi(output(~isspace(output)), found, 'match')
if output is a single string, or
regexpi(regexprep(output,'\s',''), found, 'match')
for the more general case (either class(output) == 'cell' or 'char').
Advantages:
Fast.
robust (ALL whitespace (not just spaces) is removed)
more flexible (you can return starting/ending indices of the match, tokenize, etc.)
will return original case of the match in output
Disadvantages:
more typing
less obvious (more documentation required)
will return original case of the match in output (yes, there's two sides to that coin)
That last point in both lists is easily forced to lower or uppercase using lower() or upper(), but if you want same-case, it's a bit more involved:
C = regexpi(output(~isspace(output)), found, 'match');
if ~isempty(C)
C = found; end
for single string, or
C = regexpi(regexprep(output, '\s', ''), found, 'match')
C(~cellfun('isempty', C)) = {found}
for the more general case.
You can use lower to convert everything to lowercase to solve your case problem. However ignoring whitespace like you want is a little trickier. It looks like you want to keep some spaces but not all, in which case you should split the string by whitespace and compare substrings piecemeal.
I'd advertise using regex, e.g. like this:
a = regexpi(output, 'the\s*test\s*:\s*passed');
If you don't care about the position where the match occurs but only if there's a match at all, removing all whitespaces would be a brute force, and somewhat nasty, possibility:
a = strfind(strrrep(output, ' ',''), found);

NET::LDAP FIlter with OR

In PERL, NET::LDAP, I'm trying to use-
my $psearch-$ldap->search(
base => $my_base,
attr => ["mail","employeeNumber","physicalDeliveryOfficeName"],
filter => "(&(mail=*)(!(employeeNumber=9*)) (&(physicalDeliveryOfficeName=100)) (|(physicalDeliveryOfficeName=274)))");
Saying "give me everyone with a mail entry, where employee number does not begin with 9 and the physicalDeliveryOfficeName is either 100 or 274".
I can get it to work using just 100 or using just 274 but I can't seem to figure out how to specify 100 OR 274.
I can't seem to find the correct filter string, ready pull my hair out... please help!!
I can't test this, but LDAP queries use prefix notation while we're use to using infix notation. Imagine if you want something that's either a dog or a cat. In infix notation, it would look something like this:
((ANIMAL = "cat") OR (ANIMAL = "dog"))
With prefix notation, the boolean operator goes at the beginning of the query:
(OR (ANIMAL = "cat") (ANIMAL = "dog"))
The advantage to prefixed notation comes when you do more than two checks per boolean. Here I'm looking for something that's either a cat, a dog or a wombat:
(OR (ANIMAL = "cat") (ANIMAL = "dog") (ANIMAL = "wombat"))
Notice that I only needed a single boolean operator in the front of my statement. This will OR together all three statements. With our standard infix notation, I would have to have a second OR operator:
((ANIMAL = "cat") OR (ANIMAL = "dog") OR (ANIMAL = "wombat"))
Prefix notation was created by a Polish Mathematician named Jan Lukasiewicz back in 1924 in Warsaw Univeristy and thus became known as Polish Notation. Later on, it was discovered that computers could work an equation from front to back if the equation was written in postfix notation which is the reverse of Polish Notation. Thus, Reverse Polish Notation (or RPN) was born.
Early HP calculators used RPN notation which became the Geek Sheik thing back in the early 1970s. Imagine the sense of brain superiority you get when you hand your calculator to someone and they have no early idea how to use it. The only way to be cooler back then was to have a Curta.
Okay, enough walking down nostalgia lane. Let's get back to the problem...
The easiest way to construct an infix operation is to build a tree diagram of what you want. Thus, you should sketch out your LDAP query as a tree:
AND
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
OR employee!=9* mail=*
/ \
/ \
/ \
/ \
/ \
phyDelOfficeName=100 phyDelOfficeName=274
To build a query based upon this tree, start with the bottom of the tree, and work your way up each layer. The bottom part of our tree is the OR part of our query:
(OR (physicalDeliveryOfficeName = 100) (physicalDeliveryOfficeName = 274))
Using LDAP's OR operator, the pipe (|) and removing the extra spaces, we get:
(|(physicalDeliveryOfficeName = 100)(physicalDeliveryOfficeName = 274))
When I build an LDAP query, I like to save each section as a Perl scalar variable. It makes it a bit easier to use:
$or_part = "|(physicalDeliveryOfficeName=100)(physicalDeliveryOfficeName=274)";
Notice I've left off the outer pair or parentheses. The outer set of parentheses return when you string all the queries back together. However, some people put them anyway. An extra set of parentheses doesn't hurt an LDAP query.
Now for the other two parts of the query:
$mailAddrExists = "mail=*";
$not_emp_starts_9 = "!(employee=9*)";
And, now we AND all three sections together:
"(&($mailAddrExists)($not_emp_starts_9)($or_part))"
Note that a single ampersand weaves it all together. I can substitute back each section to see the full query:
(&(mail=*)(!(employee=9*))(|(physicalDeliveryOfficeName=100)(physicalDeliveryOfficeName=274)))
Or like this:
my $psearch-$ldap->search(
base => $my_base,
attr => ["mail","employeeNumber","physicalDeliveryOfficeName"],
filter => "(&(mail=*)(!(employee=9*))(|(physicalDeliveryOfficeName=100)(physicalDeliveryOfficeName=274)))",
);
Or piecemeal:
my $mail = "mail=*";
my $employee = "!(employee=9*)";
my $physicalAddress = "|(physicalDeliveryOfficeName=100)"
. "(physicalDeliveryOfficeName=274)";
my $psearch-$ldap->search(
base => $my_base,
attr => ["mail","employeeNumber","physicalDeliveryOfficeName"],
filter => "(&($mail)($employee)($physicalAddress))",
);
As I said before, I can't test this. I hope it works. If nothing else, I hope you understand how to create an LDAP query and can figure out how to do it yourself.