I'm trying to extract all name values in input fields using selenium and perl. Part of the value, enough to identify it, is known, the rest is unknown:
This xpath works in finding all relevant matches:
//tr/td//input[contains(#name,'partofname')]
So, in perl:
my $xpath = qq(//tr/td//input[contains(\#name,'partofname')]);
my $count = $sel->get_xpath_count($xpath);
Fine, $count gives a suitable count of matches.
However, how to extract the value of the #name attribute for each individual matches?
I understand the principle is to construct a loop:
foreach my $row (1 .. $count) {
#extract here
};
However, I can't seem to construct an xpath expression which will work to find each $row that the expression matched. So I think it's the correct xpath expression to get each individual match that I need help with.
Any pointers appreciated
Try //tr/td/descendant::input[contains(#name,'partofname')][1]
Replace 1 with your counter. If that doesn't could you add some HTML to your question so I can perhaps suggest a better XPath?
Related
How do I check if an array contains a time value? I've tried checking like this:
if ( #time =~ /$_:$_:$_/)
But it didn't work. Any ideas?
P.S.: The time is given like this: HH:MM:SS
Matching the time
To check for HH:MM:SS with a regular expression match, the simplest pattern would be
/\d\d:\d\d:\d\d/
If you only want this, add anchors for start (^) and end ($) of the string.
/^\d\d:\d\d:\d\d$/
If you want to make sure that your digits are only 0 to 9 and not digits from any script, use a character group.
/^[0-9]{2}:[0-9]{2}:[0-9]{2}$/
If you also want to make sure the time is a valid time, things get more complicated.
You might want to read perlre and perlretut. The tag wiki on Regular Expressions here on Stack Overflow has a lot of useful information and links to tools as well.
On arrays and scalars
However, there is no array in the code you've shown. In Perl, a variable with a $ as its sigil is called a scalar and represents a single value. That's the only thing you can pattern match against. An array would start with an # symbol.
What you can do is match against every element in your array. For that, you have to iterate the array.
A very verbose way to do that would be:
my $matches;
foreach my $time (#times) {
++$matches if $time =~ m/\d\d:\d\d:\d\d/;
}
A more Perlish way would be to use grep.
my $matches = grep { m/\d\d:\d\d:\d\d/ } #times;
This makes use of the fact that the list returned by grep will be converted to its number of elements in scalar context. If all you want is to know whether any of the elements matched, this is enough.
What your code did
The $_ variable is called the topic in Perl, and often contains some kind of default value for certain operators, if no other value is specified. Depending on where in your program you used your line of code, you are matching the number of elements in #time (because of scalar context, see above) against a pattern built up of the content of $_ and colons.
if (
#time # number of elements in array #times
=~ # because this operator forces scalar context
/
$_ # value of $_ based on surrounding code, or undef
: # a literal colon
$_ # see above
: # a literal colon
$_ # see above
/x # ( I added /x to allow comments so this compiles)
) { ... }
I am still learning perl and have all most got a program written. My question, as simple as it may be, is if I want to hardcode a string to a field would the below do that? Thank you :).
$out[45]="VUS";
In the other lines I use the below to define the values that are passed into the `$[out], but the one in question is hardcoded and the others come from a split.
my #vals = split/\t/; # this splits the line at tabs
my #mutations=split/,/,$vals[9]; # splits on comma to create an array of mutations
my ($gene,$transcript,$exon,$coding,$aa);
for (#mutations)
{
($gene,$transcript,$exon,$coding,$aa) = split/\:/; # this takes col AB and splits it at colons
grep {$transcript eq $_} keys %nms or next;
}
my #out=($.,#colsleft,$_,#colsright);
$out[2]=$gene;
$out[3]=$nms{$transcript};
$out[4]=$transcript;
$out[15]=$coding;
$out[17]=$aa;
Your line of code: $out[45]="VUS"; is correct in that it is defining that 46th element of the array #out to the string, "VUS". I am trying to understand from your code, however why you would want to do that? Usually, it is better practice to not hardcode if at all possible. You want to make it your goal to make your program as dynamic as possible.
I have a file that I am reading in. I'm using perl to reformat the date. It is a comma seperated file. In one of the files, I know that element.0 is a zipcode and element.1 is a counter. Each row can have 1-n number of cities. I need to know the number of elements from element.3 to the end of the line so that I can reformat them properly. I was wanting to use a foreach loop starting at element.3 to format the other elements into a single string.
Any help would be appreciated. Basically I am trying to read in a csv file and create a cpp file that can then be compiled on another platform as a plug-in for that platform.
Best Regards
Michael Gould
you can do something like this to get the fields from a line:
my #fields = split /,/, $line;
To access all elements from 3 to the end, do this:
foreach my $city (#fields[3..$#fields])
{
#do stuff
}
(Note, based on your question I assume you are using zero-based indexing. Thus "element 3" is the 4th element).
Alternatively, consider Text::CSV to read your CSV file, especially if you have things like escaped delimiters.
Well if your line is being read into an array, you can get the number of elements in the array by evaluating it in scalar context, for example
my $elems = #line;
or to be really sure
my $elems = scalar(#line);
Although in that case the scalar is redundant, it's handy for forcing scalar context where it would otherwise be list context. You can also find the index of the last element of the array with $#line.
After that, if you want to get everything from element 3 onwards you can use an array slice:
my #threeonwards = #line[3 .. $#line];
I'm trying to find a way in Zend_Search_Lucene to pull off the following scenario:
Let's say we have a user and her name is Aïcha (note the special character). If I'm searching the index for Aicha (without the special derivative of i), I'd like for Aïcha to be returned in the results.
Is there something special I need to do when indexing or searching in order to make this work? I've read solutions about normalizing the data before indexing, replacing all special characters with normalized characters, but I'd rather not go that route.
Thanks in advance,
Gary
function normalize ($string){
$a = 'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ
ßàáâãäåæçèéêëìíîïðñòóôõöøùúûýýþÿŔŕ';
$b = 'aaaaaaaceeeeiiiidnoooooouuuuy
bsaaaaaaaceeeeiiiidnoooooouuuyybyRr';
$string = utf8_decode($string);
$string = strtr($string, utf8_decode($a), $b);
$string = strtolower($string);
return utf8_encode($string);
}
$passToIndexer = normalize(" Aïcha ");
try to use this functions output while creating the index, store the actual value without indexing it =) hope it helps, I Frankly dont think there is any other way.
I've spent entirely way too long trying to figure this out. I'm using XML: RSS and Perl to read / parse an Ebay RSS feed. Within the <item></item> area, I see these entries:
<rx:BuyItNowPrice xmlns:rx="urn:ebay:apis:eBLBaseComponents">1395</rx:BuyItNowPrice>
<rx:CurrentPrice xmlns:rx="urn:ebay:apis:eBLBaseComponents">1255</rx:CurrentPrice>
However, I can't figure out how to grab the details during the loop. I wrote a regex to grab them:
#current_price = $item =~ m/\<rx\:CurrentPrice.*\>(\d+)\<\/rx\:CurrentPrice\>/g;
Which works if you place the above 'CurrentPrice' entry into a standalone string, but not while the script is reading through the RSS feed.
I can grab most of the information I want out of the item->description area (# bids, auction end time, BIN price, thumbnail image, etc.), but it would be nicer if I could grab the info from the feed without me having to deal with grabbing all that information manually.
How to grab custom fields from an RSS feed (short of writing regexes to parse the entire feed w/o a module)?
Here's the code I'm working with:
$my_limit = 0;
use LWP::Simple;
use XML::RSS;
$rss = XML::RSS->new();
$data = get( $mylink );
$rss->parse( $data );
$channel = $rss->{channel};
$NumItems = 0;
foreach $item (#{$rss->{'items'}}) {
if($NumItems > $my_limit){
last;
}
#current_price = $item =~ m/\<rx\:CurrentPrice.*\>(\d+)\<\/rx\:CurrentPrice\>/g;
print "$current_price[0]";
}
If you have the rss/xml document and want specific data you could use XPATH:
Perl CPAN XPATH
XPath Introduction
What is the way in which "it doesn't work" from an RSS feed? Do you mean no matches when there should be matches? Or one match where there should be several matches?
One thing that jumps out at me about your regular expression is that you use .*, which can sometimes be greedier than you want. That is, if $item contained the expression
<rx:BuyItNowPrice xmlns:rx="urn:...nts">1395</rx:BuyItNowPrice>
<rx:CurrentPrice xmlns:rx="urn:...nts">1255</rx:CurrentPrice>
<rx:BuyItNowPrice xmlns:rx="urn:...nts">1395</rx:BuyItNowPrice>
<rx:SomeMoreStuff xmlns:rx="urn:...nts">zzz</rx:BuyItNowPrice>
<rx:CurrentPrice xmlns:rx="urn:...nts">1255</rx:CurrentPrice>
then the first part of your regular expression (\<rx\:CurrentPrice.*\>) will wind up matching everything on lines 2, 3, and 4, plus the first part of line 5 (up to the >). Instead, you might want to use the regular expression1
m/\<rx:CurrentPrice[^>]*>(\d+)\<\/rx:CurrentPrice\>/
which will only match up to the closing </rx:CurrentPrice> tag after a single instance of an opening <rx:CurrentPrice> tag.
1 The other obvious answer is that you really don't want to use a regular expression at all, that regular expressions are inferior tools for parsing XML compared to customized parsing modules, and that all the special cases you will have to deal with using regular expressions will eventually render you unconscious from having repeatedly beaten your head against your desk. See Salgar's answer, for example.