Quick Perl question with hopefully a simple answer. I'm trying to perform a split on a string containing non breaking spaces ( ). This is after reading in an html page using HTML::TreeBuilder::XPath and retrieving the string needed by $titleString = $tree->findvalue('/html/head/title')
use HTML::TreeBuilder::XPath;
$tree = HTML::TreeBuilder::XPath->new;
$tree->parse_file( "filename" );
$titleString = $tree->findvalue('/html/head/title');
print "$titleString\n";
Pasted below is the original string and below that the string that gets printed:
Mr Dan Perkins (Active)
Mr?Dan Perkins?(Active)
I've tried splitting $titleString with #parts = split('\?',$titleString); and also with the original nbsp, though neither have worked. My hunch is that there's a simple piece of encoding code to be added somewhere?
HTML code:
<html>
<head>
<title>Dan Perkins (Active)</title>
</head>
</html>
You shouldn't have to know how the text in the document is encoded. As such, findvalue returns an actual non-breaking space (U+00A0) when the document contains . As such, you'd use
split(/\xA0/, $title_string)
-or-
split(/\x{00A0}/, $title_string)
-or-
split(/\N{U+00A0}/, $title_string)
-or-
split(/\N{NBSP}/, $title_string)
-or-
split(/\N{NO-BREAK SPACE}/, $title_string)
Related
$encoded = encode_entities($input, '<>&"');
This will encode the <,>,&,".But how to exclude these things from the encoding??
There is an example in the documentation:
$encoded = encode_entities($input, '^\n\x20-\x25\x27-\x7e');
When using multibyte UTF-8 characters in a NOTE node, characters are garbled/lost around the newline.
For example:
$vcard = $address_book->add_vcard();
$vcard->version('3.0');
$vcard->FN('Tèśt Ûšér');
$vcard->NOTE('①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳');
say $vcard->as_string();
Produces:
BEGIN:VCARD
VERSION:3.0
FN:Tèśt Ûšér
NOTE:①②③④⑤⑥⑦⑧⑨⑩⑪��
�⑬⑭⑮⑯⑰⑱⑲⑳①②③④
⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯�
��⑱⑲⑳①②③④⑤⑥⑦⑧��
�⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳①
②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬�
��⑮⑯⑰⑱⑲⑳
END:VCARD
How would go about fixing this? I also posted this as an issue on the text-vcard project page. I think this is related to how the new lines are inserted (by inserting the raw bytes: \x0D\x0A), but I'm not sure.
It looks like the culprit is Text::vCard::Node->_wrap_utf8(). I was able to at least get it to stop cutting up characters by bypassing that method all together.
sub _wrap_utf8 {
my ( $self, $key, $value, $max, $newline ) = #_;
#bypass wrapping
return $key . $value;
…
}
I needed to write a custom module in drupal to help out with my location search. Initially I simply needed to remove a comma from queries, and then I realized that I would need to replace all instances of states with their abbreviation (California -> CA) because of how information is stored in my database. However, upon doing this I found out that my method of using preg_replace seems to be dependent on upper/lowercase. So in this line:
$form_state['values'] = preg_replace("/alabama/", 'al', $form_state['values']);
"alabama" will be replaced with "al", but "Alabama" or "ALABAMA" will not. Is there a way to replace any instance of Alabama with its abbreviation without accounting for every possible variation in casings?
you can try also str_ireplace() it's Case-insensitive
<?php
$str = 'alabama ,Alabama,ALABAMA';
$replace = str_ireplace('alabama','al',$str);
echo $str;
echo "<br/>";
echo $test;
?>
$form_state['values'] = preg_replace("/alabama/i", 'al', $form_state['values']);
The 'i' modifier will make the pattern case-insensitive.
How do you get elements in a header/footer of a odt doc?
for example I have:
use OpenOffice::OODoc;
my $doc = odfDocument(file => 'whatever.odt');
my $t=0;
while (my $table = $doc->getTable($t))
{
print "Table $t exists\n";
$t++;
}
When I check the tables they are all from the body. I can't seem to find elements for anything in the header or footer?
I found sample code here which led me to the answer:
#! /usr/local/bin/perl
use OpenOffice::OODoc;
my $file='asdf.odt';
# odfContainer is a representation of the zipped odf file
# and all of its parts.
my $container = odfContainer("$file");
# We're going to look at the 'style' part of the container,
# because that's where the header is located.
my $style = odfDocument
(
container => $container,
part => 'styles'
);
# masterPageHeader takes the style name as its argument.
# This is not at all clear from the documentation.
my $masterPageHeader = $style->masterPageHeader('Standard');
my $headerText = $style->getText( $masterPageHeader );
print "$headerText\n"
The master page style defines the look and feel of the document -- think CSS. Apparently 'Standard' is the default name for the master page style of a document created by OpenOffice... that was the toughest nut to crack... once I found the example code, that fell out in my lap.
Do you ever escape single quotes in template toolkit for necessary javascript handlers? If so, how do you do it.
[% SET s = "A'B'C" %]
ABC
html_entity obviously doesn't work because it only handles the double quote. So how do you do it?
I don't use the inlined event handlers -- for the same reason I refuse to use the style attribute for css. Jquery just makes it to easy to do class="foo" on the html and $('.foo').click( function () {} ), in an external .js file.
But, for the purpose of doing my best to answer this question, check out these docs on Template::Filter for the ones in core.
It seems as if you could do [% s | replace( "'", "\\'" ) %], to escape single quotes. Or you could probably write a more complex sanitizing javascript parser that permits only function calls, and make your own Template::Filter
2018 update for reference:
TT has a method for this called squote for escaping single quotes and dquote for double quotes.
[% tim = "Tim O'Reilly" %]
[% tim.squote %] # Tim O\'Reilly
Questioned link would be something like:
ABC
http://www.template-toolkit.org/docs/manual/VMethods.html#section_squote
You can try: popup('[% s | html %]').
Perl isn't my strongest language... But!
Easiest way I've found is to use the JSON module. In a module called JS.pm or something:
use JSON;
sub encode () {
my $self = shift;
my $string = shift;
$json = JSON->new->allow_nonref;
return $json->encode( $string );
}
More here: http://search.cpan.org/~makamaka/JSON-2.90/lib/JSON.pm
Then in your template:
[% use JS; %]
<script>
var escaped_string = [% JS.encode( some_template_variable ) %];
</script>
Remember to double-escape the slash in the replacement, otherwise it will be interpreted as escaping the apostrophe.
[% string.replace( "'", "\\'" ) %]