How can I fix garbled multibyte text when using Text::vCard's as_string() method? - perl

When using multibyte UTF-8 characters in a NOTE node, characters are garbled/lost around the newline.
For example:
$vcard = $address_book->add_vcard();
$vcard->version('3.0');
$vcard->FN('Tèśt Ûšér');
$vcard->NOTE('①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳');
say $vcard->as_string();
Produces:
BEGIN:VCARD
VERSION:3.0
FN:Tèśt Ûšér
NOTE:①②③④⑤⑥⑦⑧⑨⑩⑪��
�⑬⑭⑮⑯⑰⑱⑲⑳①②③④
⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯�
��⑱⑲⑳①②③④⑤⑥⑦⑧��
�⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳①
②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬�
��⑮⑯⑰⑱⑲⑳
END:VCARD
How would go about fixing this? I also posted this as an issue on the text-vcard project page. I think this is related to how the new lines are inserted (by inserting the raw bytes: \x0D\x0A), but I'm not sure.

It looks like the culprit is Text::vCard::Node->_wrap_utf8(). I was able to at least get it to stop cutting up characters by bypassing that method all together.
sub _wrap_utf8 {
my ( $self, $key, $value, $max, $newline ) = #_;
#bypass wrapping
return $key . $value;
…
}

Related

Why Laravel Request object is replacing spaces with underscores on my form names?

I have a Form posting variables containing spaces in their names
e.g.
I perform my ajax request and i can see in chrome inspector that name is correctly passed "with blank space)
In my api.php:
Route::post('/user', 'UserController#get');
UserController
function get(Request $request)
{
dd($request->input('Name Surname')); //display null
dd($request->all()); //I notice the key's changed to Name_Surname
}
Taken that I can't change the names because they have to contain spaces (bad practice? ok but it has to be like that):
how can I avoid spaces to be replaced?
(maybe without to have to manipulate the request->all() returned array keys by hand....)
Short answer I don't believe there to be such a way.
You can map the response with a bit of string replace though:
$data = $request->all()->mapWithKeys(function($item, $key) {
return [str_replace("_", " ", $key) => $item];
});
If it's something you want to apply across the board, you could possible rig up some middleware to apply it to all requests.
If previous answer not work for you, try this:
$data = collect($request->all())->mapWithKeys(function($item, $key) {
return [str_replace("_", " ", $key) => $item];
})->toArray();
You may also normalize the Input Name if it is known...
$field_name = 'FIELD NAME WITH SPACES';
$value = request( str_replace( ' ', '_', $field_name ) );

Perl split string at character entity reference

Quick Perl question with hopefully a simple answer. I'm trying to perform a split on a string containing non breaking spaces ( ). This is after reading in an html page using HTML::TreeBuilder::XPath and retrieving the string needed by $titleString = $tree->findvalue('/html/head/title')
use HTML::TreeBuilder::XPath;
$tree = HTML::TreeBuilder::XPath->new;
$tree->parse_file( "filename" );
$titleString = $tree->findvalue('/html/head/title');
print "$titleString\n";
Pasted below is the original string and below that the string that gets printed:
Mr Dan Perkins (Active)
Mr?Dan Perkins?(Active)
I've tried splitting $titleString with #parts = split('\?',$titleString); and also with the original nbsp, though neither have worked. My hunch is that there's a simple piece of encoding code to be added somewhere?
HTML code:
<html>
<head>
<title>Dan Perkins (Active)</title>
</head>
</html>
You shouldn't have to know how the text in the document is encoded. As such, findvalue returns an actual non-breaking space (U+00A0) when the document contains . As such, you'd use
split(/\xA0/, $title_string)
-or-
split(/\x{00A0}/, $title_string)
-or-
split(/\N{U+00A0}/, $title_string)
-or-
split(/\N{NBSP}/, $title_string)
-or-
split(/\N{NO-BREAK SPACE}/, $title_string)

Perl Read a file into a variable and add suffix to each lines

I'm very new to Perl and I'm having a hard time find out what I want.
I have a text file containing something like
text 2015-02-02:
- blabla1
- blabla2
text2 2014-12-12:
- blabla
- ...
I'm trying to read the file, put it in var, add to end of each line (of my var) and use it to send it to a web page.
This is what I have for the moment. It works except for the part.
if (open (IN, "CHANGELOG.OLD")) {
local $/;
$oldchangelog = <IN>'</br>';
close (IN);
$tmplhtml{'CHANGELOG'} = $oldchangelog;
} else {
# changelog not available
$tmplhtml{'CHANGELOG'} = "Changelog not available";
}
thanks for the help!
As someone comments - this looks like YAML, so parsing as YAML is probably more appropriate.
However to address your scenario:
3 argument file opens are good.
you're using local $/; which means you're reading the whole file into a string. This is not suitable for line by line processing.
Looks like you're putting everything into one element of a hash. Is there any particular reason you're doing this?
Anyway:
if ( open ( my $input, "<", "CHANGELOG.OLD" ) ) {
while ( my $line = <$input> ) {
$tmplhtml{'CHANGELOG'} .= $line . " <BR/>\n";
}
}
else {
$tmplhtml{'CHANGELOG'} = "Changelog not available";
}
As an alternative - you can render text 'neatly' to HTML using <PRE> tags.

ReCaptcha Implementation in Perl

To implement recaptcha in my website.
One option is google API . But for that i need to signup with domain name to get API key.
Is there any other way we can do it ?
You don't necessarily need a domain name to sign up, per se.
They have a concept of a "global key" where one single domain key would be used on several domains. When signing up, select the "Enable this key on all domains (global key)" option, and use a unique identifier (domainkey.abhilasha.com) and this will be fine, you can use the key from any domain in the end.
One way: add this code to your perl file that is called by an html form:
Simplified of course
my #field_names=qw(name branch email g-recaptcha-response);
foreach $field_name (#field_names)
{
if (defined param("$field_name"))
{
$FIELD{$field_name} = param("$field_name");
}
}
$captcha=$FIELD{'g-recaptcha-response'};
use LWP::Simple;
$secretKey = "put your key here";
$ip = remote_host;
#Remove # rem to test submitted variables are present
#print "secret= $secretKey";
#print " and response= $captcha";
#print " and remoteip= $ip";
$URL = "https://www.google.com/recaptcha/api/siteverify?secret=".$secretKey."&response=".$captcha."&remoteip=".$ip;
$contents = get $URL or die;
# contents variable takes the form of: "success": true, "challenge_ts": "2016-11-21T16:02:41Z", "hostname": "www.mydomain.org.uk"
use Data::Dumper qw(Dumper);
# Split contents variable by comma:
my ($success, $challenge_time, $hostname) = split /,/, $contents;
# Split success variable by colon:
my ($success_title, $success_value) = split /:/, $success;
#strip whitespace:
$success_value =~ s/^\s+//;
if ($success_value eq "true")
{
print "it worked";
}else{
print "it did not";
}
If you are just trying to block spam, I prefer the honeypot captcha approach: http://haacked.com/archive/2007/09/10/honeypot-captcha.aspx
Put an input field on your form that should be left blank, then hide it with CSS (preferably in an external CSS file). A robot will find it and will put spam in it but humans wont see it.
In your form validation script, check the length of the field, if it contains any characters, do not process the form submission.

Removing top-directory-only URLs from a list of URLs?

I have a question that I'm having trouble researching, as I don't know how to ask it correctly on a search engine.
I have a list of URLs. I would like to have some automated way (Perl for preference) to go through the list and remove all URLs that are top directory only.
So for example I might have this list:
http://www.example.com/hello.html
http://www.foo.com/this/thingrighthere.html
In this case I would want to remove example.com from my list, as it is either top-directory only or they reference files in a top directory.
I'm trying to figure out how to do that. My first thought was, count forward slashes and if there's more than two, eliminate the URL from the list. But then you have trailing forward slashes, so that wouldn't work.
Any ideas or thoughts would be much appreciated.
Something like this:
use URI::Split qw( uri_split );
my $url = "http://www.foo.com/this/thingrighthere.html";
my ($scheme, $auth, $path, $query, $frag) = uri_split( $url );
if (($path =~ tr/\///) > 1 ) {
print "I care about this $url";
}
http://metacpan.org/pod/URI::Split
You could do this with regexes, but its much less work to let the URI library do it for you. You won't get caught out by funny schemes, escapes, and extra stuff before and after the path (query, anchor, authorization...). There's some trickiness around how paths are represented by path_segments(). See the comments below and the URI docs for details.
I have assumed that http://www.example.com/foo/ is considered a top directory. Adjust as necessary, but its something you have to think about.
#!/usr/bin/env perl
use URI;
use File::Spec;
use strict;
use warnings;
use Test::More 'no_plan';
sub is_top_level_uri {
my $uri = shift;
# turn it into a URI object if it isn't already
$uri = URI->new($uri) unless eval { $uri->isa("URI") };
# normalize it
$uri = $uri->canonical;
# split the path part into pieces
my #path_segments = $uri->path_segments;
# for an absolute path, which most are, the absoluteness will be
# represented by an empty string. Also /foo/ will come out as two elements.
# Strip that all out, it gets in our way for this purpose.
#path_segments = grep { $_ ne '' } #path_segments;
return #path_segments <= 1;
}
my #filtered_uris = (
"http://www.example.com/hello.html",
"http://www.example.com/",
"http://www.example.com",
"https://www.example.com/",
"https://www.example.com/foo/#extra",
"ftp://www.example.com/foo",
"ftp://www.example.com/foo/",
"https://www.example.com/foo/#extra",
"https://www.example.com/foo/?extra",
"http://www.example.com/hello.html#extra",
"http://www.example.com/hello.html?extra",
"file:///foo",
"file:///foo/",
"file:///foo.txt",
);
my #unfiltered_uris = (
"http://www.foo.com/this/thingrighthere.html",
"https://www.example.com/foo/bar",
"ftp://www.example.com/foo/bar/",
"file:///foo/bar",
"file:///foo/bar.txt",
);
for my $uri (#filtered_uris) {
ok is_top_level_uri($uri), $uri;
}
for my $uri (#unfiltered_uris) {
ok !is_top_level_uri($uri), $uri;
}
Use the URI module from CPAN. http://search.cpan.org/dist/URI
This is a solved problem. People have already written, tested and debugged code that handles this already. Whenever you have a programming problem that others have probably had to deal with, then look for existing code that does it for you.