Perl Regex Error Help - perl

I'm receiving a similar error in two completely unrelated places in our code that we can't seem to figure out how to resolve. The first error occurs when we try to parse XML using XML::Simple:
Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /usr/local/lib/perl5/XML/LibXML/Error.pm line 217.
And the second is when we try to do simple string substitution:
Malformed UTF-8 character (unexpected non-continuation byte 0x78, immediately after start byte 0xe9) in substitution (s///) at /gold/content/var/www/alltrails.com/cgi-bin/API/Log.pm line 365.
The line in question in our Log.pm file is as follows where $message is a string:
$message =~ s/\s+$//g;
Our biggest problem in troubleshoot this is that we haven't found a way to identify the input that is causing this to occur. My hope is that some else has run into this issue before and can provide advice or sample code that will help us resolve it.
Thanks in advance for your help!

Not sure what the cause is, but if you want to log the message that is causing this, you could always add a __DIE__ signal handler to make sure you capture the error:
$SIG{__DIE__} = sub {
if ($_[0] =~ /Malformed UTF-8 character/) {
print STDERR "message = $message\n";
}
};
That should at least let you know what string is triggering these errors.

Can you do a hex dump of the source data to see what it looks like?
If your reading this from a file, you can do this with a tool like "od".
Or, you can do this inside the perl script itself by passing the string to a function like this:
sub DumpString {
my #a = unpack('C*',$_[0]);
my $o = 0;
while (#a) {
my #b = splice #a,0,16;
my #d = map sprintf("%03d",$_), #b;
my #x = map sprintf("%02x",$_), #b;
my $c = substr($_[0],$o,16);
$c =~ s/[[:^print:]]/ /g;
printf "%6d %s\n",$o,join(' ',#d);
print " "x8,join(' ',#x),"\n";
print " "x9,join(' ',split(//,$c)),"\n";
$o += 16;
}
}

Sounds like you have an "XML" file that is expected to have UTF-8 encoded characters but doesn't. Try just opening it and looking for hibit characters.

Related

Perl Single Quote replacement

I've been struggling for the last days in regards to a character replacement in Perl:
I have a String which is surrounded by single quotes, yet, inside that String, I have a name which contains a single quote, let's say O'Neil. Now, given the fact that my String is surrounded by single quotes, Perl recognizes the single quote in the Name, as being the end of the String.
Surrounding the entire string in double quotes is not an option, since it's build from an URL.
Now, I did some research and didn't find anything, now I'm asking y'all:
I've tried to play around with the following syntax:
$Daten =~ s/\'/\\'/g; which of course doesn't work...
$Daten is the entire string which contains the Name O'Neil*
Now, I want to replace the single quote, with a backslash quote: ' -> \'
Anyone has any ideas?
Best regards,
Ionut Sanda
Perhaps something like following code should comply with your requirements
use strict;
use warnings;
my $debug = 1;
while( my $line = <DATA> ) {
$line =~ s/(.*)'(.+)'(.+)'(.*)/$1'$2\\'$3'$4/g;
print $line if $debug;
}
__DATA__
'USER1:O'NEILL:PATRICK:M:lastname_firstname#company.com'
datax 'USER1:O'NEILL:PATRICK:M:lastname_firstname#company.com' datay
output
'USER1:O\'NEILL:PATRICK:M:lastname_firstname#company.com'
datax 'USER1:O\'NEILL:PATRICK:M:lastname_firstname#company.com' datay
Well, as you do not provide a sample or your code I have to improvise
use strict;
use warnings;
my $debug = 1;
while( my $Daten = <DATA> ) {
$Daten =~ s/(.*)'(.+)'(.+)'(.*)/$1'$2\\'$3'$4/g; # Magic happens here
print $Daten if $debug;
}
__DATA__
'USER1:O'NEILL:PATRICK:M:lastname_firstname#company.com'
datax 'USER1:O'NEILL:PATRICK:M:lastname_firstname#company.com' datay
output
'USER1:O\'NEILL:PATRICK:M:lastname_firstname#company.com'
datax 'USER1:O\'NEILL:PATRICK:M:lastname_firstname#company.com' datay
Otherwise I do not have enough information to understand your problem (no sample of data, no snippet of the code).

Reading CSV with Perl produces distorted lines

I am reading a CSV file using Perl 5.26.1 with lines that look like this:
B1_10,202337840166,R08C02,202337840166_R08C02.gtc
I'm reading this data into a hash that has the last element as a key, and the first as a value.
I read the file line by line (snippet only):
while (<$csv>) {
if (/^Sample/) { next }
say "-----start----\noriginal = $_";
chomp;
my #line = split /,/;
my $name = $line[0];
my $vcf = $line[3];
say "1st element = $name";
say "4th element = $vcf";
$vcf2dir{$vcf} = $name;
say "\$vcf2dir{$vcf} = '$name'";
say '-----end------';
}
which produces the following output:
-----start----
original = B1_10,202337840166,R08C02,202337840166_R08C02.gtc
1st element = B1_10
4th element = 202337840166_R08C02.gtc
} = 'B1_10'2337840166_R08C02.gtc
-----end-------
but it should look like
-----start----
original = B1_10,202337840166,R08C02,202337840166_R08C02.gtc
1st element = B1_10
4th element = 202337840166_R08C02.gtc
$vcf2dir{202337840166_R08C02.gtc} = 'B1_10'
-----end-------
and it shows strangely with the data printer package:
use DDP;
p %vcf2dir;
produces
{
' "B1_10"840166_R08C02.gtc
}
in other words, the last string is being cut up for some reason.
I have tried removing non-ascii characters with $_ =~ s/[[:^ascii:]]//g; but this still produces the same error.
I have no idea why Perl is ripping these strings apart :(
while (<$csv>) {
...
chomp;
My guess is that the input file has as line end \r\n (windows style) while you are executing the code in a UNIX like environment (Linux, Mac...) where the line end is \n. This means that $INPUT_RECORD_SEPARATOR is also \n and that chomp only removes the \n and leaves the \r. This left \r causes such strange output.
To fix this either fix the line endings in your input file, set $INPUT_RECORD_SEPARATOR to the expected separator or just do s{\r?\n\z}{} instead of chomp to handle both \r\n and \n line endings.
I ran your snippet against your line and it worked as expected
But I have had behavior like what you show because a spurious Control-M's in my data.
Try filtering for control-M's
after your chomp replace all control-M's with the command below
s/\cM//g;

Weird behavior with Perl string concatenation

I'm working on a pretty simple script, reading a maplist.txt file and using the \n separated map names in it to build a command string - however, I'm getting some unexpected behavior.
My full code:
# compiles a map pack from maplist.txt
# for every server.
# Filipe Dobreira <dobreira#gmail.com>
# v1 # Sept. 2011
use strict;
my #servers = <*>;
foreach my $server (#servers)
{
# we only want folders:
next if -f $server;
print "server: $server\n";
my $maplist = $server . '/orangebox/cstrike/maplist.txt';
my $mapdir = $server . '/orangebox/cstrike/maps';
print " maplist: $maplist\n";
print " map folder: $mapdir\n";
# check if the maplist actually exists:
if(!(-e $maplist))
{
print "!!! failed to find $maplist\n";
next;
}
open MAPLIST, "<$maplist";
foreach my $map (<MAPLIST>)
{
chomp($map);
next if !$map;
# full path to the map file:
my $mapfile = "$mapdir/$map.bsp";
print "$mapfile\n";
}
}
Where I declare $mapfile, I expect the result to be something like:
zombieescape1/orangebox/cstrike/maps/ze_stargate_escape_v8.bsp
However, it seems like the concatenation is being made to the START of the string, and the final result ends up being something like:
.bspiescape1/orangebox/cstrike/maps/ze_stargate_escape_v8
So the .bsp portion is actually being written over the start of the leftmost string. I have very little perl experience, and I can only assume this is me failing to understand some quirk or operator behavior.
Note: I've also tried using "${mapdir}/${map}.bsp", concatenating everything with the dot operator, and a join "", $mapdir, $map, ".bsp", with the same result.
Thanks in advance.
PS: for reference, here's what a maplist.txtlooks like:
zm_3dubka_v3
zm_4way_tunnel_v2
zm_abstractchode_pyramid2
zm_anotheruglyzmap_v1e
zm_app7e_betterbworld_JDfix_v3
zm_atix_helicopter_mini
zm_base_winter_beta3
zm_battleforce_panic_ua
zm_black_lion_macd_v8
zm_bunker_f57_v2
zm_burbsdelchode_b3
zm_choddarena_b12
zm_choddasnowpanic_b4
zm_citylife_V2b
zm_crazycity
zm_deep_thought_nv
zm_desert_fortress_v2
ZM_desprerados_a1
zm_doomlike_station_v2
zm_dust_arena_v1_final
zm_exhibit_night_2F
zm_facility_v1
zm_farm3_nav72
zm_firewall_samarkand
zm_fortress_b7
zm_ghs_flats
zm_gl33m4x_errata
zm_idm_hauntedhouse_v1
zm_industry_v2
zm_kruma_kakariko_village_006
zm_kruma_panic_004
zm_lila_off!ce_v4
zm_little_city_v5pf_fix
zm_moonlight_v3_pF
zm_moon_roflicious_pF_02
zm_moocbblechode_b2
zm_mountain_b2
zm_neko_abura_v2
zm_neko_athletic_park_v2
zm_novum_v3_JDfix
zm_ocx_orly_v4
zm_officeattack_b5a
zm_officerush_betav7
zm_officesspace_pfss
zm_omi_facility_pfv2
zm_penumbra_PF3
zm_raindance_ak_v2
zm_roflicious_pfcf2
zm_roy_abandoned_canals_new
zm_roy_barricade_factory
zm_roy_highway
zm_roy_industrial_complex
zm_roy_old_industrial_pF
zm_roy_the_ship_pf
zm_roy_zombieranch_night_b4
zm_survival_f2a
zm_temple_v3pf
zm_towers_v3
zm_tx_highschool_zkedit_v2
zm_unpanicv2_pF
zm_vc2_office_redone_b1
zm_wasteyard_beta3
zm_winterfun_b4a
zm_wtfhax_v6
zm_wtfhax_v6e
zm_wwt_twinsteel_v8
I'd guess that the maplist.txt has non-unix line endings - probably dos - and as result you see what looks like prepending.
The problem is that the chomp() is only consuming one of the two line ending characters, leaving the carriage return behind.
You might find that if you set the Perl special variable $/ (input record seperator) before opening the map list, that chomp then does the job - it will consume both line-ending characters.
$/ = qq{\r\n};
Another solution would be to convert the line endings in the file before processing, perhaps using dos2unix.

Perl comparison operation between a variable and an element of an array

I am having quite a bit of trouble with a Perl script I am writing. I want to compare an element of an array to a variable I have to see if they are true. For some reason I cannot seem to get the comparison operation to work correctly. It will either evaluate at true all the time (even when outputting both strings clearly shows they are not the same), or it will always be false and never evaluate (even if they are the same). I have found an example of just this kind of comparison operation on another website, but when I use it it doesn't work. Am I missing something? Is the variable type I take from the file not a string? (Can't be an integer as far as I can tell as it is an IP address).
$ipaddress = '192.43.2.130'
if ($address[0] == ' ')
{
open (FH, "serverips.txt") or die "Crossroads could not find a list of backend servers";
#address = <FH>;
close(FH);
print $address[0];
print $address[1];
}
for ($i = 0; $i < #address; $i++)
{
print "hello";
if ($address[$i] eq $ipaddress)
{print $address[$i];
$file = "server_$i";
print "I got here first";
goto SENDING;}
}
SENDING:
print " I am here";
I am pretty weak in Perl, so forgive me for any rookie mistakes/assumptions I may have made in my very meager bit of code. Thank you for you time.
if ($address[0] == ' ')
{
open (FH, "serverips.txt") or die "Crossroads could not find a list of backend servers";
#address = <FH>;
close(FH);
You have several issues with this code here. First you should use strict because it would tell you that #address is being used before it's defined and you're also using numeric comparison on a string.
Secondly you aren't creating an array of the address in the file. You need to loop through the lines of the file to add each address:
my #address = ();
while( my $addr = <FH> ) {
chomp($addr); # removes the newline character
push(#address, $addr);
}
However you really don't need to push into an array at all. Just loop through the file and find the IP. Also don't use goto. That's what last is for.
while( my $addr = <FH> ) {
chomp($addr);
if( $addr eq $ipaddress ) {
$file = "server_$i";
print $addr,"\n";
print "I got here first"; # not sure what this means
last; # breaks out of the loop
}
}
When you're reading in from a file like that, you should use chomp() when doing a comparison with that line. When you do:
print $address[0];
print $address[1];
The output is on two separate lines, even though you haven't explicitly printed a newline. That's because $address[$i] contains a newline at the end. chomp removes this.
if ($address[$i] eq $ipaddress)
could read
my $currentIP = $address[$i];
chomp($currentIP);
if ($currentIP eq $ipaddress)
Once you're familiar with chomp, you could even use:
chomp(my $currentIP = $address[$i]);
if ($currentIP eq $ipaddress)
Also, please replace the goto with a last statement. That's perl's equivalent of C's break.
Also, from your comment on Jack's answer:
Here's some code you can use for finding how long it's been since a file was modified:
my $secondsSinceUpdate = time() - stat('filename.txt')->mtime;
You probably are having an issue with newlines. Try using chomp($address[$i]).
First of all, please don't use goto. Every time you use goto, the baby Jesus cries while killing a kitten.
Secondly, your code is a bit confusing in that you seem to be populating #address after starting the if($address[0] == '') statement (not to mention that that if should be if($address[0] eq '')).
If you're trying to compare each element of #address with $ipaddress for equality, you can do something like the following
Note: This code assumes that you've populated #address.
my $num_matches=0;
foreach(#address)
{
$num_matches++ if $_ eq $ipaddress;
}
if($num_matches)
{
#You've got a match! Do something.
}
else
{
#You don't have any matches. This may or may not be bad. Do something else.
}
Alternatively, you can use the grep operator to get any and all matches from #address:
my #matches=grep{$_ eq $ipaddress}#address;
if(#matches)
{
#You've got matches.
}
else
{
#Sorry, no matches.
}
Finally, if you're using a version of Perl that is 5.10 or higher, you can use the smart match operator (ie ~~):
if($ipaddress~~#address)
{
#You've got a match!
}
else
{
#Nope, no matches.
}
When you read from a file like that you include the end-of-line character (generally \n) in each element. Use chomp #address; to get rid of it.
Also, use last; to exit the loop; goto is practically never needed.
Here's a rather idiomatic rewrite of your code. I'm excluding some of your logic that you might need, but isn't clear why:
$ipaddress = '192.43.2.130'
open (FH, "serverips.txt") or die "Crossroads could not find a list of backend servers";
while (<FH>) { # loop over the file, using the default input space
chomp; # remove end-of-line
last if ($_ eq $ipaddress); # a RE could easily be used here also, but keep the exact match
}
close(FH);
$file = "server_$."; # $. is the line number - it's not necessary to keep track yourself
print "The file is $file\n";
Some people dislike using perl's implicit variables (like $_ and $.) but they're not that hard to keep track of. perldoc perlvar lists all these variables and explains their usage.
Regarding the exact match vs. "RE" (regular expression, or regexp - see perldoc perlre for lots of gory details) -- the syntax for testing a RE against the default input space ($_) is very simple. Instead of
last if ($_ eq $ipaddress);
you could use
last if (/$ipaddress/);
Although treating an ip address as a regular expression (where . has a special meaning) is probably not a good idea.

Why can't I match my string from standard input in Perl?

Why will my script not work correctly?
I follow a YouTube video and worked for the guy.
I am running Perl on Windows using ActiveState ActivePerl 5.12.2.1202
Here is my tiny tiny code block.
print "What is your name?\n";
$name = <STDIN>;
if ($name eq "Jon") {
print "We have met before!\n";
} else {
print "We have not met before.\n";
}
The code automatically jumps to the else statement and does not even check the if statement.
The statement $name = <STDIN>; reads from standard input and includes the terminating newline character "\n". Remove this character using the chomp function:
print "What is your name?\n";
$name = <STDIN>;
chomp($name);
if ($name eq "Jon") {
print "We have met before!\n";
} else {
print "We have not met before.\n";
}
The trick in programming is to know what your data are. When something's not acting like you expect, look at the data to see if they are what you expect. For instance:
print "The name is [$name]\n";
You put the braces around it so you can see any extra whitespace that might be there. In this case, you would have seen:
The name is [Jon
]
That's your clue that there is extra stuff. Since the eq has to match exactly, it fails to match.
If you're just starting with Perl, try Learning Perl. It's much better than random videos from YouTube. :)
When you read the name standard input as $name = <STDIN>;
$name will have a trailing newline. So if I enter foo , $name will actually have foo\n.
To get rid of this newline you an make use of the chomp function as:
chomp($name = <STDIN>);