How to convert symbols "&'<> to entities " & &apos; < > in Perl - perl

What is the simplest method in Perl to convert special symbols "&'<> to entities " & &apos; < > in Perl? It is easy to write functions like this, but I think this problem has been solved a lot of times and there is no need to write your own functions.
sub add_entities {
my ($text) = #_;
$text =~ s/&/&/g;
$text =~ s/"/"/g;
$text =~ s/'/&apos;/g;
$text =~ s/</</g;
$text =~ s/>/>/g;
return $text;
}
sub remove_entities {
my ($text) = #_;
$text =~ s/"/"/g;
$text =~ s/&/&/g;
$text =~ s/&apos;/'/g;
$text =~ s/</</g;
$text =~ s/>/>/g;
return $text;
}

You should never ever need remove_entities. Your parser shouldn't return any entities. Seems you have a horribly broken parser. I recommend XML::LibXML.
Same goes for add_entities XML. The XML writing library will handle all of that for you. You could use XML::LibXML for this too, but XML::Writer is much simpler to use for this task.
Note that both of your routines are horribly broken. add_entities doesn't consider character set. remove_entities doesn't handle numerical and entities outside of the base XML spec.

Related

usage of perl find and replacement (regex) for a verilog netlist

I am trying to develop a script which takes input of a verilog netlist and create a testbench for that, so when we connect test-bench to the main module by name we do somethin like this,
.a(a); .ext(ext); etc etc.
now I have like 120 inputs for a bigger block like
`in1, in2... and some arrays like [31:0] ext; etc
I want to match the pattern of .in1; and replace it as .in1(in1);
I am trying
s/^\s+\.(.*)/\.$1\($1\)/g;
so it will check a string starting with space characters, followed by a single '.' character and then all characters, and replace it with the pattern shown in the statement line,
the complete code as follows:
use strict;
use warnings;
my $filename = shift;
open (my $fh , "<" , $filename) or die $!;
open (my $pr , ">" , "D:/dump/testbench.v") or die $!;
my #code;
while (my $line = <$fh>)
{
chomp $line;
#code = (#code, $line);
}
#foreach my $i (0..$#code)
#{
#print "$code[$i]\n";
#}
#
foreach my $j (0..$#code)
{
if ($code[$j] =~ /^\s+\..*/)
{
print "$code[$j]\n";
$code[$j] =~ s/^\s+\.(.*)/\.$1\($1\)/g;
print "$code[$j]\n";
}
}
foreach my $k (0..$#code)
{
print $pr "$code[$k]\n";
}
close $pr;
The replacement pattern does something like .in1;(in1;) instead of .in1(in1); and .[31:0] ext;([31:0] ext;).
how to do it in better way?

Unmatched ) in reg when using lc function

I am trying to run the following code:
$lines = "Enjoyable )) DAY";
$lines =~ lc $lines;
print $lines;
It fails on the second line where I get the error mentioned in the title. I understand the brackets are causing the trouble. I think I could use "quotemeta", but the thing is that my string contains info that I go on to process later, so I would like to keep the string intact as far as possible and not tamper with it too much.
You have two problems here.
1. =~ is used to execute a specific set of operations
The =~ operator is used to either match with //, m//, qr// or a string; or to substitute with s/// or tr///.
If all you want to do is lowercase the contents of $lines then you should use = not =~.
$lines = "Enjoyable )) DAY";
$lines = lc $lines;
print $lines;
2. Regular expressions have special characters which must be escaped
If you want to match $lines against a lower case version of $Lines, which should return true if $lines was already entirely lower case and false otherwise, then you need to escape the ")" characters.
#!/usr/bin/env perl
use strict;
use warnings;
my $lines = "enjoyable )) day";
if ($lines =~ lc quotemeta $lines) {
print "lines is lower case\n";
}
print $lines;
Note this is a toy example trying to find a reason for doing $lines =~ lc $lines - It would be much better (faster, safer) to solve this with eq as in $lines eq lc $lines.
See perldoc -f quotemeta or http://perldoc.perl.org/functions/quotemeta.html for more details on quotemeta.
=~ is used for regular expressions. "lc" is not part of regex, it's a function like this: $new = lc($old);
I don't recall the regex operator for lowercase, because I use lc() all the time.

Perl: how to split string without storing into array and continue split?

I guess this has been asked before, but I can't find it.
Say
my $string = "something_like:this-and/that";
my #w1 = split(/_/, $string);
my #w2 = split(/-/, $w1[1]);
my #w3 = split(/:/, $w2[0]);
print $w3[1]; #print out "this"
Is there anyway to avoid the temporary array variables #w1, #w2 and #w3 and get $w3[1] directly? I remember continue split works, but forget the syntax.
Thanks.
Yes, it's possible, but would be much harder to read, so isn't advised:
my $string = "something_like:this-and/that";
my $this = (split /:/, (split /-/, (split(/_/, $string))[1])[0])[1];
print $this; #print out "this"
Alternatively, you could use a regex in this instance, but don't think it adds anything:
my $string = "something_like:this-and/that";
my ($this) = $string =~ /.*?_.*?:([^-]*)/ or warn "not found";
print $this;
Your own solution unnecessarily splits on underscores, unless your real data is significantly different from your example. You could write this
use strict;
use warnings;
my $string = "something_like:this-and/that";
my $value = (split /-/, (split /:/, $string)[1])[0];
print $value;
Or this solution uses regular expressions and does what you ask
use strict;
use warnings;
my $string = "something_like:this-and/that";
my ($value) = $string =~ /:([^_-]*)/;
print $value;
output
this
This will modify $string in place:
my $string = "something_like:this-and/that";
$string =~ s/^.*:(.+)-.*/$1/;

perl find and replace ../ and  

I am using Perl to replace all instances of
../../../../../../abc' and  
in a string with
/ and , respectively.
The method I am using looks like this:
sub encode
{
my $result = $_[0];
$result =~ s/..\/..\/..\/..\/..\/..\//\//g;
$result =~ s/ / /g;
return $result;
}
Is this correct?
Essentially, yes, although the first regex has to be written in a different way: because . matches any character, we have to escape it \. or put it in its own character class [.]. The first regex can also be written cleaner as
...;
$result =~ s{ (?: [.][.]/ ){6} }
{/}gx;
...;
We look for the literal pattern ../ repeated 6 times and then replace it. Because I use curly braces as a delimiter I don't have to escape the slash. Because I use the /x modifier I can have these spaces inside the regex improving readability.
Try this. It will print /foo bar/baz.
#!/usr/bin/perl -w
use strict;
my $result = "../../../../../../foo bar/baz";
#$result =~ s/(\.\.\/)+/\//g; #for any number of ../
$result =~ s/(\.\.\/){6}/\//g; #for 6 exactly
$result =~ s/ / /g;
print $result . "\n";
you forgot the abc, i think:
sub encode
{
my $result = $_[0];
$result =~ s/(?:..\/){6}abc/\//g;
$result =~ s/ / /g;
return $result;
}

What does =~ mean in Perl? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What does =~ do in Perl?
In a Perl program I am examining (namly plutil.pl), I see a lot of =~ on the XML parser portion. For example, here is UnfixXMLString (lines 159 to 167 on 1.7):
sub UnfixXMLString {
my ($s) = #_;
$s =~ s/</</g;
$s =~ s/>/>/g;
$s =~ s/&/&/g;
return $s;
}
From what I can tell, it's taking a string, modifying it with the =~ operator, then returning that modified string, but what exactly is it doing?
=~ is the Perl binding operator. It's generally used to apply a regular expression to a string; for instance, to test if a string matches a pattern:
if ($string =~ m/pattern/) {
Or to extract components from a string:
my ($first, $rest) = $string =~ m{^(\w+):(.*)$};
Or to apply a substitution:
$string =~ s/foo/bar/;
=~ is the Perl binding operator and can be used to determine if a regular expression match occurred (true or false)
$sentence = "The river flows slowly.";
if ($sentence =~ /river/)
{
print "Matched river.\n";
}
else
{
print "Did not match river.\n";
}