Regex characters should be replaced as a text in perl - perl

I have two input files one is *TEX file and *INI.
INI file contains: (User can add if some more patters but this way)
\bra{([^{}]*)} \langle $1|\sprangle
\ket{([^{}]*)} \splangle|$1\rangle
\braket{([^{}]*)} \langle $1\rangle
\set{([^{}]*)} \{$1\}
INPUT file contains:
\bra{ahdhaodoaddo4092039585038}
\ket{su093unfs}
\braket{adlkgnaogoaj}
\set{982509unflksndl}
OUTPUT file should be:
\langle ahdhaodoaddo4092039585038|\sprangle
\splangle|su093unfs\rangle
\langle adlkgnaogoaj\rangle
\{982509unflksndl\}
My code:
use strict;
use warnings;
my $texfile = "brktt.tex"; my $inifile = "brck.ini";
my $texcnt = ""; my $inicnt = "";
readFileinString($texfile,\$texcnt);
readFileinString($inifile,\$inicnt);
my %iniStore = ();
while($inicnt=~m/^([^\t]*)\t([^\t]*)/mgs)
{
my $find = $1; my $rep = $2;
$texcnt=~s/$find/$rep/g;
}
Could someone help me on this one.

Several issues:
readFileinString isn't provided, so we don't know exactly what's going on.
[\t]* matches newlines as well
$1 inside a variable doesn't interpolate in replacement. You need either /e or eval, but such an approach is fragile and dangerous. I tried to find a better way: just split the replacement on $1 and glue the parts back with the string $1 replaced by the real $1.
\ need to be doubled to keep their literal meaning.
And some minor ones:
/s changes what . matches, but you use no dot in the regex.
#! /usr/bin/perl
use warnings;
use strict;
sub readFileinString {
my ($filename, $content) = #_;
open my $fh, '<', $filename or die "$filename: $!";
$$content = do { local $/; <$fh> };
}
my ($texfile, $inifile) = #ARGV;
readFileinString($texfile, \ my $texcnt);
readFileinString($inifile, \ my $inicnt);
while ($inicnt =~ /^([^\t]*)\t(.*)/mg) {
my ($find, $rep) = ($1, $2);
$find =~ s/\\/\\\\/g;
my ($pre, $post) = split /\$1/, $rep;
$texcnt =~ s/$find/$pre$1$post/g;
}
print $texcnt;

You want to evalue the second part of the inifile, so replacing
$texcnt=~s/$find/$rep/g;
with
$texcnt=~s/$find/eval "qq($rep)"/eg;
might do the job, though using eval is usally a bad (and slow) idea.

Related

Grep using perl

I'm trying to grep multiple patterns from a log file using perl. For the first pattern i'm getting the desired matching pattern via read only variable($1,$2..). But for the next pattern the read only variable is returning the previous value but not the value matching the second pattern.
here is the code:
$tmp = `grep "solo_video_channel_.*(0): queueing" $log`;
chomp($tmp);
$tmp =~ m/(.*):.*solo_video_channel_write(.*): queueing page (.*).*/;
$chnl = $2;
$page = $3;
$timestamp = $1;
$tmp1 = `grep "(0): DUMP GO" $log`;
chomp($tmp1);
$tmp1 =~ m/(.*): solo_video_channel_write(0): DUMP GO/;
$dmp = $1;
print "dump go time = $1\n";
tmp1's value after grep is coming as expected. but $1 value remains same as the previous one.
Any suggestions?
Always make sure that you verify that a regex matched before using a captured variable.
Additionally, there is no reason to shell out to grep. Use Perl's file processing instead:
use strict;
use warnings;
local #ARGV = $log;
while (<>) {
chomp;
if (/solo_video_channel_.*\(0\): queueing/) {
if ( my ( $timestamp, $chnl, $page ) = m/(.*):.*solo_video_channel_write(.*): queueing page (.*).*/ ) {
print "$. - $timestamp, $chnl, $page\n";
}
}
if ( my ($dmp) = m/(.*): solo_video_channel_write\(0\): DUMP GO/ ) {
print "dump go time = $dmp\n";
}
}
Note, your first set of if's could almost certainly be combined into a single if statement, but I left it as is for now.
Why not use Pure Perl? It's faster than running external greps. Plus, you can grep both regular expressions at once. Faster than looping through the file twice.
Always check the value of your rexp match. Here I'm using if statements to do this. Note too that I am printing all lines that don't match with UNMATCHED LINES. You can remove the else when you see that everything is working, or simply redirect 2> /dev/null.
use strict;
use warnings;
use autodie;
use feature qw(say);
my $log = "log.txt";
open my $log_fh, "<", $log;
while ( my $line = <$log_fh> ) {
my $timestamp;
my $channel;
my $page;
my $gotime;
if ( $line =~ /(.*):.*solo_video_channel_(.*):\s+queueing page (.*)/ ) {
$timestamp = $1;
$channel = $2;
$page = $3;
say qq(Timestamp = "$timestamp" Channel = "$channel" Page = "$page");
}
elsif ( $line =~ /(.*): solo_video_channel_write(0): DUMP GO/ ) {
$gotime = $1;
say "Dump Go Time = $1";
}
else {
say STDERR qq(UNMATCHED LINES: "$line");
}
}
close $log_fh;
In the second regexp you need to escape the literal brackets
$tmp1 =~ m/(.*): solo_video_channel_write\(0\): DUMP GO/
This is because the expression \(0\) matches the exact pattern (0)
In the example given in this answer this would include strings such as
37: solo_video_channel_write(0): DUMP GO
In contrast, the expression (0) matches the exact pattern 0 and sets a capture group.
With the regexp given in your original question
$tmp1 =~ m/(.*): solo_video_channel_write(0): DUMP GO/;
matching would occur on strings such as
37: solo_video_channel_write0: DUMP GO
Of course in the original program the strings are not in this format, so they do not match and $1 is not set
The regular expression syntax for the shell program grep is (confusingly) different
To use round brackets for setting a capture group they must be escaped with a backslash, which is the opposite to the syntax in perl

doing a substitution until certain condition is true

I'm trying to edit a text using Perl. I need to make a substitution but the substitution cannot be applied once an specific word is found in the text. So, imagine I want to substitute all the "hello" forms by "goodbye", but the substitution cannot be applied once the word "foo" is found.
I tried to do this:
use warnings;
use strict;
$/ = undef;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
while(<F>) {
do {s/hello/goodbay/} until (m{foo});
print;
}
close F;
But, as a result, only the first "hello" of my text is changed.
Any suggestion?
Trying to think what would be the most efficient. It should be one of the following:
s{^(.*?)(foo|\z)}{
my $s = $1;
$s =~ s{hello}{goodbay}g;
$s.$2
}se;
print;
or (same as above, but requires 5.14+)
s{^(.*?)(foo|\z)}{ s{hello}{goodbay}gr . $2 }se;
print;
or
my $pos = /foo/ ? $-[0] : length;
my $s = substr($_, 0, $pos, '');
$s =~ s{hello}{goodbay}g;
print($s);
print;
Both work even if foo isn't present.
This solution uses less memory:
# Assumes foo will always be present
# (though it could be expanded to handle that
# Assumes foo isn't a regex pattern.
local $/ = "foo";
$_ = <$fh>;
chomp;
s{hello}{goodbay}g;
print;
print $/;
local $/;
print <$fh>;
If the substrings you work on (the hello and foo of your example) are single words, a easy way would probably be to replace $/ = undef; with $/ = " ";. Currently you slurp in the whole file at once, meaning the while loop gets executed at most once.
That is because there is only one "line" in the whole input after you told perl that there are no line separators.
If you use a space as input separator, it will loop over the input word by word and hopefully work as you intend.
Use a flag variable:
use warnings;
use strict;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
my $replace=1;
while(<F>) {
$replace = 0 if m{foo};
s/hello/goodbye/g if $replace;
print;
}
close F;
This stops at the line containing the end pattern. It will be slightly more complicated if you want to substitute up to just before the match.
This answer uses the ${^PREMATCH] and related variables introduced in Perl 5.10.
#!/usr/bin/env perl
use v5.10.0;
use strict;
use warnings;
my $foo_found;
while (my $line = <>) {
if (!$foo_found) {
if ($line =~ m/foo/ip) {
# only replace hellos in the part before foo
${^PREMATCH} =~ s/hello/goodbye/g;
$line = "${^PREMATCH}${^MATCH}${^POSTMATCH}";
$foo_found ++;
} else {
$line =~ s/hello/goodbye/ig;
}
}
print $line;
}
Given the following input:
hello cruel world
hello baseball
hello mudda, hello fadda
foo
The rest of the hellos should stay
Last hello
I get the following output
goodbye cruel world
goodbye baseball
goodbye mudda, goodbye fadda
foo
The rest of the hellos should stay
Last hello
If you don't have 5.10 you can use $` and related variables but they come with a performance hit. See perldoc perlvar for details.

How to search and replace string in a file in Perl

The content of my input file is shown below:
abc\**def\ghi**\abc\!!!!!
abc\**4nfiug\frgrefd\gtefe\wf4fs**\abc\df3gwddw
abc\**eg4/refw**\abc\f3
I need to replace whatever string in between abc \ --------------\abc in my input file with ABC\CBA.
I have tried something like below to get the strings that need to be replaced. But I get stuck when I need to use the search and replace:
my $string1 = qr/abc\W+([^a]+)/;
my $string2 = map{/$string1/ => " "} #input_file; # The string that needs to be replaced
my $string3 = 'ABC\CBA' # String in that. I want it to replace to
s/$string2/$string3/g
How can I fix this?
perl -i -pe 's/this/that/g;' file1
A one-liner to fix a file:
perl -plwe 's/abc\\\K.*(?=\\abc)/ABC\\CBA/' input.txt > output.txt
Or as a script:
use strict;
use warnings;
while (<DATA>) {
s/abc\\\K.*(?=\\abc)/ABC\\CBA/;
print;
}
__DATA__
abc\**def\ghi**\abc\!!!!!
abc\**4nfiug\frgrefd\gtefe\wf4fs**\abc\df3gwddw
abc\**eg4/refw**\abc\f3
The \K (keep) escape sequence means these characters will not be removed. Similarly, the look-ahead assertion (?= ... ) will keep that part of the match. I assumed you only wanted to change the characters in between.
Instead of \K one can use a look-behind assertion: (?<=abc\\). As a personal preference, I used \K instead.
#!/usr/bin/perl
use strict;
use warnings;
open my $fh,"<", "tryit.txt" or die $!;
while (my $line = <$fh>) {
$line =~ s/(abc\\)(.*?)(\\abc)/$1ABC\\CBA$3/;
print $line;
}
gives the following with the input data.
abc\ABC\CBA\abc\!!!!!
abc\ABC\CBA\abc\df3gwddw
abc\ABC\CBA\abc\f3
If you do not want the substitution to operate on the default variable $_, you have to use the =~ operator:
#!/usr/bin/perl
use warnings;
use strict;
my #input_file = split /\n/, <<'__EOF__';
abc\**def\ghi**\abc\!!!!!
abc\**4nfiug\frgrefd\gtefe\wf4fs**\abc\df3gwddw
abc\**eg4/refw**\abc\f3
__EOF__
my $pattern = qr/abc\\.*\\abc/; # pattern to be matched
my $string2 = join "\n", #input_file; # the string that need to be replaced
my $string3 = 'ABC\CBA'; # string i that i want it to replace to
$string2 =~ s/$pattern/$string3/g;
print $string2;
To address your comment about replacing text "inplace" in the file directly, you can use the -i switch for a one-liner. In a script, you can perhaps look at using Tie::File, which allows read-write access to lines of a file as (mutable) elements in an array. To copy Mike/TLP's answer:
#!/usr/bin/perl
use strict;
use warnings;
use Tie::File;
tie my #file, "Tie::File", "tryit.txt" or die $!;
# I think you have to use $_ here (done implicitly)
while (#file) {
s/(abc\\)(.*?)(\\abc)/$1ABC\\CBA$3/;
print;
}

How to match Chinese character in the web pages whose charset is big5 (Perl)?

I am doing a match between Chinese words, for example, "语言中心“ and a mount of web files (php, html, htm, etc).
However, somehow I get the following error:
Malformed UTF-8 character (1 byte, need 2, after start byte 0xdf) in regexp compilation at ../Final_FindOnlyNoReplace_CLE_Chinese.pl line 89, <INFILE> line 12.
Can anyone help?
Here is my code.
#!/usr/bin/env perl
use Encode qw/encode decode/;
use utf8;
use strict;
use Cwd;
use LWP::UserAgent;
my($path) = #_;
## append a trailing / if it's not there
$path .= '/' if($path !~ /\/$/);
use File::Glob ':glob';
my #all_files = bsd_glob($path."*");
for my $eachFile (#all_files) {
open(INFILE, "<$eachFile") || die ("Could not open '$eachFile'\n");
my(#inlines) = <INFILE>;
my($line, $find);
my $outkey = 1;
foreach $line (#inlines) {
$find = &find($line);
if ($find ne 'false') {
chomp($line);
print "\tline$outkey : $line\n";
}
$outkey ++;
}
}
#subroutine
sub find {
my $m = encode("utf8", decode("big5", #_));
my $html = LWP::UserAgent->new
->get($m)
->decoded_content;
my $str_chinese = '語言中心';
if ($m =~ /$str_chinese/) {
$m; ##if match, return the whole line.
}
}
You aren't searching in $html you've retrieved and decoded, but in URL instead: $m =~ /$str_chinese/, which, I guess, is not what you intend.
Also, you're comparing result of find function with exact string "false," which will never work. Change if ($find ne 'false') to if (defined($find)) and add explicit returns for success and failure to find for clarity.
Finally, you script seems to fail because you point it to directory that have some other Perl scripts amongst other files. They're most likely in UTF-8, so when your script tries to read them as big5 data, it falis on decoding. Just change your glob to cover data files only.
#!/usr/bin/env perl
use utf8;
use strictures;
use LWP::UserAgent qw();
use Path::Class::Rule qw();
use URI::file qw();
my $start_directory = q(.);
my $search_text = qr'語言中心';
my $next = Path::Class::Rule->new->name(qw(*.php *.htm*))->iter($start_directory);
my #matching_lines;
while (my $file = $next->()) {
for my $line (split /\R/, LWP::UserAgent
->new
->get(URI::file->new_abs($file))
->decoded_content
) {
push #matching_lines, $line if $line =~ $search_text;
}
}
# #matching_lines is (
# '<title>Untitled 語言中心 Document</title>',
# 'abc 語言中心 cde',
# '天天向上語言中心他'
# )

How can I find the strings from one file in another file in Perl?

The script below takes function names in a text file and scans on a
folder that contains multiple c,h files. It opens those files one-by-one and
reads each line. If the match is found in any part of the files, it prints the
line number and the line that contains the match.
Everything is working fine except that the comparison is not working properly. I would be very grateful to whoever solves my problem.
#program starts:
use FileHandle;
print "ENTER THE PATH OF THE FILE THAT CONTAINS THE FUNCTIONS THAT YOU WANT TO
SEARCH: ";#getting the input file
our $input_path = <STDIN>;
$input_path =~ s/\s+$//;
open(FILE_R1,'<',"$input_path") || die "File open failed!";
print "ENTER THE PATH OF THE FUNCTION MODEL: ";#getting the folder path that
#contains multiple .c,.h files
our $model_path = <STDIN>;
$model_path =~ s/\s+$//;
our $last_dir = uc(substr ( $model_path,rindex( $model_path, "\\" ) +1 ));
our $output = $last_dir."_FUNC_file_names";
while(our $func_name_input = <FILE_R1> )#$func_name_input is the function name
#that is taken as the input
{
$func_name_input=reverse($func_name_input);
$func_name_input=substr($func_name_input,rindex($func_name_input,"\("+1);
$func_name_input=reverse($func_name_input);
$func_name_input=substr($func_name_input,index($func_name_input," ")+1);
#above 4 lines are func_name_input is choped and only part of the function
#name is taken.
opendir FUNC_MODEL,$model_path;
while (our $file = readdir(FUNC_MODEL))
{
next if($file !~ m/\.(c|h)/i);
find_func($file);
}
close(FUNC_MODEL);
}
sub find_func()
{
my $fh1 = FileHandle->new("$model_path//$file") or die "ERROR: $!";
while (!$fh1->eof())
{
my $func_name = $fh1->getline(); #getting the line
**if($func_name =~$func_name_input)**#problem here it does not take the
#match
{
next if($func_name=~m/^\s+/);
print "$.,$func_name\n";
}
}
}
$func_name_input=substr($func_name_input,rindex($func_name_input,"\("+1);
You're missing an ending parenthesis. Should be:
$func_name_input=substr($func_name_input,rindex($func_name_input,"\(")+1);
There's probably an easier way than those four statements, too. But it's a little early to wrap my head around it all. Do you want to match "foo" in "function foo() {"? If so, you could use a regex like /\s+([^) ]+)/.
When you say $func_name =~$func_name_input, you're treating all characters in $func_name_input as special regex characters. If this is not what you mean to do, you can use quotemeta (perldoc -f quotemeta): $func_name =~quotemeta($func_name_input) or $func_name =~ qr/\Q$func_name_input\E/.
Debugging will be easier with strictures (and a syntax-hilighting editor). Also note that, if you're not using those variables in other files, "our" doesn't do anything "my" wouldn't do for file-scoped variables.
find + xargs + grep does 90% of what you want.
find . -name '*.[c|h]' | xargs grep -n your_pattern
ack does it even easier.
ack --type=cc your_pattern
Simply take your list of patterns from your file and "or" them together.
ack --type=cc 'foo|bar|baz'
This has the benefit of only search the files once, and not once for each pattern being searched for as you're doing.
I still think you should just use ack, but your code needed some serious love.
Here is an improved version of your program. It now takes the directory to search and patterns on the command line rather than having to ask for (and the user write) files. It searches all the files under the directory, not just the ones in the directory, using File::Find. It does this in one pass by concatenating all the patterns into regular expressions. It uses regexes instead of index() and substr() and reverse() and oh god. It simply uses built in filehandles rather than the FileHandle module and checking for eof(). Everything is declared lexical (my) instead of global (our). Strict and warnings are on for easier debugging.
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
die "Usage: search_directory function ...\n" unless #ARGV >= 2;
my $Search_Dir = shift;
my $Pattern = build_pattern(#ARGV);
find(
{
wanted => sub {
return unless $File::Find::name =~ m/\.(c|h)$/i;
find_func($File::Find::name, $pattern);
},
no_chdir => 1,
},
$Search_Dir
);
# Join all the function names into one pattern
sub build_pattern {
my #patterns;
for my $name (#_) {
# Turn foo() into foo. This replaces all that reverse() and rindex()
# and substr() stuff.
$name =~ s{\(.*}{};
# Use \Q to protect against regex metacharacters in the input
push #patterns, qr{\Q$name\E};
}
# Join them up into one pattern.
return join "|", #patterns;
}
sub find_func {
my( $file, $pattern ) = #_;
open(my $fh, "<", $file) or die "Can't open $file: $!";
while (my $line = <$fh>) {
# XXX not all functions are unindented, but your choice
next if $line =~ m/^\s+/;
print "$file:$.: $line" if $line =~ $pattern;
}
}