Perl regex to capture strings between anchor words - perl

I am still working on cleaning up Oracle files, having to replace strings in files where the Oracle schema name is prepended to the function/procedure/package name within the file, as well as when the function/procedure/package name is double-quoted. Once the definition is corrected, I write the correction back to the file, along with the rest of the actual code.
I have code written to replace simple declarations (no input/output parameters) Now I am trying to get my regex to operate on (Note: This post is a continuation from this question) Some examples of what I'm trying to clean up:
Replace:
CREATE OR REPLACE FUNCTION "TRON2000"."DC_F_DUMP_CSV_MMA" (
p_trailing_separator IN BOOLEAN DEFAULT FALSE,
p_max_linesize IN NUMBER DEFAULT 32000,
p_mode IN VARCHAR2 DEFAULT 'w'
)
RETURN NUMBER
IS
to
CREATE OR REPLACE FUNCTION DC_F_DUMP_CSV_MMA (
p_trailing_separator IN BOOLEAN DEFAULT FALSE,
p_max_linesize IN NUMBER DEFAULT 32000,
p_mode IN VARCHAR2 DEFAULT 'w'
)
RETURN NUMBER
IS
I have been trying to use the following regex to separate the declaration, for later reconstruction after I've cleaned out the schema name / fixed the name of the function/procedure/package to not be double-quoted. I am struggling with getting each into a buffer - here's my latest attempt to grab all the middle input/output into it's own buffer:
\b(CREATE\sOR\sREPLACE\s(PACKAGE|PACKAGE\sBODY|PROCEDURE|FUNCTION))(?:\W+\w+){1,100}?\W+(RETURN)\s*(\W+\w+)\s(AS|IS)\b
Any / all help is GREATLY appreciated!
This is the script that I'm using right now to evaluate / write the corrected files:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use Data::Dumper;
# utility to clean strings
sub trim($) {
my $string = shift;
$string = "" if !defined($string);
$string =~ s/^\s+//;
$string =~ s/\s+$//;
# aggressive removal of blank lines
$string =~ s/\n+/\n/g;
return $string;
}
sub cleanup_packages {
my $file = shift;
my $tmp = $file . ".tmp";
my $package_name;
open( OLD, "< $file" ) or die "open $file: $!";
open( NEW, "> $tmp" ) or die "open $tmp: $!";
while ( my $line = <OLD> ) {
# look for the first line of the file to contain a CREATE OR REPLACE STATEMENT
if ( $line =~
m/^(CREATE\sOR\sREPLACE)\s*(PACKAGE|PACKAGE\sBODY)?\s(.+)\s(AS|IS)?/i
)
{
# look ahead to next line, in case the AS/IS is next
my $nextline = <OLD>;
# from the above IF clause, the package name is in buffer 3
$package_name = $3;
# if the package name and the AS/IS is on the same line, and
# the package name is quoted/prepended by the TRON2000 schema name
if ( $package_name =~ m/"TRON2000"\."(\w+)"(\s*|\S*)(AS|IS)/i ) {
# grab just the name and the AS/IS parts
$package_name =~ s/"TRON2000"\."(\w+)"(\s*|\S*)(AS|IS)/$1 $2/i;
trim($package_name);
}
elsif ( ( $package_name =~ m/"TRON2000"\."(\w+)"/i )
&& ( $nextline =~ m/(AS|IS)/ ) )
{
# if the AS/IS was on the next line from the name, put them together on one line
$package_name =~ s/"TRON2000"\."(\w+)"(\s*|\S*)/$1/i;
$package_name = trim($package_name) . ' ' . trim($nextline);
trim($package_name); # remove trailing carriage return
}
# now put the line back together
$line =~
s/^(CREATE\sOR\sREPLACE)\s*(PACKAGE|PACKAGE\sBODY|FUNCTION|PROCEDURE)?\s(.+)\s(AS|IS)?/$1 $2 $package_name/ig;
# and print it to the file
print NEW "$line\n";
}
else {
# just a normal line - print it to the temp file
print NEW $line or die "print $tmp: $!";
}
}
# close up the files
close(OLD) or die "close $file: $!";
close(NEW) or die "close $tmp: $!";
# rename the temp file as the original file name
unlink($file) or die "unlink $file: $!";
rename( $tmp, $file ) or die "can't rename $tmp to $file: $!";
}
# find and clean up oracle files
sub eachFile {
my $ext;
my $filename = $_;
my $fullpath = $File::Find::name;
if ( -f $filename ) {
($ext) = $filename =~ /(\.[^.]+)$/;
}
else {
# ignore non files
return;
}
if ( $ext =~ /(\.spp|\.sps|\.spb|\.sf|\.sp)/i ) {
print "package: $filename\n";
cleanup_packages($fullpath);
}
else {
print "$filename not specified for processing!\n";
}
}
MAIN:
{
my ( #files, $file );
my $dir = 'C:/1_atest';
# grab all the files for cleanup
find( \&eachFile, "$dir/" );
#open and evaluate each
foreach $file (#files)
{
# skip . and ..
next if ( $file =~ /^\.$/ );
next if ( $file =~ /^\.\.$/ );
cleanup_file($file);
};
}

Assuming the entire content of a file is stored as scalar in a var, the following should do the trick.
$Str = '
CREATE OR REPLACE FUNCTION "TRON2000"."DC_F_DUMP_CSV_MMA" (
p_trailing_separator IN BOOLEAN DEFAULT FALSE,
p_max_linesize IN NUMBER DEFAULT 32000,
p_mode IN VARCHAR2 DEFAULT w
)
RETURN NUMBER
IS
CREATE OR REPLACE FUNCTION "TRON2000"."DC_F_DUMP_CSV_MMA" (
p_trailing_separator IN BOOLEAN DEFAULT FALSE,
p_max_linesize IN NUMBER DEFAULT 32000,
p_mode IN VARCHAR2 DEFAULT w
)
RETURN NUMBER
IS
';
$Str =~ s#^(create\s+(?:or\s+replace\s+)?\w+\s+)"[^"]+"."([^"]+)"#$1 $2#mig;
print $Str;

Related

How to check whether one file's value contains in another text file? (perl script)

I would like to check one of the file's values contains on another file. if one of the value contains it will show there is existing bin for that specific, if no, it will show there is no existing bin limit. the problem is I am not sure how to check all values at once.
first DID1 text file value contain :
L84A:D:O:M:
L84C:B:E:D:
second DID text file value contain :
L84A:B:E:Q:X:F:i:M:Y:
L84C:B:E:Q:X:F:i:M:Y:
L83A:B:E:Q:X:F:i:M:Y:
if first 4words value are match, need to check all value for that line.
for example L84A in first text file & second text file value has M . it should print out there is an existing M bin
below is my code :
use strict;
use warnings;
my $filename = 'DID.txt';
my $filename1 = 'DID1.txt';
my $count = 0;
open( FILE2, "<$filename1" )
or die("Could not open log file. $!\n");
while (<FILE2>) {
my ($number) = $_;
chomp($number);
my #values1 = split( ':', $number );
open( FILE, "<$filename" )
or die("Could not open log file. $!\n");
while (<FILE>) {
my ($line) = $_;
chomp($line);
my #values = split( ':', $line );
foreach my $val (#values) {
if ( $val =~ /$values1[0]/ ) {
$count++;
if ( $values[$count] =~ /$values1[$count]/ ) {
print
"Yes ,There is an existing bin & DID\n #values1\n";
}
else {
print "No, There is an existing bin & DID\n";
}
}
}
}
}
I cannot check all value. please help to give any advice on it since this is my first time learning for perl language. Thanks a lot :)
Based on my understanding I write this code:
use strict;
use warnings;
#use ReadWrite;
use Array::Utils qw(:all);
use vars qw($my1file $myfile1cnt $my2file $myfile2cnt #output);
$my1file = "did1.txt"; $my2file = "did2.txt";
We are going to read both first and second files (DID1 and DID2).
readFileinString($my1file, \$myfile1cnt); readFileinString($my2file, \$myfile2cnt);
In first file, as per the OP's request the first four characters should be matched with second file and then if they matched we need to check rest of the characters in the first file with the second one.
while($myfile1cnt=~m/^((\w){4})\:([^\n]+)$/mig)
{
print "<LineStart>";
my $lineChk = $1; my $full_Line = $3; #print ": $full_Line\n";
my #First_values = split /\:/, $full_Line; #print join "\n", #First_values;
If the first four digit matched then,
if($myfile2cnt=~m/^$lineChk\:([^\n]+)$/m)
{
Storing the rest of the content in the same and to be split with colon and getting the characters to be matched with first file contents.
my $FullLine = $1; my #second_values = split /:/, $FullLine;
Then search each letter first and second content which matched line...
foreach my $sngletter(#First_values)
{
If the letters are matched with first and second file its going to be printed.
if( grep {$_ eq "$sngletter"} #second_values)
{
print "Matched: $sngletter\t";
}
}
}
else { print "Not Matched..."; }
This is just information that the line end.
print "<LineEnd>\n"
}
#------------------>Reading a file
sub readFileinString
#------------------>
{
my $File = shift;
my $string = shift;
use File::Basename;
my $filenames = basename($File);
open(FILE1, "<$File") or die "\nFailed Reading File: [$File]\n\tReason: $!";
read(FILE1, $$string, -s $File, 0);
close(FILE1);
}
Read search pattern and data into hash (first field is a key), then go through data and select only field included into pattern for this key.
use strict;
use warnings;
use feature 'say';
my $input1 = 'DID1.txt'; # look for key,pattern(array)
my $input2 = 'DID.txt'; # data - key,elements(array)
my $pattern;
my $data;
my %result;
$pattern = file2hash($input1); # read pattern into hash
$data = file2hash($input2); # read data into hash
while( my($k,$v) = each %{$data} ) { # walk through data
next unless defined $pattern->{$k}; # skip those which is not in pattern hash
my $find = join '|', #{ $pattern->{$k} }; # form search pattern for grep
my #found = grep {/$find/} #{ $v }; # extract only those of interest
$result{$k} = \#found; # store in result hash
}
while( my($k,$v) = each %result ) { # walk through result hash
say "$k has " . join ':', #{ $v }; # output final result
}
sub file2hash {
my $filename = shift;
my %hash;
my $fh;
open $fh, '<', $filename
or die "Couldn't open $filename";
while(<$fh>) {
chomp;
next if /^\s*$/; # skip empty lines
my($key,#data) = split ':';
$hash{$key} = \#data;
}
close $fh;
return \%hash;
}
Output
L84C has B:E
L84A has M

How do I chomp off everything after a character?

I want to create a Perl program to take in a file, and for each line, chomp off everything after a certain character (let's say a /). For example, consider this example file:
foo1/thing 1.1.1 bar
foo2/item 2.3.2 bar
foo3/thing 3.4.5 bar
I want to remove everything after the slash on each line and print it out, so that that file becomes:
foo1
foo2
foo3
I tried to use this program, with readline in a foreach loop, but the output was not what I expected:
print ( "Enter file name: " ) ;
my $filename = <> ;
$/ = ''
chomp $filename ;
my $file = undef ;
open ( $file, "< :encoding(UTF-8)", $filename
$/ = '/' ;
foreach ( <$file> ) {
chomp ;
print ;
}
But all this does is remove the slashes from each line.
foo1thing 1.1.1 bar
foo2item 2.3.2 bar
foo3thing 3.4.5 bar
How can I alter this to produce the output I need?
As far as concerns, the input record separator ($/) does not allow regexes.
You could proceed as follows:
print ( "Enter file name: " ) ;
my $filename = <> ;
chomp $filename ;
open ( my $file, "< :encoding(UTF-8)", $filename )
or die "could not open file $filename: $!";
while ( my $line = <$file> ) {
$line =~ s{/.*}{}s;
print "$line\n";
}
Regexp s{/.*}{}s matches on the first slash and everything afterwards, and suppresses it (along with the trailing new line).
Note: always check for errors when using open(), as noted in the documentation:
When opening a file, it's seldom a good idea to continue if the request failed, so open is frequently used with die.
$line =~ s{/.*}{}s; # In-place (destructive)
or
my ($extracted) = $line =~ m{([^/]*)}; # Returns (non-destructive)

How to get a comment printed for each line of text that matches within a file?

I am trying to match a keyword/text/line given in a file called expressions.txt from all files matching *main_log. When a match is found I want to print the comment for each line that matches.
Is there any better way to get this printed?
expression.txt
Hello World ! # I want to print this comments#
Bye* #I want this to print when Bye Is match with main_log#
:::
:::
Below Is the code I used :
{
open( my $kw, '<', 'expressions.txt' ) or die $!;
my #keywords = <$kw>;
chomp( #keywords ); # remove newlines at the end of keywords
# get list of files in current directory
my #files = grep { -f } ( <*main_log>, <*Project>, <*properties> );
# loop over each file to search keywords in
foreach my $file ( #files ) {
open( my $fh, '<', $file ) or die $!;
my #content = <$fh>;
close( $fh );
my $l = 0;
foreach my $kw ( #keywords ) {
my $search = quotemeta( $kw ); # otherwise keyword is used as regex, not literally
#$kw =~ m/\[(.*)\]/;
$kw =~ m/\((.*)\)/;
my $temp = $1;
print "$temp\n";
foreach ( #content ) { # go through every line for this keyword
$l++;
printf 'Found keyword %s in file %s, line %d:%s'.$/, $kw, $file, $l, $_ if /$search/;
}
}
}
I tried this code to print the comments mentioned within parentheses (...) but it is not printing in the fashion which I want like below:
If the expression.txt contains
Hello World ! # I want to print this comments#
If Hello World ! string is matched in my file called main_log then it should match only Hello World! from the main_log but print # I want to print this comments# as a comment for user to understand the keyword.
These keywords can be from any length or contains any character.
It worked fine but just a little doubt on printing the required output Into a file though I have used perl -w Test.pl > my_output.txt command on command prompt not sure how can I use Inside the perl script Itself
open( my $kw, '<', 'expressions.txt') or die $!;
my #keywords = <$kw>;
chomp(#keywords); # remove newlines at the end of keywords
# post-processing your keywords file
my $kwhashref = {
map {
/^(.*?)(#.*?#)*$/;
defined($2) ? ($1 => $2) : ( $1 => undef )
} #keywords
};
# get list of files in current directory
my #files = grep { -f } (<*main_log>,<*Project>,<*properties>);
# loop over each file to search keywords in
foreach my $file (#files) {
open(my $fh, '<', $file) or die $!;
my #content = <$fh>;
close($fh);
my $l = 0;
#foreach my $kw (#keywords) {
foreach my $kw (keys %$kwhashref) {
my $search = quotemeta($kw); # otherwise keyword is used as regex, not literally
#$kw =~ m/\[(.*)\]/;
#$kw =~ m/\#(.*)\#/;
#my $temp = $1;
#print "$temp\n";
foreach (#content) { # go through every line for this keyword
$l++;
if (/$search/)
{
# only print if comment defined
print $kwhashref->{$kw}."\n" if defined($kwhashref->{$kw}) ;
printf 'Found keyword %s in file %s, line %d:%s'.$/, $kw, $file, $l, $_
#printf '$output';
}
}
}
}
Your example code has mismatched braces { ... } and won't compile.
If you were to add another closing brace to the end of your code then it would compile, but the line
$kw =~ m/\((.*)\)/;
will never succeed since there are no parentheses anywhere in expressions.txt. If a match has not succeeded then the value of $1 will be retained from the most recently successful regex match operation
You are also trying to search the lines from the files against the whole of the lines retrieved from expressions.txt, when you should be splitting those lines into keywords and their corresponding comments
This seems to be the followup for this answer of another question of you. What I tried to suggest in the last paragraph would start after the first three lines of your code:
# post-processing your keywords file
my $kwhashref = {
map {
/^(.*?)(#.*?#)*$/;
defined($2) ? ($1 => $2) : ( $1 => undef )
} #keywords
};
Now you have the keywords in a hashref containing the actual keywords to search for as keys, and comments as values, if they exists (using your #comment# at the end of line syntax here).
Your keyword loop would now have to use keys %$kwhashref and you now can additionally print the comment in the inner loop, converted like shown in the answer I linked. The additional print:
print $kwhashref->{$kw}."\n" if defined($kwhashref->{$kw}); # only print if comment defined

How do i pattern match and keep writing to new file until another pattern match

My goal is to find and print all the lines in a "big.v" file starting from pattern match "module" until "endmodule" into individual files.
big.v: module test;
<bunch of code>
endmodule
module foo;
<bunch of code>
endmodule
And the individual files would look like:
test.v : module test;
..
endmodule
foo.v: module test1;
..
endmodule
I got most of it working using:
use strict;
use warnings;
#open(my $fh, ">", $f1) || die "Couldn't open '".$f."' for writing because: ".$!;
while (<>) {
my $line = $_;
if ($line =~ /(module)(\s+)(\w+)(.*)/) {
my $modname = $3;
open(my $fh1, ">", $modname.".v") ;
print $fh1 $line."\n";
## how do i keep writing next lines to this file until following pattern
if ($line =~ /(endmodule)(\s+)(.*)/) { close $fh1;}
}
}
Thanks,
There's a useful perl construct called the 'range operator':
http://perldoc.perl.org/perlop.html#Range-Operators
It works like this:
while ( <$file> ) {
if ( m/startpattern/ .. m/endpattern/ ) {
print;
}
}
So given your example - I think this should do the trick:
my $output;
while ( my $line = <STDIN> ) {
if ( $line =~ m/module/ .. m/endmodule/ ) {
my ( $modname ) = ( $line =~ m/module\s+(\w+)/ );
if ( defined $modname) {
open ( $output, ">", "$modname.v" ) or warn $!;
}
print {$output} $line;
}
}
Edit: But given your source data - you don't actually need to use a range operator I don't think. You could just close/reopen new 'output' files as you go. This assumes that you could 'cut up' your file based on 'module' lines, which isn't necessarily a valid assumption.
But sort of more like this:
use strict;
use warnings;
open ( my $input, "<", "big.v" ) or die $!;
my $output;
while ( my $line = <$input> ) {
if ( $line =~ m/^\s*module/ ) {
#start of module line found
#close filehandle if it's open
close($output) if defined $output;
#extract the module name from the line.
my ($modulename) = ( $line =~ m/module\s+(\w+)/ );
#open new output file (overwriting)
open( $output, ">", "$modulename.v" ) or warn $!;
}
#this test might not be necessary.
if ( defined $output ) {
print {$output} $line;
}
}

Output .Resx From .CS using perl script

.CS contains string within double quotes and I am trying to extract these strings into .resx file.
The existing code output the .resx but with only one string whereas .CS file contains more than one strings in quotes.
Can you please provide any reference to achieve this?
use strict;
use warnings;
use File::Find;
use XML::Writer;
use Cwd;
#user input: [Directory]
my $wrkdir = getcwd;
system "attrib -r /s";
print "Processing $wrkdir\n";
find( \&recurse_src_path, $wrkdir );
sub recurse_src_path
{
my $file = $File::Find::name;
my $fname = $_;
my #lines;
my $line;
if ( ( -f $file ) && ( $file =~ /.*\.cs$/i ) )
{
print "..";
open( FILE, $file ) || die "Cannot open $file:\n$!";
while ( $line = <FILE> )
{
if ( $line =~ s/\"(.*?)\"/$1/m )
{
chomp $line;
push( #lines, $line );
my $nl = '0';
my $dataIndent;
my $output = new IO::File(">Test.resx");
#binmode( $output, ":encoding(utf-8)" );
my $writer = XML::Writer->new(
OUTPUT => $output,
DATA_MODE => 1,
DATA_INDENT => 2
);
$writer->xmlDecl("utf-8");
$writer->startTag('root');
foreach my $r ($line)
{
print "$1\n";
$writer->startTag( 'data', name => $_ );
$writer->startTag('value');
$writer->characters($1);
$writer->endTag('value');
$writer->startTag('comment');
$writer->characters($1);
$writer->endTag('comment');
$writer->endTag('data');
}
$writer->endTag('root');
$writer->end;
$output->close();
}
}
close FILE;
}
}
Use the /g regex modifier. For example:
use strict;
use warnings;
my $cs_string = '
// Imagine this is .cs code here
system "attrib -r /s";
print "Processing $wrkdir\n";
find( \&recurse_src_path, $wrkdir );
';
while ($cs_string =~ /\"(.*)\"/g) {
print "Found quoted string: '$1'\n"
}
;
See also: http://perldoc.perl.org/perlrequick.html#Matching-repetitions
You might also want to look at File-Slurp to read your .cs code into a single Perl scalar, trusting that your .cs file is not too large.
Finally combine this with your existing code to get the .resx output format.