Perl collecting xml snippets from log with specific contents

Perl collecting xml snippets from log with specific contents - perl

I have a script copied from another stackoverflow question, but it seems to replace the content of the variable, could someone point me to the error? If i remove the if check for the ">OK<" it prints the whole xml to a file, if i put the if back it only prints the line containing the ">OK<", why is the $xml variable modified by the =~?
# Example usage:
# perl script.pl data.xml RootTag > RootTag.xml
use strict;
use warnings;
my $tag = pop;
while (<>){
if ( s/.*(<$tag>)/$1/ .. s/(<(\/)$tag>).*/$1/ ) {
my $xml = $_;
if ($xml =~ m/>OK</) {
print "$xml";
}
}
}
An example of a input file could be
reioirioree
brebreberbre
rbebrbebre
<test>
<id>1</id>
<status>OK</status>
</test>
bbrtbtrbt
rtbtrb
<test>
<id>2</id>
<status>KO</status>
</test>
brtoibjtrbi
bebbetreb
<test>
<id>3</id>
<status>OK</status>
</test>
dfbreberbreb
berbrebre
In this case if we user "test" as parameter, i would like following output
<test>
<id>1</id>
<status>OK</status>
</test>
<test>
<id>3</id>
<status>OK</status>
</test>

The objective is to capture the whole tag when it contains a specific pattern (>OK<).
Here is a step-by-step way which spells out details. I keep your program interface.
use strict;
use warnings;
my $tag = pop;
my ($inside_tag, $found, #buff);
while (<>)
{
if (s/.*(<$tag>)/$1/) {
$inside_tag = 1;
}
elsif (s|(</$tag>).*|$1|) { #/
$inside_tag = 0;
if ($found) {
print #buff, $_;
$found = 0;
}
#buff = ();
}
next unless $inside_tag;
push #buff, $_;
$found = 1 if />OK</;
}
On the opening tag we set the flag that we are inside the tag. On the closing tag we unset it, and if the marker has been $found we print the buffer (and unset marker's flag). We clear the buffer here.
Then we skip the iteration if outside of the tag. Otherwise, add the line to the buffer and test for the marker on that line.
A glitch with using the range in this problem is that we must know when we are on the closing-tag line, and would like to know the opening line as well. Then we need further tests and flip-flop isn't so clean any more. We can use the sequence number that the .. operator returns
The value returned is either the empty string for false, or a sequence number (beginning with 1) for true. The sequence number is reset for each range encountered. The final sequence number in a range has the string "E0" appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint. You can exclude the beginning point by waiting for the sequence number to be greater than 1.
It would go something like
if (my $seq = /BEG/ .. /END/)
{
if ($seq == 1) { # first line of range
# ...
}
elsif ($seq =~ /EO$/) { # last line of range
# ...
}
else { ... } # inside
and I don't see that this is clearer or better than keeping the state manually.

Related

Regular expression to print a string from a command outpout

I have written a function that uses regex and prints the required string from a command output.
The script works as expected. But it's does not support a dynamic output. currently, I use regex for "icmp" and "ok" and print the values. Now, type , destination and return code could change. There is a high chance that command doesn't return an output at all. How do I handle such scenarios ?
sub check_summary{
my ($self) = #_;
my $type = 0;
my $return_type = 0;
my $ipsla = $self->{'ssh_obj'}->exec('show ip sla');
foreach my $line( $ipsla) {
if ( $line =~ m/(icmp)/ ) {
$type = $1;
}
if ( $line =~ m/(OK)/ ) {
$return_type = $1;
}
}
INFO ($type,$return_type);
}
command Ouptut :
PSLAs Latest Operation Summary
Codes: * active, ^ inactive, ~ pending
ID Type Destination Stats Return Last
(ms) Code Run
-----------------------------------------------------------------------
*1 icmp 192.168.25.14 RTT=1 OK 1 second ago

Updated to some clarifications -- we need only the last line
As if often the case, you don't need a regex to parse the output as shown. You have space-separated fields and can just split the line and pick the elements you need.
We are told that the line of interest is the last line of the command output. Then we don't need the loop but can take the last element of the array with lines. It is still unclear how $ipsla contains the output -- as a multi-line string or perhaps as an arrayref. Since it is output of a command I'll treat it as a multi-line string, akin to what qx returns. Then, instead of the foreach loop
my #lines = split '\n', $ipsla; # if $ipsla is a multi-line string
# my #lines = #$ipsla; # if $ipsla is an arrayref
pop #lines while $line[-1] !~ /\S/; # remove possible empty lines at end
my ($type, $return_type) = (split ' ', $lines[-1])[1,4];
Here are some comments on the code. Let me know if more is needed.
We can see in the shown output that the fields up to what we need have no spaces. So we can split the last line on white space, by split ' ', $lines[-1], and take the 2nd and 5th element (indices 1 and 4), by ( ... )[1,4]. These are our two needed values and we assign them.
Just in case the output ends with empty lines we first remove them, by doing pop #lines as long as the last line has no non-space characters, while $lines[-1] !~ /\S/. That is the same as
while ( $lines[-1] !~ /\S/ ) { pop #lines }
Original version, edited for clarifications. It is also a valid way to do what is needed.
I assume that data starts after the line with only dashes. Set a flag once that line is reached, process the line(s) if the flag is set. Given the rest of your code, the loop
my $data_start;
foreach (#lines)
{
if (not $data_start) {
$data_start = 1 if /^\s* -+ \s*$/x; # only dashes and optional spaces
}
else {
my ($type, $return_type) = (split)[1,4];
print "type: $type, return code: $return_type\n";
}
}
This is a sketch until clarifications come. It also assumes that there are more lines than one.

I'm not sure of all possibilities of output from that command so my regular expression may need tweaking.
I assume the goal is to get the values of all columns in variables. I opted to store values in a hash using the column names as the hash keys. I printed the results for debugging / demonstration purposes.
use strict;
use warnings;
sub check_summary {
my ($self) = #_;
my %results = map { ($_,undef) } qw(Code ID Type Destination Stats Return_Code Last_Run); # Put results in hash, use column names for keys, set values to undef.
my $ipsla = $self->{ssh_obj}->exec('show ip sla');
foreach my $line (#$ipsla) {
chomp $line; # Remove newlines from last field
if($line =~ /^([*^~])([0-9]+)\s+([a-z]+)\s+([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\s+([[:alnum:]=]+)\s+([A-Z]+)\s+([^\s].*)$/) {
$results{Code} = $1; # Code prefixing ID
$results{ID} = $2;
$results{Type} = $3;
$results{Destination} = $4;
$results{Stats} = $5;
$results{Return_Code} = $6;
$results{Last_Run} = $7;
}
}
# Testing
use Data::Dumper;
print Dumper(\%results);
}
# Demonstrate
check_summary();
# Commented for testing
#INFO ($type,$return_type);
Worked on the submitted test line.
EDIT:
Regular expressions allow you to specify patterns instead of the exact text you are attempting to match. This is powerful but complicated at times. You need to read the Perl Regular Expression documentation to really learn them.
Perl regular expressions also allow you to capture the matched text. This can be done multiple times in a single pattern which is how we were able to capture all the columns with one expression. The matches go into numbered variables...
$1
$2

Perl - Need to append duplicates in a file and write unique value only

I have searched a fair bit and hope I'm not duplicating something someone has already asked. I have what amounts to a CSV that is specifically formatted (as required by a vendor). There are four values that are being delimited as follows:
"Name","Description","Tag","IPAddresses"
The list is quite long (and there are ~150 unique names--only 2 in the sample below) but it basically looks like this:
"2B_AppName-Environment","desc","tag","192.168.1.1"
"2B_AppName-Environment","desc","tag","192.168.22.155"
"2B_AppName-Environment","desc","tag","10.20.30.40"
"6G_ServerName-AltEnv","desc","tag","1.2.3.4"
"6G_ServerName-AltEnv","desc","tag","192.192.192.40"
"6G_ServerName-AltEnv","desc","tag","192.168.50.5"
I am hoping for a way in Perl (or sed/awk, etc.) to come up with the following:
"2B_AppName-Environment","desc","tag","192.168.1.1,192.168.22.155,10.20.30.40"
"6G_ServerName-AltEnv","desc","tag","1.2.3.4,192.192.192.40,192.168.50.5"
So basically, the resulting file will APPEND the duplicates to the first match -- there should only be one line per each app/server name with a list of comma-separated IP addresses just like what is shown above.
Note that the "Decription" and "Tag" fields don't need to be considered in the duplication removal/append logic -- let's assume these are blank for the example to make things easier. Also, in the vendor-supplied list, the "Name" entries are all already sorted to be together.

This short Perl program should suit you. It expects the path to the input CSV file as a parameter on the command line and prints the result to STDOUT. It keeps track of the appearance of new name fields in the #names array so that it can print the output in the order that each name first appears, and it takes the values for desc and tag from the first occurrence of each unique name.
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({always_quote => 1, eol => "\n"});
my (#names, %data);
while (my $row = $csv->getline(*ARGV)) {
my $name = $row->[0];
if ($data{$name}) {
$data{$name}[3] .= ','.$row->[3];
}
else {
push #names, $name;
$data{$name} = $row;
}
}
for my $name (#names) {
$csv->print(*STDOUT, $data{$name});
}
output
"2B_AppName-Environment","desc","tag","192.168.1.1,192.168.22.155,10.20.30.40"
"6G_ServerName-AltEnv","desc","tag","1.2.3.4,192.192.192.40,192.168.50.5"
Update
Here's a version that ignores any record that doesn't have a valid IPv4 address in the fourth field. I've used Regexp::Common as it's the simplest way to get complex regex patterns right. It may need installing on your system.
use strict;
use warnings;
use Text::CSV;
use Regexp::Common;
my $csv = Text::CSV->new({always_quote => 1, eol => "\n"});
my (#names, %data);
while (my $row = $csv->getline(*ARGV)) {
my ($name, $address) = #{$row}[0,3];
next unless $address =~ $RE{net}{IPv4};
if ($data{$name}) {
$data{$name}[3] .= ','.$address;
}
else {
push #names, $name;
$data{$name} = $row;
}
}
for my $name (#names) {
$csv->print(*STDOUT, $data{$name});
}

I would advise you to use a CSV parser like Text::CSV for this type of problem.
Borodin has already pasted a good example of how to do this.
One of the approaches that I'd advise you NOT to use are regular expressions.
The following one-liner demonstrates how one could do this, but this is a very fragile approach compared to an actual csv parser:
perl -0777 -ne '
while (m{^((.*)"[^"\n]*"\n(?:(?=\2).*\n)*)}mg) {
$s = $1;
$s =~ s/"\n.*"([^"\n]+)(?=")/,$1/g;
print $s
}' test.csv
Outputs:
"2B_AppName-Environment","desc","tag","192.168.1.1,192.168.22.155,10.20.30.40"
"6G_ServerName-AltEnv","desc","tag","1.2.3.4,192.192.192.40,192.168.50.5"
Explanation:
Switches:
-0777: Slurp the entire file
-n: Creates a while(<>){...} loop for each “line” in your input file.
-e: Tells perl to execute the code on command line.
Code:
while (m{^((.*)"[^"]*"\n(?:(?=\2).*\n)*)}mg): Separate text into matching sections.
$s =~ s/"\n.*"([^"\n]+)(?=")/,$1/g;: Join all ip addresses by a comma in matching sections.
print $s: Print the results.

Search and Replace using Perl

I have some tags with values like below,
<section>
<title id="ABC0123">is The human nervous system?</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
<section>
<title id="DEF0123">Terms for anatomical directions in the nervous system</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
</section>
<section>
<title id="ABC4356">Anatomical terms: is referring to directions</title>
.
.
.
The output I need is like below,
<section>
<title id="ABC0123">Is the Human Nervous System?</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
</section>
<section>
<title id="DEF0123">Terms for Anatomical Directions in the Nervous System</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
<section>
<title id="ABC4356">Anatomical Terms: Is Referring to Directions</title>
.
.
how could I do this using perl. Here all prepositions and articles will be in lower case. Now the condition is slightly differs as below
condition is if a word that is in #lowercase (suppose is) and it is the first word of the and is in lower case then it should be upper case. Again if any #lowercase word after colon in the should be in upper case.

Probably something like this then:
#!/usr/bin/env perl
use strict;
use warnings;
my $lines = qq#
<title>The human nervous system</title>
<title>Terms for anatomical directions in the nervous system</title>
<title>Anatomical terms referring to directions</title>
#;
foreach my $line ( split(/\n/, $lines ) ) {
$line =~ s|</?title>||g;
if ( $line = /\w+/ ) { # Skip if blank
print "<title>" . ucfirst(
join(" ",
map{ !/^(in|the|on|or|to|for)$/i ? ucfirst($_) : lc($_); }
split(/\s/, $line )
)
) ."<\/title>\n";
}
}
Or however you want to loop your file. But you are going to have to filter the terms you don't want converted like this. As I have shown.

New answer to match the updated question (sample input and desired output changed since the original question). Updated again on Mar 9, 2014, per the op's request to always uppercase the first word in a title tag.
#!/usr/bin/perl
use strict;
use warnings;
# Add your articles and prepositions here!!!
my #lowercase = qw(a an at for in is the to);
# Use a hash since lookup is easier later.
my %lowercase;
# Populate the hash with keys and values from #lowercase.
# Values could have been anything, but it needs to match the number of keys, so this is easiest.
#lowercase{#lowercase} = #lowercase;
open(F, "foo.txt") or die $!;
while(<F>) {
if (m/^<title/i) {
chomp;
my #words;
my $line = $_;
# Save the opening <title> tags
my $titleTag = $line;
$titleTag =~ s/^(<[^>]*>).*/$1/;
# Remove any tags in <brackets>
$line =~ s/<[^>]*>//g;
# Uppercase the first letter in every word, except for those in a certain list.
my $first = 1;
foreach my $word (split(/\s/, $line)) {
if ($first) {
$first = 0;
push(#words, ucfirst($word));
next;
}
if ($first || exists $lowercase{$word}) { push(#words, "$word") }
else { push(#words, ucfirst($word)) }
}
print $titleTag . join(" ", #words) . "</title>\n";
}
else {
print $_;
}
}
close(F)
This code does make 2 assumptions:
Each <title>...</title> is on a single line. It never wraps to more
than one line in the file.
The opening <title> tag is at the beginning of the line. This can be easily be changed in the code if desired though.

Perl: How to add a line to sorted text file

I want to add a line to the text file in perl which has data in a sorted form. I have seen examples which show how to append data at the end of the file, but since I want the data in a sorted format.
Please guide me how can it be done.
Basically from what I have tried so far :
(I open a file, grep its content to see if the line which I want to add to the file already exists. If it does than exit else add it to the file (such that the data remains in a sorted format)
open(my $FH, $file) or die "Failed to open file $file \n";
#file_data = <$FH>;
close($FH);
my $line = grep (/$string1/, #file_data);
if($line) {
print "Found\n";
exit(1);
}
else
{
#add the line to the file
print "Not found!\n";
}

Here's an approach using Tie::File so that you can easily treat the file as an array, and List::BinarySearch's bsearch_str_pos function to quickly find the insert point. Once you've found the insert point, you check to see if the element at that point is equal to your insert string. If it's not, splice it into the array. If it is equal, don't splice it in. And finish up with untie so that the file gets closed cleanly.
use strict;
use warnings;
use Tie::File;
use List::BinarySearch qw(bsearch_str_pos);
my $insert_string = 'Whatever!';
my $file = 'something.txt';
my #array;
tie #array, 'Tie::File', $file or die $!;
my $idx = bsearch_str_pos $insert_string, #array;
splice #array, $idx, 0, $insert_string
if $array[$idx] ne $insert_string;
untie #array;
The bsearch_str_pos function from List::BinarySearch is an adaptation of a binary search implementation from Mastering Algorithms with Perl. Its convenient characteristic is that if the search string isn't found, it returns the index point where it could be inserted while maintaining the sort order.

Since you have to read the contents of the text file anyway, how about a different approach?
Read the lines in the file one-by-one, comparing against your target string. If you read a line equal to the target string, then you don't have to do anything.
Otherwise, you eventually read a line 'greater' than your current line according to your sort criteria, or you hit the end of the file. In the former case, you just insert the string at that position, and then copy the rest of the lines. In the latter case, you append the string to the end.
If you don't want to do it that way, you can do a binary search in #file_data to find the spot to add the line without having to examine all of the entries, then insert it into the array before outputting the array to the file.

Here's a simple version that reads from stdin (or filename(s) specified on command line) and appends 'string to append' to the output if it's not found in the input. Outuput is printed on stdout.
#! /usr/bin/perl
$found = 0;
$append='string to append';
while(<>) {
$found = 1 if (m/$append/o);
print
}
print "$append\n" unless ($found);;
Modifying it to edit a file in-place (with perl -i) and taking the append string from the command line would be quite simple.

A 'simple' one-liner to insert a line without using any module could be:
perl -ni -le '$insert="lemon"; $eq=($insert cmp $_); if ($eq == 0){$found++}elsif($eq==-1 && !$found){print$insert} print'
giver a list.txt whose context is:
ananas
apple
banana
pear
the output is:
ananas
apple
banana
lemon
pear

{
local ($^I, #ARGV) = ("", $file); # Enable in-place editing of $file
while (<>) {
# If we found the line exactly, bail out without printing it twice
last if $_ eq $insert;
# If we found the place where the line should be, insert it
if ($_ gt $insert) {
print $insert;
print;
last;
}
print;
}
# We've passed the insertion point, now output the rest of the file
print while <>;
}
Essentially the same answer as pavel's, except with a lot of readability added. Note that $insert should already contain a trailing newline.

perl split on empty file

I have basically the following perl I'm working with:
open I,$coupon_file or die "Error: File $coupon_file will not Open: $! \n";
while (<I>) {
$lctr++;
chomp;
my #line = split/,/;
if (!#line) {
print E "Error: $coupon_file is empty!\n\n";
$processFile = 0; last;
}
}
I'm having trouble determining what the split/,/ function is returning if an empty file is given to it. The code block if (!#line) is never being executed. If I change that to be
if (#line)
than the code block is executed. I've read information on the perl split function over at
http://perldoc.perl.org/functions/split.html and the discussion here about testing for an empty array but not sure what is going on here.
I am new to Perl so am probably missing something straightforward here.

If the file is empty, the while loop body will not run at all.
Evaluating an array in scalar context returns the number of elements in the array.
split /,/ always returns a 1+ elements list if $_ is defined.

You might try some debugging:
...
chomp;
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper( { "line is" => $_ } );
my #line = split/,/;
print Dumper( { "split into" => \#line } );
if (!#line) {
...

Below are a few tips to make your code more idiomatic:
The special variable $. already holds the current line number, so you can likely get rid of $lctr.
Are empty lines really errors, or can you ignore them?
Pull apart the list returned from split and give the pieces names.
Let Perl do the opening with the "diamond operator":
The null filehandle <> is special: it can be used to emulate the behavior of sed and awk. Input from <> comes either from standard input, or from each file listed on the command line. Here's how it works: the first time <> is evaluated, the #ARGV array is checked, and if it is empty, $ARGV[0] is set to "-", which when opened gives you standard input. The #ARGV array is then processed as a list of filenames. The loop
while (<>) {
... # code for each line
}
is equivalent to the following Perl-like pseudo code:
unshift(#ARGV, '-') unless #ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
except that it isn't so cumbersome to say, and will actually work.
Say your input is in a file named input and contains
Campbell's soup,0.50
Mac & Cheese,0.25
Then with
#! /usr/bin/perl
use warnings;
use strict;
die "Usage: $0 coupon-file\n" unless #ARGV == 1;
while (<>) {
chomp;
my($product,$discount) = split /,/;
next unless defined $product && defined $discount;
print "$product => $discount\n";
}
that we run as below on Unix:
$ ./coupons input
Campbell's soup => 0.50
Mac & Cheese => 0.25

Empty file or empty line? Regardless, try this test instead of !#line.
if (scalar(#line) == 0) {
...
}
The scalar method returns the array's length in perl.
Some clarification:
if (#line) {
}
Is the same as:
if (scalar(#line)) {
}
In a scalar context, arrays (#line) return the length of the array. So scalar(#line) forces #line to evaluate in a scalar context and returns the length of the array.

I'm not sure whether you're trying to detect if the line is empty (which your code is trying to) or whether the whole file is empty (which is what the error says).
If the line, please fix your error text and the logic should be like the other posters said (or you can put if ($line =~ /^\s*$/) as your if).
If the file, you simply need to test if (!$lctr) {} after the end of your loop - as noted in another answer, the loop will not be entered if there's no lines in the file.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl collecting xml snippets from log with specific contents - perl

Related

Regular expression to print a string from a command outpout

Perl - Need to append duplicates in a file and write unique value only

Search and Replace using Perl

Perl: How to add a line to sorted text file

perl split on empty file

Categories

Resources