I'm trying to grep multiple patterns from a log file using perl. For the first pattern i'm getting the desired matching pattern via read only variable($1,$2..). But for the next pattern the read only variable is returning the previous value but not the value matching the second pattern.
here is the code:
$tmp = `grep "solo_video_channel_.*(0): queueing" $log`;
chomp($tmp);
$tmp =~ m/(.*):.*solo_video_channel_write(.*): queueing page (.*).*/;
$chnl = $2;
$page = $3;
$timestamp = $1;
$tmp1 = `grep "(0): DUMP GO" $log`;
chomp($tmp1);
$tmp1 =~ m/(.*): solo_video_channel_write(0): DUMP GO/;
$dmp = $1;
print "dump go time = $1\n";
tmp1's value after grep is coming as expected. but $1 value remains same as the previous one.
Any suggestions?
Always make sure that you verify that a regex matched before using a captured variable.
Additionally, there is no reason to shell out to grep. Use Perl's file processing instead:
use strict;
use warnings;
local #ARGV = $log;
while (<>) {
chomp;
if (/solo_video_channel_.*\(0\): queueing/) {
if ( my ( $timestamp, $chnl, $page ) = m/(.*):.*solo_video_channel_write(.*): queueing page (.*).*/ ) {
print "$. - $timestamp, $chnl, $page\n";
}
}
if ( my ($dmp) = m/(.*): solo_video_channel_write\(0\): DUMP GO/ ) {
print "dump go time = $dmp\n";
}
}
Note, your first set of if's could almost certainly be combined into a single if statement, but I left it as is for now.
Why not use Pure Perl? It's faster than running external greps. Plus, you can grep both regular expressions at once. Faster than looping through the file twice.
Always check the value of your rexp match. Here I'm using if statements to do this. Note too that I am printing all lines that don't match with UNMATCHED LINES. You can remove the else when you see that everything is working, or simply redirect 2> /dev/null.
use strict;
use warnings;
use autodie;
use feature qw(say);
my $log = "log.txt";
open my $log_fh, "<", $log;
while ( my $line = <$log_fh> ) {
my $timestamp;
my $channel;
my $page;
my $gotime;
if ( $line =~ /(.*):.*solo_video_channel_(.*):\s+queueing page (.*)/ ) {
$timestamp = $1;
$channel = $2;
$page = $3;
say qq(Timestamp = "$timestamp" Channel = "$channel" Page = "$page");
}
elsif ( $line =~ /(.*): solo_video_channel_write(0): DUMP GO/ ) {
$gotime = $1;
say "Dump Go Time = $1";
}
else {
say STDERR qq(UNMATCHED LINES: "$line");
}
}
close $log_fh;
In the second regexp you need to escape the literal brackets
$tmp1 =~ m/(.*): solo_video_channel_write\(0\): DUMP GO/
This is because the expression \(0\) matches the exact pattern (0)
In the example given in this answer this would include strings such as
37: solo_video_channel_write(0): DUMP GO
In contrast, the expression (0) matches the exact pattern 0 and sets a capture group.
With the regexp given in your original question
$tmp1 =~ m/(.*): solo_video_channel_write(0): DUMP GO/;
matching would occur on strings such as
37: solo_video_channel_write0: DUMP GO
Of course in the original program the strings are not in this format, so they do not match and $1 is not set
The regular expression syntax for the shell program grep is (confusingly) different
To use round brackets for setting a capture group they must be escaped with a backslash, which is the opposite to the syntax in perl
Related
I need some help with following perl code.
#!perl -w
use strict;
use warnings;
open my $file, '<', 'ubb' or die $1;
my $spool = 0;
my #matchingLines;
while (<$file>) {
if (/GROUPS/i) {
$spool = 1;
next;
}
elsif (/SERVERS/i) {
$spool = 0;
print map { "$_" } #matchingLines;
#matchingLines = ();
}
if ($spool) {
push (#matchingLines, $_);
}
}
close ($file);
Output from that is shown below.
ADM LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=1
ADM_TMS LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=2
TMSNAME=TMS
ADM_1 LMID=GW_S4_1_PM
GRPNO=11
ADM_2 LMID=GW_S4_2_BM
GRPNO=12
DMWSG_Gateway_1 LMID=GW_S4_1_PM
GRPNO=101
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_Gateway_2 LMID=GW_S4_2_BM
GRPNO=201
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_1 LMID=GW_S4_1_PM
GRPNO=106
DMWSG_2 LMID=GW_S4_2_BM
GRPNO=206
But I only would like to get the first word of each line (e.g. ADM, ADM_TMS, ADM_1).
Note that the file has a lot of other lines above and below what's printed here. I only want to do this for lines that is in between GROUPS and SERVERS.
I would suggest 2 changes in your code
Note: Tested these with your sample data (plus other stuff) in your question.
I: Extract first word before push
Change this
push (#matchingLines, $_);
to
push (#matchingLines, /^(\S+)/);
This would push the first word of each line into the array, instead of the entire line.
Note that /^(\S+)/ is shorthand for $_ =~ /^(\S+)/. If you're using an explicit loop variable like in 7stud's answer, you can't use this shorthand, use the explicit syntax instead, say $line =~ /^(\S+)/ or whatever your loop variable is.
Of course, you can also use split function as suggested in 7stud's answer.
II: Change how you print
Change this
print map { "$_" } #matchingLines;
into
local $" = "\n";
print "#matchingLines \n";
$" specifies the delimiter used for list elements when the array is printed with print or say inside double quotes.
Alternatively, as per TLP's suggestion,
$\ = $/;
print for #lines;
or
print join("\n", #lines), "\n"
Note that $/ is the input record separator (newline by default), $\ is the output record separator (undefined by default). $\ is appended after each print command.
For more information on $/, $\, and $":
See perldoc perlvar (just use CTRL+F to find them in that page)
Or you can simply use perldoc -v '$/' etc on your console to get those information.
Note on readability
I don't think implicit regex matching i.e. /pattern/ is bad per se.
But matching against a variable, i.e. $variable =~ /pattern/ is more readable (as in you can immediately see there's a regex matching going on) and more beginner-friendly, at the cost of conciseness.
use strict;
use warnings;
use 5.014; #say()
my $fname = 'data.txt';
open my $INFILE, '<', $fname
or die "Couldn't open $fname: $!"; #-->Not $1"
my $recording_on = 0;
my #matching_lines;
for my $line (<$INFILE>) {
if ($line =~ /groups/i) {
$recording_on = 1;
next;
}
elsif ($line =~ /servers/i) {
say for #matching_lines; #say() is the same as print(), but it adds a newline at the end
#matching_lines = ();
$recording_on = 0;
}
if ($recording_on) {
my ($first_word, $trash) = split " ", $line, 2;
push #matching_lines, $first_word;
}
}
close $INFILE;
You can use the flip-flop operator (range) to select a part of your input. The idea of this operator is that it returns false until its LHS (left hand side) returns true, and after that it returns true until its RHS returns false, after which it is reset. It is somewhat like preserving a state.
Note that the edge lines are also included in the match, so we need to remove those. After that, use doubleDown's idea and push /^(\S+)/ onto an array. The nice thing about using this with push is that the capture regex returns an empty list if it fails, and this gives us a warning-free failure when the regex does not match.
use strict;
use warnings;
my #matches;
while (<>) {
if (/GROUPS/i .. /SERVERS/i) { # flip-flop remembers the matches
next if (/GROUPS/i or /SERVERS/i);
push #matches, /^(\S+)/;
}
}
# #matches should now contain the first words of those lines
I need to detect if the first character in a file is an equals sign (=) and display the line number. How should I write the if statement?
$i=0;
while (<INPUT>) {
my($line) = $_;
chomp($line);
$findChar = substr $_, 0, 1;
if($findChar == "=")
$output = "$i\n";
print OUTPUT $output;
$i++;
}
Idiomatic perl would use a regular expression (^ meaning beginning of line) plus one of the dreaded builtin variables which happens to mean "line in file":
while (<INPUT>) {
print "$.\n" if /^=/;
}
See also perldoc -v '$.'
Use $findChar eq "=". In Perl:
== and != are numeric comparisons. They will convert both operands to a number.
eq and ne are string comparisons. They will convert both operands to a string.
Yes, this is confusing. Yes, I still write == when I mean eq ALL THE TIME. Yes, it takes me forever to spot my mistake too.
It looks like you are not using strict and warnings. Use them, especially since you do not know Perl, you might also want to add diagnostics to the list of must-use pragmas.
You are keeping track of the input line number in a separate variable $i. Perl has various builtin variables documented in perlvar. Some of these, such as $. are very useful use them.
You are using my($line) = $_; in the body of the while loop. Instead, avoid $_ and assign to $line directly as in while ( my $line = <$input> ).
Note that bareword filehandles such as INPUT are package global. With the exception of the DATA filehandle, you are better off using lexical filehandles to properly limit the scope of your filehandles.
In your posts, include sample data in the __DATA_ section so others can copy, paste and run your code without further work.
With these comments in mind, you can print all lines that do not start with = using:
#!/usr/bin/perl
use strict; use warnings;
while (my $line = <DATA> ) {
my $first_char = substr $line, 0, 1;
if ( $first_char ne '=' ) {
print "$.:$first_char\n";
}
}
__DATA__
=
=
a
=
+
However, I would be inclined to write:
while (my $line = <DATA> ) {
# this will skip blank lines
if ( my ($first_char) = $line =~ /^(.)/ ) {
print "$.:$first_char\n" unless $first_char eq '=';
}
}
I'd to read a file, e.g. test.test which contains
#test:testdescription\n
#cmd:binary\n
#return:0\n
#stdin:|\n
echo"toto"\n
echo"tata"\n
#stdout:|\n
toto\n
tata\n
#stderr:\n
I succeeded in taking which are after #test: ; #cmd: etc...
but for stdin or stdout, I want to take all the line before the next # to a table #stdin and #stdout.
I do a loop while ($line = <TEST>) so it will look at each line. If i see a pattern /^#stdin:|/, I want to move to the next line and take this value to a
table until i see the next #.
How do I move to the next line in the while loop?
This file format can be easily handled with some creativity in selecting the appropriate value for $/:
use strict; use warnings;
my %parsed;
{
local $/ = '#';
while ( my $line = <DATA> ) {
chomp $line;
my $content = (split /:/, $line, 2)[1];
next unless defined $content;
$content =~ s/\n+\z//;
if ( my ($chan) = $line =~ /^(std(?:err|in|out))/ ) {
$content =~ s/^\|\n//;
$parsed{$chan} = [ split /\n/, $content];
}
elsif ( my ($var) = $line =~ /^(cmd|return|test)/ ) {
$parsed{ $var } = $content;
}
}
}
use YAML;
print Dump \%parsed;
__DATA__
#test:testdescription
#cmd:binary
#return:0
#stdin:|
echo"toto"
echo"tata"
#stdout:|
toto
tata
#stderr:
Output:
---
cmd: binary
return: 0
stderr: []
stdin:
- echo"toto"
- echo"tata"
stdout:
- toto
- tata
test: testdescription
UPDATED as per user's colmments
If I understand the question correctly, you want to read one more line within a loop?
If so, you can either:
just do another line read inside the loop.
my $another_line = <TEST>;
Keep some state flag and use it next iteration of the loop, and accumulate lines between stdins in a buffer:
my $last_line_was_stdin = 0;
my #line_buffer = ();
while ($line = <TEST>) {
if (/^#stdin:|/) {
#
# Some Code to process all lines acccumulated since last "stdin"
#
#line_buffer = ();
$last_line_was_stdin = 1;
next;
}
push #line_buffer, $line;
}
This solution may not do 100% of what you need but it defines a pattern you need to follow in your state machine implementation: read a line. Check your current state (if it matters). Based on the current state and a pattern in the line, verify what do do about the current line (add to the buffer? change the state? If changing a state, process the buffer based on last state?)
Also, as per your comment, you have a bug in your regex - the pipe (| character) means "OR" in regex, so you are saying "if line starts with #stdin OR matches an empty regex" - the latter part is always true so your regex will match 100% of time. You need to escape the "|" via /^#stdin:\|/ or /^#stdin:[|]/
Some of the environment variables which we use in Unix are as below (just an example):
VAR1=variable1
VAR2=variable2
VAR3=variable3
# and so on
Now, I have a perl script (let's call it test.pl) which reads a tab delimited text file (let's call it test.txt) and pushes the contents of it columnwise in separate arays. The first column of test.txt contains the following information for example (the strings in first column are delimited by / but I do not know how may / a string would contain and at what position the environment variable would appear):
$VAR1/$VAR2/$VAR3
$VAR3/some_string/SOME_OTHER_STRING/and_so_on/$VAR2
$VAR2/$VAR1/some_string/some_string_2/some_string_3/some_string_n/$VAR2
The extract of the script is as below:
use strict;
my $input0 = shift or die "must provide test.txt as the argument 0\n";
open(IN0,"<",$input0) || die "Cannot open $input0 for reading: $!";
my #first_column;
while (<IN0>)
{
chomp;
my #cols = split(/\t/);
my $first_col = `eval $cols[0]`; #### but this does not work
# here goes the push stmt to populate the array
### more code here
}
close(IN0);
Question: How can I access environment variables in such a situation so that the array is populated as below:
$first_column[0] = variable1/vraible2/variable3
$first_column[1] = variable3/some_string/SOME_OTHER_STRING/and_so_on/variable2
$first_column[2] = variable2/variable1/some_string/some_string_2/some_string_3/some_string_n/variable2
I think you are looking for a way to process configuration files. I like Config::Std for that purpose although there are many others on CPAN.
Here is a way of processing just the contents of $cols[0] to show in an explicit way what you need to do with it:
#!/usr/bin/perl
use strict; use warnings;
# You should not type this. I am assuming the
# environment variables are defined in the environment.
# They are here for testing.
#ENV{qw(VAR1 VAR2 VAR3)} = qw(variable1 variable2 variable3);
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
chomp $line;
my #components = split qr{/}, $line;
for my $c ( #components ) {
if ( my ($var) = $c =~ m{^\$(\w+)\z} ) {
if ( exists $ENV{$var} ) {
$c = $ENV{$var};
}
}
}
print join('/', #components), "\n";
}
__DATA__
$VAR1/$VAR2/$VAR3
$VAR3/some_string/SOME_OTHER_STRING/and_so_on/$VAR2
$VAR2/$VAR1/some_string/some_string_2/some_string_3/some_string_n/$VAR2
Instead of the split/join, you can use s/// to replace patterns that look like variables with the corresponding values in %ENV. For illustration, I put a second column in the __DATA__ section which is supposed to stand for a description of the path, and turned each line in to a hashref. Note, I factored out the actual substitution to eval_path so you can try alternatives without messing with the main loop:
#!/usr/bin/perl
use strict; use warnings;
# You should not type this. I am assuming the
# environment variables are defined in the environment.
# They are here for testing.
#ENV{qw(VAR1 VAR2 VAR3)} = qw(variable1 variable2 variable3);
my #config;
while ( my $config = <DATA> ) {
last unless $config =~ /\S/;
chomp $config;
my #cols = split /\t/, $config;
$cols[0] = eval_path( $cols[0] );
push #config, { $cols[1] => $cols[0] };
}
use YAML;
print Dump \#config;
sub eval_path {
my ($path) = #_;
$path =~ s{\$(\w+)}{ exists $ENV{$1} ? $ENV{$1} : $1 }ge;
return $path;
}
__DATA__
$VAR1/$VAR2/$VAR3 Home sweet home
$VAR3/some_string/SOME_OTHER_STRING/and_so_on/$VAR2 Man oh man
$VAR2/$VAR1/some_string/some_string_2/some_string_3/some_string_n/$VAR2 Can't think of any other witty remarks ;-)
Output:
---
- Home sweet home: variable1/variable2/variable3
- Man oh man: variable3/some_string/SOME_OTHER_STRING/and_so_on/variable2
- Can't think of any other witty remarks ;-): variable2/variable1/some_string/some_string_2/some_string_3/some_string_n/variable2
I think you just want to do this:
my #cols = map { s/(\$(\w+))/ $ENV{$2} || $1 /ge; $_ } split /\t/;
What you would do here is after you split them you would take each sequence of '$' followed by word characters and check to see if there was an environment variable for the word portion of it, otherwise leave it as is.
The e switch on a substitution allows you to execute code for the replacement value.
If you expect a '0' for any environment variable value, it's better off to do a defined or, that came in with 5.10.
my #cols = map { s|(\$(\w+))| $ENV{$2} // $1 |ge; $_ } split /\t/;
(Ignore the markup. // is a defined-or, not a C-comment)
If you want to allow for full shell expansions, one option to use the shell to do the expansion for you, perhaps via echo:
$ cat input
$FOO
bar
${FOO//cat/dog}
$ FOO=cat perl -wpe '$_ = qx"echo $_"' input
cat
bar
dog
If you cannot trust the contents of the environment variable, this introduces a security risk, as invoking qx on a string may cause the shell to invoke commands embedded in the string. As a result, this scriptlet will not run under taint mode (-T).
Perl keeps its environment variables in %ENV, in your case you can change your code like so:
my $first_col = $ENV[$cols[0]];
I'm maintaining a script that can get its input from various sources, and works on it per line. Depending on the actual source used, linebreaks might be Unix-style, Windows-style or even, for some aggregated input, mixed(!).
When reading from a file it goes something like this:
#lines = <IN>;
process(\#lines);
...
sub process {
#lines = shift;
foreach my $line (#{$lines}) {
chomp $line;
#Handle line by line
}
}
So, what I need to do is replace the chomp with something that removes either Unix-style or Windows-style linebreaks.
I'm coming up with way too many ways of solving this, one of the usual drawbacks of Perl :)
What's your opinion on the neatest way to chomp off generic linebreaks? What would be the most efficient?
Edit: A small clarification - the method 'process' gets a list of lines from somewhere, not nessecarily read from a file. Each line might have
No trailing linebreaks
Unix-style linebreaks
Windows-style linebreaks
Just Carriage-Return (when original data has Windows-style linebreaks and is read with $/ = '\n')
An aggregated set where lines have different styles
After digging a bit through the perlre docs a bit, I'll present my best suggestion so far that seems to work pretty good. Perl 5.10 added the \R character class as a generalized linebreak:
$line =~ s/\R//g;
It's the same as:
(?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])
I'll keep this question open a while yet, just to see if there's more nifty ways waiting to be suggested.
Whenever I go through input and want to remove or replace characters I run it through little subroutines like this one.
sub clean {
my $text = shift;
$text =~ s/\n//g;
$text =~ s/\r//g;
return $text;
}
It may not be fancy but this method has been working flawless for me for years.
$line =~ s/[\r\n]+//g;
Reading perlport I'd suggest something like
$line =~ s/\015?\012?$//;
to be safe for whatever platform you're on and whatever linefeed style you may be processing because what's in \r and \n may differ through different Perl flavours.
Note from 2017: File::Slurp is not recommended due to design mistakes and unmaintained errors. Use File::Slurper or Path::Tiny instead.
extending on your answer
use File::Slurp ();
my $value = File::Slurp::slurp($filename);
$value =~ s/\R*//g;
File::Slurp abstracts away the File IO stuff and just returns a string for you.
NOTE
Important to note the addition of /g , without it, given a multi-line string, it will only replace the first offending character.
Also, the removal of $, which is redundant for this purpose, as we want to strip all line breaks, not just line-breaks before whatever is meant by $ on this OS.
In a multi-line string, $ matches the end of the string and that would be problematic ).
Point 3 means that point 2 is made with the assumption that you'd also want to use /m otherwise '$' would be basically meaningless for anything practical in a string with >1 lines, or, doing single line processing, an OS which actually understands $ and manages to find the \R* that proceed the $
Examples
while( my $line = <$foo> ){
$line =~ $regex;
}
Given the above notation, an OS which does not understand whatever your files '\n' or '\r' delimiters, in the default scenario with the OS's default delimiter set for $/ will result in reading your whole file as one contiguous string ( unless your string has the $OS's delimiters in it, where it will delimit by that )
So in this case all of these regex are useless:
/\R*$// : Will only erase the last sequence of \R in the file
/\R*// : Will only erase the first sequence of \R in the file
/\012?\015?// : When will only erase the first 012\015 , \012 , or \015 sequence, \015\012 will result in either \012 or \015 being emitted.
/\R*$// : If there happens to be no byte sequences of '\015$OSDELIMITER' in the file, then then NO linebreaks will be removed except for the OS's own ones.
It would appear nobody gets what I'm talking about, so here is example code, that is tested to NOT remove line feeds. Run it, you'll see that it leaves the linefeeds in.
#!/usr/bin/perl
use strict;
use warnings;
my $fn = 'TestFile.txt';
my $LF = "\012";
my $CR = "\015";
my $UnixNL = $LF;
my $DOSNL = $CR . $LF;
my $MacNL = $CR;
sub generate {
my $filename = shift;
my $lineDelimiter = shift;
open my $fh, '>', $filename;
for ( 0 .. 10 )
{
print $fh "{0}";
print $fh join "", map { chr( int( rand(26) + 60 ) ) } 0 .. 20;
print $fh "{1}";
print $fh $lineDelimiter->();
print $fh "{2}";
}
close $fh;
}
sub parse {
my $filename = shift;
my $osDelimiter = shift;
my $message = shift;
print "Parsing $message File $filename : \n";
local $/ = $osDelimiter;
open my $fh, '<', $filename;
while ( my $line = <$fh> )
{
$line =~ s/\R*$//;
print ">|" . $line . "|<";
}
print "Done.\n\n";
}
my #all = ( $DOSNL,$MacNL,$UnixNL);
generate 'Windows.txt' , sub { $DOSNL };
generate 'Mac.txt' , sub { $MacNL };
generate 'Unix.txt', sub { $UnixNL };
generate 'Mixed.txt', sub {
return #all[ int(rand(2)) ];
};
for my $os ( ["$MacNL", "On Mac"], ["$DOSNL", "On Windows"], ["$UnixNL", "On Unix"]){
for ( qw( Windows Mac Unix Mixed ) ){
parse $_ . ".txt", #{ $os };
}
}
For the CLEARLY Unprocessed output, see here: http://pastebin.com/f2c063d74
Note there are certain combinations that of course work, but they are likely the ones you yourself naĆvely tested.
Note that in this output, all results must be of the form >|$string|<>|$string|< with NO LINE FEEDS to be considered valid output.
and $string is of the general form {0}$data{1}$delimiter{2} where in all output sources, there should be either :
Nothing between {1} and {2}
only |<>| between {1} and {2}
In your example, you can just go:
chomp(#lines);
Or:
$_=join("", #lines);
s/[\r\n]+//g;
Or:
#lines = split /[\r\n]+/, join("", #lines);
Using these directly on a file:
perl -e '$_=join("",<>); s/[\r\n]+//g; print' <a.txt |less
perl -e 'chomp(#a=<>);print #a' <a.txt |less
To extend Ted Cambron's answer above and something that hasn't been addressed here: If you remove all line breaks indiscriminately from a chunk of entered text, you will end up with paragraphs running into each other without spaces when you output that text later. This is what I use:
sub cleanLines{
my $text = shift;
$text =~ s/\r/ /; #replace \r with space
$text =~ s/\n/ /; #replace \n with space
$text =~ s/ / /g; #replace double-spaces with single space
return $text;
}
The last substitution uses the g 'greedy' modifier so it continues to find double-spaces until it replaces them all. (Effectively substituting anything more that single space)