Some of the environment variables which we use in Unix are as below (just an example):
VAR1=variable1
VAR2=variable2
VAR3=variable3
# and so on
Now, I have a perl script (let's call it test.pl) which reads a tab delimited text file (let's call it test.txt) and pushes the contents of it columnwise in separate arays. The first column of test.txt contains the following information for example (the strings in first column are delimited by / but I do not know how may / a string would contain and at what position the environment variable would appear):
$VAR1/$VAR2/$VAR3
$VAR3/some_string/SOME_OTHER_STRING/and_so_on/$VAR2
$VAR2/$VAR1/some_string/some_string_2/some_string_3/some_string_n/$VAR2
The extract of the script is as below:
use strict;
my $input0 = shift or die "must provide test.txt as the argument 0\n";
open(IN0,"<",$input0) || die "Cannot open $input0 for reading: $!";
my #first_column;
while (<IN0>)
{
chomp;
my #cols = split(/\t/);
my $first_col = `eval $cols[0]`; #### but this does not work
# here goes the push stmt to populate the array
### more code here
}
close(IN0);
Question: How can I access environment variables in such a situation so that the array is populated as below:
$first_column[0] = variable1/vraible2/variable3
$first_column[1] = variable3/some_string/SOME_OTHER_STRING/and_so_on/variable2
$first_column[2] = variable2/variable1/some_string/some_string_2/some_string_3/some_string_n/variable2
I think you are looking for a way to process configuration files. I like Config::Std for that purpose although there are many others on CPAN.
Here is a way of processing just the contents of $cols[0] to show in an explicit way what you need to do with it:
#!/usr/bin/perl
use strict; use warnings;
# You should not type this. I am assuming the
# environment variables are defined in the environment.
# They are here for testing.
#ENV{qw(VAR1 VAR2 VAR3)} = qw(variable1 variable2 variable3);
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
chomp $line;
my #components = split qr{/}, $line;
for my $c ( #components ) {
if ( my ($var) = $c =~ m{^\$(\w+)\z} ) {
if ( exists $ENV{$var} ) {
$c = $ENV{$var};
}
}
}
print join('/', #components), "\n";
}
__DATA__
$VAR1/$VAR2/$VAR3
$VAR3/some_string/SOME_OTHER_STRING/and_so_on/$VAR2
$VAR2/$VAR1/some_string/some_string_2/some_string_3/some_string_n/$VAR2
Instead of the split/join, you can use s/// to replace patterns that look like variables with the corresponding values in %ENV. For illustration, I put a second column in the __DATA__ section which is supposed to stand for a description of the path, and turned each line in to a hashref. Note, I factored out the actual substitution to eval_path so you can try alternatives without messing with the main loop:
#!/usr/bin/perl
use strict; use warnings;
# You should not type this. I am assuming the
# environment variables are defined in the environment.
# They are here for testing.
#ENV{qw(VAR1 VAR2 VAR3)} = qw(variable1 variable2 variable3);
my #config;
while ( my $config = <DATA> ) {
last unless $config =~ /\S/;
chomp $config;
my #cols = split /\t/, $config;
$cols[0] = eval_path( $cols[0] );
push #config, { $cols[1] => $cols[0] };
}
use YAML;
print Dump \#config;
sub eval_path {
my ($path) = #_;
$path =~ s{\$(\w+)}{ exists $ENV{$1} ? $ENV{$1} : $1 }ge;
return $path;
}
__DATA__
$VAR1/$VAR2/$VAR3 Home sweet home
$VAR3/some_string/SOME_OTHER_STRING/and_so_on/$VAR2 Man oh man
$VAR2/$VAR1/some_string/some_string_2/some_string_3/some_string_n/$VAR2 Can't think of any other witty remarks ;-)
Output:
---
- Home sweet home: variable1/variable2/variable3
- Man oh man: variable3/some_string/SOME_OTHER_STRING/and_so_on/variable2
- Can't think of any other witty remarks ;-): variable2/variable1/some_string/some_string_2/some_string_3/some_string_n/variable2
I think you just want to do this:
my #cols = map { s/(\$(\w+))/ $ENV{$2} || $1 /ge; $_ } split /\t/;
What you would do here is after you split them you would take each sequence of '$' followed by word characters and check to see if there was an environment variable for the word portion of it, otherwise leave it as is.
The e switch on a substitution allows you to execute code for the replacement value.
If you expect a '0' for any environment variable value, it's better off to do a defined or, that came in with 5.10.
my #cols = map { s|(\$(\w+))| $ENV{$2} // $1 |ge; $_ } split /\t/;
(Ignore the markup. // is a defined-or, not a C-comment)
If you want to allow for full shell expansions, one option to use the shell to do the expansion for you, perhaps via echo:
$ cat input
$FOO
bar
${FOO//cat/dog}
$ FOO=cat perl -wpe '$_ = qx"echo $_"' input
cat
bar
dog
If you cannot trust the contents of the environment variable, this introduces a security risk, as invoking qx on a string may cause the shell to invoke commands embedded in the string. As a result, this scriptlet will not run under taint mode (-T).
Perl keeps its environment variables in %ENV, in your case you can change your code like so:
my $first_col = $ENV[$cols[0]];
Related
I'm trying to grep multiple patterns from a log file using perl. For the first pattern i'm getting the desired matching pattern via read only variable($1,$2..). But for the next pattern the read only variable is returning the previous value but not the value matching the second pattern.
here is the code:
$tmp = `grep "solo_video_channel_.*(0): queueing" $log`;
chomp($tmp);
$tmp =~ m/(.*):.*solo_video_channel_write(.*): queueing page (.*).*/;
$chnl = $2;
$page = $3;
$timestamp = $1;
$tmp1 = `grep "(0): DUMP GO" $log`;
chomp($tmp1);
$tmp1 =~ m/(.*): solo_video_channel_write(0): DUMP GO/;
$dmp = $1;
print "dump go time = $1\n";
tmp1's value after grep is coming as expected. but $1 value remains same as the previous one.
Any suggestions?
Always make sure that you verify that a regex matched before using a captured variable.
Additionally, there is no reason to shell out to grep. Use Perl's file processing instead:
use strict;
use warnings;
local #ARGV = $log;
while (<>) {
chomp;
if (/solo_video_channel_.*\(0\): queueing/) {
if ( my ( $timestamp, $chnl, $page ) = m/(.*):.*solo_video_channel_write(.*): queueing page (.*).*/ ) {
print "$. - $timestamp, $chnl, $page\n";
}
}
if ( my ($dmp) = m/(.*): solo_video_channel_write\(0\): DUMP GO/ ) {
print "dump go time = $dmp\n";
}
}
Note, your first set of if's could almost certainly be combined into a single if statement, but I left it as is for now.
Why not use Pure Perl? It's faster than running external greps. Plus, you can grep both regular expressions at once. Faster than looping through the file twice.
Always check the value of your rexp match. Here I'm using if statements to do this. Note too that I am printing all lines that don't match with UNMATCHED LINES. You can remove the else when you see that everything is working, or simply redirect 2> /dev/null.
use strict;
use warnings;
use autodie;
use feature qw(say);
my $log = "log.txt";
open my $log_fh, "<", $log;
while ( my $line = <$log_fh> ) {
my $timestamp;
my $channel;
my $page;
my $gotime;
if ( $line =~ /(.*):.*solo_video_channel_(.*):\s+queueing page (.*)/ ) {
$timestamp = $1;
$channel = $2;
$page = $3;
say qq(Timestamp = "$timestamp" Channel = "$channel" Page = "$page");
}
elsif ( $line =~ /(.*): solo_video_channel_write(0): DUMP GO/ ) {
$gotime = $1;
say "Dump Go Time = $1";
}
else {
say STDERR qq(UNMATCHED LINES: "$line");
}
}
close $log_fh;
In the second regexp you need to escape the literal brackets
$tmp1 =~ m/(.*): solo_video_channel_write\(0\): DUMP GO/
This is because the expression \(0\) matches the exact pattern (0)
In the example given in this answer this would include strings such as
37: solo_video_channel_write(0): DUMP GO
In contrast, the expression (0) matches the exact pattern 0 and sets a capture group.
With the regexp given in your original question
$tmp1 =~ m/(.*): solo_video_channel_write(0): DUMP GO/;
matching would occur on strings such as
37: solo_video_channel_write0: DUMP GO
Of course in the original program the strings are not in this format, so they do not match and $1 is not set
The regular expression syntax for the shell program grep is (confusingly) different
To use round brackets for setting a capture group they must be escaped with a backslash, which is the opposite to the syntax in perl
I'm trying to edit a text using Perl. I need to make a substitution but the substitution cannot be applied once an specific word is found in the text. So, imagine I want to substitute all the "hello" forms by "goodbye", but the substitution cannot be applied once the word "foo" is found.
I tried to do this:
use warnings;
use strict;
$/ = undef;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
while(<F>) {
do {s/hello/goodbay/} until (m{foo});
print;
}
close F;
But, as a result, only the first "hello" of my text is changed.
Any suggestion?
Trying to think what would be the most efficient. It should be one of the following:
s{^(.*?)(foo|\z)}{
my $s = $1;
$s =~ s{hello}{goodbay}g;
$s.$2
}se;
print;
or (same as above, but requires 5.14+)
s{^(.*?)(foo|\z)}{ s{hello}{goodbay}gr . $2 }se;
print;
or
my $pos = /foo/ ? $-[0] : length;
my $s = substr($_, 0, $pos, '');
$s =~ s{hello}{goodbay}g;
print($s);
print;
Both work even if foo isn't present.
This solution uses less memory:
# Assumes foo will always be present
# (though it could be expanded to handle that
# Assumes foo isn't a regex pattern.
local $/ = "foo";
$_ = <$fh>;
chomp;
s{hello}{goodbay}g;
print;
print $/;
local $/;
print <$fh>;
If the substrings you work on (the hello and foo of your example) are single words, a easy way would probably be to replace $/ = undef; with $/ = " ";. Currently you slurp in the whole file at once, meaning the while loop gets executed at most once.
That is because there is only one "line" in the whole input after you told perl that there are no line separators.
If you use a space as input separator, it will loop over the input word by word and hopefully work as you intend.
Use a flag variable:
use warnings;
use strict;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
my $replace=1;
while(<F>) {
$replace = 0 if m{foo};
s/hello/goodbye/g if $replace;
print;
}
close F;
This stops at the line containing the end pattern. It will be slightly more complicated if you want to substitute up to just before the match.
This answer uses the ${^PREMATCH] and related variables introduced in Perl 5.10.
#!/usr/bin/env perl
use v5.10.0;
use strict;
use warnings;
my $foo_found;
while (my $line = <>) {
if (!$foo_found) {
if ($line =~ m/foo/ip) {
# only replace hellos in the part before foo
${^PREMATCH} =~ s/hello/goodbye/g;
$line = "${^PREMATCH}${^MATCH}${^POSTMATCH}";
$foo_found ++;
} else {
$line =~ s/hello/goodbye/ig;
}
}
print $line;
}
Given the following input:
hello cruel world
hello baseball
hello mudda, hello fadda
foo
The rest of the hellos should stay
Last hello
I get the following output
goodbye cruel world
goodbye baseball
goodbye mudda, goodbye fadda
foo
The rest of the hellos should stay
Last hello
If you don't have 5.10 you can use $` and related variables but they come with a performance hit. See perldoc perlvar for details.
I'm reading this textfile to get ONLY the words in it and ignore all kind of whitespaces:
hello
now
do you see this.sadslkd.das,msdlsa but
i hoohoh
And this is my Perl code:
#!usr/bin/perl -w
require 5.004;
open F1, './text.txt';
while ($line = <F1>) {
#print $line;
#arr = split /\s+/, $line;
foreach $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
#print #arr;
}
close F1;
And this is the output:
hello
now
do
you
see
this.sadslkd.das,msdlsa
but
i
hoohoh
The output is showing two newlines but I am expecting the output to be just words. What should I do to just get words?
You should always use strict and use warnings (in preference to the -w command-line qualifier) at the top of every Perl program, and declare each variable at its first point of use using my. That way Perl will tell you about simple errors that you may otherwise overlook.
You should also use lexical file handles with the three-parameter form of open, and check the status to make sure it succeeded. There is little point in explicitly closing an input file unless you expect your program to run for an appreciable time, as Perl will close all files for you on exit.
Do you really need to require Perl v5.4? That version is fifteen years old, and if there is anything older than that installed then you have a museum!
Your program would be better like this:
use strict;
use warnings;
open my $fh, '<', './text.txt' or die $!;
while (my $line = <$fh>) {
my #arr = split /\s+/, $line;
foreach my $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
}
Note: my apologies. The warnings pragma and lexical file handles were introduced only in v5.6 so that part of my answer is irrelevant. The latest version of Perl is v5.16 and you really should upgrade
As Birei has pointed out, the problem is that, when the line has leading whitespace, there is a empty field before the first separator. Imagine if your data was comma-separated, then you would want Perl to report a leading empty field if the line started with a comma.
To extract all the non-space characters you can use a regular expression that does exactly that
my #arr = $line =~ /\S+/g;
and this can be emulated by using the default parameter for split which is a single quoted space (not a regular expression)
my #arr = $line =~ split ' ', $line;
In this case split behaves like the awk utility and discards any leading empty fields as you expected.
This is even simpler if you let Perl use the $_ variable in the read loop, as all of the parameters for split can be defaulted:
while (<F1>) {
my #arr = split;
foreach my $w (#arr) {
print "$w\n" if $w !~ /^\s+$/;
}
}
This line is the problem:
#arr=split(/\s+/,$line);
\s+ does a match just before the leading spaces. Use ' ' instead.
#arr=split(' ',$line);
I believe that in this line:
if(!($w =~ /^\s+$/))
You wanted to ask if there's nothing in this row - don't print it.
But the "+" in the REGEX actually force it to have at least 1 space.
If you change the "\s+" to "\s*", you'll see that it's working. because * is 0 occurrences or more ...
I need to detect if the first character in a file is an equals sign (=) and display the line number. How should I write the if statement?
$i=0;
while (<INPUT>) {
my($line) = $_;
chomp($line);
$findChar = substr $_, 0, 1;
if($findChar == "=")
$output = "$i\n";
print OUTPUT $output;
$i++;
}
Idiomatic perl would use a regular expression (^ meaning beginning of line) plus one of the dreaded builtin variables which happens to mean "line in file":
while (<INPUT>) {
print "$.\n" if /^=/;
}
See also perldoc -v '$.'
Use $findChar eq "=". In Perl:
== and != are numeric comparisons. They will convert both operands to a number.
eq and ne are string comparisons. They will convert both operands to a string.
Yes, this is confusing. Yes, I still write == when I mean eq ALL THE TIME. Yes, it takes me forever to spot my mistake too.
It looks like you are not using strict and warnings. Use them, especially since you do not know Perl, you might also want to add diagnostics to the list of must-use pragmas.
You are keeping track of the input line number in a separate variable $i. Perl has various builtin variables documented in perlvar. Some of these, such as $. are very useful use them.
You are using my($line) = $_; in the body of the while loop. Instead, avoid $_ and assign to $line directly as in while ( my $line = <$input> ).
Note that bareword filehandles such as INPUT are package global. With the exception of the DATA filehandle, you are better off using lexical filehandles to properly limit the scope of your filehandles.
In your posts, include sample data in the __DATA_ section so others can copy, paste and run your code without further work.
With these comments in mind, you can print all lines that do not start with = using:
#!/usr/bin/perl
use strict; use warnings;
while (my $line = <DATA> ) {
my $first_char = substr $line, 0, 1;
if ( $first_char ne '=' ) {
print "$.:$first_char\n";
}
}
__DATA__
=
=
a
=
+
However, I would be inclined to write:
while (my $line = <DATA> ) {
# this will skip blank lines
if ( my ($first_char) = $line =~ /^(.)/ ) {
print "$.:$first_char\n" unless $first_char eq '=';
}
}
I'd to read a file, e.g. test.test which contains
#test:testdescription\n
#cmd:binary\n
#return:0\n
#stdin:|\n
echo"toto"\n
echo"tata"\n
#stdout:|\n
toto\n
tata\n
#stderr:\n
I succeeded in taking which are after #test: ; #cmd: etc...
but for stdin or stdout, I want to take all the line before the next # to a table #stdin and #stdout.
I do a loop while ($line = <TEST>) so it will look at each line. If i see a pattern /^#stdin:|/, I want to move to the next line and take this value to a
table until i see the next #.
How do I move to the next line in the while loop?
This file format can be easily handled with some creativity in selecting the appropriate value for $/:
use strict; use warnings;
my %parsed;
{
local $/ = '#';
while ( my $line = <DATA> ) {
chomp $line;
my $content = (split /:/, $line, 2)[1];
next unless defined $content;
$content =~ s/\n+\z//;
if ( my ($chan) = $line =~ /^(std(?:err|in|out))/ ) {
$content =~ s/^\|\n//;
$parsed{$chan} = [ split /\n/, $content];
}
elsif ( my ($var) = $line =~ /^(cmd|return|test)/ ) {
$parsed{ $var } = $content;
}
}
}
use YAML;
print Dump \%parsed;
__DATA__
#test:testdescription
#cmd:binary
#return:0
#stdin:|
echo"toto"
echo"tata"
#stdout:|
toto
tata
#stderr:
Output:
---
cmd: binary
return: 0
stderr: []
stdin:
- echo"toto"
- echo"tata"
stdout:
- toto
- tata
test: testdescription
UPDATED as per user's colmments
If I understand the question correctly, you want to read one more line within a loop?
If so, you can either:
just do another line read inside the loop.
my $another_line = <TEST>;
Keep some state flag and use it next iteration of the loop, and accumulate lines between stdins in a buffer:
my $last_line_was_stdin = 0;
my #line_buffer = ();
while ($line = <TEST>) {
if (/^#stdin:|/) {
#
# Some Code to process all lines acccumulated since last "stdin"
#
#line_buffer = ();
$last_line_was_stdin = 1;
next;
}
push #line_buffer, $line;
}
This solution may not do 100% of what you need but it defines a pattern you need to follow in your state machine implementation: read a line. Check your current state (if it matters). Based on the current state and a pattern in the line, verify what do do about the current line (add to the buffer? change the state? If changing a state, process the buffer based on last state?)
Also, as per your comment, you have a bug in your regex - the pipe (| character) means "OR" in regex, so you are saying "if line starts with #stdin OR matches an empty regex" - the latter part is always true so your regex will match 100% of time. You need to escape the "|" via /^#stdin:\|/ or /^#stdin:[|]/