In perl pattern matching..how to exclude the \n character from pattern - perl

I am new to perl and writing my first few programs and using its pattern matching abilities. I am reading a file into array like this:
#list=<file>
Then indexing each line of array by $list[0..9] etc, and when I match it against a pattern, the $list[0] includes \n character, hence the match fails. So if ($string =~ $list[0]) fails though without \n character in pattern it would match.
How do I tell pattern matcher to not consider the \n character from pattern?
Thanks

You can shave the line ends from the array after reading:
#lines = …;
chomp #lines;
Now #lines contains the lines without line ends. See perldoc chomp for details.

If you want to remove the \n from your lines you can:
chomp $list[0]
see perldoc -f chomp for the details.

This is a good opportunity to get to know how Perl modules work.
You can for example use Perl6::Slurp which will both a) parse the file b) put the contents in an array c) remove the newline characters for you.
For example:
use Perl6::Slurp;
my #lines = slurp '<:utf8', 'filename', {chomp=>"\n"}

This will match with the \n:
if ( $list[0] =~ "$string\n")
Or if you want the \n to be optional:
if ( $list[0] =~ /$string\n?/ )

Related

Perl using regex to compare fields with multiple delimiters

I am studying Perl.
My data.txt file contains:
Lori:James Apple
Jamie:Eric Orange
My code below prints the first line "Lori:James Apple"
open(FILE,'data.txt');
while(<FILE>){
print if /James/;
}
But how do I modify my regular expression to search for a specific field?
For example, I'd like to use 2 delimiters ' ' and ':' to make each line contain 3 fields and check if the 3rd field of the first line is Apple. Which will be equivalent to awk -F'[ :]' '$3 = "Lori"' data.txt
One simple way with regex is to use the negated character class (also see it in perlreftut)
open my $fh, '<', $file or die "Can't open $file: $!";
while (my $line = <$fh>)
{
my #fields = $line =~ /([^:\s]+)/g;
}
The [^...] matches any character other than those listed inside (after ^ which "negates"). The + quantifier means to match one-or-more times so the whole pattern matches a string of consecutive characters other than : and "white space." See docs for a precise description of \s. If you actually mean to skip only a single literal space use [^: ]. All this is captured by ().
The search keeps going through the string due to the global modifier /g, finding all such matches. Since it is in the list context it returns the list of matches, which is assigned to #fields array.
One can pick elements "on the fly" by indexing into the list, ($line =~ /([^:\s]+)/g)[2]. If we are matching $_ this is (/([^:\s]+)/g)[2].
I suggest a good read through perlreftut, for starters.
On the other hand, it is often simpler and clearer to use split
my #fields = split /[:\s]/, $line;
This also uses regex for the pattern by which to split the string. The character class is not negated since here it specifies the delimiter itself, either : or \s (each delimiter may be either of these, they don't have to all be the same).
I would now like to answer the specific question, but the question isn't clear to me.
It asks to "check if the 3rd field of the first line is Apple", what can be done for example by
while (<$fh>)
{
if ( (/([^:\s]+)/g)[2] eq 'Apple' ) {
# ....
}
}
but it isn't clear what to do with it. Perhaps get the first field by what the third one is?
I suggest to get an array and then process. One can write a regex to identify and pick fields directly but that's more brittle and the regex itself then depends on the position (and number) of fields.
At this point we are in a guessing game. If you need more detail please clarify.
The given awk code would yield Lori James Lori and I don't see how that fits.
The short answer is - don't. Regular expressions are about pattern matching, and not context.
You can define a pattern that builds in delimiters and fields, but ... it's not the right tool for the job.
The answer is use split and then handle the fields separately.
open ( my $input, '<', 'data.txt' ) or die $!;
while(<$input>){
chomp;
my #fields = split /[\s:]/;
print if $fields[2] eq "Apple";
}
You can compact this further if you wish, but I'd advise caution - compressing your code at the expense of readability isn't a virtue.
Also - whilst we're at it:
open(FILE,'data.txt');
is bad style - it doesn't check for success, and it also uses a global file handle name. It would be much better to:
open ( my $input, '<', 'data.txt' ) or die $!;
The autodie pragma also does this implicitly.

Print Line By Line

I've been trying to work on a lyrical bot for my server, but before I started to work on it, I wanted to give it a test so I came up with this script using the Lyrics::Fetcher module.
use strict;
use warnings;
use Lyrics::Fetcher;
my ($artist, $song) = ('Coldplay', 'Adventures Of A Lifetime');
my $lyrics = Lyrics::Fetcher->fetch($artist, $song, [qw(LyricWiki AstraWeb)]);
my #lines = split("\n\r", $lyrics);
foreach my $line (#lines) {
sleep(10);
print $line;
}
This script works fine, it grabs the lyrics and prints it out in a whole(which is not what I'm looking for).
I was hoping to achieve a line by line print of the lyrics every 10 seconds. Help please?
Your call to split looks suspicious. In particular the regex "\n\r". Note, the first argument to split is always interpreted as a regex regardless of whether you supply a quoted string.
On Unix systems the line ending is typically "\n". On DOS/Windows it's "\r\n" (the reverse of what you have). On ancient Macs it was "\r". To match all thre you could do:
my #lines = split(/\r\n|\n|\r/, $lyrics);
You will need to enable autoflush, otherwise the lines will just be buffered and printed when the buffer is full or when the program terminates
STDOUT->autoflush;
You can use the regex generic newline pattern \R to split on any line ending, whether your data contains CR, LF, or CR LF. This feature is available only in Perl v5.10 or better
my #lines = split /\R/, $lyrics;
And you will need to print a newline after each line of lyrics, because the split will have removed them
print $line, "\n";

How to remove all lines except .c extention(at last) lines using perl scripting

I've a string $string which has got list of lines, some ending with *.c, *.pdf,etc and few without any extensions(these are directories). I need to remove all lines except *.c lines. How can i do that using regular expression? I've written to get removed *.c files as below but how to do a not of it?
next if $line =~ /(\.c)/i;
Any ideas.
thanks,
Sharath
Use unless instead of if to reverse the sense of the condition.
next unless $line =~ /\.c$/i;
or simply invert the test:
next if $line !~ /\.c$/i;
Also, you don't need parentheses around the regexp, and you need $ to anchor it to the end of the line.

How to match \n at the end of variable using regexp

I stored a file in an variable, say $buffer. There is a "\n" at the end of $buffer. I want to replace it with a empty value. I tried
regexp "\n$" $buffer
Not working. The code is in TCL, but I need to know how we can do it in either Perl or TCL.
string trim $buffer \n
See the manual.
In Perl chomp removes the end-of-record separator. So, to remove the '\n' all you need in is chomp $buffer.
How about: regsub {\n$} $buffer ""
chomp is probably best, as #Borodin said, but you can also use \z to match only at the end of the string:
$buffer =~ s/\n\z//;
In Perl:
$buffer =~ s/\n$//;
=~ is the binding operator, s is the substitution operator, the / are delimiters for the operands, so \n$ is replaced by the empty string.
chomp($buffer) will accomplish the same thing.
Check whether this works.
regsub "\n$" "" $buffer
For further reading

Need to print the last occurrence of a string in Perl

I have a script in Perl that searches for an error that is in a config file, but it prints out any occurrence of the error. I need to match what is in the config file and print out only the last time the error occurred. Any ideas?
Wow...I was not expecting this much of a response. I should've been more clear in stating this is for log monitoring on a windows box that sends an alert to Nagios. This is actually my first Perl program and all this information has been very helpful. Does anyone know how I can apply this any of the tail answers on a wintel box?
Another way to do it:
perl -n -e '$e = $1 if /(REGEX_HERE)/; END{ print $e }' CONFIG_FILE_HERE
What exactly do you need to print? The line containing the error? More context than that?
File::ReadBackwards can be helpful.
In outline:
my $errinfo;
while (<>)
{
$errinfo = "whatever" if (m/the error pattern/);
}
print "error: $errinfo\n" if ($errinfo);
This catches all errors, but doesn't print until the end, when only the last one survives.
A brute-force approach involves setting up your own pipeline by pointing STDOUT to tail. This allows you to print all errors, and then it's up to tail to worry about only letting the last one out.
You didn't specify, so I assume a legal config line is of the form
Name = some value
Matching that is straightforward:
^ (starting at the beginning of line)
\w+ (one or more “word characters”)
\s+ (followed by mandatory whitespace)
= (followed by an equals sign)
\s+ (more mandatory whitespace)
.+ (some mandatory value)
$ (finishing at the end of the line)
Gluing it together, we get
#! /usr/bin/perl
use warnings;
use strict;
# for demo only
*ARGV = *DATA;
my $pid = open STDOUT, "|-", "tail", "-1" or die "$0: open: $!";
while (<>) {
print unless /^ \w+ \s+ = \s+ .+ $/x;
}
close STDOUT or warn "$0: close: $!";
__DATA__
This = assignment is ok
But := not this
And == definitely not this
Output:
$ ./lasterr
And == definitely not this
With regular expressions, when you want the last occurrence of a pattern, place ^.* at the front of your pattern. For example, to replace the last X in the input with Y, use
$ echo XABCXXXQQQXX | perl -pe 's/^(.*)X/$1Y/'
XABCXXXQQQXY
Note that the ^ is redundant because regular-expression quantifiers are greedy, but I like having it there for emphasis.
Applying this technique to your problem, you can search for the last line in your config file that contains an error as in the following program:
#! /usr/bin/perl
use warnings;
use strict;
local $_ = do { local $/; scalar <DATA> };
if (/\A.* ^(?! \w+ \s+ = \s+ [^\r\n]+ $) (.+?)$/smx) {
print $1, "\n";
}
__DATA__
This = assignment is ok
But := not this
And == definitely not this
The syntax of the regular expression is a bit different because $_ contains multiple lines, but the principle is the same. \A is similar to ^, but it matches only at the beginning of string to be searched. With the /m switch (“multi-line”), ^ matches at logical line boundaries.
Up to this point, we know the pattern
/\A.* ^ .../
matches the last line that looks like something. The negative look-ahead assertion (?!...) looks for a line that is not a legal config line. Ordinarily . matches any character except newline, but the /s switch (“single line”) lifts this restriction. Specifying [^\r\n]+, that is, one or more characters that are neither carriage return nor line feed, does not allow the match to spill into the next line.
Look-around assertions do not capture, so we grab the offending line with (.+?)$. The reason it's safe to use . in this context is because we know the current line is bad and the non-greedy quantifier +? stops matching as soon as it can, which in this case is the end of the current logical line.
All these regular expressions use the /x switch (“extended mode”) to allow extra whitespace: the aim is to improve readability.