Suppose I have a file with these inputs:
line 1
line 2
line3
My program should only store "line1", "line2" and "line3" not the newlines. How do I achieve that?
My program already removed leading and trailing whitespaces but it doesn't help to remove newline.
I am setting $/ as \n because each input is separated by a \n.
while (<>) {
chomp;
next unless /\S/;
print "$_\n";
}
Set
$/ = q(); # that's an empty string, like "" or ''
while (<>) {
chomp;
...
}
The special value of the defined empty string is how you tell the input operator to treat one or more newlines as the terminator (preferring more), and also to get chomp to remove them all. That way each record always starts with real data.
Perl -n is the equivalent of wrapping while(<>) { } around your script. Assuming that all you need to do is eliminate blank lines, you can do it like this:
#! /usr/bin/perl -n
print unless ( /^$/ );
... On the other hand, if that's all you need to do, you might as well ditch perl and use
grep -n '^$'
Edit: your post says that you want to store values where lines are not blank... in that case, assuming that you don't have too much work to do in the rest of your script, you might do something like this:
#! /usr/bin/perl -n
my #values;
push #values, $_ unless ( /^$/ );
END {
# do whatever work you want to do here
}
... but this quickly reaches a point of limiting returns if you have very much code inside the END{} block.
Related
I have a variable $string and i want to print all the lines after I find a keyword in the line (including the line with keyword)
$string=~ /apple /;
I'm using this regexp to find the key word but I do not how to print lines after this keyword.
It's not really clear where your data is coming from. Let's assume it's a string containing newlines. Let's start by splitting it into an array.
my #string = split /\n/, $string;
We can then use the flip-flop operator to decide which lines to print. I'm using \0 as a regex that is very unlikely to match any string (so, effectively, it's always false).
for (#string) {
say if /apple / .. /\0/;
}
Just keep a flag variable, set it to true when you see the string, print if the flag is true.
perl -ne 'print if $seen ||= /apple/'
If your data in scalar variable we can use several methods
Recommended method
($matching) = $string=~ /([^\n]*apple.+)/s;
print "$matching\n";
And there is another way to do it
$string=~ /[^\n]*apple.+/s;
print $&; #it will print the data which is match.
If you reading the data from file, try the following
while (<$fh>)
{
if(/apple/)
{
print <$fh>;
}
}
Or else try the following one liner
perl -ne 'print <> and exit if(/apple/);' file.txt
I am an absolute beginner in perl and I am trying to extract lines of text between 2 strings on different lines but without success. It looks like I`m missing something in my code. The code should print out the file name and the found strings. Do you have any idea where could be the problem ? Many thanks indeed for your help or advice. Here is the example:
*****************
example:
START
new line 1
new line 2
new line 3
END
*****************
and my script:
use strict;
use warnings;
my $command0 = "";
opendir (DIR, "C:/Users/input/") or die "$!";
my #files = readdir DIR;
close DIR;
splice (#files,0,2);
open(MYOUTFILE, ">>output/output.txt");
foreach my $file (#files) {
open (CHECKBOOK, "input/$file")|| die "$!";
while ($record = <CHECKBOOK>) {
if (/\bstart\..\/bend\b/) {
print MYOUTFILE "$file;$_\n";
}
}
close(CHECKBOOK);
$command0 = "";
}
close(MYOUTFILE);
I suppose that you are trying to use a flip-flop here, which might work well for your input, but you've written it wrong:
if (/\bstart\..\/bend\b/) {
A flip-flop (the range operator) uses two statements, separated by either .. or .... What you want is two regexes joined with ..:
if (/\bSTART\b/ .. /\bEND\b/)
Of course, you also want to match the case (upper), or use the /i modifier to ignore case. You might even want to use beginning of line anchor ^ to only match at the beginning of a line, e.g.:
if (/^START\b/ .. /^END\b/)
You should also know that your entire program can be replaced with a one-liner, such as
perl -ne 'print if /^START\b/ .. /^END\b/' input/*
Alas, this only works for linux. The cmd shell in Windows does not glob, so you must do that manually:
perl -ne "BEGIN { #ARGV = map glob, #ARGV }; print if /^START\b/ .. /^END\b/" input/*
If you are having troubles with the whole file printing no matter what you do, I think the problem lies with your input file. So take a moment to study it and make sure it is what you think it is, for example:
perl -MData::Dumper -e"$Data::Dumper::Useqq = 1; print Dumper $_;" file.txt
If you're matching a multi-line string, you might need to tell the regexp about it:
if (/\bstart\..\/bend\b/s) {
note the s after the regex.
Perldoc says:
s
Treat string as single line. That is, change "." to match any
character whatsoever, even a newline, which normally it would not
match.
I'm using \D to not display digits but why the digits are being displayed using perl regular expressions?
Here's the content of the text2.tx file
1. Hello Brue this is a test.
2. Hello Lisa this is a test.
This is a test 1.
This is a test 2.
Here is the perl program.
#!/usr/bin/perl
use strict;
use warnings;
open READFILE,"<", "test2.txt" or die "Unable to open file";
while(<READFILE>)
{
if(/\D/)
{
print;
}
}
/\D/ just checks that the line has at least one non-digit character (including the newline...). Can you explain what you wanted to check? What output you were expecting?
If you want to only print lines that don't have a digit, you want to do:
if ( ! /\d/ )
(does the line not have a digit), not
if ( /\D/ )
(does the line have a non-digit).
Lets take a look at what is going on behind the scenes. Your while loop is equivalent to:
while(defined($_ = <READFILE>))
{
if($_ =~ /\D/)
{
print $_;
}
}
So, you are checking if the line contains a non-digit character (which it does) and then printing that line.
If you want to print Hello Brue this is a test. instead of 1. Hello Brue this is a test., then you would have to use something like:
while(<READFILE>) {
s/^\d+\. //;
print;
}
Also, it would make for more readable code if you used a variable rather than $_.
What you want is to reject lines that have a digit rather than match lines that have a non-digit (as you're doing)
while (<READFILE>) {
print unless /\d/;
}
This will print each line unless it has a digit on it.
Lets say I have the following lines:
1:a:b:c
2:d:e:f
3:a:b
4:a:b:c:d:e:f
how can I edit this with sed (or perl) in order to read:
1a1b1c
2d2e2f
3a3b
4a4b4c4d4e4f
I have done with awk like this:
awk -F':' '{gsub(/:/, $1, $0); print $0}'
but takes ages to complete! So looking for something faster.
'Tis a tad tricky, but it can be done with sed (assuming the file data contains the sample input):
$ sed '/^\(.\):/{
s//\1/
: retry
s/^\(.\)\([^:]*\):/\1\2\1/
t retry
}' data
1a1b1c
2d2e2f
3a3b
4a4b4c4d4e4f
$
You may be able to flatten the script to one line with semi-colons; sed on MacOS X is a bit cranky at times and objected to some parts, so it is split out into 6 lines. The first line matches lines starting with a single character and a colon and starts a sequence of operations for when that is recognized. The first substitute replaces, for example, '1:' by just '1'. The : retry is a label for branching too - a key part of this. The next substitution copies the first character on the line over the first colon. The t retry goes back to the label if the substitute changed anything. The last line delimits the entire sequence of operations for the initially matched line.
#!/usr/bin/perl
use warnings;
use strict;
while (<DATA>) {
if ( s/^([^:]+)// ) {
my $delim = $1;
s/:/$delim/g;
}
print;
}
__DATA__
1:a:b:c
2:d:e:f
3:a:b
4:a:b:c:d:e:f
use feature qw/ say /;
use strict;
use warnings;
while( <DATA> ) {
chomp;
my #elements = split /:/;
my $interject = shift #elements;
local $" = $interject;
say $interject, "#elements";
}
__DATA__
1:a:b:c
2:d:e:f
3:a:b
4:a:b:c:d:e:f
Or on the linux shell command line:
perl -aF/:/ -pe '$i=shift #F;$_=$i.join $i,#F;' infile.txt
I have a directory full of files containing records like:
FAKE ORGANIZATION
799 S FAKE AVE
Northern Blempglorff, RI 99xxx
01/26/2011
These items are being held for you at the location shown below each one.
IF YOU ASKED THAT MATERIAL BE MAILED TO YOU, PLEASE DISREGARD THIS NOTICE.
The Waltons. The complete DAXXXX12118198
Pickup at:CHUPACABRA LOCATION 02/02/2011
GRIMLY, WILFORD
29 FAKE LANE
S. BLEMPGLORFF RI 99XXX
I need to remove all entries with the expression Pickup at:CHUPACABRA LOCATION.
The "record separator" issue:
I can't touch the input file's formatting -- it must be retained as is. Each record
is separated by roughly 40+ new lines.
Here's some awk ( this works ):
BEGIN {
RS="\n\n\n\n\n\n\n\n\n+"
FS="\n"
}
!/CHUPACABRA/{print $0}
My stab with perl:
perl -a -F\n -ne '$/ = "\n\n\n\n\n\n\n\n\n+";$\ = "\n";chomp;$regex="CHUPACABRA";print $_ if $_ !~ m/$regex/i;' data/lib51.000
Nothing is returned. I'm not sure how to specify 'field separator' in perl except at the commandline. Tried the a2p utility -- no dice. For the curious, here's what it produces:
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
# process any FOO=bar switches
#$FS = ' '; # set field separator
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
$/ = "\n\n\n\n\n\n\n\n\n+";
$FS = "\n";
while (<>) {
chomp; # strip record separator
if (!/CHUPACABRA/) {
print $_;
}
}
This has to run under someone's Windows box otherwise I'd stick with awk.
Thanks!
Bubnoff
EDIT ( SOLVED ) **
Thanks mob!
Here's a ( working ) perl script version ( adjusted a2p output ):
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
# process any FOO=bar switches
#$FS = ' '; # set field separator
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
$/ = "\n"x10;
$FS = "\n";
while (<>) {
chomp; # strip record separator
if (!/CHUPACABRA/) {
print $_;
}
}
Feel free to post improvements or CPAN goodies that make this more idiomatic and/or perl-ish. Thanks!
In Perl, the record separator is a literal string, not a regular expression. As the perlvar doc famously says:
Remember: the value of $/ is a string, not a regex. awk has to be better for something. :-)
Still, it looks like you can get away with $/="\n" x 10 or something like that:
perl -a -F\n -ne '$/="\n"x10;$\="\n";chomp;$regex="CHUPACABRA";
print if /\S/ && !m/$regex/i;' data/lib51.000
Note the extra /\S/ &&, which will skip empty paragraphs from input that has more than 20 consecutive newlines.
Also, have you considered just installing Cygwin and having awk available on your Windows machine?
There is no need for (much)conversion if you can download gawk for windows
Did you know that Perl comes with a program called a2p that does exactly what you described you want to do in your title?
And, if you have Perl on your machine, the documentation for this program is already there:
C> perldoc a2p
My own suggestion is to get the Llama book and learn Perl anyway. Despite what the Python people say, Perl is a great and flexible language. If you know shell, awk and grep, you'll understand many of the Perl constructs without any problems.