How to compress 4 consecutive blank lines into one single line in Perl - perl

I'm writing a Perl script to read a log so that to re-write the file into a new log by removing empty lines in case of seeing any consecutive blank lines of 4 or more. In other words, I'll have to compress any 4 consecutive blank lines (or more lines) into one single line; but any case of 1, 2 or 3 lines in the file will have to remain the format. I have tried to get the solution online but the only I can find is
perl -00 -pe ''
or
perl -00pe0
Also, I see the example in vim like this to delete blocks of 4 empty lines :%s/^\n\{4}// which match what I'm looking for but it was in vim not Perl. Can anyone help in this? Thanks.

To collapse 4+ consecutive Unix-style EOLs to a single newline:
$ perl -0777 -pi.bak -e 's|\n{4,}|\n|g' file.txt
An alternative flavor using look-behind:
$ perl -0777 -pi.bak -e 's|(?<=\n)\n{3,}||g' file.txt

use strict;
use warnings;
my $cnt = 0;
sub flush_ws {
$cnt = 1 if ($cnt >= 4);
while ($cnt > 0) {print "\n"; $cnt--; }
}
while (<>) {
if (/^$/) {
$cnt++;
} else {
flush_ws();
print $_;
}
}
flush_ws();

Your -0 hint is a good one since you can use -0777 to slurp the whole file in -p mode. Read more about these guys in perlrun So this oneliner should do the trick:
$ perl -0777 -pe 's/\n{5,}/\n\n/g'
If there are up to four new lines in a row, nothing happens. Five newlines or more (four empty lines or more) are replaced by two newlines (one empty line). Note the /g switch here to replace not only the first match.
Deparsed code:
BEGIN { $/ = undef; $\ = undef; }
LINE: while (defined($_ = <ARGV>)) {
s/\n{5,}/\n\n/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
HTH! :)

One way using GNU awk, setting the record separator to NUL:
awk 'BEGIN { RS="\0" } { gsub(/\n{5,}/,"\n")}1' file.txt
This assumes that you're definition of empty excludes whitespace

This will do what you need
perl -ne 'if (/\S/) {$n = 1 if $n >= 4; print "\n" x $n, $_; $n = 0} else {$n++}' myfile

Related

Print each line of a file

I have a file test.txt that reads as follows:
one
two
three
Now, I want to print each line of this file as follows:
.one (one)
.two (two)
.three (three)
I try this in Perl:
#ARGV = ("test.txt");
while (<>) {
print (".$_ \($_\)");
}
This doesn't seem to work and this is what I get:
.one
(one
).two
(two
).three
(three
)
Can some help me figure out what's going wrong?
Update :
Thanks to Aureliano Guedes for the suggestion.
This 1-liner seems to work :
perl -pe 's/([^\s]+)/.$1 ($1)/'
$_ will include the newline, e.g. one\n, so print ".$_ \($_\)" becomes something like print ".one\n (one\n).
Use chomp to get rid of them, or use s/\s+\z// to remove all trailing whitespace.
while (<>) {
chomp;
print ".$_ ($_)\n";
}
(But add a \n to print the newline that you do want.)
Besides the correct answer already given, you can do this in a oneliner:
perl -pe 's/(.+)/.$1 ($1)/'
Or if you prefer a while loop:
while (<>) {
s/(.+)/.$1 ($1)/;
print;
}
This simply modifies your current line to your desired output and prints it then.
Another Perl one-liner without using regex.
perl -ple ' $_=".$_ ($_)" '
with the given inputs
$ cat test.txt
one
two
three
$ perl -ple ' $_=".$_ ($_)" ' test.txt
.one (one)
.two (two)
.three (three)
$

Why does the following oneliner skip the first line?

Here is the example:
$cat test.tsv
AAAATTTTCCCCGGGG foo
GGGGCCCCTTTTAAAA bar
$perl -wne 'while(<STDIN>){ print $_;}' <test.tsv
GGGGCCCCTTTTAAAA bar
This should work like cat and not like tail -n +2. What is happening here? And what the correct way?
The use of the -n option creates this (taking from man perlrun):
while (<STDIN>) {
while(<STDIN>){ print $_;} #< your code
}
This shows two while(<STDIN>) instances. They both take all available inputs from STDIN, breaking at newlines.
When you run with a test.tsv which is at least two lines long, the first (outer) use of while(<STDIN>) takes the first line, and the second (inner) one takes the second line - so your print statement is first passed the second line.
If you had more than two lines in test.tsv then the inner loop would print out all lines from the second line onwards.
The correct way to make this work is simply to rely on the -n option you pass to perl:
perl -wne 'print $_;' < test.tsv
Because the -n switch implicitly puts your code inside a loop which goes through the file line by line. Remove the 'n' from the list of switches, or (even better) remove your loop from the code, leave only the print command there.
nbokor#nbokor:~/tmp$ perl -wne 'print $_;' <test.csv
AAAATTTTCCCCGGGG foo
GGGGCCCCTTTTAAAA bar
Remove -n command line option. It duplicates while(<STDIN>){ ... }.
$perl -MO=Deparse -wne 'while(<STDIN>){ print $_;}'
BEGIN { $^W = 1; }
LINE: while (defined($_ = <ARGV>)) {
while (defined($_ = <STDIN>)) {
print $_;
}
}
-e syntax OK

How to add new number into each line?

I have this line about 500 times in a my file backup.xml
my-company-review/</link>
Is there a way through command line, perl, etc. to add a number into the line after the word review. For example, something like this:
my-company-review1/</link>
my-company-review2/</link>
my-company-review3/</link>
Thanks in advance for the help!
Why not use Perl, as I suggested with your last problem. Once again, this is a sort of hack solution, that only works if there's a maximum of one replacement per line... But it's a quick throw-away program.
perl -e '$count=1; foreach (<>) { s/(my-company-review)(\/<\/link>)/$1$count$2/ && $count++; print; }'
An extra loop will do multiple substitutions on a line:
perl -e '$count=1; foreach (<>) { while(s/(my-company-review)(\/<\/link>)/$1$count$2/) {$count++;} print; }'
That awk solution looks way nicer =)
Here's one way:
perl -i -wpe ' BEGIN { $count = 1; }
++$count
if s{(my-company-review)(/</link>)}{$1$count$2};
' backup.xml
(Disclaimer: not tested.)
You can use awk:
awk 'gsub("/</link>", NR "/</link>")' infile
or perl:
perl -ne 's:/</link>:$./</link>:; print' infile

Perl: extract rows from 1 to n (Windows)

I want to extract rows 1 to n from my .csv file. Using this
perl -ne 'if ($. == 3) {print;exit}' infile.txt
I can extract only one row. How to put a range of rows into this script?
If you have only a single range and a single, possibly concatenated input stream, you can use:
#!/usr/bin/perl -n
if (my $seqno = 1 .. 3) {
print;
exit if $seqno =~ /E/;
}
But if you want it to apply to each input file, you need to catch the end of each file:
#!/usr/bin/perl -n
print if my $seqno = 1 .. 3;
close ARGV if eof || $seqno =~ /E/;
And if you want to be kind to people who forget args, add a nice warning in a BEGIN or INIT clause:
#!/usr/bin/perl -n
BEGIN { warn "$0: reading from stdin\n" if #ARGV == 0 && -t }
print if my $seqno = 1 .. 3;
close ARGV if eof || $seqno =~ /E/;
Notable points include:
You can use -n or -p on the #! line. You could also put some (but not all) other command line switches there, like ‑l or ‑a.
Numeric literals as
operands to the scalar flip‐flop
operator are each compared against
readline counter, so a scalar 1 ..
3 is really ($. == 1) .. ($. ==
3).
Calling eof with neither an argument nor empty parens means the last file read in the magic ARGV list of files. This contrasts with eof(), which is the end of the entire <ARGV> iteration.
A flip‐flop operator’s final sequence number is returned with a "E0" appended to it.
The -t operator, which calls libc’s isatty(3), default to the STDIN handle — unlike any of the other filetest operators.
A BEGIN{} block happens during compilation, so if you try to decompile this script with ‑MO=Deparse to see what it really does, that check will execute. With an INIT{}, it will not.
Doing just that will reveal that the implicit input loop as a label called LINE that you perhaps might in other circumstances use to your advantage.
HTH
What's wrong with:
head -3 infile.txt
If you really must use Perl then this works:
perl -ne 'if ($. <= 3) {print} else {exit}' infile.txt
You can use the range operator:
perl -ne 'if (1 .. 3) { print } else { last }' infile.txt

How can I swap two consecutive lines in Perl?

I have this part of a code for editing cue sheets and I don't know how to reverse two consecutive lines if found:
/^TITLE.*?"$/
/^PERFORMER.*?"$/
to reverse to
/^PERFORMER.*?"$/
/^TITLE.*?"$/
What would it be the solution in my case?
use strict;
use warnings;
use File::Find;
use Tie::File;
my $dir_target = 'test';
find(\&c, $dir_target);
sub c {
/\.cue$/ or return;
my $fn = $File::Find::name;
tie my #lines, 'Tie::File', $fn or die "could not tie file: $!";
for (my $i = 0; $i < #lines; $i++) {
if ($lines[$i] =~ /^REM (DATE|GENRE|REPLAYGAIN).*?$/) {
splice(#lines, $i, 3);
}
if ($lines[$i] =~ /^\s+REPLAYGAIN.*?$/) {
splice(#lines, $i, 1);
}
}
untie #lines;
}
This may seem like overkill, but seeing that your files aren't very large, I'm tempted to leverage the following one-liner (either from the command line or via a system call).
The one-liner works by slurping all the lines in one shot, then leaving the rest of the work to a regex substitution which flips the order of the lines.
If you're using *nix:
perl -0777 -i -ne 's/(TITLE.*?")\n(PERFORMER.*?")/$2\n$1/g' file1 file2 ..
If you're using Windows, you'll need to create a backup of the existing files:
perl -0777 -i.bak -ne "s/(TITLE.*?\")\n(PERFORMER.*?\")/$2\n$1/g" file1 file2 ..
Explanation
Command Switches (see perlrun for more info)
-0777 (an octal number) enforces file-slurping behavior
-i enables in-place editing (no need to splice-'n'-dice!). Windows systems require that you provide a backup extension, hence the additional .bak
-n loops over all lines in your file(s) (although since you're slurping them in, Perl treats the contents of each file as one line)
-e allows Perl to recognize code within the command-line
Regex
The substitution regex captures all occurrences of the TITLE line, the consecutive PERFORMER line, and stores it in variables $1 and $2 respectively. The substitution regex then flips the order of the two variables, separated with a newline.
Filename Arguments
You could use *nix to provide the filenames of the directories in question, but I'll leave that to someone else to figure out as I'm not too comfortable with Unix pipes just yet (see this John Siracusa answer for more guidance).
I would create a backup of your files before you try these one-liners though.
Well, since you're tying into an array, I'd just check $lines[$i] and $lines[$i+1] (as long as the +1 address exists, that is), and if the former matches TITLE and the latter PERFORMER, swap them. Unless perhaps you need to transpose these even if they're not consecutive??
Here's an option (this snippet would go inside your for loop, perhaps above the REM-checking line) if you know they'll be consecutive:
if ($i < $#lines and $lines[$i] =~ /^TITLE.*?"$/
and $lines[$i+1] =~ /^PERFORMER.*?$/) {
my $tmp = $lines[$i];
$lines[$i] = $lines[$i+1];
$lines[$i+1] = $tmp;
}
Another option (which would work regardless of consecutiveness, and is arguably more elegant) would be to
use List::MoreUtils qw(first_index);
(up at the top, with your other use statements) and then do (inside &c, but outside the for loop):
my $title_idx = first_index { /^TITLE.*?"$/ } #lines;
my $performer_idx = first_index { /^PERFORMER.*?"$/ } #lines;
if($title_idx >= 0 and $performer_idx >= 0 and $title_idx < $performer_idx)
{
# swap these lines:
($lines[$title_idx],$lines[$performer_idx]) =
($lines[$performer_idx],$lines[$title_idx]);
}
Is that what you're after?