Perl: extract rows from 1 to n (Windows) - perl

I want to extract rows 1 to n from my .csv file. Using this
perl -ne 'if ($. == 3) {print;exit}' infile.txt
I can extract only one row. How to put a range of rows into this script?

If you have only a single range and a single, possibly concatenated input stream, you can use:
#!/usr/bin/perl -n
if (my $seqno = 1 .. 3) {
print;
exit if $seqno =~ /E/;
}
But if you want it to apply to each input file, you need to catch the end of each file:
#!/usr/bin/perl -n
print if my $seqno = 1 .. 3;
close ARGV if eof || $seqno =~ /E/;
And if you want to be kind to people who forget args, add a nice warning in a BEGIN or INIT clause:
#!/usr/bin/perl -n
BEGIN { warn "$0: reading from stdin\n" if #ARGV == 0 && -t }
print if my $seqno = 1 .. 3;
close ARGV if eof || $seqno =~ /E/;
Notable points include:
You can use -n or -p on the #! line. You could also put some (but not all) other command line switches there, like ‑l or ‑a.
Numeric literals as
operands to the scalar flip‐flop
operator are each compared against
readline counter, so a scalar 1 ..
3 is really ($. == 1) .. ($. ==
3).
Calling eof with neither an argument nor empty parens means the last file read in the magic ARGV list of files. This contrasts with eof(), which is the end of the entire <ARGV> iteration.
A flip‐flop operator’s final sequence number is returned with a "E0" appended to it.
The -t operator, which calls libc’s isatty(3), default to the STDIN handle — unlike any of the other filetest operators.
A BEGIN{} block happens during compilation, so if you try to decompile this script with ‑MO=Deparse to see what it really does, that check will execute. With an INIT{}, it will not.
Doing just that will reveal that the implicit input loop as a label called LINE that you perhaps might in other circumstances use to your advantage.
HTH

What's wrong with:
head -3 infile.txt
If you really must use Perl then this works:
perl -ne 'if ($. <= 3) {print} else {exit}' infile.txt

You can use the range operator:
perl -ne 'if (1 .. 3) { print } else { last }' infile.txt

Related

Perl grep-like one-liner with regex match condition and assignment?

Say I have this file:
cat > testfile.txt <<'EOF'
test1line
test23line
test456line
EOF
Now, I want to use perl in a "grep like" manner (a one-liner expression/command with file argument, which dumps output to terminal/stdout), such that I match all of the numbers in the above lines, and I make them into zero-padded three digit representation. So, I tried this to check if matching generally works:
perl -nE '/(.*test)(\d+)(line.*)/ && print "$1 - $2 -$3\n";' testfile.txt
#test - 1 -line
#test - 23 -line
#test - 456 -line
Well, that works; so now, I was thinking, I'll just call sprintf to format, assign that to a variable, and print that variable instead, and I'm done; unfortunately, my attempt there failed:
perl -nE '/(.*test)(\d+)(line.*)/ && $b = sprintf("%03d", $2); print "$1 - $b -$3\n";' testfile.txt
#Can't modify logical and (&&) in scalar assignment at -e line 1, near ");"
#Can't modify pattern match (m//) in scalar assignment at -e line 1, near ");"
#Execution of -e aborted due to compilation errors.
Ok, so something went wrong there. As far as I can tell from the error messages, apparently mixing logical AND (&&) and assignments like in the above one-liner do not work.
So, how can I have a Perl one-liner where a regex-condition is checked; and if match is detected, a series of commands are executed, which may involve one or more assignments, and conclude with a print?
EDIT: found an invocation that works:
perl -nE '/(.*test)(\d+)(line.*)/ && printf("$1%03d$3\n", $2);' testfile.txt
#test001line
#test023line
#test456line
... but I'd still like to know how to do the same via sprintf and assignment to variable.
Precedence issue.
/.../ && $b = ...;
means
( /.../ && $b ) = ...;
You could use
/.../ && ( $b = ... );
/.../ && do { $b = ...; };
/.../ and $b = ...;
$b = ... if /.../;
But there's a second problem. You call print unconditionally.
perl -ne'printf "%s-%03d-%s\n", $1, $2, $3 if /(.*test)(\d+)(line.*)/'

How to pass both variables and files to a perl -p -e command

I have a command line written in perl that executes in Solaris (maybe this is irrelevant as it is UNIX-like) which inserts a "wait" string every 6 lines
perl -pe 'print "wait\n" if ($. % 6 == 0);' file
However, I want to replace that 6 by a parameter (ARGV[0]), resulting in something like this:
perl -pe 'print "wait\n" if ($. % ARGV[0] == 0);' file 6
It goes well, giving me the right output, until it finishes reading the file and treats "6" as the next file (even when it understood it as ARGV[0] before).
Is there any way to use the -p option and specify which parameters are files and which ones are not?
Edited: I thought there was a problem with using the -f option but as #ThisSuitIsBlackNot pointed out, I was using it wrongly.
-p, as a superset of -n, wraps the code with a while (<>) { } loop, which reads from the files named on the command line. You need to extract the argument before entering the loop.
perl -e'$n = shift; while (<>) { print "wait\n" if $. % $n == 0; print }' 6 file
or
perl -pe'BEGIN { $n = shift } print "wait\n" if $. % $n == 0' 6 file
Alternatively, you could also use an env var.
N=6 perl -pe'print "wait\n" if $. % $ENV{N} == 0' file

perl line count a file containing a specific text

Basically I want to count the number if lines which contain the word Out.
my $lc1 = 0;
open my $file, "<", "LNP_Define.cfg" or die($!);
#return [ grep m|Out|, <$file> ]; (I tried something with return to but also failed)
#$lc1++ while <$file>;
#while <$file> {$lc1++ if (the idea of the if statement is to count lines if it contains Out)
close $file;
print $lc1, "\n";
The command line might be potential option for you too:
perl -ne '$lc1++ if /Out/; END { print "$lc1\n"; } ' LNP_Define.cfg
The -n assumes a while loop for all your code before END.
The -e expects code surrounded by ' '.
The $lc1++ will count only if the following if statement is true.
The if statement runs per line looking for "Out".
The END { } statement is for processing after the while loops ends. Here is where you can print the count.
Or without the command line:
my $lc1;
while ( readline ) {
$lc1++ if /Out/;
}
print "$lc1\n";
Then run on the command line:
$ perl count.pl LNP_Define.cfg
Use index:
0 <= index $_, 'Out' and $lc1++ while <$file>;

Perl Print to File error [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am rather new to Perl and am trying to combine several .pm files into a single script. Most of the modules copy over just fine, but some have an error where the end of file is reached, but the script keeps printing. Here is an example of the code:
$copy_line = 0;
sysopen(FILE, $file_path, O_WRONLY | O_CREAT, 0711);
sysopen(MODULE, $module_path, O_RDONLY | O_EXCL);
while(<MODULE>)
{
my $line = $_;
if(($line ne "# START\n") and ($copy_line eq 0))
{
}
else
{
print FILE "$line";
$copy_line = 1;
}
}
close FILE;
close MODULE;
Each module has start and end tags to I do not copy any use statements, and so I know when to stop copying. An example of the module is
#!/usr/bin/perl
# START
some code to copy over
some more code to copy
even more code to copy
# END
What happens in some files is I see the end tag, followed by repeated code from the module. The output looks something like
# START
some code to copy over
some more code to copy
even more code to copy
# END
code to copy
even more code to copy
# END
What might be causing this?
Thanks,
-rusty
There are various things wrong with your script:
You didn't show the whole script; constants like O_WRONLY don't exist by default.
Therefore it may be that you didn't use strict; use warnings; at the beginning of your script. This is neccessary to get warned about errors or possible mistakes.
The strict mode requires you to declare all your variables. You can do so with the my keyword, e.g. my $copy_line = 0.
Never use sysopen, except when you fully understand how open works and why it wouldn't be the best choice for a given situation. Considering that I don't have that level of knowledge, I think we'll stick to the normal open.
The open takes a variable, a mode, and a filename, like
open my $file, "<", $filename;
I encourage you to use autodie for automatic error handling, otherwise you should do
open my $file, "<", $filename or die "Can't open $filename: $!";
where $! contains the reason for the error. You can specify various modes for open, which are modelled after shell redirection operators. Important are: < read, > write (or create), >> append, |- write pipe to command, -| read pipe from command.
The eq operator tests for string equality. If you want to test for numeric equality, use the == operator.
if (COND) {} else { STUFF } could rather be written unless (COND) {STUFF}.
You have successfully implemented some twisted logic that starts copying at the START marker. However, you don't stop at the END. For stuff like this, the flip-flop-operator .. can be used: It takes two operands, which are arbitrary expressions. The operator returns false until the first operand is true, and remains true until after the second operand returned true. If one operand is a constant integer, it is interpreted as a line number. Thus, the script
while (<>) {
print if 5 .. 10;
}
prints lines number 5–10 inclusive of the input.
For your problem, you should probably use regexes that match the start and the end marker:
while (<>) {
print if /^ \s* # \s* START/x .. /^ \s* # \s* END/x
}
I'll assume here that you know regexes, but I can add explanations if needed.
If the readline operator <> is used without an operand, it takes the command line arguments of the script, opens them, and reads them in sequence. If no arguments were provided, it uses STDIN.
This allows for flexible little scripts. The code can be summarized in the command-line oneliner
$ perl -ne'print if /^\s*#\s*START/../^\s*#\s*END/' INPUT-FILE1 INPUT-FILE2 >OUTPUT
There are two issues with this:
It prints out the start/end markers as well
If a file doesn't contain an # END, the next files will be printed out in full until the next # END is found.
We can mitigate issue #2 by testing for the end of file in the termination condition:
print if /^\s*#\s*START/ .. (/^\s*#\s*END/ or eof);
Issue #1 is slightly more complex; I'd reintroduce a flag for that:
my $print_this = 0;
while (<>) {
if (/^\s*#\s*END/ or eof) {
$print_this = 0;
} elsif ($print_this) {
print;
} elsif (/^\s*#\s*START/) {
$print_this = 1;
}
}
Partial test case:
$ perl -e'
my $print_this = 0;
while (<>) {
if (/^\s*#\s*END/ or eof) { $print_this = 0 }
elsif ($print_this) { print }
elsif (/^\s*#\s*START/) { $print_this = 1 }
}' <<'__END__'
no a 1
no a 2
# START
yes b 1
yes b 2
yes b 3
#END
no c 1
no c 2
# START
yes d 1
# END
no e 1
__END__
Output:
yes b 1
yes b 2
yes b 3
yes d 1
If you're copying files without modifying their contents, you should look into File::Copy http://perldoc.perl.org/File/Copy.html
File::Copy is a standard module and is installed along with Perl. For a list of standard modules, see perldoc perlmodlib http://perldoc.perl.org/perlmodlib.html#Standard-Modules

How to compress 4 consecutive blank lines into one single line in Perl

I'm writing a Perl script to read a log so that to re-write the file into a new log by removing empty lines in case of seeing any consecutive blank lines of 4 or more. In other words, I'll have to compress any 4 consecutive blank lines (or more lines) into one single line; but any case of 1, 2 or 3 lines in the file will have to remain the format. I have tried to get the solution online but the only I can find is
perl -00 -pe ''
or
perl -00pe0
Also, I see the example in vim like this to delete blocks of 4 empty lines :%s/^\n\{4}// which match what I'm looking for but it was in vim not Perl. Can anyone help in this? Thanks.
To collapse 4+ consecutive Unix-style EOLs to a single newline:
$ perl -0777 -pi.bak -e 's|\n{4,}|\n|g' file.txt
An alternative flavor using look-behind:
$ perl -0777 -pi.bak -e 's|(?<=\n)\n{3,}||g' file.txt
use strict;
use warnings;
my $cnt = 0;
sub flush_ws {
$cnt = 1 if ($cnt >= 4);
while ($cnt > 0) {print "\n"; $cnt--; }
}
while (<>) {
if (/^$/) {
$cnt++;
} else {
flush_ws();
print $_;
}
}
flush_ws();
Your -0 hint is a good one since you can use -0777 to slurp the whole file in -p mode. Read more about these guys in perlrun So this oneliner should do the trick:
$ perl -0777 -pe 's/\n{5,}/\n\n/g'
If there are up to four new lines in a row, nothing happens. Five newlines or more (four empty lines or more) are replaced by two newlines (one empty line). Note the /g switch here to replace not only the first match.
Deparsed code:
BEGIN { $/ = undef; $\ = undef; }
LINE: while (defined($_ = <ARGV>)) {
s/\n{5,}/\n\n/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
HTH! :)
One way using GNU awk, setting the record separator to NUL:
awk 'BEGIN { RS="\0" } { gsub(/\n{5,}/,"\n")}1' file.txt
This assumes that you're definition of empty excludes whitespace
This will do what you need
perl -ne 'if (/\S/) {$n = 1 if $n >= 4; print "\n" x $n, $_; $n = 0} else {$n++}' myfile