How can I extract a line or row? - perl

How can I extract the whole line in a row, for example, row 3.
These data are saved in my text editor in linux.
Here's my data:
1,julz,kath,shiela,angel
2,may,ann,janice,aika
3,christal,justine,kim
4,kris,allan,jc,mine
I want output like:
3,christal,justine,kim

The following snippet reads in the first three lines, prints only the third then exits to ensure that no unnecessary processing takes place.
Without the exit, the script would continue to process the input file despite you knowing that you have no use for it.
perl -ne 'if ($. == 3) {print;exit}' infile.txt
As perlvar points out, $. is the current line number for the last file handle accessed.

$ perl -ne'print if $. == 3' your_file.txt
Below is a script version of #ysth's answer:
$ perl -mTie::File -e'tie #lines, q(Tie::File), q(your_file.txt);
> print $lines[2]'

If it's always the third line:
perl -ne 'print if 3..3' <infile >outfile
If it's always the one that has a numeric value of "3" as the first column:
perl -F, -nae 'print if $F[0] == 3' <infile >outfile # thanks for the comment doh!
Since you didn't say how you were identifying that line, I am providing alternatives.

For a more general solution:
open my $fh, '<', 'infile.txt';
while (my $line = <$fh>) {
print $line if i_want_this_line($line);
}
where i_want_this_line implements the criteria defining which line(s) you want.

Um, the -n answers are assuming the question is "what is a script that...". In which case, perl isn't even the best answer. But I don't read that into the question.
In general, if the lines are not of fixed length, you have to read through a file line by
line until you get to the line you want. Tie::File automates this process for you (though since the code it would replace is so trivial, I rarely bother with it, myself).
use Tie::File;
use Fcntl "O_RDONLY";
tie my #line, "Tie::File", "yourfilename", mode => O_RDONLY
or die "Couldn't open file: $!";
print "The third line is ", $line[2];

You can assign the diamond operator on your filehandle to a list, each element will be a line or row.
open $fh, "myfile.txt";
my #lines = <$fh>;
EDIT: This solution grabs all the lines so that you can access any one you want, e.g. row 3 would be $lines[2] ... If you really only want one specific line, that'd be a different solution, like the other answerers'.

Related

search a paragraph in reverse order without array in perl

i have a log file where the errors will be mentioned as "ERROR" in the beginning of the line and next line will have the detailed text about the error. I would like to search for "ERROR" in the reverse order so i can find the last error and print the next line or copy is the line to a variable.
In shell i can try the below command which will help me to achieve the same. Can some one give me a equivalent perl code.
grep -A2 ERROR sapinst.log | tail -2
As the log file will be huge (~5000+ lines), so I don't want to store it in an array.
Your file size is rather small and perl is pretty quick, so I wouldn't worry about reverse order that much. This little program reads lines of input from the files you specify on the command line (or standard input if you specify none), skips lines until it finds ERROR, then prints that and the next line:
#!perl
while( <> ) {
next unless /ERROR/;
print;
print scalar <>;
}
From there you can use tail if you like. This Perl does the same as the grep you posted (although since you already have that solution I wonder why you want a different one).
If you don't want to use tail, keep track of the two lines you'll output and replace them when you find a new set:
my( $error_line, $next_line );
while( <> ) {
next unless /ERROR/;
$error_line = $_;
$next_line = scalar <>;
}
print $error_line, $next_line;
If you have a recent enough perl, you can use the safer double diamond line input operator:
use v5.22;
my( $error_line, $next_line );
while( <<>> ) {
next unless /ERROR/;
$error_line = $_;
$next_line = scalar <<>>;
}
print $error_line, $next_line;
You can use File::ReadBackwards, but you'll have to do the same task by remembering every line then checking if the previous line had ERROR. For you data sizes, the benefit probably isn't apparent. If the simple solution isn't fast enough, it's time to get fancier (but not before then).

Remove multiple duplicate lines from a file

I have a Perl script run in crontab that generates a file rich with duplicate entries, because on each run it rewrites information previously written.
I would use a sort -u of file, but, I would do it at the end of the Perl script file.
My list
10/10/2017 00:01:39:000;Sagitter
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
10/12/2017 00:09:00:459;Sagitter
10/13/2017 01:11:03:009;Lupus
12/13/2017 04:29:00:609;Ariet
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
...
My code
#!/usr/bin/perl
# Libraries
use strict;
use warnings 'all';
%lines = ();
# Remove duplicate
open( TMP_GL_OUTPUT, '>', $OUTPUT_FILE ) or die $!;
while ( <TMP_GL_OUTPUT> ) {
$lines{$_}++;
}
open( OUTFILE, '>', $TMPOUTPUT_FILE ) or die $!;
print OUTFILE keys %lines;
close( OUTFILE );
close( TMP_GL_OUTPUT );
Where am I going wrong? In shell it feels shorter than in Perl.
sort -u $TMPOUTPUT_FILE > $OUTPUT_FILE
As Suggested by ikegamy user, I've do as following:
move $OUTPUT_FILE, $TMPOUTPUT_FILE; # Copy file
run [ 'sort', '-u', '--', $TMPOUTPUT_FILE ], '>', $OUTPUT_FILE; # Remove duplicate
unlink $TMPOUTPUT_FILE;
I think you are asking why your Perl program is longer than your shell script.
First of all, your shell script does something completely different than your Perl program.
Your shell script executes a program, and stores its out in a file.
Your Perl program reads a file, manipulates the data it read, and stores the output in a file.
The Perl equivalent to
sort -u -- "$TMPOUTPUT_FILE" > "$OUTPUT_FILE"
is
use IPC::Run qw( run );
run [ 'sort', '-u', '--', $TMPOUTPUT_FILE ], '>', $OUTPUT_FILE;
(There are differences in error handling between these two.)
They're not that different in length.
This brings up the second difference. The shell specializes in executing programs, but Perl is a general purpose language. It would be surprising if it wasn't longer in Perl!
(Now try comparing the size of your Perl program to the source of sort...)
List::Util is a core module.
use List::Util 'uniq';
print for uniq <>
Your code looks almost OK.
My proposition is only to chomp each line, before you
save an element in the hash.
The reason is that e.g. the last line, not terminated
with a \n may look just the same as one of previous lines,
but without chomp the previous line would have contained
the terminating \n, whereas the last - not.
The resut is that both these lines will be different keys in the hash.
Compare my example program (working, presented below) with yours, there are
no other significant differences, apart from reading from __DATA__ and
writing to the console.
In my program, for demonstration purposes, I put 2 variants of printout,
one with key values (repetition counts) and another, printing just keys.
In your program leave only the second printout.
use strict; use warnings; use feature qw(say);
my %lines;
while(<DATA>) {
chomp;
$lines{$_}++;
}
while(my($key, $val) = each %lines) {
printf "%-32s / %d\n", $key, $val;
}
say '========';
foreach my $key (keys %lines) {
say $key;
}
__DATA__
10/10/2017 00:01:39:000;Sagitter
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
10/12/2017 00:09:00:459;Sagitter
10/13/2017 01:11:03:009;Lupus
12/13/2017 04:29:00:609;Ariet
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
Edit
Your code assigns no names to $OUTPUT_FILE and $TMPOUTPUT_FILE,
you even didn't declare these variables, but I assume, that in your actual
code you did it.
Another detail is that %lines should be preceded with my,
otherwise, as you put use strict; the compiler prints an error.
Edit 2
There is a quicker and shorter solution than yours.
Instead of writing lines to a hash and printing them as late as in
the second step, you can do it in a single loop:
Read the line.
Check whether the hash already contains a key equal to the line just read.
If not, then:
write the line to the hash, to block the printout, if just the
same line occured again,
print the line.
You can even write this program as a Perl one-liner:
perl -lne"print if !$lines{$_}++" input.txt
If you run the above command from the Windows cmd, it will print the output
to the console. If you use Linux, instead of double quotes, you can use apostrophes.
You may of course redirect the output to any file, adding > output.txt to
the above command.
The code is executed for each input line, chomped due to -l option.
If any other details concerning Perl one-liners are not known to you, search the web.

Awk's output in Perl doesn't seem to be working properly

I'm writing a simple Perl script which is meant to output the second column of an external text file (columns one and two are separated by a comma).
I'm using AWK because I'm familiar with it.
This is my script:
use v5.10;
use File::Copy;
use POSIX;
$s = `awk -F ',' '\$1==500 {print \$2}' STD`;
say $s;
The contents of the local file "STD" is:
CIR,BS
60,90
70,100
80,120
90,130
100,175
150,120
200,260
300,500
400,600
500,850
600,900
My output is very strange and it prints out the desired "850" but it also prints a trailer of the line and a new line too!
ka#man01:$ ./test.pl
850
ka#man01:$
The problem isn't just printing. I need to use the variable generated by awk "i.e. the $s variable) but the variable is also being reserved with a long string and a new line!
Could you guys help?
Thank you.
I'd suggest that you're going down a dirty road by trying to inline awk into perl in the first place. Why not instead:
open ( my $input, '<', 'STD' ) or die $!;
while ( <$input> ) {
s/\s+\z//;
my #fields = split /,/;
print $fields[1], "\n" if $fields[0] == 500;
}
But the likely problem is that you're not handling linefeeds, and say is adding an extra one. Try using print instead, or chomp on the resultant string.
perl can do many of the things that awk can do. Here's something similar that replaces your entire Perl program:
$ perl -naF, -le 'chomp; print $F[1] if $F[0]==500' STD
850
The -n creates a while loop around your argument to -e.
The -a splits up each line into #F and -F lets you specify the separator. Since you want to separate the fields on a comma you use -F,.
The -l adds a newline each time you call print.
The -e argument is the program to run (with the added while from -n). The chomp removes the newline from the output. You get a newline in your output because you happen to use the last field in the line. The -l adds a newline when you print; that's important when you want to extract a field in the middle of the line.
The reason you get 2 newlines:
the backtick operator does not remove the trailing newline from the awk output. $s contains "850\n"
the say function appends a newline to the string. You have say "850\n" which is the same as print "850\n\n"

How to remove a specific word from a file in perl

A file contains:
rhost=localhost
ruserid=abcdefg_xxx
ldir=
lfile=
rdir=p01
rfile=
pgp=none
mainframe=no
ftpmode=binary
ftpcmd1=
ftpcmd2=
ftpcmd3=
ftpcmd1a=
ftpcmd2a=
notifycc=no
firstfwd=Yes
NOTIFYNYL=
decompress=no
compress=no
I want to write a simple code that removes the "_xxx" in that second line. Keep in mind that there will never be a file that contains the string "_xxx" so that should make it extremely easier. I'm just not too familiar with the syntax. Thanks!
The short answer:
Here's how you can remove just the literal '_xxx'.
perl -pli.bak -e 's/_xxx$//' filename
The detailed explanation:
Since Perl has a reputation for code that is indistinguishable from line noise, here's an explanation of the steps.
-p creates an implicit loop that looks something like this:
while( <> ) {
# Your code goes here.
}
continue {
print or die;
}
-l sort of acts like "auto-chomp", but also places the line ending back on the line before printing it again. It's more complicated than that, but in its simplest use, it changes your implicit loop to look like this:
while( <> ) {
chomp;
# Your code goes here.
}
continue {
print $_, $/;
}
-i tells Perl to "edit in place." Behind the scenes it creates a separate output file and at the end it moves that temporary file to replace the original.
.bak tells Perl that it should create a backup named 'originalfile.bak' so that if you make a mistake it can be reversed easily enough.
Inside the substitution:
s/
_xxx$ # Match (but don't capture) the final '_xxx' in the string.
/$1/x; # Replace the entire match with nothing.
The reference material:
For future reference, information on the command line switches used in Perl "one-liners" can be obtained in Perl's documentation at perlrun. A quick introduction to Perl's regular expressions can be found at perlrequick. And a quick overview of Perl's syntax is found at perlintro.
This overwrites the original file, getting rid of _xxx in the 2nd line:
use warnings;
use strict;
use Tie::File;
my $filename = shift;
tie my #lines, 'Tie::File', $filename or die $!;
$lines[1] =~ s/_xxx//;
untie #lines;
Maybe this can help
perl -ple 's/_.*// if /^ruserid/' < file
will remove anything after the 1st '_' (included) in the line what start with "ruserid".
One way using perl. In second line ($. == 2), delete from last _ until end of line:
perl -lpe 's/_[^_]*\Z// if $. == 2' infile

Perl substitute with regex

When I run this command over a Perl one liner, it picks up the the regular expression -
so that can't be bad.
more tagcommands | perl -nle 'print /(\d{8}_\d{9})/' | sort
12012011_000005769
12012011_000005772
12162011_000005792
12162011_000005792
But when I run this script over the command invocation below, it does not pick up the
regex.
#!/usr/bin/perl
use strict;
my $switch="12012011_000005777";
open (FILE, "more /home/shortcasper/work/tagcommands|");
my #array_old = (<FILE>) ;
my #array_new = #array_old ;
foreach my $line(#array_new) {
$line =~ s/\d{8}_\d{9}/$switch/g;
print $line;
sleep 1;
}
This is the data that I am feeding into the script
/CASPERBOT/START URL=simplefile:///data/tag/squirrels/squirrels /12012011_000005777N.dart.gz CASPER=SeqRashMessage
/CASPERBOT/ADDSERVER simplefile:///data/tag/squirrels/12012011_0000057770.dart.trans.gz
/CASPERRIP/newApp multistitch CASPER_BIN
/CASPER_BIN/START URLS=simplefile:///data/tag/squirrels /12012011_000005777R.rash.gz?exitOnEOF=false;binaryfile:///data/tag/squirrels/12162011_000005792D.binaryBlob.gz?exitOnEOF=false;simplefile:///data/tag/squirrels/12012011_000005777E.bean.trans.gz?exitOnEOF=false EXTRACTORS=rash;island;rash BINARY=T
You should study your one-liner to see how it works. First check perl -h to learn about the switches used:
-l[octal] enable line ending processing, specifies line terminator
-n assume "while (<>) { ... }" loop around program
The first one is not exactly self-explanatory, but what -l actually does is chomp each line, and then change $\ and $/ to newline. So, your one-liner:
perl -nle 'print /(\d{8}_\d{9})/'
Actually does this:
$\ = "\n";
while (<>) {
chomp;
print /(\d{8}_\d{9})/;
}
A very easy way to see this is to use the Deparse command:
$ perl -MO=Deparse -nle 'print /(\d{8}_\d{9})/'
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
print /(\d{8}_\d{9})/;
}
-e syntax OK
So, that's how you transform that into a working script.
I have no idea how you went from that to this:
use strict;
my $switch="12012011_000005777";
open (FILE, "more /home/shortcasper/work/tagcommands|");
my #array_old = (<FILE>) ;
my #array_new = #array_old ;
foreach my $line(#array_new) {
$line =~ s/\d{8}_\d{9}/$switch/g;
print $line;
sleep 1;
}
First of all, why are you opening a pipe from the more command to read a text file? That is like calling a tow truck to fetch you a cab. Just open the file. Or better yet, don't. Just use the diamond operator, like you did the first time.
You don't need to first copy the lines of a file to an array, and then use the array. while(<FILE>) is a simple way to do it.
In your one-liner, you print the regex. Well, you print the return value of the regex. In this script, you print $line. I'm not sure how you thought that would do the same thing.
Your regex here will remove all set of numbers and replace it with the ones in your script. Nothing else.
You may also be aware that sleep 1 will not do what you think. Try this one-liner, for example:
perl -we 'for (1 .. 10) { print "line $_\n"; sleep 1; }'
As you will notice, it will simply wait 10 seconds then print everything at once. That's because perl by default prints to the standard output buffer (in the shell!), and that buffer is not printed until it is full or flushed (when the perl execution ends). So, it's a perception problem. Everything works like it should, you just don't see it.
If you absolutely want to have a sleep statement in your script, you'll probably want to autoflush, e.g. STDOUT->autoflush(1);
However, why are you doing that? Is it so you will have time to read the numbers? If so, put that more statement at the end of your one-liner instead:
perl ...... | more
That will pipe the output into the more command, so you can read it at your own pace. Now, for your one-liner:
Always also use -w, unless you specifically want to avoid getting warnings (which basically you never should).
Your one-liner will only print the first match. If you want to print all the matches on a new line:
perl -wnle 'print for /(\d{8}_\d{9})/g'
If you want to print all the matches, but keep the ones from the same line on the same line:
perl -wnle 'print "#a" if #a = /(\d{8}_\d{9})/g'
Well, that should cover it.
Your open call may be failing (you should always check the result of an open to make sure it succeeded if the rest of the program depends on it) but I believe your problem is in complicating things by opening a pipe from a more command instead of simply opening the file itself. Change the open to simply
open FILE, "/home/shortcasper/work/tagcommands" or die $!;
and things should improve.