Extract file contents between given lines using perl - perl

I want to use only Sed in Perl to capture the file contents between 1000 and 2000 lines in a given file.
I tried the below but it didn't work,Can someone help me on this please.
$firstLIne="1000";
$lastline="2000";
$output=`sed -n '$firstLIne,$lastline'p sample.txt`;

Here is another pure perl solution:
my ($firstline, $lastline) = (1000,2000);
open my $fh, '<', 'sample.txt' or die "$!";
while(<$fh>){
print if $. == $firstline .. $. == $lastline;
}
if you don't use the variables anywhere else, you can use the special use case of .. with constants (4th paragraph if you use constant expression they automatically get compared to $.):
while(<$fh>){
print if 1000 .. 2000;
}
Here is the important part from the perldoc for the .. operator:
In scalar context, ".." returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors.
Edit Per request, with storing the intermediate lines in a variable.
my ($firstline, $lastline) = (1000,2000);
my $output = '';
open my $fh, '<', 'sample.txt' or die $!;
while(<$fh>){
$output .= $_ if $. == $firstline .. $. == $lastline;
}
print $ouput;
Also, if your file isn't too big (it fits completely into memory) you also can read it into a list and select the lines you're interested in:
my $output = join '', (<$fh>)[$firstline+1..$lastline]

For comparison, to do this in Perl only, one could write:
my $firstLine=1000;
my $lastLine=2000;
my $fn="sample.txt";
my $output;
open (my $fh, "<", $fn) or die "Could not open file '$fn': $!\n";
while (<$fh>) {
last if $. > $lastLine;
$output .= $_ if $. >= $firstLine;
}
close($fh);
Note that this will stop reading from file after line $lastLine.. so if the file contains 100,000 lines it will only read the first 2000 lines..

If you just want to print out the lines then:
perl -ne 'print if 1000 .. 2000' example_data.txt
should work.
If you want to incorporate that into a script somehow then you can "semi-slurp" the filehandle:
use strict;
use warnings;
open my $filehandle, 'example_data.txt' or die $!;
my $lines_1k_to_2k ;
while (<$filehandle>) {
$lines_1k_to_2k .= $_ if 1000 .. 2000 ;
}
print $lines_1k_to_2k ;
The .= operator will add the lines to the string in variable $lines_1k_to_2k only if they are in the range 1000 .. 2000

Related

My perl script isn't working, I have a feeling it's the grep command

I'm trying for search in the one file for instances of the
number and post if the other file contains those numbers
#!/usr/bin/perl
open(file, "textIds.txt"); #
#file = <file>; #file looking into
# close file; #
while(<>){
$temp = $_;
$temp =~ tr/|/\t/; #puts tab between name and id
#arrayTemp = split("\t", $temp);
#found=grep{/$arrayTemp[1]/} <file>;
if (defined $found[0]){
#if (grep{/$arrayTemp[1]/} <file>){
print $_;
}
#found=();
}
print "\n";
close file;
#the input file lines have the format of
#John|7791 154
#Smith|5432 290
#Conor|6590 897
#And in the file the format is
#5432
#7791
#6590
#23140
There are some issues in your script.
Always include use strict; and use warnings;.
This would have told you about odd things in your script in advance.
Never use barewords as filehandles as they are global identifiers. Use three-parameter-open
instead: open( my $fh, '<', 'testIds.txt');
use autodie; or check whether the opening worked.
You read and store testIds.txt into the array #file but later on (in your grep) you are
again trying to read from that file(handle) (with <file>). As #PaulL said, this will always
give undef (false) because the file was already read.
Replacing | with tabs and then splitting at tabs is not neccessary. You can split at the
tabs and pipes at the same time as well (assuming "John|7791 154" is really "John|7791\t154").
Your talking about "input file" and "in file" without exactly telling which is which.
I assume your "textIds.txt" is the one with only the numbers and the other input file is the
one read from STDIN (with the |'s in it).
With this in mind your script could be written as:
#!/usr/bin/perl
use strict;
use warnings;
# Open 'textIds.txt' and slurp it into the array #file:
open( my $fh, '<', 'textIds.txt') or die "cannot open file: $!\n";
my #file = <$fh>;
close($fh);
# iterate over STDIN and compare with lines from 'textIds.txt':
while( my $line = <>) {
# split "John|7791\t154" into ("John", "7791", "154"):
my ($name, $number1, $number2) = split(/\||\t/, $line);
# compare $number1 to each member of #file and print if found:
if ( grep( /$number1/, #file) ) {
print $line;
}
}

Read in first 5 lines, print specified line

I am trying to read in the first 5 lines of my input file and print only a single line (in this case line 4) that is given to the perl script from the command line. I am having some trouble comparing the current line number to the specified line number.
Here is the important part of my perl script:
# Variables
my $sInputFile = $ARGV[0];
my $sOutputFile = $ARGV[1];
my $sRowExtractNumber = $ARGV[2];
# Open-Close / Exceptions
open(my $in, "<", $sInputFile) or die "cannot open output file: $sOutputFile\n";
open(my $out, ">", $sOutputFile) or die "cannot open input file: $sInputFile\n";
# Script
while (<$in>) {
if (1..5) {
print $out $_ if $_ == $sRowExtractNumber;
}
}
I am not getting an error per say, but nothing is being printed to the $out file.
How can I accomplish my goal?
Thanks.
The $. variable is the current input line number. I presume you used $_ by mistake.
There is no need to check both that the line number is five or less and that it matches the requested line, although you may want to verify your input beforehand.
You must always use strict and use warnings at the top of every module. It is a simple measure that will alert you to many trivial mistakes that you could otherwise easily overlook. And you should use lower case letters for local identifiers: upper case is reserved for globals such as package names.
use strict;
use warnings;
my ($input_file, $output_file, $row_extract_number) = #ARGV;
die "Line number must be five or less" if $row_extract_number < 1 or $row_extract_number > 5;
open my $in, '<', $input_file or die qq{Cannot open "$input_file" for input: $!}
open my $out, '>', $output_file or die qq{Cannot open "$output_file" for output: $!}
while (<$in>) {
if ($. == $row_extract_number) {
print $out $_;
last;
}
}

problems with replacing first line of file using perl

I have a file that looks like this:
I,like
blah...
I want to replace only the first line with 'i,am' to get:
i,am
blah...
These are big files, so this is what I did (based on this):
open(FH, "+< input.txt") or die "FAIL!";
my $header = <FH>;
chop($header);
$header =~ s/I,like/i,am/g;
seek FH, 0, 0; # go back to start of file
printf FH $header;
close FH;
However, I get this when I run it:
i,amke
blah...
I looks like the 'ke' from like is still there. How do I get rid of it?
What I would do is probably something like this:
perl -i -pe 'if ($. == 1) { s/.*/i,am/; }' yourfile.txt
Which will only affect the first line, when the line counter for the current file handle $. is equal to 1. The regex will replace everything except newline. If you need it to match your specific line, you can include that in the if-statement:
perl -i -pe 'if ($. == 1 and /^I,like$/) { s/.*/i,am/; }' yourfile.txt
You can also look into Tie::File, which allows you to treat the file like an array, which means you can simply do $line[0] = "i,am\n". It is mentioned that there may be performance issues with this module, however.
If the replacement has a different length than the original, you cannot use this technique. You can for example create a new file and then rename it to the original name.
open my $IN, '<', 'input.txt' or die $!;
open my $OUT, '>', 'input.new' or die $!;
my $header = <$IN>;
$header =~ s/I,like/i,am/g;
print $OUT $header;
print $OUT $_ while <$IN>; # Just copy the rest.
close $IN;
close $OUT or die $!;
rename 'input.new', 'input.txt' or die $!;
I'd just use Tie::File:
#! /usr/bin/env perl
use common::sense;
use Tie::File;
sub firstline {
tie my #f, 'Tie::File', shift or die $!;
$f[0] = shift;
untie #f;
}
firstline $0, '#! ' . qx(which perl);
Usage:
$ ./example
$ head -2 example
#! /bin/perl
use common::sense;

How to read specific lines from file and store in an array using perl?

How can i read/store uncommented lines from file into an array ?
file.txt looks like below
request abcd uniquename "zxsder,azxdfgt"
request abcd uniquename1 "nbgfdcbv.bbhgfrtyujk"
request abcd uniquename2 "nbcvdferr,nscdfertrgr"
#request abcd uniquename3 "kdgetgsvs,jdgdvnhur"
#request abcd uniquename4 "hvgsfeyeuee,bccafaderryrun"
#request abcd uniquename5 "bccsfeueiew,bdvdfacxsfeyeueiei"
Now i have to read/store the uncommented lines (first 3 lines in this script) into an array. is it possible to use it by pattern matching with string name or any regex ? if so, how can i do this ?
This below code stores all the lines into an array.
open (F, "test.txt") || die "Could not open test.txt: $!\n";
#test = <F>;
close F;
print #test;
how can i do it for only uncommented lines ?
If you know your comments will contain # at the beginning you can use
next if $_ =~ m/^#/
Or use whatever variable you have to read each line instead of $_
This matches # signs at the beginning of the line.
As far as adding the others to an array you can use push (#arr, $_)
#!/usr/bin/perl
# Should always include these
use strict;
use warnings;
my #lines; # Hold the lines you want
open (my $file, '<', 'test.txt') or die $!; # Open the file for reading
while (my $line = <$file>)
{
next if $line =~ m/^#/; # Look at each line and if if isn't a comment
push (#lines, $line); # we will add it to the array.
}
close $file;
foreach (#lines) # Print the values that we got
{
print "$_\n";
}
You could do:
push #ary,$_ unless /^#/;END{print join "\n",#ary}'
This skips any line that begins with #. Otherwise the line is added to an array for later use.
The smallest change to your original program would probably be:
open (F, "test.txt") || die "Could not open test.txt: $!\n";
#test = grep { $_ !~ /^#/ } <F>;
close F;
print #test;
But I'd highly recommend rewriting that slightly to use current best practices.
# Safety net
use strict;
use warnings;
# Lexical filehandle, three-arg open
open (my $fh, '<', 'test.txt') || die "Could not open test.txt: $!\n";
# Declare #test.
# Don't explicitly close filehandle (closed automatically as $fh goes out of scope)
my #test = grep { $_ !~ /^#/ } <$fh>;
print #test;

Select rows based on text pattern

I want to extract rows from a file that match a particular pattern and I want to do this for over 500 files. It should have the ability to retain the unique name of the file as well.
I used awk but then i have to do each file individually.
c:\>gawk "/S1901/" Census_Tract_*.csv > Census_Tract_*.csv
In the example shown in the link here (http://bit.ly/nMX8qh) I want to retain only those records that have S1901 in them. Apologies for the external link but i am not able to retain formatting of the table.
I found some perl code that I used to write it but it retains all the rows and does not select only those rows/records where the pattern matches. Any tips would be much appreciated. The perl code is below:
#perl -w
$pattern = "Subject_Census*.csv"; # process only those files that match pattern
while (defined ($in = glob($pattern))) {
($out = $in) =~ s/\.csv$/.outcsv/; # read from "xyz.in" and write to "xyz.out"
open (IN, "<", $in) or die "Can't open $in for reading: $!";
open (OUT,">>", $out) or die "Can't open $out for writing: $!";
while (<IN>) {
$mystring =~ /S1901/;
print OUT $_ if $mystring == 0;
}
close (IN) or die "Can't close $in: $!"; # good idea to do some housekeeping
close (OUT) or die "Can't close $out: $!";
}
Untested:
use strict;
use warnings;
use autodie;
my $files_list_filename = 'files.txt';
open my $fl, '<', $files_list_filename;
my #list_of_files = <$fl>;
chomp #list_of_files;
close $fl;
foreach my $file ( #list_of_files ) {
open my $test_fh, '<', $file;
while ( my $line = <$test_fh> ) {
if( $line =~ m/S1901/ ) {
print "$file at $.: $line";
}
}
close $test_fh;
}
Is that sort of what you had in mind? It opens a file named filelist.txt and reads in a list of however many filenames you want to give it. Then it iterates over that list, opening each file one by one, scanning each file one by one, and if a line is found containing the trigger text, it prints the filename and line number, as well as the line itself where the trigger was met. Then it moves on to the next.
perl -ni.bak -e 'print if /S1901/' Subject_Census*.csv