Seek function not working in perl - perl

I tried the below code snippet and the seek function doesn't seem to work.
funct("ls -ltr /scratch/dummy/dum*");
sub funct {
print "\nRecording\n";
open(SENSOR_LIST1, "$_[0] |") || die "Failed to read sensor list: $!\n";
for $sensor_line1 (<SENSOR_LIST1>) {
print "$sensor_line1";
}
my $pos = tell SENSOR_LIST1;
print "\nposition is $pos"; #Here the position is 613
print "\nRecording again";
seek (SENSOR_LIST1, SEEK_SET, 0);
$pos = tell SENSOR_LIST1; # Here again the position is 613, even after a seek
print "\nposition now is $pos";
for $sensor_line1 (<SENSOR_LIST1>) {
print "$sensor_line1";
}
close SENSOR_LIST1;
}
Note: All variants of seek doesn't work.
Output:
The position does not change even after the seek. It remains in 613.
Can you guys, check and let me know what is the issue here?
Thanks.

You cannot seek on a pipe.
Either use a temporary file or store the data in memory.
Your choice as to the best solution

Try writing the output of your ls command to a file and opening that file instead of reading the command's output directly. You can't seek on a transient data stream (such as a command's output), only on data which still exists after being read (such as a file).

Related

Save contents of those files which contain a specific known string in an single .txt or .tmp file using perl

I'm trying to write a perl script where I'm trying to save whole contents of those files which contain a specific string 'PYAG_GENERATED', in a single .txt/.tmp file one after another. These file names are in a specific pattern and this pattern is 'output_nnnn.txt' where nnnn is 0001,0002 and so on. But I don't know how many number of files are present with this 'output_nnnn.txt' name.
I'm new in perl and I don't know how I can resolve this issue to get the output correctly. Can anyone help me. Thanks in advance.
I've tried to write perl script in different ways but nothing is coming in output file. I'm giving here one of those I've tried. 'new_1.txt' is the new file where I want to save the expected output and "PYAG_GENERATED" is that specific string I'm finding for in the files.
open(NEW,">>new_1.txt") or die "could not open:$!";
$find2="PYAG_GENERATED";
$n='0001';
while('output_$n.txt'){
if(/find2/){
print NEW;
}
$n++;
}
close NEW;
I expect that the output file 'new_1.txt' will save the whole contents of the the files(with filename pattern 'output_nnnn.txt') which have 'PYAG_GENERATED' string at least once inside.
Well, you tried I guess.
Welcome to the wonderful world of Perl where there are always a dozen ways of doing X :-) One possible way to achieve what you want. I put in a lot of comments I hope are helpful. It's also a bit verbose for the sake of clarity. I'm sure it could be golfed down to 5 lines of code.
use warnings; # Always start your Perl code with these two lines,
use strict; # and Perl will tell you about possible mistakes
use experimental 'signatures';
use File::Slurp;
# this is a subroutine/function, a block of code that can be called from
# somewhere else. it takes to arguments, that the caller must provide
sub find_in_file( $filename, $what_to_look_for )
{
# the open function opens $filename for reading
# (that's what the "<" means, ">" stands for writing)
# if successfull open will return we will have a "file handle" in the variable $in
# if not open will return false ...
open( my $in, "<", $filename )
or die $!; # ... and the program will exit here. The variable $! will contain the error message
# now we read the file using a loop
# readline will give us the next line in the file
# or something false when there is nothing left to read
while ( my $line = readline($in) )
{
# now we test wether the current line contains what
# we are looking for.
# the index function gives us the index of a string within another string.
# for example index("abc", "c") will give us 3
if ( index( $line, $what_to_look_for ) > 0 )
{
# we found what we were looking for
# so we don't need to keep looking in this file anymore
# so we must first close the file
close( $in );
# and then we indicate to the caller the search was a successfull
# this will immedeatly end the subroutine
return 1;
}
}
# If we arrive here the search was unsuccessful
# so we tell that to the caller
return 0;
}
# Here starts the main program
# First we get a list of files
# we want to look at
my #possible_files = glob( "where/your/files/are/output_*.txt" );
# Here we will store the files that we are interested in, aka that contain PYAG_GENERATED
my #wanted_files;
# and now we can loop over the files and see if they contain what we are looking for
foreach my $filename ( #possible_files )
{
# here we use the function we defined earlier
if ( find_in_file( $filename, "PYAG_GENERATED" ) )
{
# with push we can add things to the end of an array
push #wanted_files, $filename;
}
}
# We are finished searching, now we can start adding the files together
# if we found any
if ( scalar #wanted_files > 0 )
{
# Now we could code that us ourselves, open the files, loop trough them and write out
# line by line. But we make life easy for us and just
# use two functions from the module File::Slurp, which comes with Perl I believe
# If not you have to install it
foreach my $filename ( #wanted_files )
{
append_file( "new_1.txt", read_file( $filename ) );
}
print "Output created from " . (scalar #wanted_files) . " files\n";
}
else
{
print "No input files\n";
}
use strict;
use warnings;
my #a;
my $i=1;
my $find1="PYAG_GENERATED";
my $n=1;
my $total_files=47276; #got this no. of files by writing 'ls' command in the terminal
while($n<=$total_files){
open(NEW,"<output_$n.txt") or die "could not open:$!";
my $join=join('',<NEW>);
$a[$i]=$join;
#print "$a[10]";
$n++;
$i++;
}
close NEW;
for($i=1;$i<=$total_files;$i++){
if($a[$i]=~m/$find1/){
open(NEW1,">>new_1.tmp") or die "could not open:$!";
print NEW1 $a[$i];
}
}
close NEW1;

PERL: Jumping to lines in a huge text file

I have a very large text file (~4 GB).
It has the following structure:
S=1
3 lines of metadata of block where S=1
a number of lines of data of this block
S=2
3 lines of metadata of block where S=2
a number of lines of data of this block
S=4
3 lines of metadata of block where S=4
a number of lines of data of this block
etc.
I am writing a PERL program that read in another file,
foreach line of that file (where it must contain a number),
search the huge file for a S-value of that number minus 1,
and then analyze the lines of data of the block belongs to that S-value.
The problem is, the text file is HUGE, so processing each line with a
foreach $line {...} loop
is very slow. As the S=value is strictly increasing, are there any methods to jump to a particular line of the required S-value?
are there any methods to jump to a particular line of the required S-value?
Yes, if the file does not change then create an index. This requires reading the file in its entirety once and noting the positions of all the S=# lines using tell. Store it in a DBM file with the key being the number and the value being the byte position in the file. Then you can use seek to jump to that point in the file and read from there.
But if you're going to do that, you're better off exporting the data into a proper database such as SQLite. Write a program to insert the data into the database and add normal SQL indexes. This will probably be simpler than writing the index. Then you can query the data efficiently using normal SQL, and make complex queries. If the file change you can either redo the export, or use the normal insert and update SQL to update the database. And it will be easy for anyone who knows SQL to work with, as opposed to a bunch of custom indexing and search code.
I know the op has already accepted an answer, but a method that's served me well is to slurp the file into an array, based on changing the "record separator" ($/).
If you do something like this (not tested, but this should be close):
$/ = "S=";
my #records=<fh>;
print $records[4];
The output should be the entire fifth record (the array starts at 0, but your data starts at 1), starting with the record number (5) on a line by itself (you might need to strip that out later), following by all the remaining lines in that record.
It's very simple and fast, although it is a memory pig...
If the blocks of text are of the same length (in bytes or characters) you can calculate the position of the needed S-value in the file and seek there, then read. Otherwise, in principle you need to read lines to find the S value.
However, if there are only a few S-values to find you can estimate the needed position and seek there, then read enough to capture an S-value. Then analyze what you read to see how far off you are, and either seek again or read lines with <> to get to the S-value.
use warnings;
use strict;
use feature 'say';
use Fcntl qw(:seek);
my ($file, $s_target) = #ARGV;
die "Usage: $0 filename\n" if not $file or not -f $file;
$s_target //= 5; #/ default, S=5
open my $fh, '<', $file or die $!;
my $est_text_len = 1024;
my $jump_by = $est_text_len * $s_target; # to seek forward in file
my ($buff, $found);
seek $fh, $jump_by, SEEK_CUR; # get in the vicinity
while (1) {
my $rd = read $fh, $buff, $est_text_len;
warn "error reading: $!" if not defined $rd;
last if $rd == 0;
while ($buff =~ /S=([0-9]+)/g) {
my $s_val = $1;
# Analyze $s_val and $buff:
# (1) if overshot $s_target adjust $jump_by and seek back
# (2) if in front of $s_target read with <> to get to it
# (3) if $s_target is in $buff extract needed text
if ($s_val == $s_target) {
say "--> Found S=$s_val at pos ", pos $buff, " in buffer";
seek $fh, - $est_text_len + pos($buff) + 1, SEEK_CUR;
while (<$fh>) {
last if /S=[0-9]+/; # next block
print $_;
}
$found = 1;
last;
}
}
last if $found;
}
Tested with your sample, enlarged and cleaned up (change S=n in text as it is the same as the condition!), with $est_text_len and $jump_by set at 100 and 20.
This is a sketch. A full implementation needs to negotiate over and under seeking as outlined in comments in code. If text-block sizes don't vary much it can get in front of the needed S-value in two seek-and-reads, and then read with <> or use regex as in the example.
Some comments
The "analysis" sketched above need be done carefully. For one, a buffer may contain multiple S-value lines. Also, note that the code keeps reading if an S-value isn't in buffer.
Once you are close enough and in front of $s_target read lines by <> to get to it.
The read may not get as much as requested so you should really put that in a loop. There are recent posts with that.
Change to sysread from read for efficiency. In that case use sysseek, and don't mix with <> (which is buffered).
The code above presumes one S-value to find; adjust for more. It absolutely assumes that S-values are sorted.
This is clearly far more complex than reading lines but it does run much faster, with a very large file and only a few S-values to find. If there are many values then this may not help.
The foreach (<$fh>), indicated in the question, would cause the whole file to be read first (to build the list for foreach to go through); use while (<$fh>) instead.
If the file doesn't change (or the same file need be searched many times) you can first process it once to build an index of exact positions of S-values. Thanks to Danny_ds for a comment.
Binary search of a sorted list is an O(log N) operation. Something like this using seek:
open my $fh, '>>+', $big_file;
$target = 123_456_789;
$low = 0;
$high = -s $big_file;
while ($high - $low > 0.01 * -s $big_file) {
$mid = ($low + $high) / 2;
seek $fh, $mid, 0;
while (<$fh>) {
if (/^S=(\d+)/) {
if ($1 < $target) { $low = $mid; }
else { $high = $mid }
last;
}
}
}
seek $fh, $low, 0;
while (<$fh>) {
# now you are searching through the 1% of the file that contains
# your target S
}
Sort the numbers in the second file. Now you can proceed thru the huge file in order, processing each S-value as needed.

Building indexes for files in Perl

I'm currently new to Perl, and I've stumbled upon a problem :
My task is to create a simple way to access a line of a big file in Perl, the fastest way possible.
I created a file consisting of 5 million lines with, on each line, the number of the line.
I've then created my main program that will need to be able to print any content of a given line.
To do this, I'm using two methods I've found on the internet :
use Config qw( %Config );
my $off_t = $Config{lseeksize} > $Config{ivsize} ? 'F' : 'j';
my $file = "testfile.err";
open(FILE, "< $file") or die "Can't open $file for reading: $!\n";
open(INDEX, "+>$file.idx")
or die "Can't open $file.idx for read/write: $!\n";
build_index(*FILE, *INDEX);
my $line = line_with_index(*FILE, *INDEX, 129);
print "$line";
sub build_index {
my $data_file = shift;
my $index_file = shift;
my $offset = 0;
while (<$data_file>) {
print $index_file pack($off_t, $offset);
$offset = tell($data_file);
}
}
sub line_with_index {
my $data_file = shift;
my $index_file = shift;
my $line_number = shift;
my $size; # size of an index entry
my $i_offset; # offset into the index of the entry
my $entry; # index entry
my $d_offset; # offset into the data file
$size = length(pack($off_t, 0));
$i_offset = $size * ($line_number-1);
seek($index_file, $i_offset, 0) or return;
read($index_file, $entry, $size);
$d_offset = unpack($off_t, $entry);
seek($data_file, $d_offset, 0);
return scalar(<$data_file>);
}
Those methods sometimes work, I get a value once out of ten tries on different set of values, but most of the time I get "Used of uninitialized value $line in string at test2.pl line 10" (when looking for line 566 in the file) or not the right numeric value. Moreover, the indexing seems to work fine on the first two hundred or so lines, but afterwards I get the error. I really don't know what I'm doing wrong..
I know you can use a basic loop that will parse each line, but I really need a way of accessing, at any given time, one line of a file without reparsing it all over again.
Edit : I've tried using a little tip found here : Reading a particular line by line number in a very large file
I've replaced the "N" template for pack with :
my $off_t = $Config{lseeksize} > $Config{ivsize} ? 'F' : 'j';
It makes the process work better, until line 128, where instead of getting 128 , I get a blank string. For 129, I get 3, which doesn't mean much..
Edit2 : Basically what I need is a mechanism that enables me to read the next 2 lines for instance for a file that is already being read, while keeping the read "head" at the current line (and not 2 lines after).
Thanks for your help !
Since you are writing binary data to the index file, you need to set the filehandle to binary mode, especially if you are in Windows:
open(INDEX, "+>$file.idx")
or die "Can't open $file.idx for read/write: $!\n";
binmode(INDEX);
Right now, when you perform something like this in Windows:
print $index_file pack("j", $offset);
Perl will convert any 0x0a's in the packed string to 0x0d0a's. Setting the filehandle to binmode will make sure line feeds are not converted to carriage return-line feeds.

Read a file from second line till end in perl

I am having a file which has so many lines. I want to discard first line and
trying to read a file from second line till end but not getting enough help on google.
Please help me out in this case.
Below is the code in which I am trying to extract 4 and 5 column of a csv file however It is including first line that is header as well, that I do not want.
My code should get me only values not headers. that are starting from second line.
foreach my $inputfile (glob("$previous_path/*Analysis*.txt")) {
open(INFILE, $inputfile) or die("Could not open file.");
foreach my $line (<INFILE>){
my #values = split(',', $line); # parse the file
my $previous_result = $values[5];
my $previous_time = $values[4];
print $previous_result,"\n";
print $previous_time,"\n";
push (#previous_result, $previous_result);
push (#previous_time, $previous_time);
}
close(INFILE);
}
Just skip the first line, then read the rest.
<>; # read and discard a line
while (<>) { # loop over the other lines
print $_
}
UPDATE: after you've edited the question, it turns out you want something completely different, to
read a CSV file in Perl
That is a completely different question, and what you should have asked for in the first place. The answer is to use an established library, like CSV::Slurp
Just skip line number ($.) 1, perhaps using next, like this:
while (<>) {
next if ($. == 1);
print $_;
}
Live demo.
u can skip the first line while reading the file itself
ex.
open(IN,"cat filename|tail -n +2|") || die "can not open file :$!";
while(<IN>){
//process further
}
close(IN);

Nested while loop which does not seem to keep variables appropriately

I'm an amature Perl coder, and I'm having a lot of trouble figuring what is causing this particular issue. It seems as though it's a variable issue.
sub patch_check {
my $pline;
my $sline;
while (<SYSTEMINFO>) {
chomp($_);
$sline = $_;
while (<PATCHLIST>) {
chomp($_);
$pline = $_;
print "sline $sline pline $pline underscoreline $_ "; #troubleshooting
print "$sline - $pline\n";
if ($pline =~ /($sline)/) {
#print " - match $pline -\n";
}
} #end while
}
}
There is more code, but I don't think it is relevant. When I print $sline in the first loop it works fine, but not in the second loop. I tried making the variables global, but that did not work either.
The point of the subform is I want to open a file (patches) and see if it is in (systeminfo). I also tried reading the files into arrays and doing foreach loops.
Does anyone have another solution?
It looks like your actual goal here is to find lines which are in both files, correct? The normal (and much more efficient! - it only requires you to read in each file once, rather than reading all of one file for each line in the other) way to do this in Perl would be to read the lines from one file into a hash, then use hash lookups on each line in the other file to check for matches.
Untested (but so simple it should work) code:
sub patch_check {
my %slines;
while (<SYSTEMINFO>) {
# Since we'll just be comparing one file's lines
# against the other file's lines, there's no real
# reason to chomp() them
$slines{$_}++;
}
# %slines now has all lines from SYSTEMINFO as its
# keys and the values are the number of times the
# line appears, in case that's interesting to you
while (<PATCHLIST>) {
print "match: $_" if exists $slines{$_};
}
}
Incidentally, if you're reading your data from SYSTEMINFO and PATCHLIST, then you're doing it the old-fashioned way. When you get a chance, read up on lexical filehandles and the three-argument form of open if you're not already familiar with them.
Your code is not entering the PATCHLIST while loop the 2nd time through the SYSTEMINFO while loop because you already read all the contents of PATCHLIST the first time through. You'd have to re-open the PATCHLIST filehandle to accomplish what you're trying to do.
That's a pretty inefficient way to see if the lines of one file match the lines of another file. Take a look at grep with the -f flag for another way.
grep -f PATCHFILE SYSTEMINFO
What I like to do in such cases is: read one file and create keys for a hash from the values you are looking for. And then read the second file and look if the keys are already existing. In this way you have to read each file only once.
Here is example code, untested:
sub patch_check {
my %patches = ();
open(my $PatchList, '<', "patch.txt") or die $!;
open(my $SystemInfo, '<', "SystemInfo.txt") or die $!;
while ( my $PatchRow = <$PatchList> ) {
$patches($PatchRow) = 0;
}
while ( my $SystemRow = <$SystemInfo> ) {
if exists $patches{$SystemRow} {
#The Patch is in System Info
#Do whateever you want
}
}
}
You can not read one file inside the read loop of another. Slurp one file in, then have one loop as a foreach line of the slurped file, the outer loop, the read loop.