Why does SEEK_SET fail? - perl

When I run the below script I get
Can't use an undefined value as a symbol reference at ./yaml-test.pl line 52.
where line 52 is
seek $fh, 0, SEEK_SET; # seek back to the beginning of file
The purpose of the script is to reproduce why I get corrupt yaml files, so the global file handles are suppose to be there.
My theory is that newly written yaml files are not overwritten, and therefore if the newly written yaml is smaller than the old, then old data will still be in the new yaml file.
Question
Can anyone see what is wrong with my script?
#!/usr/bin/perl
use strict;
use YAML::Syck;
use Fcntl ':flock', 'SEEK_SET';
use warnings;
use Data::Dumper;
my $acc;
my $acc_fh;
$acc->{1}{name1} = "abc";
system("rm -f test.yaml");
# write initial
open F, '>', 'test.yaml';
print F YAML::Syck::Dump($acc);
close F;
$acc->{1}{name2} = "abc";
write_yaml_with_lock($acc, $acc_fh);
$acc->{1}{name3} = "abc";
($acc, $acc_fh) = read_yaml_with_lock('test.yaml');
$acc->{1}{name4} = "abc";
write_yaml_with_lock($acc, $acc_fh);
sub read_yaml_with_lock {
my ($file) = #_;
open my $fh, '+<', $file or die $!;
flock($fh, LOCK_EX) or die $!;
my $obj = YAML::Syck::LoadFile($fh); # this dies on failure
return ($obj, $fh);
}
sub write_yaml_with_lock {
my ($obj, $fh) = #_;
my $yaml = YAML::Syck::Dump($obj);
$YAML::Syck::ImplicitUnicode = 1;
seek $fh, 0, SEEK_SET; # seek back to the beginning of file
print $fh $yaml . "---\n";
close $fh;
}

You call write_yaml_with_lock() twice. The first time you call it $acc_fh is still undef because it is not set until two lines further down via read_yaml_with_lock().

Related

is there a way to locally change the input record separator in perl?

Limiting the scope of a variable $x to a particular code chunk or subroutine, by means of my $x, saves a coder from a world of "global variable"-caused confusion.
But when it comes to the input record separator, $/, apparently its scope cannot be limited.
Am I correct in this?
As a consequence, if I forget to reset the input record separator at the end of a loop, or inside a subroutine, the code below my call to the subroutine can give unexpected results.
The following example demonstrates this.
#!/usr/bin/perl
use strict; use warnings;
my $count_records; my $infile = $ARGV[0]; my $HANDLEinfile;
open $HANDLEinfile, '<', $infile or die "cannot open $infile for reading";
$count_records = 0;
while(<$HANDLEinfile>)
{
$count_records++;
print "$count_records:\n";
print;
}
close $HANDLEinfile;
look_through_other_file();
print "\nNOW, after invoking look_through_other_file:\n";
open $HANDLEinfile, '<', $infile or die "cannot open $infile for reading";
$count_records = 0;
while(<$HANDLEinfile>)
{
$count_records++;
print "$count_records:\n";
print;
}
close $HANDLEinfile;
sub look_through_other_file
{
$/ = undef;
# here, look through some other file with a while loop
return;
}
Here is how it behaves on an input file:
> z.pl junk
1:
All work
2:
and
3:
no play
4:
makes Jack a dull boy.
NOW, after invoking look_through_other_file:
1:
All work
and
no play
makes Jack a dull boy.
>
Note that if one tries to change to
my $/ = undef;
inside the subroutine, this generates an error.
Incidentally, among the stackoverflow tags, why is there no tag for "input record separator"?
The answer for the my $/ = undef; question is to change it to local $/ = undef;. Then the revised code is as follows.
#!/usr/bin/perl
use strict; use warnings;
my $count_records; my $infile = $ARGV[0]; my $HANDLEinfile;
open $HANDLEinfile, '<', $infile or die "cannot open $infile for reading";
$count_records = 0;
while(<$HANDLEinfile>)
{
$count_records++;
print "$count_records:\n";
print;
}
close $HANDLEinfile;
look_through_other_file();
print "\nNOW, after invoking look_through_other_file:\n";
open $HANDLEinfile, '<', $infile or die "cannot open $infile for reading";
$count_records = 0;
while(<$HANDLEinfile>)
{
$count_records++;
print "$count_records:\n";
print;
}
close $HANDLEinfile;
sub look_through_other_file
{
local $/ = undef;
# here, look through some other file with a while loop
return;
}
Then there is no need to return the input record separator to another value, or to the default, $/ = "\n";, by hand.
You can use local to temporarily update the value of a global variable, including $/.
sub look_through_other_file {
local $/ = undef;
# here, look through some other file with a while loop
return;
}
will use an undefined $/ as long as the look_through_other_file subroutine is in the call stack.
You may encounter this construction in this common idiom, to slurp the entire contents of a file into a variable without altering the value of $/ for the rest of the program:
open my $fh, "<", "/some/file";
my $o = do { local $/; <$fh> };

code loading in perl <progress-bar>

this perl code for loading
When running code
use Term::ProgressBar::IO;
open my $fh, '<', 'passwords.txt' or die "could not open file n.txt: $!";
my $pb = Term::ProgressBar::IO->new($fh);
my $line_count;
while (<$fh>) {
$line_count += 1;
$pb->update();
}
close $fh;
print "total lines $line_count"
C:\Users\USER\Desktop>r.pl
0% [* ]total lines 360
what is the problem
Term::ProgressBar::IO seems incompatible with regular filehandles, and expects an instance of IO::File as seen in the test script, so you'll want to define $fh with
my $fh = IO::File->new('passwords.txt', 'r') or die ...;
The docs say that this module will work with any seekable filehandle, but it still doesn't work (for me, anyway).
The relevant line during construction is:
if (ref($count) and $count->can("seek")) {
When $count is an IO::File type, this condition passes but it fails when $count is a regular GLOB, even one opened for reading and writing. A GLOB will support the seek method, but can("seek") won't return true until after a method has been called on it.
use feature 'say';
open my $fh, '<', 'some-file';
say $fh->can('seek'); # ""
say tell $fh; # 0
say $fh->can('seek'); # ""
say eval { $fh->tell }; # 0
say $fh->can('seek'); # 1
and this suggests another workaround (one that could be implemented inside Term::ProgressBar::IO to address this issue), and that is to make a filehandle method call on the filehandle before you pass it to Term::ProgressBar::IO:
open my $fh, '<', 'passwords.txt' or die "could not open file n.txt: $!";
eval { $fh->tell }; # endow $fh with methods detectable by UNIVERSAL::can
...
Here is a simple process bar code, output as below.
$n = 10;
for($i=1;$i<=$n;$i++){
proc_bar($i,$n);
select(undef, undef, undef, 0.2);
}
sub proc_bar{
local $| = 1;
my $i = $_[0] || return 0;
my $n = $_[1] || return 0;
print "\r [ ".("\032" x int(($i/$n)*50)).(" " x (50 - int(($i/$n)*50)))." ] ";
printf("%2.1f %%",$i/$n*100);
local $| = 0;
}

Perl - binary unpack using pointer and index

I have a binary file that contain 3 files, a PNG, a PHP and a TGA file.
Here the file to give you the idea : container.bin
the file is build this way:
first 6 bytes are a pointer to the index, in this case 211794
Then you have all 3 files stacked one after the other
and at the ofset 211794, you have the index, that tell you where the file start and end
in this example you have:
[offset start] [offset end] [random data] [offset start] [name]
6 15149 asdf 6 Capture.PNG
15149 15168 4584 15149 index.php
15168 211794 12 15168 untilted.tga
meaning that capture.png start at offset 6, finish at offset 15149, then asdf is a random data, and the start offset is repeated again.
Now what I want to do is a perl to separate the file on this binary files.
The perl need to check the first 6 offset of the file (header), then jump to the index location, and use the list to extract the file out.
A mix of seek and read can be used to achieve the task:
#!/usr/bin/env perl
use strict;
use warnings;
use Fcntl 'SEEK_SET';
sub get_files_info {
my ( $fh, $offset ) = #_;
my %file;
while (<$fh>) {
chomp;
my $split_count = my ( $offset_start, $offset_end, $random_data, $offset_start_copy,
$file_name ) = split /\s/;
next if $split_count != 5;
if ( $offset_start != $offset_start_copy ) {
warn "Start of offset mismatch: $file_name\n";
next;
}
$file{$file_name} = {
'offset_start' => $offset_start,
'offset_end' => $offset_end,
'random_data' => $random_data,
};
}
return %file;
}
sub write_file {
my ( $fh, $file_name, $file_info ) = #_;
seek $fh, $file_info->{'offset_start'}, SEEK_SET;
read $fh, my $contents,
$file_info->{'offset_end'} - $file_info->{'offset_start'};
open my $fh_out, '>', $file_name or die 'Error opening file: $!';
binmode $fh_out;
print $fh_out $contents;
print "Wrote file: $file_name\n";
}
open my $fh, '<', 'container.bin' or die "Error opening file: $!";
binmode $fh;
read $fh, my $offset, 6;
seek $fh, $offset, SEEK_SET;
my %file = get_files_info $fh, $offset;
for my $file_name ( keys %file ) {
write_file $fh, $file_name, $file{$file_name};
}
The only real difficulty here is to make sure that both input and output files are read in binary mode. This can be achieved by using the :raw PerlIO layer when the files are opened.
This program seems to do what you want. It first locates and reads the index block into a string, and then opens that string for input and reads the start and end position and name of each of the constituent files. Thereafter processing each file is simple.
Be aware that unless the formatting of the index block is more strict than you say, you can rely only on the first, second, and last whitespace-separated fields on each line since random text could contain spaces. There is also no way to specify a file name containing spaces.
The output, using Data::Dump, is there to demonstrate correct functionality and is not necessary for the functioning of the program.
use v5.10;
use warnings;
use Fcntl ':seek';
use autodie qw/ open read seek close /;
open my $fh, '<:raw', 'container.bin';
read $fh, my $index_loc, 6;
seek $fh, $index_loc, SEEK_SET;
read $fh, my ($index), 1024;
my %contents;
open my $idx, '<', \$index;
while (<$idx>) {
my #fields = split;
next unless #fields;
$contents{$fields[-1]} = [ $fields[0], $fields[1] ];
}
use Data::Dump;
dd \%contents;
for my $file (keys %contents) {
my ($start, $end) = #{ $contents{$file} };
my $size = $end - $start;
seek $fh, $start, SEEK_SET;
my $nbytes = read $fh, my ($data), $size;
die "Premature EOF" unless $nbytes == $size;
open my $out, '>:raw', $file;
print { $out } $data;
close $out;
}
output
{
"Capture.PNG" => [6, 15149],
"index.php" => [15149, 15168],
"untilted.tga" => [15168, 211794],
}

Why does this corrupt my yaml file?

When I run the below script I get a corrupt yaml file like so
---
1:
name1: abc
name2: abc
---
me3: abc
---
Question
Can anyone see that I am doing wrong?
#!/usr/bin/perl
use strict;
use YAML::Syck;
use Fcntl ':flock', 'SEEK_SET';
use warnings;
use Data::Dumper;
my $acc;
my $acc_fh;
$acc->{1}{name1} = "abc";
unlink 'test.yaml';
# write initial
open F, '>', 'test.yaml';
print F YAML::Syck::Dump($acc);
close F;
($acc, $acc_fh) = read_yaml_with_lock('test.yaml');
$acc->{1}{name2} = "abc";
$acc->{1}{name3} = "abc";
write_yaml_with_lock($acc, $acc_fh);
($acc, $acc_fh) = read_yaml_with_lock('test.yaml');
delete $acc->{1}{name3};
write_yaml_with_lock($acc, $acc_fh);
sub read_yaml_with_lock {
my ($file) = #_;
open my $fh, '+<', $file or die $!;
flock($fh, LOCK_EX) or die $!;
my $obj = YAML::Syck::LoadFile($fh); # this dies on failure
return ($obj, $fh);
}
sub write_yaml_with_lock {
my ($obj, $fh) = #_;
my $yaml = YAML::Syck::Dump($obj);
$YAML::Syck::ImplicitUnicode = 1;
seek $fh, 0, SEEK_SET; # seek back to the beginning of file
print $fh $yaml . "---\n";
close $fh;
}
You write to the same file twice. During the second time the YAML code you're writing is shorter than the first time because you delete that hash key inbetween the calls. However, you neither unlink the file after the first time nor do you truncate it after writing to it the second time. So what you see as corruption is the part of the file that has been written the first time but that hasn't been overwritten the second time.
The "me3" part is what is left of " name3", which gets partially overwritten by "---\n" (4 characters). When you write the first time, you have more data. Then you rewind the file handle position and write a shorter data, which does not overwrite all of the old.
I think your solution "should" be to skip this passing a file handle around and rewinding it and instead use the appropriate open for each subroutine. E.g.:
sub read_yaml {
my $file = shift;
open my $fh, '<', $file or die $!;
...
close $fh;
}
sub write_yaml {
my ($file, $obj) = #_;
open my $fh, '>', $file or die $!;
...
close $fh;
}
Keeping the file handle open in between operations not really that useful or efficient, and it introduces some difficulties.

Read and Write in the same file with different process

I have written the two program. One program is write the content to the text file simultaneously. Another program is read that content simultaneously.
But both the program should run at the same time. For me the program is write the file is correctly. But another program is not read the file.
I know that once the write process is completed than only the data will be stored in the hard disk. Then another process can read the data.
But I want both read and write same time with different process in the single file. How can I do that?
Please help me.
The following code write the content in the file
sub generate_random_string
{
my $length_of_randomstring=shift;# the length of
# the random string to generate
my #chars=('a'..'z','A'..'Z','0'..'9','_');
my $random_string;
foreach (1..$length_of_randomstring)
{
# rand #chars will generate a random
# number between 0 and scalar #chars
$random_string.=$chars[rand #chars];
}
return $random_string;
}
#Generate the random string
open (FH,">>file.txt")or die "Can't Open";
while(1)
{
my $random_string=&generate_random_string(20);
sleep(1);
#print $random_string."\n";
print FH $random_string."\n";
}
The following code is read the content. This is another process
open (FH,"<file.txt") or die "Can't Open";
print "Open the file Successfully\n\n";
while(<FH>)
{
print "$_\n";
}
You might use an elaborate cooperation protocol such as in the following. Both ends, reader and writer, use common code in the TakeTurns module that handles fussy details such as locking and where the lock file lives. The clients need only specify what they want to do when they have exclusive access to the file.
reader
#! /usr/bin/perl
use warnings;
use strict;
use TakeTurns;
my $runs = 0;
reader "file.txt" =>
sub {
my($fh) = #_;
my #lines = <$fh>;
print map "got: $_", #lines;
++$runs <= 10;
};
writer
#! /usr/bin/perl
use warnings;
use strict;
use TakeTurns;
my $n = 10;
my #chars = ('a'..'z','A'..'Z','0'..'9','_');
writer "file.txt" =>
sub { my($fh) = #_;
print $fh join("" => map $chars[rand #chars], 1..$n), "\n"
or warn "$0: print: $!";
};
The TakeTurns module is execute-around at work:
package TakeTurns;
use warnings;
use strict;
use Exporter 'import';
use Fcntl qw/ :DEFAULT :flock /;
our #EXPORT = qw/ reader writer /;
my $LOCKFILE = "/tmp/taketurns.lock";
sub _loop ($&) {
my($path,$action) = #_;
while (1) {
sysopen my $lock, $LOCKFILE, O_RDWR|O_CREAT
or die "sysopen: $!";
flock $lock, LOCK_EX or die "flock: $!";
my $continue = $action->();
close $lock or die "close: $!";
return unless $continue;
sleep 0;
}
}
sub writer {
my($path,$w) = #_;
_loop $path =>
sub {
open my $fh, ">", $path or die "open $path: $!";
my $continue = $w->($fh);
close $fh or die "close $path: $!";
$continue;
};
}
sub reader {
my($path,$r) = #_;
_loop $path =>
sub {
open my $fh, "<", $path or die "open $path: $!";
my $continue = $r->($fh);
close $fh or die "close $path: $!";
$continue;
};
}
1;
Sample output:
got: 1Upem0iSfY
got: qAALqegWS5
got: 88RayL3XZw
got: NRB7POLdu6
got: IfqC8XeWN6
got: mgeA6sNEpY
got: 2TeiF5sDqy
got: S2ksYEkXsJ
got: zToPYkGPJ5
got: 6VXu6ut1Tq
got: ex0wYvp9Y8
Even though you went to so much trouble, there are still issues. The protocol is unreliable, so reader has no guarantee of seeing all messages that writer sends. With no writer active, reader is content to read the same message over and over.
You could add all this, but a more sensible approach would be using abstractions the operating system provides already.
For example, Unix named pipes seem to be a pretty close match to what you want, and note how simple the code is:
pread
#! /usr/bin/perl
use warnings;
use strict;
my $pipe = "/tmp/mypipe";
system "mknod $pipe p 2>/dev/null";
open my $fh, "<", $pipe or die "$0: open $pipe: $!";
while (<$fh>) {
print "got: $_";
sleep 0;
}
pwrite
#! /usr/bin/perl
use warnings;
use strict;
my $pipe = "/tmp/mypipe";
system "mknod $pipe p 2>/dev/null";
open my $fh, ">", $pipe or die "$0: open $pipe: $!";
my $n = 10;
my #chars = ('a'..'z','A'..'Z','0'..'9','_');
while (1) {
print $fh join("" => map $chars[rand #chars], 1..$n), "\n"
or warn "$0: print: $!";
}
Both ends attempt to create the pipe using mknod because they have no other method of synchronization. At least one will fail, but we don't care as long as the pipe exists.
As you can see, all the waiting machinery is handled by the system, so you do what you care about: reading and writing messages.
This works.
The writer:
use IO::File ();
sub generate_random_string {...}; # same as above
my $file_name = 'file.txt';
my $handle = IO::File->new($file_name, 'a');
die "Could not append to $file_name: $!" unless $handle;
$handle->autoflush(1);
while (1) {
$handle->say(generate_random_string(20));
}
The reader:
use IO::File qw();
my $file_name = 'file.txt';
my $handle = IO::File->new($file_name, 'r');
die "Could not read $file_name: $!" unless $handle;
STDOUT->autoflush(1);
while (defined (my $line = $handle->getline)) {
STDOUT->print($line);
}
are you on windows or *nix? you might be able to string something like this together on *nix by using tail to get the output as it is written to the file. On windows you can call CreateFile() with FILE_SHARE_READ and/or FILE_SHARE_WRITE in order to allow others to access the file while you have it opened for read/write. you may have to periodically check to see if the file size has changed in order to know when to read (i'm not 100% certain here.)
another option is a memory mapped file.