Why does this corrupt my yaml file?

Why does this corrupt my yaml file? - perl

When I run the below script I get a corrupt yaml file like so
---
1:
name1: abc
name2: abc
---
me3: abc
---
Question
Can anyone see that I am doing wrong?
#!/usr/bin/perl
use strict;
use YAML::Syck;
use Fcntl ':flock', 'SEEK_SET';
use warnings;
use Data::Dumper;
my $acc;
my $acc_fh;
$acc->{1}{name1} = "abc";
unlink 'test.yaml';
# write initial
open F, '>', 'test.yaml';
print F YAML::Syck::Dump($acc);
close F;
($acc, $acc_fh) = read_yaml_with_lock('test.yaml');
$acc->{1}{name2} = "abc";
$acc->{1}{name3} = "abc";
write_yaml_with_lock($acc, $acc_fh);
($acc, $acc_fh) = read_yaml_with_lock('test.yaml');
delete $acc->{1}{name3};
write_yaml_with_lock($acc, $acc_fh);
sub read_yaml_with_lock {
my ($file) = #_;
open my $fh, '+<', $file or die $!;
flock($fh, LOCK_EX) or die $!;
my $obj = YAML::Syck::LoadFile($fh); # this dies on failure
return ($obj, $fh);
}
sub write_yaml_with_lock {
my ($obj, $fh) = #_;
my $yaml = YAML::Syck::Dump($obj);
$YAML::Syck::ImplicitUnicode = 1;
seek $fh, 0, SEEK_SET; # seek back to the beginning of file
print $fh $yaml . "---\n";
close $fh;
}

You write to the same file twice. During the second time the YAML code you're writing is shorter than the first time because you delete that hash key inbetween the calls. However, you neither unlink the file after the first time nor do you truncate it after writing to it the second time. So what you see as corruption is the part of the file that has been written the first time but that hasn't been overwritten the second time.

The "me3" part is what is left of " name3", which gets partially overwritten by "---\n" (4 characters). When you write the first time, you have more data. Then you rewind the file handle position and write a shorter data, which does not overwrite all of the old.
I think your solution "should" be to skip this passing a file handle around and rewinding it and instead use the appropriate open for each subroutine. E.g.:
sub read_yaml {
my $file = shift;
open my $fh, '<', $file or die $!;
...
close $fh;
}
sub write_yaml {
my ($file, $obj) = #_;
open my $fh, '>', $file or die $!;
...
close $fh;
}
Keeping the file handle open in between operations not really that useful or efficient, and it introduces some difficulties.

Related

is there a way to locally change the input record separator in perl?

Limiting the scope of a variable $x to a particular code chunk or subroutine, by means of my $x, saves a coder from a world of "global variable"-caused confusion.
But when it comes to the input record separator, $/, apparently its scope cannot be limited.
Am I correct in this?
As a consequence, if I forget to reset the input record separator at the end of a loop, or inside a subroutine, the code below my call to the subroutine can give unexpected results.
The following example demonstrates this.
#!/usr/bin/perl
use strict; use warnings;
my $count_records; my $infile = $ARGV[0]; my $HANDLEinfile;
open $HANDLEinfile, '<', $infile or die "cannot open $infile for reading";
$count_records = 0;
while(<$HANDLEinfile>)
{
$count_records++;
print "$count_records:\n";
print;
}
close $HANDLEinfile;
look_through_other_file();
print "\nNOW, after invoking look_through_other_file:\n";
open $HANDLEinfile, '<', $infile or die "cannot open $infile for reading";
$count_records = 0;
while(<$HANDLEinfile>)
{
$count_records++;
print "$count_records:\n";
print;
}
close $HANDLEinfile;
sub look_through_other_file
{
$/ = undef;
# here, look through some other file with a while loop
return;
}
Here is how it behaves on an input file:
> z.pl junk
1:
All work
2:
and
3:
no play
4:
makes Jack a dull boy.
NOW, after invoking look_through_other_file:
1:
All work
and
no play
makes Jack a dull boy.
>
Note that if one tries to change to
my $/ = undef;
inside the subroutine, this generates an error.
Incidentally, among the stackoverflow tags, why is there no tag for "input record separator"?

The answer for the my $/ = undef; question is to change it to local $/ = undef;. Then the revised code is as follows.
#!/usr/bin/perl
use strict; use warnings;
my $count_records; my $infile = $ARGV[0]; my $HANDLEinfile;
open $HANDLEinfile, '<', $infile or die "cannot open $infile for reading";
$count_records = 0;
while(<$HANDLEinfile>)
{
$count_records++;
print "$count_records:\n";
print;
}
close $HANDLEinfile;
look_through_other_file();
print "\nNOW, after invoking look_through_other_file:\n";
open $HANDLEinfile, '<', $infile or die "cannot open $infile for reading";
$count_records = 0;
while(<$HANDLEinfile>)
{
$count_records++;
print "$count_records:\n";
print;
}
close $HANDLEinfile;
sub look_through_other_file
{
local $/ = undef;
# here, look through some other file with a while loop
return;
}
Then there is no need to return the input record separator to another value, or to the default, $/ = "\n";, by hand.

You can use local to temporarily update the value of a global variable, including $/.
sub look_through_other_file {
local $/ = undef;
# here, look through some other file with a while loop
return;
}
will use an undefined $/ as long as the look_through_other_file subroutine is in the call stack.
You may encounter this construction in this common idiom, to slurp the entire contents of a file into a variable without altering the value of $/ for the rest of the program:
open my $fh, "<", "/some/file";
my $o = do { local $/; <$fh> };

Recursive search in Perl?

I'm incredibly new to Perl, and never have been a phenomenal programmer. I have some successful BVA routines for controlling microprocessor functions, but never anything embedded, or multi-facted. Anyway, my question today is about a boggle I cannot get over when trying to figure out how to remove duplicate lines of text from a text file I created.
The file could have several of the same lines of txt in it, not sequentially placed, which is problematic as I'm practically comparing the file to itself, line by line. So, if the first and third lines are the same, I'll write the first line to a new file, not the third. But when I compare the third line, I'll write it again since the first line is "forgotten" by my current code. I'm sure there's a simple way to do this, but I have issue making things simple in code. Here's the code:
my $searchString = pseudo variable "ideally an iterative search through the source file";
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
my $count = "0";
open (FILE, $file2) || die "Can't open cutdown.txt \n";
open (FILE2, ">$file3") || die "Can't open output.txt \n";
while (<FILE>) {
print "$_";
print "$searchString\n";
if (($_ =~ /$searchString/) and ($count == "0")) {
++ $count;
print FILE2 $_;
} else {
print "This isn't working\n";
}
}
close (FILE);
close (FILE2);
Excuse the way filehandles and scalars do not match. It is a work in progress... :)

The secret of checking for uniqueness, is to store the lines you have seen in a hash and only print lines that don't exist in the hash.
Updating your code slightly to use more modern practices (three-arg open(), lexical filehandles) we get this:
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
open my $in_fh, '<', $file2 or die "Can't open cutdown.txt: $!\n";
open my $out_fh, '>', $file3 or die "Can't open output.txt: $!\n";
my %seen;
while (<$in_fh>) {
print $out_fh unless $seen{$_}++;
}
But I would write this as a Unix filter. Read from STDIN and write to STDOUT. That way, your program is more flexible. The whole code becomes:
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
while (<>) {
print unless $seen{$_}++;
}
Assuming this is in a file called my_filter, you would call it as:
$ ./my_filter < /tmp/cutdown.txt > /tmp/output.txt
Update: But this doesn't use your $searchString variable. It's not clear to me what that's for.

If your file is not very large, you can store each line readed from the input file as a key in a hash variable. And then, print the hash keys (ordered). Something like that:
my %lines = ();
my $order = 1;
open my $fhi, "<", $file2 or die "Cannot open file: $!";
while( my $line = <$fhi> ) {
$lines {$line} = $order++;
}
close $fhi;
open my $fho, ">", $file3 or die "Cannot open file: $!";
#Sort the keys, only if needed
my #ordered_lines = sort { $lines{$a} <=> $lines{$b} } keys(%lines);
for my $key( #ordered_lines ) {
print $fho $key;
}
close $fho;

You need two things to do that:
a hash to keep track of all the lines you have seen
a loop reading the input file
This is a simple implementation, called with an input filename and an output filename.
use strict;
use warnings;
open my $fh_in, '<', $ARGV[0] or die "Could not open file '$ARGV[0]': $!";
open my $fh_out, '<', $ARGV[1] or die "Could not open file '$ARGV[1]': $!";
my %seen;
while (my $line = <$fh_in>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $fh_out $line;
}
# remember this line
$seen{$line}++;
}
To test it, I've included it with the DATA handle as well.
use strict;
use warnings;
my %seen;
while (my $line = <DATA>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $line;
}
# remember this line
$seen{$line}++;
}
__DATA__
foo
bar
asdf
foo
foo
asdfg
hello world
This will print
foo
bar
asdf
asdfg
hello world
Keep in mind that the memory consumption will grow with the file size. It should be fine as long as the text file is smaller than your RAM. Perl's hash memory consumption grows a faster than linear, but your data structure is very flat.

Read and Write to a file in perl

this
is just
an example.
Lets assume the above is out.txt. I want to read out.txt and write onto the same file.
<Hi >
<this>
<is just>
<an example.>
Modified out.txt.
I want to add tags in the beginning and end of some lines.
As I will be reading the file several times I cannot keep writing it onto a different file each time.
EDIT 1
I tried using "+<" but its giving an output like this :
Hi
this
is just
an example.
<Hi >
<this>
<is just>
<an example.>
**out.txt**
EDIT 2
Code for reference :
open(my $fh, "+<", "out.txt");# or die "cannot open < C:\Users\daanishs\workspace\CCoverage\out.txt: $!";
while(<$fh>)
{
$s1 = "<";
$s2 = $_;
$s3 = ">";
$str = $s1 . $s2 . $s3;
print $fh "$str";
}

The very idea of what you are trying to do is flawed. The file starts as
H i / t h i s / ...
If you were to change it in place, it would look as follows after processing the first line:
< H i > / i s / ...
Notice how you clobbered "th"? You need to make a copy of the file, modify the copy, the replace the original with the copy.
The simplest way is to make this copy in memory.
my $file;
{ # Read the file
open(my $fh, '<', $qfn)
or die "Can't open \"$qfn\": $!\n";
local $/;
$file = <$fh>;
}
# Change the file
$file =~ s/^(.*)\n/<$1>\n/mg;
{ # Save the changes
open(my $fh, '>', $qfn)
or die "Can't create \"$qfn\": $!\n";
print($fh $file);
}
If you wanted to use the disk instead:
rename($qfn, "$qfn.old")
or die "Can't rename \"$qfn\": $!\n";
open(my $fh_in, '<', "$qfn.old")
or die "Can't open \"$qfn\": $!\n";
open(my $fh_out, '>', $qfn)
or die "Can't create \"$qfn\": $!\n";
while (<$fh_in>) {
chomp;
$_ = "<$_>";
print($fh_out "$_\n");
}
unlink("$qfn.old");
Using a trick, the above can be simplified to
local #ARGV = $qfn;
local $^I = '';
while (<>) {
chomp;
$_ = "<$_>";
print(ARGV "$_\n");
}
Or as a one-liner:
perl -i -pe'$_ = "<$_>"' file

Read contents in memory and then prepare required string as you write to your file. (SEEK_SET to zero't byte is required.
#!/usr/bin/perl
open(INFILE, "+<in.txt");
#a=<INFILE>;
seek INFILE, 0, SEEK_SET ;
foreach $i(#a)
{
chomp $i;
print INFILE "<".$i.">"."\n";
}
If you are worried about amount of data being read in memory, you will have to create a temporary result file and finally copy the result file to original file.

You could use Tie::File for easy random access to the lines in your file:
use Tie::File;
use strict;
use warnings;
my $filename = "out.txt";
my #array;
tie #array, 'Tie::File', $filename or die "can't tie file \"$filename\": $!";
for my $line (#array) {
$line = "<$line>";
# or $line =~ s/^(.*)$/<$1>/g; # -- whatever modifications you need to do
}
untie #array;
Disclaimer: Of course, this option is only viable if the file is not shared with other processes. Otherwise you could use flock to prevent shared access while you modify the file.
Disclaimer-2 (thanks to ikegami): Don't use this solution if you have to edit big files and are concerned about performance. Most of the performance loss is mitigated for small files (less than 2MB, though this is configurable using the memory arg).

One option is to open the file twice: Open it once read-only, read the data, close it, process it, open it again read-write (no append), write the data, and close it. This is good practice because it minimizes the time you have the file open, in case someone else needs it.
If you only want to open it once, then you can use the +< file type - just use the seek call between reading and writing to return to the beginning of the file. Otherwise, you finish reading, are at the end of the file, and start writing there, which is why you get the behavior you're seeing.

Need to specify
use Fcntl qw(SEEK_SET);
in order to use
seek INFILE, 0, SEEK_SET;
Thanks user1703205 for the example.

Why does SEEK_SET fail?

When I run the below script I get
Can't use an undefined value as a symbol reference at ./yaml-test.pl line 52.
where line 52 is
seek $fh, 0, SEEK_SET; # seek back to the beginning of file
The purpose of the script is to reproduce why I get corrupt yaml files, so the global file handles are suppose to be there.
My theory is that newly written yaml files are not overwritten, and therefore if the newly written yaml is smaller than the old, then old data will still be in the new yaml file.
Question
Can anyone see what is wrong with my script?
#!/usr/bin/perl
use strict;
use YAML::Syck;
use Fcntl ':flock', 'SEEK_SET';
use warnings;
use Data::Dumper;
my $acc;
my $acc_fh;
$acc->{1}{name1} = "abc";
system("rm -f test.yaml");
# write initial
open F, '>', 'test.yaml';
print F YAML::Syck::Dump($acc);
close F;
$acc->{1}{name2} = "abc";
write_yaml_with_lock($acc, $acc_fh);
$acc->{1}{name3} = "abc";
($acc, $acc_fh) = read_yaml_with_lock('test.yaml');
$acc->{1}{name4} = "abc";
write_yaml_with_lock($acc, $acc_fh);
sub read_yaml_with_lock {
my ($file) = #_;
open my $fh, '+<', $file or die $!;
flock($fh, LOCK_EX) or die $!;
my $obj = YAML::Syck::LoadFile($fh); # this dies on failure
return ($obj, $fh);
}
sub write_yaml_with_lock {
my ($obj, $fh) = #_;
my $yaml = YAML::Syck::Dump($obj);
$YAML::Syck::ImplicitUnicode = 1;
seek $fh, 0, SEEK_SET; # seek back to the beginning of file
print $fh $yaml . "---\n";
close $fh;
}

You call write_yaml_with_lock() twice. The first time you call it $acc_fh is still undef because it is not set until two lines further down via read_yaml_with_lock().

Read and Write in the same file with different process

I have written the two program. One program is write the content to the text file simultaneously. Another program is read that content simultaneously.
But both the program should run at the same time. For me the program is write the file is correctly. But another program is not read the file.
I know that once the write process is completed than only the data will be stored in the hard disk. Then another process can read the data.
But I want both read and write same time with different process in the single file. How can I do that?
Please help me.
The following code write the content in the file
sub generate_random_string
{
my $length_of_randomstring=shift;# the length of
# the random string to generate
my #chars=('a'..'z','A'..'Z','0'..'9','_');
my $random_string;
foreach (1..$length_of_randomstring)
{
# rand #chars will generate a random
# number between 0 and scalar #chars
$random_string.=$chars[rand #chars];
}
return $random_string;
}
#Generate the random string
open (FH,">>file.txt")or die "Can't Open";
while(1)
{
my $random_string=&generate_random_string(20);
sleep(1);
#print $random_string."\n";
print FH $random_string."\n";
}
The following code is read the content. This is another process
open (FH,"<file.txt") or die "Can't Open";
print "Open the file Successfully\n\n";
while(<FH>)
{
print "$_\n";
}

You might use an elaborate cooperation protocol such as in the following. Both ends, reader and writer, use common code in the TakeTurns module that handles fussy details such as locking and where the lock file lives. The clients need only specify what they want to do when they have exclusive access to the file.
reader
#! /usr/bin/perl
use warnings;
use strict;
use TakeTurns;
my $runs = 0;
reader "file.txt" =>
sub {
my($fh) = #_;
my #lines = <$fh>;
print map "got: $_", #lines;
++$runs <= 10;
};
writer
#! /usr/bin/perl
use warnings;
use strict;
use TakeTurns;
my $n = 10;
my #chars = ('a'..'z','A'..'Z','0'..'9','_');
writer "file.txt" =>
sub { my($fh) = #_;
print $fh join("" => map $chars[rand #chars], 1..$n), "\n"
or warn "$0: print: $!";
};
The TakeTurns module is execute-around at work:
package TakeTurns;
use warnings;
use strict;
use Exporter 'import';
use Fcntl qw/ :DEFAULT :flock /;
our #EXPORT = qw/ reader writer /;
my $LOCKFILE = "/tmp/taketurns.lock";
sub _loop ($&) {
my($path,$action) = #_;
while (1) {
sysopen my $lock, $LOCKFILE, O_RDWR|O_CREAT
or die "sysopen: $!";
flock $lock, LOCK_EX or die "flock: $!";
my $continue = $action->();
close $lock or die "close: $!";
return unless $continue;
sleep 0;
}
}
sub writer {
my($path,$w) = #_;
_loop $path =>
sub {
open my $fh, ">", $path or die "open $path: $!";
my $continue = $w->($fh);
close $fh or die "close $path: $!";
$continue;
};
}
sub reader {
my($path,$r) = #_;
_loop $path =>
sub {
open my $fh, "<", $path or die "open $path: $!";
my $continue = $r->($fh);
close $fh or die "close $path: $!";
$continue;
};
}
1;
Sample output:
got: 1Upem0iSfY
got: qAALqegWS5
got: 88RayL3XZw
got: NRB7POLdu6
got: IfqC8XeWN6
got: mgeA6sNEpY
got: 2TeiF5sDqy
got: S2ksYEkXsJ
got: zToPYkGPJ5
got: 6VXu6ut1Tq
got: ex0wYvp9Y8
Even though you went to so much trouble, there are still issues. The protocol is unreliable, so reader has no guarantee of seeing all messages that writer sends. With no writer active, reader is content to read the same message over and over.
You could add all this, but a more sensible approach would be using abstractions the operating system provides already.
For example, Unix named pipes seem to be a pretty close match to what you want, and note how simple the code is:
pread
#! /usr/bin/perl
use warnings;
use strict;
my $pipe = "/tmp/mypipe";
system "mknod $pipe p 2>/dev/null";
open my $fh, "<", $pipe or die "$0: open $pipe: $!";
while (<$fh>) {
print "got: $_";
sleep 0;
}
pwrite
#! /usr/bin/perl
use warnings;
use strict;
my $pipe = "/tmp/mypipe";
system "mknod $pipe p 2>/dev/null";
open my $fh, ">", $pipe or die "$0: open $pipe: $!";
my $n = 10;
my #chars = ('a'..'z','A'..'Z','0'..'9','_');
while (1) {
print $fh join("" => map $chars[rand #chars], 1..$n), "\n"
or warn "$0: print: $!";
}
Both ends attempt to create the pipe using mknod because they have no other method of synchronization. At least one will fail, but we don't care as long as the pipe exists.
As you can see, all the waiting machinery is handled by the system, so you do what you care about: reading and writing messages.

This works.
The writer:
use IO::File ();
sub generate_random_string {...}; # same as above
my $file_name = 'file.txt';
my $handle = IO::File->new($file_name, 'a');
die "Could not append to $file_name: $!" unless $handle;
$handle->autoflush(1);
while (1) {
$handle->say(generate_random_string(20));
}
The reader:
use IO::File qw();
my $file_name = 'file.txt';
my $handle = IO::File->new($file_name, 'r');
die "Could not read $file_name: $!" unless $handle;
STDOUT->autoflush(1);
while (defined (my $line = $handle->getline)) {
STDOUT->print($line);
}

are you on windows or *nix? you might be able to string something like this together on *nix by using tail to get the output as it is written to the file. On windows you can call CreateFile() with FILE_SHARE_READ and/or FILE_SHARE_WRITE in order to allow others to access the file while you have it opened for read/write. you may have to periodically check to see if the file size has changed in order to know when to read (i'm not 100% certain here.)
another option is a memory mapped file.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Why does this corrupt my yaml file? - perl

Related

is there a way to locally change the input record separator in perl?

Recursive search in Perl?

Read and Write to a file in perl

Why does SEEK_SET fail?

Read and Write in the same file with different process

Categories

Resources