perl Digest add addfile computes different SHA1 digest

perl Digest add addfile computes different SHA1 digest - perl

perl Digest module computes different SHA1 digest for add and addfile functions.
I have created binary random data using /dev/urandom
running on ubuntu
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.1 LTS
Release: 12.04
Codename: precise
$ perl -v
This is perl 5, version 14, subversion 2 (v5.14.2) built for i686-linux-gnu-thread-multi-64int
output from the script
$ perl t.pl sha1 a.tmp
doesntwork da39a3ee5e6b4b0d3255bfef95601890afd80709
works ee49451434cffe001a568090c86f16f076677af5
$ openssl dgst -sha1 a.tmp
SHA1(a.tmp)= ee49451434cffe001a568090c86f16f076677af5
following in my code
use strict;
use warnings;
use Switch;
use Digest;
sub doesntwork {
my ($datafile, $hashfun) = #_;
open(my $fh, "<", $datafile ) or die "error: Can't open '$datafile', $!\n";
binmode($fh);
read($fh, my $data, -s $datafile);
close($fh);
$hashfun->add($data);
my $hashval = $hashfun->digest();
return unpack('H*', $hashval);
}
sub works {
my ($datafile, $hashfun) = #_;
open(my $fh, "<", $datafile ) or die "error: Can't open '$datafile', $!\n";
binmode($fh);
$hashfun->addfile($fh);
my $hashval = $hashfun->digest();
close($fh);
return unpack('H*', $hashval);
}
###############################################################################
(#ARGV >= 2) or die "usage: perl $0 algo datafile\n";
my ($algo, $datafile) = #ARGV;
my $hashfun;
switch($algo) {
case "md5" {$hashfun = Digest->new("MD5" );}
case "sha1" {$hashfun = Digest->new("SHA-1" );}
case "sha256" {$hashfun = Digest->new("SHA-256");}
case "sha512" {$hashfun = Digest->new("SHA-512");}
else {die "error: invalid algorithm '$algo'\n"}
}
print "doesntwork\t", doesntwork( $datafile, $hashfun ), "\n";
print "works \t", works ( $datafile, $hashfun ), "\n";
I would like add function to work, as I want to compute it on buffered data, not from file data. Possible add treats data as text, while for addfile, binmod on file handle makes it use binary data, if so how can I make add to treat buffer as binary data.
Edited post to print size of the data read--
$ stat -c "%n %s" a.tmp
a.tmp 671088640
$ openssl dgst -sha1 a.tmp
SHA1(a.tmp)= 7dfcced1b0c8864e6a20b2daa63de7ffc1cd7a26
#### Works
$ perl -W -MDigest -e 'open(my $fh, "<", "a.tmp") or die "cant open $!\n";
> binmode($fh);
> my $hf = Digest->new("SHA-1");
> $hf->addfile($fh);
> print unpack("H*", $hf->digest()),"\n";
> close($fh);'
7dfcced1b0c8864e6a20b2daa63de7ffc1cd7a26
#### Doesnt Work
$ perl -W -MDigest -e 'open(my $fh, "<", "a.tmp") or die "cant open $!\n";
> binmode($fh);
> read($fh, my $data, -s "a.tmp") or die "cant read $!\n";
> close($fh);
> printf("## data.length=%d,file.length=%d\n",length($data),-s "a.tmp");
> length($data)==(-s "a.tmp") or die "couldnt read all the data";
> my $hf = Digest->new("SHA-1");
> $hf->add($data);
> print unpack("H*", $hf->digest()),"\n";'
## data.length=671088640,file.length=671088640
9eecafd368a50fb240e0388e3c84c0c94bd6cc2a
Also tried according to Fred's answer
$ perl -W -MDigest -e '
> open(my $fh, "<", "a.tmp") or die "cant open $!\n";
> binmode($fh);
> my $size = -s "a.tmp";
> my $got = read($fh, my $data, $size) or die "cant read $!\n";
> print "##read $got bytes, size=$size\n";
> my $done = $size - $got;
> print "done=$done, size=$size, got=$got\n";
> until(!$done) {
> $got = read($fh, my $newdata, $done);
> $done -= $got ;
> $data .= $newdata;
> print "##read1 $got bytes, size=$size, done=$done\n";
> }
> close($fh);
> printf("## data.length=%d,file.length=%d\n",length($data),-s "a.tmp");
> length($data)==(-s "a.tmp") or die "couldnt read all the data";
> my $hf = Digest->new("SHA-1");
> $hf->add($data);
> print unpack("H*", $hf->digest()),"\n";'
##read 671088640 bytes, size=671088640
done=0, size=671088640, got=671088640
## data.length=671088640,file.length=671088640
9eecafd368a50fb240e0388e3c84c0c94bd6cc2a

You have yet to provide data that produces the problem, but I cannot replicate your problems using the Perl script as the input.
Here's the definition of addfile:
sub addfile {
my ($self, $handle) = #_;
my $n;
my $buf = "";
while (($n = read($handle, $buf, 4*1024))) {
$self->add($buf);
}
unless (defined $n) {
require Carp;
Carp::croak("Read failed: $!");
}
$self;
}
Your claim that addfile works and add doesn't makes much sense. I suppose there could be a bug in the module when it comes to handling long strings, but it's far more likely that you're passing different inputs to the module.

You need to test the return value from read. There is no guarantee that you have read the full contents of the file.
read in perl is generally implemented as a call to underlying system call fread. When you use low level reads like this you must test the return value
to see if you got as much as you asked for.
$size = -s $datafile ;
$got = read($fh, my $data, $size);
$done = $size - $got ;
until ( $done ) {
$got = read($fh, my $newdata, $done );
$done -= $got ;
$data .= $mydata ;
}
That's just off the top of my head and probably has a glaring fencepost error. This is why I avoid using read whenever possible. See, http://perltricks.com/article/21/2013/4/21/Read-an-entire-file-into-a-string for some less painful ways to do this.

Related

Split large csv file into multiple files based on column(s)

I would like to know of a fast/efficient way in any program (awk/perl/python) to split a csv file (say 10k columns) into multiple small files each containing 2 columns. I would be doing this on a unix machine.
#contents of large_file.csv
1,2,3,4,5,6,7,8
a,b,c,d,e,f,g,h
q,w,e,r,t,y,u,i
a,s,d,f,g,h,j,k
z,x,c,v,b,n,m,z
I now want multiple files like this:
# contents of 1.csv
1,2
a,b
q,w
a,s
z,x
# contents of 2.csv
1,3
a,c
q,e
a,d
z,c
# contents of 3.csv
1,4
a,d
q,r
a,f
z,v
and so on...
I can do this currently with awk on small files (say 30 columns) like this:
awk -F, 'BEGIN{OFS=",";} {for (i=1; i < NF; i++) print $1, $(i+1) > i ".csv"}' large_file.csv
The above takes a very long time with large files and I was wondering if there is a faster and more efficient way of doing the same.
Thanks in advance.

The main hold up here is in writing so many files.
Here is one way
use warnings;
use strict;
use feature 'say';
my $file = shift // die "Usage: $0 csv-file\n";
my #lines = do { local #ARGV = $file; <> };
chomp #lines;
my #fhs = map {
open my $fh, '>', "f${_}.csv" or die $!;
$fh
}
1 .. scalar( split /,/, $lines[0] );
for (#lines) {
my ($first, #cols) = split /,/;
say {$fhs[$_]} join(',', $first, $cols[$_])
for 0..$#cols;
}
I didn't time this against any other approaches. Assembling data for each file first and then dumping it in one operation into each file may help, but first let us know how large the original CSV file is.
Opening so many output files at once (for #fhs filehandles) may pose problems. If that is the case then the simplest way is to first assemble all data and then open and write a file at a time
use warnings;
use strict;
use feature 'say';
my $file = shift // die "Usage: $0 csv-file\n";
open my $fh, '<', $file or die "Can't open $file: $!";
my #data;
while (<$fh>) {
chomp;
my ($first, #cols) = split /,/;
push #{$data[$_]}, join(',', $first, $cols[$_])
for 0..$#cols;
}
for my $i (0..$#data) {
open my $fh, '>', $i+1 . '.csv' or die $!;
say $fh $_ for #{$data[$i]};
}
This depends on whether the entire original CSV file, plus a bit more, can be held in memory.

With your show samples, attempts; please try following awk code. Since you are opening files all together it may fail with infamous "too many files opened error" So to avoid that have all values into an array and in END block of this awk code print them one by one and I am closing them ASAP all contents are getting printed to output file.
awk '
BEGIN{ FS=OFS="," }
{
for(i=1;i<NF;i++){
value[i]=(value[i]?value[i] ORS:"") ($1 OFS $(i+1))
}
}
END{
for(i=1;i<=NF;i++){
outFile=i".csv"
print value[i] > (outFile)
close(outFile)
}
}
' large_file.csv

I needed the same functionality and wrote it in bash.
Not sure if it will be faster than ravindersingh13's answer, but I hope it will help someone.
Actual version: https://github.com/pgrabarczyk/csv-file-splitter
#!/usr/bin/env bash
set -eu
SOURCE_CSV_PATH="${1}"
LINES_PER_FILE="${2}"
DEST_PREFIX_NAME="${3}"
DEBUG="${4:-0}"
split_files() {
local source_csv_path="${1}"
local lines_per_file="${2}"
local dest_prefix_name="${3}"
local debug="${4}"
_print_log "source_csv_path: ${source_csv_path}"
local dest_prefix_path="$(pwd)/output/${dest_prefix_name}"
_print_log "dest_prefix_path: ${dest_prefix_path}"
local headline=$(awk "NR==1" "${source_csv_path}")
local file_no=0
mkdir -p "$(dirname ${dest_prefix_path})"
local lines_in_files=$(wc -l "${source_csv_path}" | awk '{print $1}')
local files_to_create=$(((lines_in_files-1)/lines_per_file))
_print_log "There is ${lines_in_files} lines in file. I will create ${files_to_create} files per ${lines_per_file} (Last file may have less)"
_print_log "Start processing."
for (( start_line=1; start_line<=lines_in_files; )); do
last_line=$((start_line+lines_per_file))
file_no=$((file_no+1))
local file_path="${dest_prefix_path}$(printf "%06d" ${file_no}).csv"
if [ $debug -eq 1 ]; then
_print_log "Creating file ${file_path} with lines [${start_line};${last_line}]"
fi
echo "${headline}" > "${file_path}"
awk "NR>${start_line} && NR<=${last_line}" "${source_csv_path}" >> "${file_path}"
start_line=$last_line
done
_print_log "Done."
}
_print_log() {
local log_message="${1}"
local date_time=$(date "+%Y-%m-%d %H:%M:%S.%3N")
printf "%s - %s\n" "${date_time}" "${log_message}" >&2
}
split_files "${SOURCE_CSV_PATH}" "${LINES_PER_FILE}" "${DEST_PREFIX_NAME}" "${DEBUG}"
Execution:
bash csv-file-splitter.sh "sample.csv" 3 "result_" 1

Tried a solution using the module Text::CSV.
#! /usr/bin/env perl
use warnings;
use strict;
use utf8;
use open qw<:std :encoding(utf-8)>;
use autodie;
use feature qw<say>;
use Text::CSV;
my %hsh = ();
my $csv = Text::CSV->new({ sep_char => ',' });
print "Enter filename: ";
chomp(my $filename = <STDIN>);
open (my $ifile, '<', $filename);
while (<$ifile>) {
chomp;
if ($csv->parse($_)) {
my #fields = $csv->fields();
my $first = shift #fields;
while (my ($i, $v) = each #fields) {
push #{$hsh{($i + 1).".csv"}}, "$first,$v";
}
} else {
die "Line could not be parsed: $_\n";
}
}
close($ifile);
while (my ($k, $v) = each %hsh) {
open(my $ifile, '>', $k);
say {$ifile} $_ for #$v;
close($ifile);
}
exit(0);

Perl IPC::Run pipeline blocks with input file larger than 64KiB

A Perl program uses IPC::Run to pipe a file through a series of commands determined at runtime and into another file, like this small test excerpt demonstrates:
#!/usr/bin/perl
use IO::File;
use IPC::Run qw(run);
open (my $in, 'test.txt');
my $out = IO::File->new_tmpfile;
my #args = ( [ split / /, shift ], "<", $in); # this code
while ($#ARGV >= 0) { # extracted
push #args, "|", [ split / /, shift ]; # verbatim
} # from the
push #args, ">pipe", $out; # program
print "Running...";
run #args or die "command failed ($?)";
print "Done\n";
It builds the pipeline from commands given as arguments, the test file is hard-coded. The problem is that the pipeline hangs if the file is bigger than 64KiB. Here is a demonstration that uses cat in the pipeline to keep things simple. First a 64KiB (65536 bytes) file works as expected:
$ dd if=/dev/urandom of=test.txt bs=1 count=65536
65536 bytes (66 kB, 64 KiB) copied, 0.16437 s, 399 kB/s
$ ./test.pl cat
Running...Done
Next, one byte more. The call to run never returns...
$ dd if=/dev/urandom of=test.txt bs=1 count=65537
65537 bytes (66 kB, 64 KiB) copied, 0.151517 s, 433 kB/s
$ ./test.pl cat
Running...
With IPCRUNDEBUG enabled, plus a few more cats you can see it's the last child that doesn't end:
$ IPCRUNDEBUG=basic ./test.pl cat cat cat cat
Running...
...
IPC::Run 0000 [#1(3543608)]: kid 1 (3543609) exited
IPC::Run 0000 [#1(3543608)]: 3543609 returned 0
IPC::Run 0000 [#1(3543608)]: kid 2 (3543610) exited
IPC::Run 0000 [#1(3543608)]: 3543610 returned 0
IPC::Run 0000 [#1(3543608)]: kid 3 (3543611) exited
IPC::Run 0000 [#1(3543608)]: 3543611 returned 0
(with a file under 64KiB you see all four exit normally)
How can this be made to work for files of any size ?
(Perl 5, version 30, subversion 3 (v5.30.3) built for x86_64-linux-thread-multi, tried on Alpine Linux, the target platform, and Arch Linux to rule out Alpine as a cause)

You have a deadlock:
Consider using one of the following instead:
run [ 'cat' ], '<', $in_fh, '>', \my $captured;
# Do something with the captured output in $captured.
or
my $receiver = sub {
# Do something with the chunk in $_[0].
};
run [ 'cat' ], '<', $in_fh, '>', $receiver;
For example, the following "receiver" processes each line as they come in:
my $buffer = '';
my $receiver = sub {
$buffer .= $_[0];
while ($buffer =~ s/^(.*)\n//) {
process_line("$1");
}
};
run [ 'cat' ], '<', $in_fh, '>', $receiver;
die("Received partial line") if length($buffer);

Here is an example that does not deadlock but still uses the >pipe output handle. I would not recommend using this complicated approach for your use case, instead consider the approach suggested by #ikegami.
The problem is that the >pipe handle is never read from. cat tries to write to the >pipe handle but it gets filled up (since no one reads from it) and the cat process blocks when the pipe content reaches 64 KiB which is the capacity of a pipe on Linux. Now the IPC::Run::finish() process is waiting for the child cat process to exit, but at the same time the cat process is waiting for the parent to read from its pipe so we have a deadlock situation.
To avoid this situation, we can use IPC::Run::start() instead of IPC::Run::run():
use feature qw(say);
use strict;
use warnings;
use constant READ_BUF_SIZE => 8192;
use Errno qw( EAGAIN );
use IO::Select;
use IPC::Run qw();
use Symbol 'gensym';
my $outfile = 'out.txt';
open (my $out, '>', $outfile) or die "Could not open file '$outfile': $!";
my $h = IPC::Run::start ['cat'], '<', 'test.txt', '>pipe', my $pipeout = gensym;
my $select = IO::Select->new( $pipeout );
my $data = '';
my $read_offset = 0;
while (1) {
my #ready = $select->can_read;
last if !#ready;
for my $fh (#ready) {
my $bytes_read = sysread $fh, $data, READ_BUF_SIZE, $read_offset;
say "Read $bytes_read bytes..";
if ( !defined $bytes_read ) {
die "sysread failed: $!" if $! != EAGAIN;
$bytes_read = 0;
}
elsif ( $bytes_read == 0 ) {
say "Removing pipe handle from select loop";
$select->remove( $fh );
close $fh;
}
$read_offset += $bytes_read;
}
}
say "Saving data to file..";
print $out $data; #Save data to file
close $out;
say "Finishing harness..";
IPC::Run::finish $h or die "cat returned $?";
say "Done.";

Comparing FILE1 value to FILE2 range and printing matches

I'm very new to Perl and am working on a Bioinformatics project at University. I have FILE1 containing a list of positions, in the format:
99269
550
100
126477
1700
And FILE2 in the format:
517 1878 forward
700 2500 forward
2156 3289 forward
99000 100000 forward
22000 23000 backward
I want to compare every position in FILE1 to every range in values on FILE2, and if a position falls into one of the ranges then I want to print the position, range and direction.
So my expected output would be:
99269 99000 100000 forward
550 517 1878 forward
1700 517 1878 forward
Currently it will run with no errors, however it doesn't output any information so I am unsure where I am going wrong! When I split the final 'if' rule it runs but will only work if the position is on exactly the same line as the range.
My code is as follows:
#!/usr/bin/perl
use strict;
use warnings;
my $outputfile = "/Users/edwardtickle/Documents/CC22CDS.txt";
open FILE1, "/Users/edwardtickle/Documents/CC22positions.txt"
or die "cannot open > CC22: $!";
open FILE2, "/Users/edwardtickle/Documents/CDSpositions.txt"
or die "cannot open > CDS: $!";
open( OUTPUTFILE, ">$outputfile" ) or die "Could not open output file: $! \n";
while (<FILE1>) {
if (/^(\d+)/) {
my $CC22 = $1;
while (<FILE2>) {
if (/^(\d+)\s+(\d+)\s+(\S+)/) {
my $CDS1 = $1;
my $CDS2 = $2;
my $CDS3 = $3;
if ( $CC22 > $CDS1 && $CC22 < $CDS2 ) {
print OUTPUTFILE "$CC22 $CDS1 $CDS2 $CDS3\n";
}
}
}
}
}
close(FILE1);
close(FILE2);
I have posted the same question on Perlmonks.

Because you are only reading FILE2 once it is only compared with the first line of FILE1
Subsequent lines are compared with the closed file
Stash the lines from FILE1 in an array and then compare each line in FILE2 with each array entry, as shown below
#!/usr/bin/perl
use strict;
use warnings;
my $outputfile = "out.txt";
open FILE1, "file1.txt"
or die "cannot open > CC22: $!";
open FILE2, "file2.txt"
or die "cannot open > CDS: $!";
open( OUTPUTFILE, ">$outputfile" ) or die "Could not open output file: $! \n";
my #file1list = ();
while (<FILE1>) {
if (/^(\d+)/) {
push #file1list, $1;
}
}
while (<FILE2>) {
if (/^(\d+)\s+(\d+)\s+(\S+)/) {
my $CDS1 = $1;
my $CDS2 = $2;
my $CDS3 = $3;
for my $CC22 (#file1list) {
if ( $CC22 > $CDS1 && $CC22 < $CDS2 ) {
print OUTPUTFILE "$CC22 $CDS1 $CDS2 $CDS3\n";
}
}
}
}
( there are also stylistic issues with the program (like capital letters for variables) but I've ignored these, it's quite a nice program for a beginner)

I thought I could simplify some of that by using split instead of regex, but I think my code is actually longer and more difficult to read! In any event, remember that split works great for problems like this:
# User config area
my $positions_file = 'input_positions.txt';
my $ranges_file = 'input_ranges.txt';
my $output_file = 'output_data.txt';
# Reading data
open my $positions_fh, "<", $positions_file;
open my $ranges_fh, "<", $ranges_file;
chomp( my #positions = <$positions_fh> );
# Store the range data in an array containing hash tables
my #range_data;
# to be used like $range_data[0] = {start => $start, end => $end, dir => $dir}
while (<$ranges_fh>) {
chomp;
my ( $start, $end, $dir ) = split; #splits $_ according to whitespace
push #range_data, { start => $start, end => $end, dir => $dir };
#print "start: $start, end: $end, direction: $dir\n";
} #/while
close $positions_fh;
close $ranges_fh;
# Data processing:
open my $output_fh, ">", $output_file;
#It feels like it should be more efficient to process one range at a time for all data points
foreach my $range (#range_data) { #start one range at a time
#each $range = $range_data[#] = { hash table }
foreach my $position (#positions) { #check all positions
if ( ( $range->{start} <= $position ) and ( $position <= $range->{end} ) ) {
my $output_string = "$position " . $range->{start} . " " . $range->{end} . " " . $range->{dir} . "\n";
print $output_fh $output_string;
} #/if
} #/foreach position
} #/foreach range
close $output_fh;
This code would probably run faster if the data processing was done during the while loop that's reading the range data.

Your bug was because you were embedding file processing, so your inner loop only went through the file's contents a single time and then was stuck at eof.
The easiest solution is just to load the inner loop file entirely into memory first.
The following demonstrates using more Modern Perl techniques:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $cc22file = "/Users/edwardtickle/Documents/CC22positions.txt";
my $cdsfile = "/Users/edwardtickle/Documents/CDSpositions.txt";
my $outfile = "/Users/edwardtickle/Documents/CC22CDS.txt";
my #ranges = do {
# open my $fh, '<', $cdsfile; # Using Fake Data instead below
open my $fh, '<', \ "517 1878 forward\n700 2500 forward\n2156 3289 forward\n99000 100000 forward\n22000 23000 backward\n";
map {[split]} <$fh>;
};
# open my $infh, '<', $cc22file; # Using Fake Data instead below
open my $infh, '<', \ "99269\n550\n100\n126477\n1700\n";
# open my $outfh, '>', $outfile; # Using STDOUT instead below
my $outfh = \*STDOUT;
CC22:
while (my $cc22 = <$infh>) {
chomp $cc22;
for my $cds (#ranges) {
if ($cc22 > $cds->[0] && $cc22 < $cds->[1]) {
print $outfh "$cc22 #$cds\n";
next CC22;
}
}
# warn "$cc22 No match found\n";
}
Outputs:
99269 99000 100000 forward
550 517 1878 forward
1700 517 1878 forward
Live Demo

printing the entire row with particular given last column criteria

Input:
orf00007 PHAGE_Prochl_MED4_213_NC_020845-gi|472340344|ref|YP_007673870.1| 7665 8618 0.210897481636936
orf00007 PHAGE_Prochl_MED4_213_NC_020845-gi|472340344|ref|YP_007673870.1| 7665 8618 0.210897481636936
orf00007 PHAGE_Prochl_P_HM2_NC_015284-gi|326783200|ref|YP_004323597.1| 7665 8618 0.207761175236097
orf00015 PHAGE_Megavi_lba_NC_020232-gi|448825467|ref|YP_007418398.1| 11594 13510 0.278721920668058
orf00015 PHAGE_Acanth_moumouvirus_NC_020104-gi|441432357|ref|YP_007354399.1| 11594 13510 0.278721920668058
The script I had implemented:
use feature qw/say/;
use Math::Trig;
open FILE,"out02.txt";
my #file=<FILE>;
close FILE;
my $aa=0;
for(my $i=$aa; $i <=17822; $i++){
if (($file[$i]>=0.210)){
open(OUTFILE,'>>OUT_t10-t10.txt');
print OUTFILE $file[$i];
}
else{}
}
Note:
I need to take the last column as the analysing criteria to print the entire row(the float value, eg:0.210897481636936)
for example if the user input value is '0.210',it has to produce >= values rows ,the expected output is
Output:
orf00007 PHAGE_Prochl_MED4_213_NC_020845-gi|472340344|ref|YP_007673870.1| 7665 8618 0.210897481636936
orf00007 PHAGE_Prochl_MED4_213_NC_020845-gi|472340344|ref|YP_007673870.1| 7665 8618 0.210897481636936
orf00015 PHAGE_Megavi_lba_NC_020232-gi|448825467|ref|YP_007418398.1| 11594 13510 0.278721920668058
orf00015 PHAGE_Acanth_moumouvirus_NC_020104-gi|441432357|ref|YP_007354399.1| 11594 13510 0.278721920668058

A script like the following could work for you:
use strict;
use warnings;
use autodie;
die "Usage: $0 number file\n"
if #ARGV != 2;
my $minval = shift;
while (<>) {
my #cols = split;
print if $col[-1] >= $minval;
}
And execute it like:
perl yourscript.pl 0.210 out02.txt >> OUT_t10-t10.txt
Or using a perl one-liner:
perl -lane 'print if $F[-1] >= 0.210' out02.txt >> OUT_t10-t10.txt

Using awk:
awk -v value=0.210 '$NF >= value' file
Or
awk -v value=0.210 '$NF >= value' file > output_file

This scripts (inspired on yours) solves the problem:
use strict;
use warnings;
my $user_filter = 0.21;
open my $input_file, "<", "out02.txt" or die $!; # Modern way of open files
open my $output_file, ">>", "OUT_t10-t10.txtt" or die $!;
while( my $line=<$input_file> ) {
if( $line =~ / ([\d\.]+)\s*$/ ) {
# If a number was found at the end of line
if( $1 > $user_filter ) { # Check condition
print $output_file $line; #Write to output file
}
}
}
close $input_file;
close $output_file;

Read and Write in the same file with different process

I have written the two program. One program is write the content to the text file simultaneously. Another program is read that content simultaneously.
But both the program should run at the same time. For me the program is write the file is correctly. But another program is not read the file.
I know that once the write process is completed than only the data will be stored in the hard disk. Then another process can read the data.
But I want both read and write same time with different process in the single file. How can I do that?
Please help me.
The following code write the content in the file
sub generate_random_string
{
my $length_of_randomstring=shift;# the length of
# the random string to generate
my #chars=('a'..'z','A'..'Z','0'..'9','_');
my $random_string;
foreach (1..$length_of_randomstring)
{
# rand #chars will generate a random
# number between 0 and scalar #chars
$random_string.=$chars[rand #chars];
}
return $random_string;
}
#Generate the random string
open (FH,">>file.txt")or die "Can't Open";
while(1)
{
my $random_string=&generate_random_string(20);
sleep(1);
#print $random_string."\n";
print FH $random_string."\n";
}
The following code is read the content. This is another process
open (FH,"<file.txt") or die "Can't Open";
print "Open the file Successfully\n\n";
while(<FH>)
{
print "$_\n";
}

You might use an elaborate cooperation protocol such as in the following. Both ends, reader and writer, use common code in the TakeTurns module that handles fussy details such as locking and where the lock file lives. The clients need only specify what they want to do when they have exclusive access to the file.
reader
#! /usr/bin/perl
use warnings;
use strict;
use TakeTurns;
my $runs = 0;
reader "file.txt" =>
sub {
my($fh) = #_;
my #lines = <$fh>;
print map "got: $_", #lines;
++$runs <= 10;
};
writer
#! /usr/bin/perl
use warnings;
use strict;
use TakeTurns;
my $n = 10;
my #chars = ('a'..'z','A'..'Z','0'..'9','_');
writer "file.txt" =>
sub { my($fh) = #_;
print $fh join("" => map $chars[rand #chars], 1..$n), "\n"
or warn "$0: print: $!";
};
The TakeTurns module is execute-around at work:
package TakeTurns;
use warnings;
use strict;
use Exporter 'import';
use Fcntl qw/ :DEFAULT :flock /;
our #EXPORT = qw/ reader writer /;
my $LOCKFILE = "/tmp/taketurns.lock";
sub _loop ($&) {
my($path,$action) = #_;
while (1) {
sysopen my $lock, $LOCKFILE, O_RDWR|O_CREAT
or die "sysopen: $!";
flock $lock, LOCK_EX or die "flock: $!";
my $continue = $action->();
close $lock or die "close: $!";
return unless $continue;
sleep 0;
}
}
sub writer {
my($path,$w) = #_;
_loop $path =>
sub {
open my $fh, ">", $path or die "open $path: $!";
my $continue = $w->($fh);
close $fh or die "close $path: $!";
$continue;
};
}
sub reader {
my($path,$r) = #_;
_loop $path =>
sub {
open my $fh, "<", $path or die "open $path: $!";
my $continue = $r->($fh);
close $fh or die "close $path: $!";
$continue;
};
}
1;
Sample output:
got: 1Upem0iSfY
got: qAALqegWS5
got: 88RayL3XZw
got: NRB7POLdu6
got: IfqC8XeWN6
got: mgeA6sNEpY
got: 2TeiF5sDqy
got: S2ksYEkXsJ
got: zToPYkGPJ5
got: 6VXu6ut1Tq
got: ex0wYvp9Y8
Even though you went to so much trouble, there are still issues. The protocol is unreliable, so reader has no guarantee of seeing all messages that writer sends. With no writer active, reader is content to read the same message over and over.
You could add all this, but a more sensible approach would be using abstractions the operating system provides already.
For example, Unix named pipes seem to be a pretty close match to what you want, and note how simple the code is:
pread
#! /usr/bin/perl
use warnings;
use strict;
my $pipe = "/tmp/mypipe";
system "mknod $pipe p 2>/dev/null";
open my $fh, "<", $pipe or die "$0: open $pipe: $!";
while (<$fh>) {
print "got: $_";
sleep 0;
}
pwrite
#! /usr/bin/perl
use warnings;
use strict;
my $pipe = "/tmp/mypipe";
system "mknod $pipe p 2>/dev/null";
open my $fh, ">", $pipe or die "$0: open $pipe: $!";
my $n = 10;
my #chars = ('a'..'z','A'..'Z','0'..'9','_');
while (1) {
print $fh join("" => map $chars[rand #chars], 1..$n), "\n"
or warn "$0: print: $!";
}
Both ends attempt to create the pipe using mknod because they have no other method of synchronization. At least one will fail, but we don't care as long as the pipe exists.
As you can see, all the waiting machinery is handled by the system, so you do what you care about: reading and writing messages.

This works.
The writer:
use IO::File ();
sub generate_random_string {...}; # same as above
my $file_name = 'file.txt';
my $handle = IO::File->new($file_name, 'a');
die "Could not append to $file_name: $!" unless $handle;
$handle->autoflush(1);
while (1) {
$handle->say(generate_random_string(20));
}
The reader:
use IO::File qw();
my $file_name = 'file.txt';
my $handle = IO::File->new($file_name, 'r');
die "Could not read $file_name: $!" unless $handle;
STDOUT->autoflush(1);
while (defined (my $line = $handle->getline)) {
STDOUT->print($line);
}

are you on windows or *nix? you might be able to string something like this together on *nix by using tail to get the output as it is written to the file. On windows you can call CreateFile() with FILE_SHARE_READ and/or FILE_SHARE_WRITE in order to allow others to access the file while you have it opened for read/write. you may have to periodically check to see if the file size has changed in order to know when to read (i'm not 100% certain here.)
another option is a memory mapped file.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

perl Digest add addfile computes different SHA1 digest - perl

Related

Split large csv file into multiple files based on column(s)

Perl IPC::Run pipeline blocks with input file larger than 64KiB

Comparing FILE1 value to FILE2 range and printing matches

printing the entire row with particular given last column criteria

Read and Write in the same file with different process

Categories

Resources