Perl Net::Server Log Buffer cuts off at 4096 characters - perl

I am using Perl Net::Server and using the built in log method like
$self->log( 1, lc( $json->encode($callInfo) ) );
The issue that I am having is sometimes the data in $callInfo is larger than 4096 characters and it writes 4096 characters to the log file and then a child writes to the log with it's $callInfo and then the rest of the original $callInfo gets logged.
Example:
Assuming abcdef is over 4096 characters.
callinfo1 -> child process tries to write 'abcdef' to log where 'abc' would be written and interrupted by the next child process
callinfo2 -> another child process writes to the log and then the remaining data 'def' from callinfo1 would get written.
I have tried adding the following and change the buffer size to 8192 but the issue remains.
sub post_configure {
94 my $self = shift;
95 my $prop = $self->{server};
96 $prop->{log_level} = 1;
97
98 if( $prop->{log_file} ){
99 local $/ = 8192;
100 open(_SERVER_LOG, ">>$prop->{log_file}") or die "Couldn't open log file \"$prop->{log_file}\" [$!].";
101 _SERVER_LOG->autoflush(1);
102 #open(our $logHandler, '>>', $prop->{log_file});
103 #$logHandler->autoflush(1);
104 $prop->{chown_log_file} = 1;
105 }
106 }
107
108 sub log {
109 my $self = shift;
110 my $prop = $self->{server};
111 my $level = shift;
112 $self->write_to_log_hook($level,#_);
113 }
114
115
116 sub write_to_log_hook {
117 my $self = shift;
118 my $prop = $self->{server};
119 my $level = shift;
120 local $_ = shift || '';
121 chomp;
122 s/([^\n\ -\~])/sprintf("%%%02X",ord($1))/eg;
123
124 if( $prop->{log_file} ){
125 #if(substr($_, 0, 1) eq '{')
126 #{
127 print _SERVER_LOG $_, "\n";
128 #print $logHandler $_, "\n";
129 #}
130 }
131 }
Any ideas on how to get the log buffer to finish before another child process logs?
Thanks in advance.

Perl doesn't send everything to OS at once, even with autoflush, so it's possible for a print to be interrupted by other processes.
$ strace perl -e'STDOUT->autoflush; print "x" x 9999' 2>&1 >/dev/null | grep write
write(1, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192
write(1, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 1807) = 1807
That said, the OS only guarantees that writes under a certain size are atomic, so large prints could still be interrupted even if Perl sent everything to the OS at once.
This means it's up to the processes to synchronize themselves using some form of mutual exclusion (e.g. by using a lock).

Related

How to resolve this warning in Perl

I asked this type of ques previously but didn't provide the full code.
I am reading below file and checking the max word width present in each column and then write it to another file with proper alignment.
id0 id1 id2 batch
0 34 56 70
2 3647 58 72 566
4 39 616 75 98 78 78987 9876 7899 776
89 40 62 76
8 42 64 78
34 455 544 565
My code:
unlink "temp1.log";
use warnings;
use strict;
use feature 'say';
my $log1_file = "log1.log";
my $temp1 = "temp1.log";
open(IN1, "<$log1_file" ) or die "Could not open file $log1_file: $!";
my #col_lens;
while (my $line = <IN1>) {
my #fs = split " ", $line;
my #rows = #fs ;
#col_lens = map (length, #rows) if $.==1;
for my $col_idx (0..$#rows) {
my $col_len = length $rows[$col_idx];
if ($col_lens[$col_idx] < $col_len) {
$col_lens[$col_idx] = $col_len;
}
};
};
close IN1;
open(IN1, "<$log1_file" ) or die "Could not open file $log1_file: $!";
open(tempp1,"+>>$temp1") or die "Could not open file $temp1: $!";
while (my $line = <IN1>) {
my #fs = split " ", $line;
my #az;
for my $h (0..$#fs) {
my $len = length $fs[$h];
my $blk_len = $col_lens[$h]+1;
my $right = $blk_len - $len;
$az[$h] = (" ") . $fs[$h] . ( " " x $right );
}
say tempp1 (join "|",#az);
};
My warning
Use of uninitialized value in numeric lt (<) at new.pl line 25, <IN1> line 3.
Use of uninitialized value in numeric lt (<) at new.pl line 25, <IN1> line 4.
Use of uninitialized value in numeric lt (<) at new.pl line 25, <IN1> line 4.
Use of uninitialized value in numeric lt (<) at new.pl line 25, <IN1> line 4.
Use of uninitialized value in numeric lt (<) at new.pl line 25, <IN1> line 4.
Use of uninitialized value in numeric lt (<) at new.pl line 25, <IN1> line 4.
I am getting the output correctly but don't know how to remove this warnings.
$col_idx can be up to the number of fields on a line, minus one. For the third line, this is more than the highest index of #col_lens, which contains at most 3 elements. So doing the following makes no sense:
if ($col_lens[$col_idx] < $col_len) {
$col_lens[$col_idx] = $col_len;
}
Replace it with
if (!defined($col_lens[$col_idx]) || $col_lens[$col_idx] < $col_len) {
$col_lens[$col_idx] = $col_len;
}
With this, there's really no point checking for $. == 1 anymore.
You're getting uninitialized warning because, while checking the $col_lens[$col_idx] < $col_len condition, one or both of them are undef.
Solution 1:
You can skip checking this condition by the use of next statement.
for my $col_idx (0..$#rows) {
my $col_len = length $rows[$col_idx];
next unless $col_lens[$col_idx];
if ($col_lens[$col_idx] < $col_len) {
$col_lens[$col_idx] = $col_len;
}
}
Solution 2: (Not advised):
You can simply ignore Use of uninitialized value.. warnings by putting this line at top of your script. This will disable uninitialized warnings in a block.
no warnings 'uninitialized';
For more info, please refer this link
Following code demonstrates one of many possible ways for solution to this task
read line by line
get length of each field
compare with stored earlier
adjust to max length
form $format string for print
print formatted data
use strict;
use warnings;
use feature 'say';
my(#data,#length,$format);
while ( <DATA> ) {
my #e = split ' ';
my #l = map{ length } #e;
$length[$_] = ($length[$_] // 0) < $l[$_] ? $l[$_] : $length[$_] for 0..$#e;
push #data,\#e;
}
$format = join ' ', map{ '%'.$_.'s' } #length;
$format .= "\n";
for my $row ( #data ) {
printf $format, map { $row->[$_] // '' } 0..$#length;;
}
__DATA__
id0 id1 id2 batch
0 34 56 70
2 3647 58 72 566
4 39 616 75 98 78 78987 9876 7899 776
89 40 62 76
8 42 64 78
34 455 544 565
Output
id0 id1 id2 batch
0 34 56 70
2 3647 58 72 566
4 39 616 75 98 78 78987 9876 7899 776
89 40 62 76
8 42 64 78
34 455 544 565

Why is the output the way it is? -Splitting and chop

I'm trouble understanding the output of the below code.
1. Why is the output Jo Al Ch and Sa? Doesn't chop remove the last character of string and return that character, so shouldn't the output be n i n and y? 2. What is the purpose of the $firstline=0; line in the code?
3. What exactly is happening at the lines
foreach(#data)
{$name,$age)=split(//,$_);
print "$name $age \n";
The output of the following code is
Data in file is:
J o
A l
C h
S a
The file contents are:
NAME AGE
John 26
Ali 21
Chen 22
Sally 25
The code:
#!/usr/bin/perl
my ($firstline,
#data,
$data);
open (INFILE,"heading.txt") or die $.;
while (<INFILE>)
{
if ($firstline)
{
$firstline=0;
}
else
{
chop(#data=<INFILE>);
}
print "Data in file is: \n";
foreach (#data)
{
($name,$age)=split(//,$_);
print "$name $age\n";
}
}
There are few issues with this script but first I will answer your points
chop will remove the last character of a string and returns the character chopped. In your data file "heading.txt" every line might be ending with \n and hence chop will be removing \n. It is always recommended to use chomp instead.
You can verify what is the last character of the line by running the command below:
od -bc heading.txt
0000000 116 101 115 105 040 101 107 105 012 112 157 150 156 040 062 066
N A M E A G E \n J o h n 2 6
0000020 012 101 154 151 040 062 061 012 103 150 145 156 040 062 062 012
\n A l i 2 1 \n C h e n 2 2 \n
0000040 123 141 154 154 171 040 062 065 012
S a l l y 2 5 \n
0000051
You can see \n
There is no use of $firstline because it is never been set to 1. So you can remove the if/else block.
In the first line it is reading all the elements of array #data one by one. In 2nd line it is splitting the contents of the element in characters and capturing first 2 characters and assigning them to $name and $age variables and discarding the rest. In the last line we are printing those captured characters.
IMO, in line 2 we should do split based on space to actual capture the name and age.
So the final script should looks like:
#!/usr/bin/perl
use strict;
use warnings;
my #data;
open (INFILE,"heading.txt") or die "Can't open heading.txt: $!";
while (<INFILE>) {
chomp(#data= <INFILE>);
}
close(INFILE);
print "Data in file is: \n";
foreach (#data) {
my ($name,$age)=split(/ /,$_);
print "$name $age\n";
}
Output:
Data in file is:
John 26
Ali 21
Chen 22
Sally 25

Perl: read an array and calculate corresponding percentile

I am trying to code for a perl code that reads a text file with a series of number, calculates, and prints out the numbers that corresponds to the percentiles. I do not have access to the other statistical modules, so I'd like to stick with just pure perl coding. Thanks in advance!
The input text file looks like:
197
98
251
82
51
272
154
167
38
280
157
212
188
88
40
229
228
125
292
235
67
70
127
26
279
.... (and so on)
The code I have is:
#!/usr/bin/perl
use strict;
use warnings;
my #data;
open (my $fh, "<", "testing2.txt")
or die "Cannot open: $!\n";
while (<$fh>){
push #data, $_;
}
close $fh;
my %count;
foreach my $datum (#data) {
++$count{$datum};
}
my %percentile;
my $total = 0;
foreach my $datum (sort { $a <=> $b } keys %count) {
$total += $count{$datum};
$percentile{$datum} = $total / #data;
# percentile subject to change
if ($percentile{$datum} <= 0.10) {
print "$datum : $percentile{$datum}\n\n";
}
}
My desired output:
2 : 0.01
3 : 0.01333
4 : 0.01666
6 : 0.02
8 : 0.03
10 : 0.037
12 : 0.04
14 : 0.05
15 : 0.05333
16 : 0.06
18 : 0.06333
21 : 0.07333
22 : 0.08
25 : 0.09
26 : 0.09666
Where the format is #number from the list : #corresponding percentile
To store the numer wihtout a newline in #data, just add chomp; before pushing it, or chomp #data; after you've read them all.
If your input file has MSWin style newlines, convert it to *nix style using dos2unix or fromdos.
Also, try to learn how to indent your code, it boosts readability. And consider renaming $total to $running_total, as you use the value as it changes.

should I commit in the following code?

My code:
122 #
123 my $hfpDbh = undef;
124 unless (
125 $hfpDbh = DBI->connect("dbi:Pg:host=....")#removed something
128 ) {
129 Log( ERROR, "" );
130 Exit( 1 )
131 }
132 $hfpDbh->{RaiseError} = 1;
133 $hfpDbh->{AutoCommit} = 0;
134
135 ( my $mydata, $msg ) = load_data( $hfpDbh, $DatFile );
136 unless ( defined($mydata) )
137 {
138 Log(INFO, "Calling exit...2");
139 }
140 #$hfpDbh->disconnect();
141 Exit( 0 );
142 Log(INFO, "Calling exit...4");
143
144
145 sub load_data
146 {
147 my ( $dbh, $DatFile ) = #_;
148 my $msg = '';
149 unless ( $dbh ) {
150 $msg = 'cannot load data, no DB handle';
151 Log( ERROR, $msg );
152 }
153 Log(INFO, "Call load_data...");
154 my $q = "SELECT ip as ip FROM rules WHERE active = 'true' AND isGood = 'true';";
155 my $stmt = undef;
156 unless ( $stmt = $dbh->prepare( $q ) ) {
157 $msg = "unable to prepare SQL query: $q";
158 Log( ERROR, $msg );
159 }
160
161 eval { $stmt->execute() };
162 if ( $# ) {
163 $msg = "failed to execute SQL query: $#";
164 Log( ERROR, $msg );
165 }
166
167 my $data = {};
168 while ( my $row = $stmt->fetchrow_hashref() ) {
169 #Log(INFO, "testing row");
170 }
171 $stmt->finish();
172 return $data, $msg;
173 }
The warning is:
Issuing rollback() due to DESTROY without explicit disconnect() of DBD::Pg::db handle
If I add "$dbh->commit()" after Line 171, the above warn disappeared.
If I did not add "$dbh->commit()" after Line 171 but called "$hfpDbh->disconnect();" in Line 140, the above warn disappeared too.
My question is:
The warning means that there are uncommitted transactions? That is why I need to commit or disconnect explicitly to fix the warning. But there is only SELECT operation in the code. What I am missing?
Thanks.
Since you aren't modifying the database at all, you don't need to enable transactions, so don't set AutoCommit to zero. That way there's no need to call commit anywhere either, and the database will be disconnected automatically when the handle goes out of dcope
Since you're handling errors yourself you shouldn't set RaiseError to 1. That will cause your program to die immediately if any error occurs and your own handling code won't get executed
There's no need to call finish. It won't do any harm here, but it's also pointless and should almost never be necessary
The warning means that there are uncommitted transactions?
There is an uncommitted transaction since you requested for transactions to be used, but the warning actually notifies you that a rollback was performed implicitly. It tells you this because this may result in a loss of information. Obviously, it won't result in a loss of information in this case, but the check isn't smart enough of realize this.
What I am missing?
$hfpDbh->disconnect(); or $hfpDbh->rollback();

Fastest way to index and query huge tab delimited file

I have 30Gb tab-delimited text file with numbers, I need the fastest way index it and to do a query to it by first and second column. I've tried MongoDB but it takes huge time to upload data to database, I've tried mongoimport via json file but it takes huge amount of time.
mongoimport --upsert --upsertFields A,B,S1,E1,S2,E2 -d DBName -c
TableName data.json
Data file fragment:
504 246 91.92007 93 0 4657 5631 5911 0 39 1061 1162
813 469 92.14697 109 0 2057 2665 7252 1 363 961 1399
2388 987 92.20945 61 0 1183 1575 1824 0 66 560 5088
2388 2323 92.88472 129 0 75 1161 1824 1 2516 3592 12488
2729 1008 95.29058 47 0 435 1166 1193 1 76 654 1055
2757 76 94.25837 12 0 0 44 1946 0 51 68 247
2757 2089 92.63158 14 0 12 30 1946 0 14 30 211
What is the right efficient way to do it with minimum time? Any hints about the best database for it? Or about mongo upload speed optimisation?
Query examples:
objs = db.TableName.find({'A':2757})
objs = db.TableName.find({'B':76})
For each number in column A and B there are up to 1000 hits with the mean 20.
Databases often has complex work to do in order to be more robust.
If you use strait B-tree indexes, normally it is faster.
Following you'll find a upload script in perl.
#!/usr/bin/perl
use DB_File;
use Fcntl ;
# $DB_BTREE->{'cachesize'} = 1000000;
$DB_BTREE->{'flags'} = R_DUP ;
my (%h, %h1, %h2,$n);
my $x = tie %h, 'DB_File', "bf.db", O_RDWR|O_CREAT|O_TRUNC , 0640, $DB_BTREE;
my $x1= tie %h1, 'DB_File', "i1.db", O_RDWR|O_CREAT|O_TRUNC , 0640, $DB_BTREE;
my $x2= tie %h2, 'DB_File', "i2.db", O_RDWR|O_CREAT|O_TRUNC , 0640, $DB_BTREE;
while(<>){ chomp;
if(/(\d+)\s+(\d+)/){
$h{++$n}=$_; ## add the tup
$h1{$1} = $n; ## add to index1
$h2{$2} = $n ## add to index2;
}
}
untie %h;
untie %h1;
untie %h2;
and a query:
#!/usr/bin/perl
use DB_File;
use Fcntl ;
$DB_BTREE->{'flags'} = R_DUP ;
my (%h, %h1, %h2, $n, #list);
my $x = tie %h, 'DB_File', "bf.db", O_RDWR|O_CREAT , 0640, $DB_BTREE;
my $x1= tie %h1, 'DB_File', "i1.db", O_RDWR|O_CREAT , 0640, $DB_BTREE;
my $x2= tie %h2, 'DB_File', "i2.db", O_RDWR|O_CREAT , 0640, $DB_BTREE;
while(<>){ chomp; # Queries input format: A:number or B:number
if(/A:(\d+)/){
#list = sort $x1->get_dup($1) ;
for(#list){print $h{$_},"\n"; }
}
if(/B:(\d+)/){
#list = sort $x2->get_dup($1) ;
for(#list){print $h{$_},"\n"; }
}
}
Query is very fast.
But upload took 20s (user time) for 1 000 000 lines...
(please if you do experiments with your data, show us the times)