I would like to read 100 KB from <>, do some testing on that and then put the 100 KB back, so they will be read by <> later.
In metacode:
$data100kb = read(<>,100000);
testing($data100kb);
unget(<>,$data100kb);
while(<>) {
do stuff;
}
I do not know in advance if <> will supply me an actual file, a pipe or a concatenation of actual files. So it should work with:
cat bigfile_a bigfile_b | perl my_program
Assume bigfiles are 1000*RAM size, so copying the input is prohibitively expensive.
It is acceptable if I can only read from STDIN.
Background
The first 100kb tells me how to parse the full input, but the parser needs this input as well.
This seems to work for STDIN. It would be great if it could be done faster.
read(STDIN, $first, 100000);
unget($first);
compute($first);
while($_=get_line()) {
# Similar to while(<>)
}
my #line_cache;
sub get_line {
if(#line_cache) {
my $line = shift #line_cache;
if(#line_cache) {
# not last line
return $line;
} else {
# last line - may be incomplete
if(substr($line, -1, 1) eq $/) {
# Line is complete
return $line;
} else {
return $line. scalar(<STDIN>);
}
}
} else {
return scalar(<STDIN>);
}
}
sub unget {
for(#_) {
# Split into lines
push #line_cache, split m:(?<=$/):;
}
}
For posterity... I wrote FileHandle::Unget to address this problem.
I don't know whether this satisfies your need. If you insist on using <>, then I guess you have to use tie.
#copy STDIN to another filehandle: $fh
my $fakefile = join '', <STDIN>;
open my $fh, '<', \$fakefile;
#read 100kb
read $fh, my $data100kb, 100_000;
#do something with the data
#$data100kb =~ y/a/b/;
#print $data100kb;
#reset $fh
seek $fh, 0, 0;
while(<$fh>){
print;# do some stuff
}
Related
I have an array of files. Now I need to cat each file and search for a list of keywords which is in file keywords.txt.
my keywords.txt contains below
AES
3DES
MD5
DES
SHA-1
SHA-256
SHA-512
10.*
http://
www.
#john.com
john.com
and I'm expecting output as below
file jack.txt contains AES:5 (5 line number) http://:55
file new.txt contains 3DES:75 http://:105
Okay Here is my Code
use warnings;
use strict;
open STDOUT, '>>', "my_stdout_file.txt";
my $filename = $ARGV[2];
chomp ($filename);
open my $fh, q[<], shift or die $!;
my %keyword = map { chomp; $_ => 1 } <$fh>;
print "$fh\n";
while ( <> ) {
chomp;
my #words = split;
for ( my $i = 0; $i <= $#words; $i++ ) {
if ( $keyword{ $words[ $i ] } ) {
print "Keyword Found for file:$filename\n";
printf qq[$filename Line: %4d\tWord position: %4d\tKeyword: %s\n],
$., $i, $words[ $i ];
}
}
}
But the problem is its considering all the Arguments and trying to open the files for ARGV[2]. Actually i need to Open only ARGV[0] and ARGV[1]. ARGV[2] i kept for writing in output only.
Appreciate responses.
Thanks.
I think maybe there are multipe keywords in one line of those txt files. So below code for your reference:
my #keywords;
open IN, "keywords.txt";
while(<IN>) {
if(/^(.+)$/) {
push(#keywords, $1);
}
}
close(IN);
my #filelist=glob("*.txt");
foreach my $filename(#filelist) {
if(open my $fh, '<', $filename) {
my $ln=1;
while(my $line = <$fh>) {
foreach my $kw(#keywords) {
if($line=~/$kw/) {
print $filename.':'.$ln.':'.$kw."\n";
}
}
$ln++;
}
}
close($fh);
}
Ok but one thing i noticed here is..
the Keywords
http://
oracle.com
it is matching the whole word, instead it should try to match as part of strings.. Only these keywords.. Others should be searched as per the above.
I have it set up so far that it decodes a ROT25 encrypted message and reads it only displaying if has the word "the" in it. But I have to make it go through every ROT(1-25). I know its probably a loop but not sure how to set it up.
use English;
my $file_name = shift;
sub decode
{
return shift =~ tr/Z-ZA-Yz-za-y/A-Za-z/r;
}
open(my $file_handle, '<', $file_name)
or die "Could not open file '$file_name' $!";
my $encoded = '';
{ # allow us to read entire file in as a string:
local $INPUT_RECORD_SEPARATOR = undef;
$encoded = <$file_handle>;
}
close $file_handle;
my $decoded = &decode($encoded);
if ($decoded=~m/(^| )the/) # make this more robust!
{
print($decoded);
}else{
print("File does not contain the, not the secret file");
}
my $decoded = $encoded;
for my $rot (reverse 1..25) {
$decoded = decode($decoded);
...
}
Try the following:
my $decoded = $encoded;
my $decodingFound;
for (1 .. 25) {
$decoded = decode($decoded);
if ($decoded =~ /\bthe\b/) {
print($decoded);
$decodingFound = 1;
last;
}
}
print("File does not contain the, not the secret file")
unless $decodingFound;
Notes:
this tries all the ROTs from ROT1 to ROT25
it never checks the original (encoded) string; to do that, loop 26 instead of 25 times
as soon as it finds a rotation that contains "the", it stops looking
the regular expression is \bthe\b: the \b means word binaries, so neither "there" nor "father" will match; it seems that's what you wanted, judging by your comment ("make this more robust!")
Also, your rotation may be shortened to tr/ZA-Yza-y/A-Za-z/r (Z is as good as Z-Z) and may be more legible as tr/A-Za-z/B-ZAb-za/r.
I am trying this in Perl to pick one complete line from whole document which contains "CURRENT_RUN_ID". I have been using below code to accomplish the above said task but I am unable to enter the while loop.
my $sSuccessString = "CURRENT_RUN_ID";
open(LOG, "$slogfile") or die("Can't open $slogfile\n");
my $sLines;
{
local $/ = undef;
$sLines=<LOG>;
}
my $spool = 0;
my #matchingLines;
while (<LOG>)
{
print OUTLOG "in while loop\n";
if (m/$sSuccessString/i) {
print OUTLOG "in if loop\n";
$spool = 1;
print map { "$_ \n" } #matchingLines;
#matchingLines = ();
}
if ($spool) {
push (#matchingLines, $_);
}
}
You are already done reading from the filehandle LOG after you have slurped it into $sLines. <LOG> in the head of the while will return undef because it has reached eof. You either have to use that variable $sLines in your while loop or get rid of it. You're not using it anyway.
If you only want to print the line that matches, all you need to do is this:
use strict;
use warnings;
open my $fh_in, '<', 'input_file' or die $!;
open my $fh_out '>', 'output_file' or die $!;
while (my $line = <$fh_in>) {
print $fh_out $line if $line =~ m/CURRENT_RUN_ID/;
}
close $fh_in;
close $fh_out;
When you execute this code:
$sLines=<LOG>;
it reads all of the data from LOG into $sLines and it leaves the file pointer for LOG at the end of the file. So when you next try to read from that file handle with:
while (<LOG>)
nothing is returned as there is no more data to read.
If you want to read the file twice, then you will need to use the seek() function to reset the file pointer before your second read.
seek LOG, 0, 0;
But, given that you never do anything with $sLines I suspect that you can probably just remove that whole section of the code.
The whole thing with $spool and #matchingLines seems strange too. What were you trying to achieve there?
I think your code can be simplified to just:
my $sSuccessString = "CURRENT_RUN_ID";
open(LOG, $slogfile) or die("Can't open $slogfile\n");
while (<LOG>) {
print OUTLOG if /$sSuccessString/i/;
}
Personally, I'd make it even simpler, by reading from STDIN and writing to STDOUT.
my $sSuccessString = 'CURRENT_RUN_ID';
while (<>) {
print if /$sSuccessString/i/;
}
And then using Unix I/O redirection to connect up the correct files.
$ ./this_filter.pl < your_input.log > your_output.log
I want to make a tool to classify each line in input file to several files
but it seems have some problem in naming a filehandler so I can't go ahead , how do I solve?
here is my program
ARGV[0] is the input file
ARGV[1] is the number of classes
#!/usr/bin/perl
use POSIX;
use warnings;
# open input file
open(Raw,"<","./$ARGV[0]") or die "Can't open $ARGV[0] \n";
# create a directory class to store class files
system("mkdir","Class");
# create files for store class informations
for($i=1;$i<=$ARGV[1];$i++)
{
# it seems something wrong in here
open("Class$i",">","./Class/$i.class") or die "Can't create $i.class \n";
}
# read each line and random decide which class to store
while( eof(Raw) != 1)
{
$Line = readline(*Raw);
$Random_num = ceil(rand $ARGV[1]);
for($k=1;$k<=$ARGV[1];$k++)
{
if($Random_num == $k)
{
# Store to the file
print "Class$k" $Line;
last;
}
}
}
for($h=1;$h<=$ARGV[1];$h++)
{
close "Class$h";
}
close Raw;
thanks
Later I use the advice provided by Bill Ruppert
I put the name of filehandler into array , but it seems appear a syntax bug , but I can't correct it
I label the syntax bug with ######## A syntax error but it looks quite OK ########
here is my code
#!/usr/bin/perl
use POSIX;
use warnings;
use Data::Dumper;
# open input file
open(Raw,"<","./$ARGV[0]") or die "Can't open $ARGV[0] \n";
# create a directory class to store class files
system("mkdir","Class");
# put the name of hilehandler into array
for($i=0;$i<$ARGV[1];$i++)
{
push(#Name,("Class".$i));
}
# create files of classes
for($i=0;$i<=$#Name;$i++)
{
$I = ($i+1);
open($Name[$i],">","./Class/$I.class") or die "Can't create $I.class \n";
}
# read each line and random decide which class to store
while( eof(Raw) != 1)
{
$Line = readline(*Raw);
$Random_num = ceil(rand $ARGV[1]);
for($k=0;$k<=$#Name;$k++)
{
if($Random_num == ($k+1))
{
print $Name[$k] $Line; ######## A syntax error but it looks quite OK ########
last;
}
}
}
for($h=0;$h<=$#Name;$h++)
{
close $Name[$h];
}
close Raw;
thanks
To quote the Perl documentation on the print function:
If you're storing handles in an array or hash, or in general whenever you're using any expression more complex than a bareword handle or a plain, unsubscripted scalar variable to retrieve it, you will have to use a block returning the filehandle value instead, in which case the LIST may not be omitted:
print { $files[$i] } "stuff\n";
print { $OK ? STDOUT : STDERR } "stuff\n";
Thus, print $Name[$k] $Line; needs to be changed to print { $Name[$k] } $Line;.
How about this one:
#! /usr/bin/perl -w
use strict;
use POSIX;
my $input_file = shift;
my $file_count = shift;
my %hash;
open(INPUT, "<$input_file") || die "Can't open file $input_file";
while(my $line = <INPUT>) {
my $num = ceil(rand($file_count));
$hash{$num} .= $line
}
foreach my $i (1..$file_count) {
open(OUTPUT, ">$i.txt") || die "Can't open file $i.txt";
print OUTPUT $hash{$i};
close OUTPUT;
}
close INPUT;
Is there any way to have a subroutine send data back while still processing? For instance (this example used simply to illustrate) - a subroutine reads a file. While it is reading through the file, if some condition is met, then "return" that line and keep processing. I know there are those that will answer - why would you want to do that? and why don't you just ...?, but I really would like to know if this is possible.
A common way to implement this type of functionality is with a callback function:
{
open my $log, '>', 'logfile' or die $!;
sub log_line {print $log #_}
}
sub process_file {
my ($filename, $callback) = #_;
open my $file, '<', $filename or die $!;
local $_;
while (<$file>) {
if (/some condition/) {
$callback->($_)
}
# whatever other processing you need ....
}
}
process_file 'myfile.txt', \&log_line;
or without even naming the callback:
process_file 'myfile.txt', sub {print STDERR #_};
Some languages offer this sort of feature using "generators" or "coroutines", but Perl does not. The generator page linked above has examples in Python, C#, and Ruby (among others).
The Coro module looks like it would be useful for this problem, though I have no idea how it works and no idea whether it does what it advertises.
The easiest way to do this in Perl is probably with an iterator-type solution. For example, here we have a subroutine which forms a closure over a filehandle:
open my $fh, '<', 'some_file.txt' or die $!;
my $iter = sub {
while( my $line = <$fh> ) {
return $line if $line =~ /foo/;
}
return;
}
The sub iterates over the lines until it finds one matching the pattern /foo/ and then returns it, or else returns nothing. (undef in scalar context.) Because the filehandle $fh is defined outsite the scope of the sub, it remains resident in memory between calls. Most importantly, its state, including the current seek position in the file, is retained. So each call to the subroutine resumes reading the file where it last left off.
To use the iterator:
while( defined( my $next_line = $iter->() ) ) {
# do something with each line here
}
If you really want do this you can by using threading. One option would be to fork a separate thread that reads the file and when it finds a certain line, place it in an array that is shared between threads. Then the other thread could take the lines, as they are found, and process them. Here is an example that reads a file, looks for an 'X' in a file's line, and does an action when it is found.
use strict;
use threads;
use threads::shared;
my #ary : shared;
my $thr = threads->create('file_reader');
while(1){
my ($value);
{
lock(#ary);
if ($#ary > -1){
$value = shift(#ary);
print "Found a line to process: $value\n";
}
else{
print "no more lines to process...\n";
}
}
sleep(1);
#process $value
}
sub file_reader{
#File input
open(INPUT, "<test.txt");
while(<INPUT>){
my($line) = $_;
chomp($line);
print "reading $line\n";
if ($line =~ /X/){
print "pushing $line\n";
lock(#ary);
push #ary, $line;
}
sleep(4)
}
close(INPUT);
}
Try this code as the test.txt file:
line 1
line 2X
line 3
line 4X
line 5
line 6
line 7X
line 8
line 9
line 10
line 11
line 12X
If your language supports closures, you may be able to do something like this:
By the way, the function would not keep processing the file, it would run just when you call it, so it may be not what you need.
(This is a javascript like pseudo-code)
function fileReader (filename) {
var file = open(filename);
return function () {
while (s = file.read()) {
if (condition) {
return line;
}
}
return null;
}
}
a = fileReader("myfile");
line1 = a();
line2 = a();
line3 = a();
What about a recursive sub? Re-opening existing filehandles do not reset the input line number, so it carries on from where it's left off.
Here is an example where the process_file subroutine prints out blank-line-separated "\n\n" paragraphs that contain foo.
sub process_file {
my ($fileHandle) = #_;
my $paragraph;
while ( defined(my $line = <$fileHandle>) and not eof(<$fileHandle>) ) {
$paragraph .= $line;
last unless length($line);
}
print $paragraph if $paragraph =~ /foo/;
goto &process_file unless eof($fileHandle);
# goto optimizes the tail recursion and prevents a stack overflow
# redo unless eof($fileHandle); would also work
}
open my $fileHandle, '<', 'file.txt';
process_file($fileHandle);