Regarding perl tie::File Module

Regarding perl tie::File Module - perl

I have two queries about the tie::file module
I have used the tie::file module to do a search on a 55 MB file and set an memory of 20 MB in tie::file. When i tried to grep on the tied array for a search string it's taking a lot of time. Is there any work around for it?
Can tie::file used for reading a binary file. The tied array is delimited by "\n". How do i use a tie::file to read an binary file? Could you pls paste me some sample code.
/home/a814899> perl -e 'print "x\n"x27 for 1..1024*1024;' >a
/home/a814899> echo "hello world" >> a
Using Unix grep
/home/a814899> time grep "hello " a
hello world
real 0m8.280s
user 0m8.129s
sys 0m0.139s
Using the regex
/home/a814899> (time perl -e 'while (<>) { if (/hello/) { print "hello world"} }' a)
hello world
real 0m51.316s
user 0m51.087s
sys 0m0.189s
Using Perl Grep
#!/usr/bin/perl
print "executing\n";
my $outputFileDir="/home/a814899";
my $sFileName="a";
open my $fh, "<", $outputFileDir . "/" . $sFileName or do {
print "Could not open the file";
};
print "success in open" . "\n";
my #out=grep {/hello world/} <$fh> ;
print "#out" ;
close($fh)

Yes.
This is how you probably did it using Tie::File:
$ (
time perl -MTie::File -e'
tie #a, "Tie::File", $ARGV[0];
for (#a) { if (/y/) { } }
' a
) 2>&1 | grep real
real 2m44.333s
This is the "workaround":
$ (
time perl -e'
while (<>) { if (/y/) { } }
' a
) 2>&1 | grep real
real 0m0.644s
The data file was created using
$ perl -E'say "x"x54 for 1..1024*1024;' >a
Tie::File doesn't read files; Tie::File provides a means of mapping lines of a file to array elements. Since "binary" files have no lines, accessing one using Tie::File wouldn't make any sense.

Related

Why are Perl's $. and $ARGV behaving strangely after setting #ARGV and using <>

I wrote a perl program to take a regex from the command line and do a recursive search of the current directory for certain filenames and filetypes, grep each one for the regex, and output the results, including filename and line number. [ basic grep + find functionality that I can go in and customize as needed ]
cat <<'EOF' >perlgrep2.pl
#!/usr/bin/env perl
$expr = join ' ', #ARGV;
my #filetypes = qw(cpp c h m txt log idl java pl csv);
my #filenames = qw(Makefile);
my $find="find . ";
my $nfirst = 0;
foreach(#filenames) {
$find .= " -o " if $nfirst++;
$find .= "-name \"$_\"";
}
foreach(#filetypes) {
$find .= " -o " if $nfirst++;
$find .= "-name \\*.$_";
}
#files=`$find`;
foreach(#files) {
s#^\./##;
chomp;
}
#ARGV = #files;
foreach(<>) {
print "$ARGV($.): $_" if m/\Q$expr/;
close ARGV if eof;
}
EOF
cat <<'EOF' >a.pl
print "hello ";
$a=1;
print "there";
EOF
cat <<'EOF' >b.pl
print "goodbye ";
print "all";
$a=1;
EOF
chmod ugo+x perlgrep2.pl
./perlgrep2.pl print
If you copy and paste this into your terminal, you will see this:
perlgrep2.pl(36): print "hello ";
perlgrep2.pl(0): print "there";
perlgrep2.pl(0): print "goodbye ";
perlgrep2.pl(0): print "all";
perlgrep2.pl(0): print "$ARGV($.): $_" if m/\Q$expr/;
This is very surprising to me. The program appears to be working except that the $. and $ARGV variables do not have the values I expected. It appears from the state of the variables that perl has already read all three files (total of 36 lines) when it executes the first iteration of the loop over <>. What's going on ? How to fix ? This is Perl 5.12.4.

You're using foreach(<>) where you should be using while(<>). foreach(<>) will read every file in #ARGV into a temporary list before it starts iterating over it.

Splitting large file into small multiple files based on the column concatenation

I have an input file as following. I need to break them into multiple files based on the columns 2,3&5. The file has more columns but i have used cut command to get only the required columns.
12,Accounts,India,free,Internal
13,Finance,China,used,Internal
16,Finance,China,free,Internal
12,HR,India,free,External
19,HR,China,used,Internal
33,Finance,Japan,free,Internal
39,Accounts,US,used,External
14,Accounts,Japan,used,External
11,Finance,India,used,External
11,HR,US,used,External
10,HR,India,used,External
Output files:
Accounts_India_Internal --
12,Accounts,India,free,Internal
Finance_China_Internal --
13,Finance,China,used,Internal
16,Finance,China,free,Internal
HR_India_External --
12,HR,India,free,External
10,HR,India,used,External
HR_China_Internal --
19,HR,China,used,Internal
and so on..
Please let me know how to achieve this.
As of now, I am thinking to sort the file based on these columns (2,3,5) and then run a loop on each record and start creating files. If a file does not exist, then create and add the record. Otherwise open the old file and add the record.
Is it possible to do this using shell scripting (bash)?

Is it possible to do this using shell scripting (bash)?
If you simply want to split the files based on fields 2, 3 and 5 you can do that quickly with awk:
awk -F, '{print >> $2"_"$3"_"$5}' infile.txt
That appends each line to a file whose name is made up of fields 2, 3 and 5.
Example:
[me#home]$ awk -F, '{print >> $2"_"$3"_"$5}' infile.txt
[me#home]$ cat Accounts_India_Internal
12,Accounts,India,free,Internal
[me#home]$ cat Finance_China_Internal
13,Finance,China,used,Internal
16,Finance,China,free,Internal
If you do want output sorted, you can first run the file through sort.
sort -k2,3 -k5,5 -t, infile.txt | awk -F, '{print >> $2"_"$3"_"$5}'
That sorts the lines on fields 2, 3, and 5 before passing them on to the awk command.
Do note that the we're appending to the files so if you repeat the command without deleting the output files, you'll end up with duplicate data in the output files. To address this, as well as include your additional requirements (using first line as header for all new files) as mentioned in the chat, see this solution.

I suggest you keep a hash of file handles keyed by their corresponding file names
This program demonstrates. The input file is expected as a parameter on the command line
use strict;
use warnings;
my %fh;
while (<>) {
chomp;
my $filename = join '_', (split /,/)[1,2,4];
if (not $fh{$filename}) {
open $fh{$filename}, '>', $filename or die "Unable to open '$filename' for output: $!";
print "$filename created\n";
}
print { $fh{$filename} } $_, "\n";
}
output
Accounts_India_Internal created
Finance_China_Internal created
HR_India_External created
HR_China_Internal created
Finance_Japan_Internal created
Accounts_US_External created
Accounts_Japan_External created
Finance_India_External created
HR_US_External created

Note: To use the code, simply change <DATA> to <> and use the file name as argument. The Data::Dumper print is there only for demonstration purposes and can also be removed.
use strict;
use warnings;
use Data::Dumper;
my %h;
while (<DATA>) {
chomp;
my #data = split /,/;
my $file = join "_", #data[1,2,4];
push #{$h{$file}}, $_;
}
print Dumper \%h;
__DATA__
12,Accounts,India,free,Internal
13,Finance,China,used,Internal
16,Finance,China,free,Internal
12,HR,India,free,External
19,HR,China,used,Internal
33,Finance,Japan,free,Internal
39,Accounts,US,used,External
14,Accounts,Japan,used,External
11,Finance,India,used,External
11,HR,US,used,External
10,HR,India,used,External
To print the files, you could use a subroutine like so:
for my $key (keys %h) {
print_file($key, $h{$key};
}
sub print_file {
my ($file, $data) = #_;
open my $fh, ">", $file or die $!;
print $fh "$_\n" for #$data;
}

save input text as foo, then:
cat foo | perl -nle '$k = join "_", (split ",", $_)[1,2,4]; $t{$k} = [#{$t{$k}}, $_]; END{for (keys %t){print join "\n", "$_ --", #{$t{$_}}, undef }}' | csplit -sz - '/^$/' {*}

Perl script that works the same as unix command "history | grep keyword"

in Unix, what I want to do is "history | grep keyword", just because it takes quite some steps if i wanna grep many types of keywords, so I want it to be automation, which I write a Perl script to do everything, instead of repeating the commands by just changing the keyword, so whenever I want to see those certain commands, I will just use the Perl script to do it for me.
The keyword that I would like to 'grep' is such as source, ls, cd, etc.
It can be printed out in any format, as long as to know how to do it.
Thanks! I appreciate any comments.

modified (thanks to #chas-owens)
#!/bin/perl
my $searchString = $ARGV[0];
my $historyFile = ".bash.history";
open FILE, "<", $historyFile or die "could not open $historyFile: $!";
my #line = <FILE>;
print "Lines that matched $searchString\n";
for (#lines) {
if ($_ =~ /$searchString/) {
print "$_\n";
}
}
original
#!/bin/perl
my $searchString = $ARGV[0];
my $historyFile = "<.bash.history";
open FILE, $historyFile;
my #line = <FILE>;
print "Lines that matched $searchString\n";
for (#lines) {
if ($_ =~ /$searchString/) {
print "$_\n";
}
}
to be honest ... history | grep whatever is clean and simple and nice ; )
note code may not be perfect

because it takes quite some steps if i wanna grep many types of keywords
history | grep -E 'ls|cd|source'
-P will switch on the Perl compatible regular expression library, if you have a new enough version of grep.

This being Perl, there are many ways to do it. The simplest is probably:
#!/usr/bin/perl
use strict;
use warnings;
my $regex = shift;
print grep { /$regex/ } `cat ~/.bash_history`;
This runs the shell command cat ~/.bash_history and returns the output as a list of lines. The list of lines is then consumed by the grep function. The grep function runs the code block for every item and only returns the ones that have a true return value, so it will only return lines that match the regex.
This code has several things wrong with it (it spawns a shell to run cat, it holds the entire file in memory, $regex could contain dangerous things, etc.), but in a safe environment where speed/memory isn't an issue, it isn't all that bad.
A better script would be
#!/usr/bin/perl
use strict;
use warnings;
use constant HISTORYFILE => "$ENV{HOME}/.bash_history";
my $regex = shift;
open my $fh, "<", HISTORYFILE
or die "could not open ", HISTORYFILE, ": $!";
while (<$fh>) {
next unless /$regex/;
print;
}
This script uses a constant to make it easier to change which history file it is using at a latter date. It opens the history file directly and reads it line by line. This means the whole file is never in memory. This can be very important if the file is very large. It still has the problem that $regex might contain a harmful regex, but so long as you are the person running it, you only have yourself to blame (but I wouldn't let outside users pass arguments to a command like this through, say a web application).

I think you are better off writing a perlscript which does you fancy matching (i.e. replaces the grep) but does not read the history file. I say this because the history does not appear to be flushed to the .bash_history file until I exit the shell. Now there are probably settings and/or environment variables to control this, but I don't know what they are. So if you just write a perl script which scanns STDIN for your favourite commands you can invoke it like
history | findcommands.pl
If its less typing you are after set up a shell function or alias to do this for you.
As requested by #keifer here is a sample perl script which searches for a specified (or default set of commands in your history). Onbiously you should change the dflt_cmds to whichever ones you search for most frequently.
#!/usr/bin/perl
my #dflt_cmds = qw( cd ls echo );
my $cmds = \#ARGV;
if( !scalar(#$cmds) )
{
$cmds = \#dflt_cmds;
}
while( my $line = <STDIN> )
{
my( $num, $cmd, #args ) = split( ' ', $line );
if( grep( $cmd eq $_ , #$cmds ) )
{
print join( ' ', $cmd, #args )."\n";
}
}

Perl: avoid greedy reading from stdin?

Consider the following perl script (read.pl):
my $line = <STDIN>;
print "Perl read: $line";
print "And here's what cat gets: ", `cat -`;
If this script is executed from the command line, it will get the first line of input, while cat gets everything else until the end of input (^D is pressed).
However, things are different when the input is piped from another process or read from a file:
$ echo "foo\nbar" | ./read.pl
Perl read: foo
And here's what cat gets:
Perl seems to greadily buffer the entire input somewhere, and processes called using backticks or system do no see any of the input.
The problem is that I'd like to unit test a script that mixes <STDIN> and calls to other processes. What would be the best way to do this? Can I turn off input buffering in perl? Or can I spool the data in a way that will "mimic" a terminal?

This is not a Perl problem. It is a UNIX/shell problem. When you run a command without pipes you are in line buffering mode, but when you redirect with pipes, you are in block buffering mode. You can see this by saying:
cat /usr/share/dict/words | ./read.pl | head
This C program has the same problem:
#include <stdio.h>
int main(int argc, char** argv) {
char line[4096];
FILE* cat;
fgets(line, 4096, stdin);
printf("C got: %s\ncat got:\n", line);
cat = popen("cat", "r");
while (fgets(line, 4096, cat)) {
printf("%s", line);
}
pclose(cat);
return 0;
}

I have good news and bad news.
The good news is a simple modification of read.pl allows you to give it fake input:
#! /usr/bin/perl
use warnings;
use strict;
binmode STDIN, "unix" or die "$0: binmode: $!";
my $line = <STDIN>;
print "Perl read: $line";
print "And here's what cat gets: ", `cat -`;
Sample run:
$ printf "A\nB\nC\nD\n" | ./read.pl
Perl read: A
And here's what cat gets: B
C
D
The bad news is you get a single switchover: if you try to repeat the read-then-cat, the first cat will starve all subsequent reads. To see this, consider
#! /usr/bin/perl
use warnings;
use strict;
binmode STDIN, "unix" or die "$0: binmode: $!";
my $line = <STDIN>;
print "1: Perl read: $line";
print "1: And here's what cat gets: ", `cat -`;
$line = <STDIN>;
$line = "<undefined>\n" unless defined $line;
print "2: Perl read: $line";
print "2: And here's what cat gets: ", `cat -`;
and then a sample run that produces
$ printf "A\nB\nC\nD\n" | ./read.pl
1: Perl read: A
1: And here's what cat gets: B
C
D
2: Perl read: <undefined>
2: And here's what cat gets:

Today I think I've found what I needed: Perl has a module called Expect which is perfect for such situations:
#!/usr/bin/perl
use strict;
use warnings;
use Expect;
my $exp = Expect->spawn('./read.pl');
$exp->send("First Line\n");
$exp->send("Second Line\n");
$exp->send("Third Line\n");
$exp->soft_close();
Works like a charm ;)

Here's a sub-optimal way that I've found:
use IPC::Run;
my $input = "First Line\n";
my $output;
my $process = IPC::Run::start(['./read.pl'], \$input, \$output);
$process->pump() until $output =~ /Perl read:/;
$input .= "Second Line\n";
$process->finish();
print $output;
It's sub-optimal in the sense that one needs to know the "prompt" that the program will emit before waiting for more input.
Another sub-optimal solution is the following:
use IPC::Run;
my $input = "First Line\n";
my $output;
my $process = IPC::Run::start(['./read.pl'], \$input, my $timer = IPC::Run::timer(1));
$process->pump() until $timer->is_expired();
$timer->start(1);
$input .= "Second Line\n";
$process->finish();
It does not require knowledge of any prompt, but is slow because it waits at least two seconds. Also, I don't understand why the second timer is needed (finish won't return otherwise).
Does anybody know better solutions?

Finally I ended up with the following solution. Still far from optimal, but it works. Even in situations like the one described by gbacon.
use Carp qw( confess );
use IPC::Run;
use Scalar::Util;
use Time::HiRes;
# Invokes the given program with the given input and argv, and returns stdout/stderr.
#
# The first argument provided is the input for the program. It is an arrayref
# containing one or more of the following:
#
# * A scalar is simply passed to the program as stdin
#
# * An arrayref in the form [ "prompt", "input" ] causes the function to wait
# until the program prints "prompt", then spools "input" to its stdin
#
# * An arrayref in the form [ 0.3, "input" ] waits 0.3 seconds, then spools
# "input" to the program's stdin
sub capture_with_input {
my ($program, $inputs, #argv) = #_;
my ($stdout, $stderr);
my $stdin = '';
my $process = IPC::Run::start( [$program, #argv], \$stdin, \$stdout, \$stderr );
foreach my $input (#$inputs) {
if (ref($input) eq '') {
$stdin .= $input;
}
elsif (ref($input) eq 'ARRAY') {
(scalar #$input == 2) or
confess "Input to capture_with_input must be of the form ['prompt', 'input'] or [timeout, 'input']!";
my ($prompt_or_timeout, $text) = #$input;
if (Scalar::Util::looks_like_number($prompt_or_timeout)) {
my $start_time = [ Time::HiRes::gettimeofday ];
$process->pump_nb() while (Time::HiRes::tv_interval($start_time) < $prompt_or_timeout);
}
else {
$prompt_or_timeout = quotemeta $prompt_or_timeout;
$process->pump until $stdout =~ m/$prompt_or_timeout/gc;
}
$stdin .= $text;
}
else {
confess "Unknown input type passed to capture_with_input!";
}
}
$process->finish();
return ($stdout, $stderr);
}
my $input = [
"First Line\n",
["Perl read:", "Second Line\n"],
[0.5, "Third Line\n"],
];
print "Executing process...\n";
my ($stdout, $stderr) = capture_with_input('./read.pl', $input);
print "done.\n";
print "STDOUT:\n", $stdout;
print "STDERR:\n", $stderr;
Usage example (with a slightly modified read.pl to test gbacon's case):
$ time ./spool_read4.pl
Executing process...
done.
STDOUT:
Perl read: First Line
And here's what head -n1 gets: Second Line
Perl read again: Third Line
STDERR:
./spool_read4.pl 0.54s user 0.02s system 102% cpu 0.547 total
Still, I'm open to better solutions...

How can I print only certain fields in a space separated file?

I have a file containing the following content 1000 line in the following format:
abc def ghi gkl
How can I write a Perl script to print only the first and the third fields?
abc ghi

perl -lane 'print "#F[0,2]"' file

If no answer is good for you yet, I'll try to get the bounty ;-)
#!/usr/bin/perl
# Lines beginning with a hash (#) denote optional comments,
# except the first line, which is required,
# see http://en.wikipedia.org/wiki/Shebang_(Unix)
use strict; # http://perldoc.perl.org/strict.html
use warnings; # http://perldoc.perl.org/warnings.html
# http://perldoc.perl.org/perlsyn.html#Compound-Statements
# http://perldoc.perl.org/functions/defined.html
# http://perldoc.perl.org/functions/my.html
# http://perldoc.perl.org/perldata.html
# http://perldoc.perl.org/perlop.html#I%2fO-Operators
while (defined(my $line = <>)) {
# http://perldoc.perl.org/functions/split.html
my #chunks = split ' ', $line;
# http://perldoc.perl.org/functions/print.html
# http://perldoc.perl.org/perlop.html#Quote-Like-Operators
print "$chunks[0] $chunks[2]\n";
}
To run this script, given that its name is script.pl, invoke it as
perl script.pl FILE
where FILE is the file that you want to parse. See also http://perldoc.perl.org/perlrun.html. Good luck! ;-)

That's really kind of a waste for something as powerful as perl, since you can do the same thing in one trivial line of awk.
awk '{ print $1 $3 }'

while ( <> ) {
my #fields = split;
print "#fields[0,2]\n";
}
and just for variety, on Windows:
C:\Temp> perl -pale "$_=qq{#F[0,2]}"
and on Unix
$ perl -pale '$_="#F[0,2]"'

As perl one-liner:
perl -ane 'print "#F[0,2]\n"' file
Or as executable script:
#!/usr/bin/perl
use strict;
use warnings;
open my $fh, '<', 'file' or die "Can't open file: $!\n";
while (<$fh>) {
my #fields = split;
print "#fields[0,2]\n";
}
Execute the script like this:
perl script.pl
or
chmod 755 script.pl
./script.pl

I'm sure I shouldn't get the bounty since the question asks for the result to be given in perl, but anyway:
In bash/ksh/ash/etc:
cut -d " " -f 1,3 "file"
In Windows/DOS:
for /f "tokens=1-4 delims= " %i in (file) do (echo %i %k)
Advantages: like others said, no need to learn Pearl, Awk, nothing, just knowing some tools. The result of both calls can be saved to the disk by using the ">" and the ">>" operator.

while(<>){
chomp;
#s = split ;
print "$s[0] $s[2]\n";
}
please start to go through the documentation as well

#!/usr/bin/env perl
open my$F, "<", "file" or die;
print join(" ",(split)[0,2])."\n" while(<$F>);
close $F

One easy way is:
(split)[0,2]
Example:
$_ = 'abc def ghi gkl';
print( (split)[0,2] , "\n");
print( join(" ", (split)[0,2] ),"\n");
Command line:
perl -e '$_="abc def ghi gkl";print(join(" ",(split)[0,2]),"\n")'

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Regarding perl tie::File Module - perl

Related

Why are Perl's $. and $ARGV behaving strangely after setting #ARGV and using <>

Splitting large file into small multiple files based on the column concatenation

Perl script that works the same as unix command "history | grep keyword"

Perl: avoid greedy reading from stdin?

How can I print only certain fields in a space separated file?

Categories

Resources