How to read a file which is gzipped and tar in perl - perl

I have placed the text file "FilenameKeyword.txt" file in E:/Test folder, in my perl script i am trying to traverse through the folder and am i am trying to find a file with filename which has the string "Keyword" in it, later i have printed the content of that file in my script.
Now i wish do the same thing for the file which is placed inside tar file which is compressed.
Hypothetical File from where i am trying to extract the details:
E:\test.tar.gz
Wanted to know if there are possibility in perl to search and read the file without decompressing /unzipping the hypothetical file.If that is not possible, I shall also allocate some temperory memory to decompress the file , which should deleted after extracting the content from the particular text file.
While Searching in the internet i could it is possible to extract and read the gzip/tar file by using Archive::Extract, being new to Perl - i am really confused on how actually i should make use of it. Could you please help on this....
Input file:FilenameKeyword.txt
Script:
use warnings;
use strict;
my #dirs = ("E:\\Test\\");
my %seen;
while (my $pwd = shift #dirs) {
opendir(DIR,"$pwd") or die "Cannot open $pwd\n";
my #files = readdir(DIR);
closedir(DIR);
foreach my $file (#files)
{
if (-d $file and ($file !~ /^\.\.?$/) and !$seen{$file})
{
$seen{$file} = 1;
push #dirs, "$pwd/$file";
}
next if ($file !~ /Keyword/i);
my $mtime = (stat("$pwd/$file"))[9];
print "$pwd$file";
print "\n";
open (MYFILE, "$pwd$file");
while (my $line = <MYFILE>){
#print $line;
my ($date) = split(/,/,$line,2);
if ($line =~ s!<messageText>(.+?)</messageText>!!is){
print "$1";
}
}
}
}
Output(In test program file is placed under E:\Test):
E:\Test\FilenameKeyword.txt
1311 messages Picked from the Queue.
Looking for help to retrieve the content of the file which is place under
E:\test.tar.gz
Desired Output:
E:\test.tar.gz\FilenameKeyword.txt
1311 messages Picked from the Queue.

I was stuck in using CPAN module, CPAN module didn't work for me as i have oracle 10g enterprise edition in the same machine, due do some software conflict Active state perl was unable compile and refer to the perl lib for CPAN module, i have uninstalled oracle in my machine to make this work....
#!/usr/local/bin/perl
use Archive::Tar;
my $tar = Archive::Tar->new;
$tar->read("test.tar.gz");
$tar->extract();

If your file was gzipped only, you could read its contents in a "streamed" manner as outlined here (Piping to/from a child process without system or backtick - gzipped tar files). The article illustrates a technique to use open and a fork to open and decompress the file, and then making it available to Perl's while(), allowing you to iterate over it.
As tar is basically concatenating things, it might be possible to adapt this to your scenario.

Related

Optimized way to print directory paths recursively without file comparison in perl

I have a directory which contains multiple levels of sub dirs. I want to print path for each and every directory.
Currently, I am using
use File::Find;
find(
{
wanted => \&findfiles,
}, $maindirectory);
sub findfiles
{
if (-d) {
push #arrayofdirs,$File::Find::dir;
}
}
But each subdirectory contains thousands of files at each level. The above code takes lot of time to provide the result as it compares each file for directory. Is there a way to get subdirectories path without comparing files to save time or any other optimized method?
Edit: This issue got partially resolved but a new issue came up because of this solution. I have listed it here: Multiple File search in varying level of directories in perl
If you are on a UNIX/Linux platform then you can try reading output of find $maindirectory -type d command into your program (see this answer for a safe way to do that.). This command prints the names of directories in $maindirectory. It is faster because a compiled C program (find) does all the hard work. The following script should print all directory paths found.
Sample script:
use strict;
use warnings;
my $maindirectory = '.';
open my $fh, '-|', 'find', $maindirectory, '-type', 'd' or die "Can't open pipe: $!";
while( my $dir = <$fh>) {
print $dir;
}
close $fh or warn "can't close pipe: $!";
Note that there is no point in calling find through perl and then just printing its output without any processing. You can just as well run find $maindirectory -type d in shell itself.

Zipping a file with perl results in an invalid archive

I am currently trying to zip some files with perl. The resulting file is printed, so a user who calls the page which executes the script can download or open the zip file.
Looking at the size of the zip file it seems everything worked ok, but if I try to open the file on the server no contents are shown. If I open the file after downloading it, the archive is invalid.
Here's the code:
my $zip = Archive::Zip->new();
my $i;
foreach $i(#files)
{
my $fh = $zip->addFile("$directoryPath$i") if (-e "$directoryPath$i");
}
my $zipFilePath = "Test.zip";
die 'Cannot create $zip_file_name: $!\n' if $zip->writeToFileNamed("$zipFilePath") != AZ_OK;
open (DLFILE, "<$zipFilePath");
#fileholder = <DLFILE>;
close (DLFILE);
print "Content-Type:application/x-download\n";
print "Content-Disposition:attachment;filename=$zipFilePath\n\n";
print #fileholder;
Can you please tell me where the error is?
I am running the code using xampp on my local windows machine.
Edit: The same happens when I use
use strict;
use warnings;
use autodie;
Edit: The first problem is solved by ysth, thanks for that. Now the archive is not invalid after downloading, but still no files are shown if I open it, while the zip-file's size seems to be correct.
You are corrupting it here:
open (DLFILE, "<$zipFilePath");
#fileholder = <DLFILE>;
close (DLFILE);
by opening it such that it translates "\r\n" to just "\n".
Try this:
open( DLFILE, '<:raw', $zipFilePath );

How do I create a directory in Perl?

I am new to Perl and trying to write text files. I can write text files to an existing directory no problem, but ultimately I would like to be able to create my own directories.
I am going to download files from my course works website and I want to put the files in a folder named after the course. I don't want to make a folder for each course manually beforehand, and I would also like to eventually share the script with others, so I need a way to make the directories and name them based on the course names from the HTML.
So far, I have been able to get this to work:
use strict;
my $content = "Hello world";
open MYFILE, ">C:/PerlFiles/test.txt";
print MYFILE $content;
close (MYFILE);
test.txt doesn't exist, but C:/PerlFiles/ does and supposedly typing > allows me to create files, great.
The following, however does not work:
use strict;
my $content = "area = pi*r^2";
open MYFILE, ">C:/PerlFiles/math_class/circle.txt";
print MYFILE $content;
close (MYFILE);
The directory C:/PerlFiles/math_class/ does not exist.
I also tried sysopen but I get an error when adding the flags:
use strict;
my $content = "area = pi*r^2";
sysopen (MYFILE, ">C:/PerlFiles/math_class/circle.txt", O_CREAT);
print MYFILE $content;
close (MYFILE);
I got this idea from the Perl Cookbook chapter 7.1. Opening a File. It doesn't work, and I get the error message Bareword "O_CREAT" not allowed while "strict subs" in use. Then again the book is from 1998, so perhaps O_CREAT is obsolete. At some point I think I will need to fork over the dough for an up-to-date version.
But still, what am I missing here? Or do the directories have to be created manually before creating a file in it?
Right, directories have to be created manually.
Use mkdir function.
You can check if directory already exists with -d $dir (see perldoc -f -X).
Use File::Path to create arbitrarily deep paths. Use dirname to find out a file's containing directory.
Also, use lexical file handles and three-argument open:
open(my $fd, ">", $name) or die "Can't open $name: $!";

How to read multiple files from a directory, extract specific strings and ouput to an html file?

Greetings,
I have the following code and am stuck on how I would proceed to modify it so it will ask for the directory, read all files in the directory, then extract specific strings and ouput to an html file? Thanks in advance.
#!/usr/local/bin/perl
use warnings;
use strict;
use Cwd;
print "Enter filename: "; # Should be Enter directory
my $perlfile =STDIN;
open INPUT_FILE, $perlfile || die "Could not open file: $!";
open OUTPUT, '>out.html' || die "Could not open file: $!";
# Evaluates the file and imports it into an array.
my #comment_array = ;
close(INPUT_FILE);
chomp #comment_array;
#comment_array = grep /^\s*#/g, #comment_array;
my $comment;
foreach $comment (#comment_array) {
$comment =~ /####/; #Pattern match to grab only #s
# Prints comments to screen
Print results in html format
# Writes comments to output.html
Writes results to html file
}
close (OUTPUT);
Take it one step at a time. You have a lot planned, but so far you haven't even changed your prompt string to ask for a directory.
To read the entered directory name, your:
my $perlfile =STDIN;
gives an error (under use strict;). Start by looking that error up (use diagnostics; automates this) and trying to figure out what you should be doing instead.
Once you can prompt for a directory name and print it out, then add code to open the directory and read the directory. Directories can be opened and read with opendir and readdir. Make sure you can read the directory and print out the filenames before going on to the next step.
a good starting point to learn about specific functions (from the cmd line)
perldoc -f opendir
However, your particular problem is answered as follows, you can also use command line programs and pipe them into a string to simplify file handling ('cat') and pattern matching ('grep').
#!/usr/bin/perl -w
use strict;
my $dir = "/tmp";
my $dh;
my #patterns;
my $file;
opendir($dh,$dir);
while ($file = readdir($dh)){
if (-f "$dir/$file"){
my $string = `cat $dir/$file | grep pattern123`;
push #patterns, $string;
}
}
closedir($dh);
my $html = join("<br>",#patterns);
open F, ">out.html";
print F $html;
close F;

How can I scan multiple log files to find which ones have a particular IP address in them?

Recently there have been a few attackers trying malicious things on my server so I've decided to somewhat "track" them even though I know they won't get very far.
Now, I have an entire directory containing the server logs and I need a way to search through every file in the directory, and return a filename if a string is found. So I thought to myself, what better of a language to use for text & file operations than Perl? So my friend is helping me with a script to scan all files for a certain IP, and return the filenames that contain the IP so I don't have to search for the attacker through every log manually. (I have hundreds)
#!/usr/bin/perl
$dir = ".";
opendir(DIR, "$dir");
#files = grep(/\.*$/,readdir(DIR));
closedir(DIR);
foreach $file(#files) {
open FILE, "$file" or die "Unable to open files";
while(<FILE>) {
print if /12.211.23.200/;
}
}
although it is giving me directory read errors. Any assistance is greatly appreciated.
EDIT: Code edited, still saying permission denied cannot open directory on line 10. I am just going to run the script from within the logs directory if you are questioning the directory change to "."
Mike.
Can you use grep instead?
To get all the lines with the IP, I would directly use grep, no need to show a list of files, it's a simple command:
grep 12\.211\.23\.200 *
I like to pipe it to another file and then open that file in an editor...
If you insist on wanting the filenames, it's also easy
grep -l 12\.211\.23\.200 *
grep is available on all Unix//Linux with the GNU tools, or on windows using one of the many implementations (unxutils, cygwin, ...etc.)
You have to concatenate $dirname with $filname when using files found through readdir, remember you haven't chdir'ed into the directory where those files resides.
open FH, "<", "$dirname/$filname" or die "Cannot open $filname:$!";
Incidentally, why not just use grep -r to recursively search all subdirectories under your log dir for your string?
EDIT: I see your edits, and two things. First, this line:
#files = grep(/\.*$/,readdir(DIR));
Is not effective, because you are searching for zero or more . characters at the end of the string. Since it's zero or more, it'll match everything in the directory. If you're trying to exclude files ending in ., try this:
#files = grep(!/\.$/,readdir(DIR));
Note the ! sign for negation if you're trying to exclude those files. Otherwise (if you only want those files and I'm misunderstanding your intent), leave the ! out.
In any case, if you're getting your die message on line 10, most likely you're hitting a file that has permissions such that you can't read it. Try putting the filename in the die output so you can see which file it's failing on:
open FILE, "$file" or die "Unable to open file: $file";
But as with other answers, and to reiterate: Why not use grep? The unix command, not the Perl function.
This will get the file names you are looking for in perl, and probably do it much faster than running and doing a perl regex.
#files = `find ~/ServerLogs -name "*.log" | xargs grep -l "<ip address>"`'
Although, this will require a *nix compliant system, or Cygwin on Windows.
Firstly get a list of files within your source directory:
opendir(DIR, "$dir");
#files = grep(/\.log$/,readdir(DIR));
closedir(DIR);
And then loop through those files
foreach $file(#files)
{
// file processing code
}
My first suggest would be to use grep instead. The right tool for the job, they say...
But to answer your question:
readdir just returns the filenames from the directory. You'll need to concatenate the directory name and filename together.
$path = "$dirname/$filname";
open FH, $path or die ...
Then you should ignore files that are actually directories, such as "." and "..". After getting the $path, check to see if it's a file.
if (-f $path) {
open FH, $path or die ...
while (<FH>)
BTW, I thought I would throw in a mention for File::Next. To iterate over all files in a directory (recursively):
use Path::Class; # always useful.
use File::Next;
my $files = File::Next::files( dir(qw/path to files/) ); # look in path/to/files
while( defined ( my $file = $files->() ) ){
$file = file( $file );
say "Examining $file";
say "found foo" if $file->slurp =~ /foo/;
}
File::Next is taint-safe.
~ doesn't auto-expand in Perl.
opendir my $fh, '~/' or die("Doin It Wrong"); # Doing It Wrong.
opendir my $fh, glob('~/') and die( "Thats right!" );
Also, if you must use readdir(), make sure you guard the expression thus:
while (defined(my $filename = readdir(DH))) {
...
}
If you don't do the defined() test, the loop will terminate if it finds a file called '0'.
Have you looked on CPAN for log parsers? I searched with 'log parse' and it yielded over 200 hits. Some (probably many) won't be relevant - some may be. It depends, in part, on which web server you are using.
Am I reading this right? Your line 10 that gives you the error is
open FILE, "$file" or die "Unable to open files";
And the $file you are trying to read, according to line 6,
#files = grep(/\.*$/,readdir(DIR));
is a file that ends with zero or more dot. Is this what you really wanted? This basically matches every file in the directory, including "." and "..". Maybe you don't have enough permission to open the parent directory for reading?
EDIT: if you only want to read all files (including hidden ones), you might want to use something like the following:
opendir(DIR, ".");
#files = readdir(DIR);
closedir(DIR);
foreach $file (#files) {
if ($file ne "." and $file ne "..") {
open FILE, "$file" or die "cannot open $file\n";
# do stuff with FILE
}
}
Note that this doesn't take care of sub directories.
I know I am way late to this discussion (ran across it while searching for grep related posts) but I am going to answer anyway:
It isn't specified clearly if these are web server logs (Apache, IIS, W3SVC, etc.) but the best tool for mining those for data is the LogParser tool from Microsoft. See logparser.com for more info.
LogParser will allow you to write SQL-like statements against the log files. It is very flexible and very fast.
Use perl from the command line, like a better grep
perl -wnl -e '/12.211.23.200/ and print;' *.log > output.txt
the benefit here is that you can chain logic far easier
perl -wnl -e '(/12.211.23.20[1-11]/ or /denied/i ) and print;' *.log
if you are feeling wacky you can also use more advanced command line options to feed perl one liner result into other perl one liners.
You really need to read "Minimal Perl: For UNIX and Linux People", awesome book on this very sort of thing.
First, use grep.
But if you don't want to, here are two small improvements you can make that I haven't seen mentioned yet:
1) Change:
#files = grep(/\.*$/,readdir(DIR));
to
#files = grep({ !-d "$dir/$_" } readdir(DIR));
This way you will exclude not just "." and ".." but also any other subdirectories that may exist in the server log directory (which the open downstream would otherwise choke on).
2) Change:
print if /12.211.23.200/;
to
print if /12\.211\.23\.200/;
"." is a regex wildcard meaning "any character". Changing it to "\." will reduce the number of false positives (unlikely to change your results in practice but it's more correct anyway).