I want to extract the content except http header from tcp flow files
the content is like the following
the http header ends when two ^M are met
HTTP/1.1 200 OK^M
Last-Modified: Sat, 20 Mar 2010 09:43:12 GMT^M
Content-Type: video/x-flv^M
Date: Wed, 24 Oct 2012 14:34:13 GMT^M
Expires: Wed, 24 Oct 2012 14:34:13 GMT^M
Cache-Control: private, max-age=22124^M
Accept-Ranges: bytes^M
Content-Length: 29833281^M
Connection: close^M
X-Content-Type-Options: nosniff^M
Server: gvs 1.0^M
^M
FLV^A^E^#^#^# ^#^#^#^#^R^#^CK^#^#^#^#^#^#^#^B^#
onMetaData^H^#^#^#^O^#^Hduration^##i<97>
=p£×^# starttime^#^#^#^#^#^#^#^#^#^#^Mtotalduration^##i<97>
my code for extraction is as follows,
and I run : extract.pl < tcp.flow
but it seems the loop is endless,
what is wrong with the codes? thanks!
#!/usr/bin/perl
$start=0;
$data="";
while(<STDIN>)
{
if ( $start eq 0 && $_ =~ /^\r\n/) { $start = 1; }
elsif ( $start eq 1 ) { $data = $data . $_; }
}
open(FH, ">sample.flv");
print FH $data;
close(FH);
This is a one-liner. I see no reason for any endless loop, however.
perl -00 -lne '$i++ and print' file > sample.flv
Which deparsed looks like this:
>perl -MO=Deparse -00 -lne '$i++ and print' input.txt
BEGIN { $/ = ""; $\ = "\n\n"; } # from -l and -00
LINE: while (defined($_ = <ARGV>)) { # from -n
chomp $_; # from -l, removes "\n\n" now
print $_ if $i++; # skips the first line
}
-e syntax OK
If you need to clean your file up first, just do
perl -pi -le 's/[\r\n]+$//' input.txt
Call binmode() on STDIN before reading the data, it's possible that the contents of the file are interfering with the file reading. You'll want to use it on FH as well before writing the data. HTH
Related
I would like to know what is the equivalent code that Perl runs when executed with the options perl -pi -e?
On some SO question I can read this:
while (<>) {
... # your script goes here
} continue {
print;
}
But this example does not show the part where the file is saved.
How does Perl determine the EOL? Does it touch the file when no changes occured? For example if I have a old MAC file (\r only). How does it deal with s/^foo/bar/gm?
I tried to use the Perl debugger but it doesn't really help. So I am just trying to guess:
#!/usr/bin/env perl
my $pattern = shift;
map &process, #ARGV;
# perl -pi -e PATTERN <files>...
sub process {
next unless -f;
open my $fh, '<', $_;
my $extract;
read $fh, $extract, 1024;
seek &fh, 0, 0;
if ($extract =~ /\r\n/) {
$/ = "\r\n";
} elsif ($extract =~ /\r[^\n]/) {
$/ = "\r";
} else {
$/ = "\n";
}
my $out = '';
while(<&fh>) {
my $__ = $_;
eval $pattern;
my $changes = 1 if $_ ne $__;
$out .= $_;
}
if($changes)
{
open my $fh, '>', $_;
print $fh $out;
}
close &fh;
}
You can inspect the code actually used by Perl with the core module B::Deparse. This compiler backend module is activated with the option -MO=Deparse.
$ perl -MO=Deparse -p -i -e 's/X/U/' ./*.txt
BEGIN { $^I = ""; }
LINE: while (defined($_ = <ARGV>)) {
s/X/U/;
}
continue {
die "-p destination: $!\n" unless print $_;
}
-e syntax OK
Thus perl is looping over the lines in the given files, executes the code with $_ set to the line and prints the resulting $_.
The magic variabe $^I is set to an empty string. This turns on in place editing. In place editing is explained in perldoc perlrun. There is no check whether the file is unchanged. Thus the modified time of the edited file is always updated. Apparently the modified time of the backup file is the same as the modified time of the original file.
Using the -0 flag you can set the input record separator for using "\r" for your Mac files.
$ perl -e "print qq{aa\raa\raa}" > t.txt
$perl -015 -p -i.ori -e 's/a/b/' t.txt
$cat t.txt
ba
$ perl -MO=Deparse -015 -p -i.ori -e 's/a/b/'.txt
BEGIN { $^I = ".ori"; }
BEGIN { $/ = "\r"; $\ = undef; }
LINE: while (defined($_ = <ARGV>)) {
s/a/b/;
}
continue {
die "-p destination: $!\n" unless print $_;
}
-e syntax OK
From the perlrun documentation:
-p assumes an input loop around your script. Lines are printed.
-i files processed by the < > construct are to be edited in place.
-e may be used to enter a single line of script. Multiple -e commands may be given to build up a multiline script.
I am understanding perl in command line, please help me
what is equivalent in perl
awk '{for(i=1;i<=NF;i++)printf i < NF ? $i OFS : $i RS}' file
awk '!x[$0]++' file
awk 'FNR==NR{A[$0];next}($0 in A)' file1 file2
awk 'FNR==NR{A[$1]=$5 OFS $6;next}($1 in A){print $0,A[$1];delete A[$1]}' file1 file1
Please someone help me...
Try the awk to perl translator. For example:
$ echo awk '!x[$0]++' file | a2p
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if $running_under_some_shell;
# this emulates #! processing on NIH machines.
# (remove #! line above if indigestible)
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
# process any FOO=bar switches
while (<>) {
chomp; # strip record separator
print $_ if $awk;print $_ if !($X{$_}++ . $file);
}
You can ignore the boiler plate at the beginning and see the meat of the perl in the while loop. The translation is seldom perfect (even in this simple example, the perl code omits newlines), but it usually provides a reasonable approximation.
Another example (the one Peter is having trouble with in the comments):
$ echo '{for(i=1;i<=NF;i++)printf( i < NF ? ( $i OFS ) : ($i RS))}' | a2p
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if $running_under_some_shell;
# this emulates #! processing on NIH machines.
# (remove #! line above if indigestible)
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
# process any FOO=bar switches
$, = ' '; # set output field separator
while (<>) {
chomp; # strip record separator
#Fld = split(' ', $_, -1);
for ($i = 1; $i <= ($#Fld+1); $i++) {
printf (($i < ($#Fld+1) ? ($Fld[$i] . $,) : ($Fld[$i] . $/)));
}
}
Currently I'm using s3cmd ls s3://location/ > file.txt to get a list of contents of my s3 bucket and save on a txt. However the above returns dates, filesizes paths and filenames.
for example:
2011-10-18 08:52 6148 s3://location//picture_1.jpg
I only need the filenames of the s3 bucket - so on the above example I only need picture_1.jpg.
Any suggestions?
Could this be done with a Perl one liner maybe after the initial export?
Use awk:
s3cmd ls s3://location/ | awk '{ print $4 }' > file.txt
If you have filenames with spaces, try:
s3cmd ls s3://location/ | awk '{ s = ""; for (i = 4; i <= NF; i++) s = s $i " "; print s }' > file.txt
File::Listing does not support this format because the designers of this listing format were stupid enough to not simply reuse an existing one. Let's parse it manually instead.
use URI;
my #ls = (
"2011-10-18 08:52 6148 s3://location//picture_1.jpg\n",
"2011-10-18 08:52 6148 s3://location//picture_2.jpg\n",
"2011-10-18 08:52 6148 s3://location//picture_3.jpg\n",
);
for my $line (#ls) {
chomp $line;
my $basename = (URI->new((split q( ), $line)[-1])->path_segments)[-1];
}
__END__
picture_1.jpg
picture_2.jpg
picture_3.jpg
As oneliner:
perl -mURI -lne 'print ((URI->new((split q( ), $line)[-1])->path_segments)[-1])' < input
I am sure a specific module is the safer option, but if the data is reliable, you can get away with a one-liner:
Assuming the input is:
2011-10-18 08:52 6148 s3://location//picture_1.jpg
2011-10-18 08:52 6148 s3://location//picture_2.jpg
2011-10-18 08:52 6148 s3://location//picture_3.jpg
...
The one-liner:
perl -lnwe 'print for m#(?<=//)([^/]+)$#'
-l chomps the input, and adds newline to end of print statements
-n adds a while(<>) loop around the script
(?<=//) lookbehind assertion finds a double slash
...followed by non-slashes to the end of the line
The for loop assures us that non-matches are not printed.
The benefit of the -n option is that this one-liner may be used in a pipe, or on a file.
command | perl -lnwe '...'
perl -lnwe '...' filename
How can I write a Perl script to convert a text file to all upper case letters?
perl -ne "print uc" < input.txt
The -n wraps your command line script (which is supplied by -e) in a while loop. A uc returns the ALL-UPPERCASE version of the default variable $_, and what print does, well, you know it yourself. ;-)
The -p is just like -n, but it does a print in addition. Again, acting on the default variable $_.
To store that in a script file:
#!perl -n
print uc;
Call it like this:
perl uc.pl < in.txt > out.txt
$ perl -pe '$_= uc($_)' input.txt > output.txt
perl -pe '$_ = uc($_)' input.txt > output.txt
But then you don't even need Perl if you're using Linux (or *nix). Some other ways are:
awk:
awk '{ print toupper($0) }' input.txt >output.txt
tr:
tr '[:lower:]' '[:upper:]' < input.txt > output.txt
$ perl -Tpe " $_ = uc; " --
$ perl -MO=Deparse -Tpe " $_ = uc; " -- a s d f
LINE: while (defined($_ = <ARGV>)) {
$_ = uc $_;
}
continue {
die "-p destination: $!\n" unless print $_;
}
-e syntax OK
$ cat myprogram.pl
#!/usr/bin/perl -T --
LINE: while (defined($_ = <ARGV>)) {
$_ = uc $_;
}
continue {
die "-p destination: $!\n" unless print $_;
}
Perl newbie here. I have a log file that I need to parse out "Backup succeeded" and any "Error:" entries. I tried parsing the log file by using unix cat and piping it to grep. I got the information that I want, but I would like try this in perl and to also have the option to pass a date parameter and give me the lines based on the date I need.
Sample of log file output: (Backup succeeded)
Wed Jun 09 06:14:25 2010: db2.cal.mil.mad:backup:INFO: flush-logs-time=00:00:00
Wed Jun 09 06:14:25 2010: db2.cal.mil.mad:backup:INFO: backup-time=06:14:23
Wed Jun 09 06:14:25 2010: db2.cal.mil.mad:backup:INFO: backup-status=Backup succeeded
Wed Jun 09 06:14:25 2010: db2.cal.mil.mad:backup:INFO: Backup succeeded
Sample of log file output: (Error:)
Wed Jun 09 05:00:03 2010: rip1.mil.mad:backup:ERROR: mysql-zrm appears to be already running for this backupset
Wed Jun 09 05:00:03 2010: rip1.mil.mad:backup:ERROR: If you are sure mysql-zrm is not running, please remove the file /etc/mysql-zrm/rip1.mail.mad/.mysql-zrm.pid and restart mysql-zrm
**I would like a text and/or email with this information. Like so, but with the option to pass in the date I need.
Wed Jun 09 05:00:03 2010: rip1.mil.mad:backup:ERROR: mysql-zrm appears to be already running for this backupset
Wed Jun 09 05:00:03 2010: rip1.mil.mad:backup:ERROR: If you are sure mysql-zrm is not running, please remove the file /etc/mysql-zrm/rip1.mail.mad/.mysql-zrm.pid and restart mysql-zrm
Wed Jun 09 06:14:25 2010: db2.cal.mil.mad:backup:INFO: backup-status=Backup succeeded
If you would please provide me with some perl code and/or ideas to get started. I would appreciate the help. Thank you.
#!/usr/bin/perl
# usage example: <this script> Jun 09 2010 <logfile>
use strict;
use warnings;
my ($mon,$day,$year) = ($ARGV[0],$ARGV[1],$ARGV[2]);
open(FH,"< $ARGV[3]") or die "can't open log file $ARGV[3]: $!\n";
while (my $line = <FH>) {
if ($line =~ /.* $mon $day \d{2}:\d{2}:\d{2} $year:.*(ERROR:|Backup succeeded)/) {
print $line;
}
}
Here's a simple script. The file name to scan and the target date are hard-coded. Matches are printed to STDOUT.
BTW, this code is totally untested. I typed it into the text box in my browser.
use strict;
use warnings;
my $logpath = './bar/log';
my $target = 'Jun 09 2010';
open my $fh, '<', $logpath or die "Error opening $logpath $!\n";
while (my $line = <$fh> ) {
next unless date_match( $target, $line );
next unless my $result = got_error($line) // got_backup($line);
print $result;
}
sub got_backup {
my $line = shift;
return unless $line =~ /backup-status=Backup succeeded/;
return $line;
}
sub got_error {
my $line = shift;
return unless $line =~ /:ERROR:/;
return $line;
}
# Take a line and a target date. Compare the date derived from the line to
# the target, and returns true if they match.
# Also always returns true if target is not defined
sub date_match {
my $target = shift;
my $line = shift;
return 1 unless defined $target; # Always true if target is undefined.
# Where did that god-awful date format come from? Yech.
my $date = extract_date($line);
return $date eq $target;
}
# Simple extract of date using split and join with extra variables
# to make it newbie friendly.
# IMO, it would be a good idea to switch to using DateTime objects and
# DateTime::Format::Strptime
sub extract_date {
my $line = shift;
my #parts = split /:/, $line;
my $date = join ':' #parts[0..2];
#parts = split /\s+/, $date;
$date = #parts[1,2,4];
return $date;
}
You can use Getopt::Long to get a filename and target date.
It would be a good idea to use a more robust date/time parsing and comparison scheme. DateTime and friends are very good, powerful modules for date manipulation. Check them out.
If you are processing tons of data and need to be more efficient, you can avoid copying $line everywhere in a number of ways.
For future reference, if you post a little code, you'll get better responses