Modify Perl search in file to include only specified directories - perl

I found the code sample below here. It searches for text in files, recursing through sub-directories, but I want to specify a subset of the first level of sub-directories to recurse through.
E.g. suppose I'm in directory C:\ which contains directories bin, src, and Windows, and I want to recursively search for .h and .c files containing the text "include", I'd run the following with the MWE below, where my code is in textsearch.pl:
perl textsearch.pl include "(\.)(h|c)($)"
How can I modify this program to only search in bin and src but not Windows, while at the same time still recursing into sub-directories of bin and src? I.e. I'd like to be able to do something like the following:
perl textsearch.pl include "(\.)(h|c)($)" src,bin
I thought File::Find::Rule would help, but I'm having trouble figuring out how to apply it here.
Also, if there's another much simpler way to do all this, I'd love to hear it.
MWE I found:
use strict;
use warnings;
use Cwd;
use File::Find;
use File::Basename;
my ($in_rgx,$in_files,$simple,$matches,$cwd);
sub trim($) {
my $string = shift;
$string =~ s/[\r\n]+//g;
$string =~ s/\s+$//;
return $string;
}
# 1: Get input arguments
if ($#ARGV == 0) { # *** ONE ARGUMENT *** (search pattern)
($in_rgx,$in_files,$simple) = ($ARGV[0],".",1);
}
elsif ($#ARGV == 1) { # *** TWO ARGUMENTS *** (search pattern + filename or flag)
if (($ARGV[1] eq '-e') || ($ARGV[1] eq '-E')) { # extended
($in_rgx,$in_files,$simple) = ($ARGV[0],".",0);
}
else { # simple
($in_rgx,$in_files,$simple) = ($ARGV[0],$ARGV[1],1);
}
}
elsif ($#ARGV == 2) { # *** THREE ARGUMENTS *** (search pattern + filename + flag)
($in_rgx,$in_files,$simple) = ($ARGV[0],$ARGV[1],0);
}
else { # *** HELP *** (either no arguments or more than three)
print "Usage: ".basename($0)." regexpattern [filepattern] [-E]\n\n" .
"Hints:\n" .
"*) If you need spaces in your pattern, put quotation marks around it.\n" .
"*) To do a case insensitive match, use (?i) preceding the pattern.\n" .
"*) Both patterns are regular expressions, allowing powerful searches.\n" .
"*) The file pattern is always case insensitive.\n";
exit;
}
if ($in_files eq '.') { # 2: Output search header
print basename($0).": Searching all files for \"${in_rgx}\"... (".(($simple) ? "simple" : "extended").")\n";
}
else {
print basename($0).": Searching files matching \"${in_files}\" for \"${in_rgx}\"... (".(($simple) ? "simple" : "extended").")\n";
}
if ($simple) { print "\n"; } # 3: Traverse directory tree using subroutine 'findfiles'
($matches,$cwd) = (0,cwd);
$cwd =~ s,/,\\,g;
find(\&findfiles, $cwd);
sub findfiles { # 4: Used to iterate through each result
my $file = $File::Find::name; # complete path to the file
$file =~ s,/,\\,g; # substitute all / with \
return unless -f $file; # process files (-f), not directories
return unless $_ =~ m/$in_files/io; # check if file matches input regex
# /io = case-insensitive, compiled
# $_ = just the file name, no path
# 5: Open file and search for matching contents
open F, $file or print "\n* Couldn't open ${file}\n\n" && return;
if ($simple) { # *** SIMPLE OUTPUT ***
while (<F>) {
if (m/($in_rgx)/o) { # /o = compile regex
# file matched!
$matches++;
print "---" . # begin printing file header
sprintf("%04d", $matches) . # file number, padded with 4 zeros
"--- ".$file."\n"; # file name, keep original name
# end of file header
last; # go on to the next file
}
}
} # *** END OF SIMPLE OUTPUT ***
else { # *** EXTENDED OUTPUT ***
my $found = 0; # used to keep track of first match
my $binary = (-B $file) ? 1 : 0; # don't show contents if file is bin
$file =~ s/^\Q$cwd//g; # remove current working directory
# \Q = quotemeta, escapes string
while (<F>) {
if (m/($in_rgx)/o) { # /o = compile regex
# file matched!
if (!$found) { # first matching line for the file
$found = 1;
$matches++;
print "\n---" . # begin printing file header
sprintf("%04d", $matches) . # file number, padded with 4 zeros
"--- ".uc($file)."\n"; # file name, converted to uppercase
# end of file header
if ($binary) { # file is binary, do not show content
print "Binary file.\n";
last;
}
}
print "[$.]".trim($_)."\n"; # print line number and contents
#last; # uncomment to only show first line
}
}
} # *** END OF EXTENDED OUTPUT ***
# 6: Close the file and move on to the next result
close F;
}
#7: Show search statistics
print "\nMatches: ${matches}\n";
# Search Engine Source: http://www.adp-gmbh.ch/perl/find.html
# Rewritten by Christopher Hilding, Dec 02 2006
# Formatting adjusted to my liking by Rene Nyffenegger, Dec 22 2006

The second parameter to the find() method can be a list of directories to scan. replace $cwd with #some_list_of_directories and you should be good to go

Related

Perl Script Not Liking Date Extension

why do I receive the error complaining about the parenthesis ?
sh: syntax error at line 1 : `)' unexpected
when adding this date extension to the new file -- mv abc abc$(date +%Y%m%d%H%M%S)
for it seems that it doesn't like that last parenthesis
#!/usr/bin/perl
# =========================================== #
# Script to watch POEDIACK file size
#
# - Comments -
#
# script will check the file size of the POEDIACK file in
# $LAWDIR/$PLINE/edi/in.
# If it's > 1 gig, it will send notification via email
#
#
# =========================================== #
use strict;
use POSIX qw(strftime);
# get env vars from system
my $LAWDIR = #ENV{'LAWDIR'};
my $PLINE = #ENV{'PLINE'};
#my $email_file = "/lsf10/monitors/poediack.email";
my $curr_date = strftime('%m%d%Y', localtime);
my $ack_file = "$LAWDIR" . "/$PLINE" . "/edi/in/POEDIACK";
my $ack_location = "$LAWDIR" . "/$PLINE" . "/edi/in/";
my $mv_location = "$LAWDIR" . "/$PLINE" . "/edi/in/Z_files";
my $ack_file_limit = 10;
#my $ack_file_limit = 1000000000;
my $ack_file_size;
if( -e $ack_file)
{
$ack_file_size = -s $ack_file;
if ( $ack_file_size > $ack_file_limit )
{
`compress -vf $ack_file`;
`mv $mv_location\$ack_file.Z $mv_location\$ack_file.Z.$(date +%Y%m%d%H%M%S)`;
}
}
else
{
print "POEDIACK File not found: $ack_file\n";
}
### end perl script ###
$( is being interpreted as a variable. It is the group ID of the process. You need to escape it.
And you probably shouldn't escape $ack_file.
`mv $mv_location$ack_file.Z $mv_location$ack_file.Z.\$(date +%Y%m%d%H%M%S)`;
It's safer and faster to avoid complicated shell commands and use rename instead.
use autodie;
my $timestamp = strftime('%Y%m%d%H%M%S', localtime);
rename "$mv_location$ack_file.Z", "$mv_location$ack_file.Z.$timestamp";
Or use an existing log rotator.

Save contents of those files which contain a specific known string in an single .txt or .tmp file using perl

I'm trying to write a perl script where I'm trying to save whole contents of those files which contain a specific string 'PYAG_GENERATED', in a single .txt/.tmp file one after another. These file names are in a specific pattern and this pattern is 'output_nnnn.txt' where nnnn is 0001,0002 and so on. But I don't know how many number of files are present with this 'output_nnnn.txt' name.
I'm new in perl and I don't know how I can resolve this issue to get the output correctly. Can anyone help me. Thanks in advance.
I've tried to write perl script in different ways but nothing is coming in output file. I'm giving here one of those I've tried. 'new_1.txt' is the new file where I want to save the expected output and "PYAG_GENERATED" is that specific string I'm finding for in the files.
open(NEW,">>new_1.txt") or die "could not open:$!";
$find2="PYAG_GENERATED";
$n='0001';
while('output_$n.txt'){
if(/find2/){
print NEW;
}
$n++;
}
close NEW;
I expect that the output file 'new_1.txt' will save the whole contents of the the files(with filename pattern 'output_nnnn.txt') which have 'PYAG_GENERATED' string at least once inside.
Well, you tried I guess.
Welcome to the wonderful world of Perl where there are always a dozen ways of doing X :-) One possible way to achieve what you want. I put in a lot of comments I hope are helpful. It's also a bit verbose for the sake of clarity. I'm sure it could be golfed down to 5 lines of code.
use warnings; # Always start your Perl code with these two lines,
use strict; # and Perl will tell you about possible mistakes
use experimental 'signatures';
use File::Slurp;
# this is a subroutine/function, a block of code that can be called from
# somewhere else. it takes to arguments, that the caller must provide
sub find_in_file( $filename, $what_to_look_for )
{
# the open function opens $filename for reading
# (that's what the "<" means, ">" stands for writing)
# if successfull open will return we will have a "file handle" in the variable $in
# if not open will return false ...
open( my $in, "<", $filename )
or die $!; # ... and the program will exit here. The variable $! will contain the error message
# now we read the file using a loop
# readline will give us the next line in the file
# or something false when there is nothing left to read
while ( my $line = readline($in) )
{
# now we test wether the current line contains what
# we are looking for.
# the index function gives us the index of a string within another string.
# for example index("abc", "c") will give us 3
if ( index( $line, $what_to_look_for ) > 0 )
{
# we found what we were looking for
# so we don't need to keep looking in this file anymore
# so we must first close the file
close( $in );
# and then we indicate to the caller the search was a successfull
# this will immedeatly end the subroutine
return 1;
}
}
# If we arrive here the search was unsuccessful
# so we tell that to the caller
return 0;
}
# Here starts the main program
# First we get a list of files
# we want to look at
my #possible_files = glob( "where/your/files/are/output_*.txt" );
# Here we will store the files that we are interested in, aka that contain PYAG_GENERATED
my #wanted_files;
# and now we can loop over the files and see if they contain what we are looking for
foreach my $filename ( #possible_files )
{
# here we use the function we defined earlier
if ( find_in_file( $filename, "PYAG_GENERATED" ) )
{
# with push we can add things to the end of an array
push #wanted_files, $filename;
}
}
# We are finished searching, now we can start adding the files together
# if we found any
if ( scalar #wanted_files > 0 )
{
# Now we could code that us ourselves, open the files, loop trough them and write out
# line by line. But we make life easy for us and just
# use two functions from the module File::Slurp, which comes with Perl I believe
# If not you have to install it
foreach my $filename ( #wanted_files )
{
append_file( "new_1.txt", read_file( $filename ) );
}
print "Output created from " . (scalar #wanted_files) . " files\n";
}
else
{
print "No input files\n";
}
use strict;
use warnings;
my #a;
my $i=1;
my $find1="PYAG_GENERATED";
my $n=1;
my $total_files=47276; #got this no. of files by writing 'ls' command in the terminal
while($n<=$total_files){
open(NEW,"<output_$n.txt") or die "could not open:$!";
my $join=join('',<NEW>);
$a[$i]=$join;
#print "$a[10]";
$n++;
$i++;
}
close NEW;
for($i=1;$i<=$total_files;$i++){
if($a[$i]=~m/$find1/){
open(NEW1,">>new_1.tmp") or die "could not open:$!";
print NEW1 $a[$i];
}
}
close NEW1;

How to normalize a path in Perl? (without checking the filesystem)

I want the Perl's equivalent of Python's os.path.normpath():
Normalize a pathname by collapsing redundant separators and up-level references so that A//B, A/B/, A/./B and A/foo/../B all become A/B. This string manipulation may change the meaning of a path that contains symbolic links. […]
For instance, I want to convert '/a/../b/./c//d' into /b/c/d.
The path I'm manipulating does NOT represent a real directory in the local file tree. There are no symlinks involved. So a plain string manipulation works fine.
I tried Cwd::abs_path and File::Spec, but they don't do what I want.
my $path = '/a/../b/./c//d';
File::Spec->canonpath($path);
File::Spec->rel2abs($path, '/');
# Both return '/a/../b/c/d'.
# They don't remove '..' because it might change
# the meaning of the path in case of symlinks.
Cwd::abs_path($path);
# Returns undef.
# This checks for the path in the filesystem, which I don't want.
Cwd::fast_abs_path($path);
# Gives an error: No such file or directory
Possibly related link:
Normalized directory paths - perlmonks: people discuss several approaches.
Given that File::Spec is almost what I needed, I ended up writing a function that removes ../ from File::Spec->canonpath(). The full code including tests is available as a GitHub Gist.
use File::Spec;
sub path_normalize_by_string_manipulation {
my $path = shift;
# canonpath does string manipulation, but does not remove "..".
my $ret = File::Spec->canonpath($path);
# Let's remove ".." by using a regex.
while ($ret =~ s{
(^|/) # Either the beginning of the string, or a slash, save as $1
( # Followed by one of these:
[^/]| # * Any one character (except slash, obviously)
[^./][^/]| # * Two characters where
[^/][^./]| # they are not ".."
[^/][^/][^/]+ # * Three or more characters
) # Followed by:
/\.\./ # "/", followed by "../"
}{$1}x
) {
# Repeat this substitution until not possible anymore.
}
# Re-adding the trailing slash, if needed.
if ($path =~ m!/$! && $ret !~ m!/$!) {
$ret .= '/';
}
return $ret;
}
My use case was normalizing include paths inside files relative to another path. For example, I might have a file at '/home/me/dita-ot/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/rng/concept.rng' that includes the following file relative to itself:
<include href="../../base/rng/topicMod.rng"/>
and I needed the absolute path of that included file. (The including file path might be absolute or relative.)
Path::Tiny was promising, but I can only use core modules.
I tried using chdir to the include file location then using File::Spec->rel2abs() to resolve the path, but that was painfully slow on my system.
I ended up writing a subroutine to implement a simple string-based method of evaporating '../' components:
#!/usr/bin/perl
use strict;
use warnings;
use Cwd;
use File::Basename;
use File::Spec;
sub adjust_local_path {
my ($file, $relative_to) = #_;
return Cwd::realpath($file) if (($relative_to eq '.') || ($file =~ m!^\/!)); # handle the fast cases
$relative_to = dirname($relative_to) if (-f $relative_to);
$relative_to = Cwd::realpath($relative_to);
while ($file =~ s!^\.\./!!) { $relative_to =~ s!/[^/]+$!!; }
return File::Spec->catdir($relative_to, $file);
}
my $included_file = '/home/chrispy/dita-ot/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/rng/topic.rng';
my $source_file = '.././base/rng/topicMod.rng';
print adjust_local_path($included_file, $source_file)."\n";
The result of the script above is
$ ./test.pl
/home/me/dita-ot-3.1.3/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/base/rng/topicMod.rng
Using realpath() had the nice side-effect of resolving symlinks, which I needed. In the example above, dita-ot/ is a link to dita-ot-3.1.3/.
You can provide either a file or a path as the second argument; if it's a file, the directory path of that file is used. (This was convenient for my own purposes.)
Fixing Tom van der Woerdt code:
foreach my $path ("/a/b/c/d/../../../e" , "/a/../b/./c//d") {
my #c= reverse split m#/#, $path;
my #c_new;
while (#c) {
my $component= shift #c;
next unless length($component);
if ($component eq ".") { next; }
if ($component eq "..") {
my $i=0;
while ($c[$i] =~ m/^\.{0,2}$/) {
$i++
}
splice(#c, $i, 1);
next
}
push #c_new, $component;
}
print "/".join("/", reverse #c_new) ."\n";
}
Removing '.' and '..' from paths is pretty straight-forward if you process the path right-to-left :
my $path= "/a/../b/./c//d";
my #c= reverse split m#/#, $path;
my #c_new;
while (#c) {
my $component= shift #c;
next unless length($component);
if ($component eq ".") { next; }
if ($component eq "..") { shift #c; next }
push #c_new, $component;
}
say "/".join("/", reverse #c_new);
(Assumes the path starts with a /)
Note that this violates the UNIX pathname resolution standards, specifically this part :
A pathname that begins with two successive slashes may be interpreted in an implementation-defined manner, although more than two leading slashes shall be treated as a single slash.
The Path::Tiny module does exactly this:
use strict;
use warnings;
use 5.010;
use Path::Tiny;
say path('/a/../b/./c//d');
Output:
/b/c/d

New to Perl - Parsing file and replacing pattern with dynamic values

I am very new to Perl and i am currently trying to convert a bash script to perl.
My script is used to convert nmon files (AIX / Linux perf monitoring tool), it takes nmon files present in a directory, grep and redirect the specific section to a temp file, grep and redirect the associated timestamp to aother file.
Then, it parses data into a final csv file that will be indexed by a a third tool to be exploited.
A sample NMON data looks like:
TOP,%CPU Utilisation
TOP,+PID,Time,%CPU,%Usr,%Sys,Threads,Size,ResText,ResData,CharIO,%RAM,Paging,Command,WLMclass
TOP,5165226,T0002,10.93,9.98,0.95,1,54852,4232,51220,311014,0.755,1264,PatrolAgent,Unclassified
TOP,5365876,T0002,1.48,0.81,0.67,135,85032,132,84928,38165,1.159,0,db2sysc,Unclassified
TOP,5460056,T0002,0.32,0.27,0.05,1,5060,616,4704,1719,0.072,0,db2kmchan64.v9,Unclassified
The field "Time" (Seen as T0002 and really called ZZZZ in NMON) is a specific NMON timestamp, the real value of this timestamp is present later (in a dedicated section) in the NMON file and looks like:
ZZZZ,T0001,00:09:55,01-JAN-2014
ZZZZ,T0002,00:13:55,01-JAN-2014
ZZZZ,T0003,00:17:55,01-JAN-2014
ZZZZ,T0004,00:21:55,01-JAN-2014
ZZZZ,T0005,00:25:55,01-JAN-2014
The NMON format is very specific and can't be exploited directly without being parsed, the timestamp has to be associated with the corresponding value. (A NMON file is almost like a concatenation of numerous different csv files with each a different format, different fileds and so on.)
I wrote the following bash script to parse the section i'm interested in (The "TOP" section which represents top process cpu, mem, io stats per host)
#!/bin/bash
# set -x
################################################################
# INFORMATION
################################################################
# nmon2csv_TOP.sh
# Convert TOP section of nmon files to csv
# CAUTION: This script is expected to be launched by the main workflow
# $DST and DST_CONVERTED_TOP are being exported by it, if not this script will exit at launch time
################################################################
# VARS
################################################################
# Location of NMON files
NMON_DIR=${DST}
# Location of generated files
OUTPUT_DIR=${DST_CONVERTED_TOP}
# Temp files
rawdatafile=/tmp/temp_rawdata.$$.temp
timestampfile=/tmp/temp_timestamp.$$.temp
# Main Output file
finalfile=${DST_CONVERTED_TOP}/NMON_TOP_processed_at_date_`date '+%F'`.csv
###########################
# BEGIN OF WORK
###########################
# Verify exported var are not null
if [ -z ${NMON_DIR} ]; then
echo -e "\nERROR: Var NMON_DIR is null!\n" && exit 1
elif [ -z ${OUTPUT_DIR} ]; then
echo -e "\nERROR: Var OUTPUT_DIR is null!\n" && exit 1
fi
# Check if temp and output files already exists
if [ -s ${rawdatafile} ]; then
rm -f ${rawdatafile}
elif [ -s ${timestampfile} ]; then
rm -f ${timestampfile}
elif [ -s ${finalfile} ]; then
rm -f ${finalfile}
fi
# Get current location
PWD=`pwd`
# Go to NMON files location
cd ${NMON_DIR}
# For each NMON file present:
# To restrict to only PROD env: `ls *.nmon | grep -E -i 'sp|gp|ge'`
for NMON_FILE in `ls *.nmon | grep -E -i 'sp|gp|ge'`; do
# Set Hostname identification
serialnum=`grep 'AAA,SerialNumber,' ${NMON_FILE} | awk -F, '{print $3}' OFS=, | tr [:lower:] [:upper:]`
hostname=`grep 'AAA,host,' ${NMON_FILE} | awk -F, '{print $3}' OFS=, | tr [:lower:] [:upper:]`
# Grep and redirect TOP Section
grep 'TOP' ${NMON_FILE} | grep -v 'AAA,version,TOPAS-NMON' | grep -v 'TOP,%CPU Utilisation' > ${rawdatafile}
# Grep and redirect associated timestamps (ZZZZ)
grep 'ZZZZ' ${NMON_FILE}> ${timestampfile}
# Begin of work
while IFS=, read TOP PID Time Pct_CPU Pct_Usr Pct_Sys Threads Size ResText ResData CharIO Pct_RAM Paging Command WLMclass
do
timestamp=`grep ${Time} ${timestampfile} | awk -F, '{print $4 " "$3}' OFS=,`
echo ${serialnum},${hostname},${timestamp},${Time},${PID},${Pct_CPU},${Pct_Usr},${Pct_Sys},${Threads},${Size},${ResText},${ResData},${CharIO},${Pct_RAM},${Paging},${Command},${WLMclass} \
| grep -v '+PID,%CPU,%Usr,%Sys,Threads,Size,ResText,ResData,CharIO,%RAM,Paging,Command,WLMclass' >> ${finalfile}
done < ${rawdatafile}
echo -e "INFO: Done for Serialnum: ${serialnum} Hostname: ${hostname}"
done
# Go back to initial location
cd ${PWD}
###########################
# END OF WORK
###########################
This works as wanted and generate a main csv file (you'll see in the code that i voluntary don't keep the csv header in the file) wich is a concatenation of all parsed hosts.
But, i have a very large amount of host to treat each day (around 3000 hosts), with this current code and in worst cases, it can takes a few minutes to generate data for 1 host, multiplicated per number of hosts minutes becomes easily hours...
So, this code is really not performer enough to deal with such amount of data
10 hosts represents around 200.000 lines, which represents finally around 20 MB of csv file.
That's not that much, but i think that a shell script is probably not the better choice to manage such a process...
I guess that perl shall be much better at this task (even if the shell script could probably be improved), but my knowledge in perl is (currently) very poor, this is why i ask your help... I think that code should be quite simple to do in perl but i can't get it to work as for now...
One guy used to develop a perl script to manage NMON files and convert them to sql files (to dump these data into a database), i staged it to use its feature and with the help of some shell scripts i manage the sql files to get my final csv files.
But the TOP section was not integrated into that perl script and can't be used to that without being redeveloped.
The code in question:
#!/usr/bin/perl
# Program name: nmon2mysql.pl
# Purpose - convert nmon.csv file(s) into mysql insert file
# Author - Bruce Spencer
# Disclaimer: this provided "as is".
# Date - March 2007
#
$nmon2mysql_ver="1.0. March 2007";
use Time::Local;
#################################################
## Your Customizations Go Here ##
#################################################
# Source directory for nmon csv files
my $NMON_DIR=$ENV{DST_TMP};
my $OUTPUT_DIR=$ENV{DST_CONVERTED_CPU_ALL};
# End "Your Customizations Go Here".
# You're on your own, if you change anything beyond this line :-)
####################################################################
############# Main Program ############
####################################################################
# Initialize common variables
&initialize;
# Process all "nmon" files located in the $NMON_DIR
# #nmon_files=`ls $NMON_DIR/*.nmon $NMON_DIR/*.csv`;
#nmon_files=`ls $NMON_DIR/*.nmon`;
if (#nmon_files eq 0 ) { die ("No \*.nmon or csv files found in $NMON_DIR\n"); }
#nmon_files=sort(#nmon_files);
chomp(#nmon_files);
foreach $FILENAME ( #nmon_files ) {
#cols= split(/\//,$FILENAME);
$BASEFILENAME= $cols[#cols-1];
unless (open(INSERT, ">$OUTPUT_DIR/$BASEFILENAME.sql")) {
die("Can not open /$OUTPUT_DIR/$BASEFILENAME.sql\n");
}
print INSERT ("# nmon version: $NMONVER\n");
print INSERT ("# AIX version: $AIXVER\n");
print INSERT ("use nmon;\n");
$start=time();
#now=localtime($start);
$now=join(":",#now[2,1,0]);
print ("$now: Begin processing file = $FILENAME\n");
# Parse nmon file, skip if unsuccessful
if (( &get_nmon_data ) gt 0 ) { next; }
$now=time();
$now=$now-$start;
print ("\t$now: Finished get_nmon_data\n");
# Static variables (number of fields always the same)
##static_vars=("LPAR","CPU_ALL","FILE","MEM","PAGE","MEMNEW","MEMUSE","PROC");
##static_vars=("LPAR","CPU_ALL","FILE","MEM","PAGE","MEMNEW","MEMUSE");
#static_vars=("CPU_ALL");
foreach $key (#static_vars) {
&mk_mysql_insert_static($key);;
$now=time();
$now=$now-$start;
print ("\t$now: Finished $key\n");
} # end foreach
# Dynamic variables (variable number of fields)
##dynamic_vars=("DISKBSIZE","DISKBUSY","DISKREAD","DISKWRITE","DISKXFER","ESSREAD","ESSWRITE","ESSXFER","IOADAPT","NETERROR","NET","NETPACKET");
#dynamic_vars=("");
foreach $key (#dynamic_vars) {
&mk_mysql_insert_variable($key);;
$now=time();
$now=$now-$start;
print ("\t$now: Finished $key\n");
}
close(INSERT);
# system("gzip","$FILENAME");
}
exit(0);
############################################
############# Subroutines ############
############################################
##################################################################
## Extract CPU_ALL data for Static fields
##################################################################
sub mk_mysql_insert_static {
my($nmon_var)=#_;
my $table=lc($nmon_var);
my #rawdata;
my $x;
my #cols;
my $comma;
my $TS;
my $n;
#rawdata=grep(/^$nmon_var,/, #nmon);
if (#rawdata < 1) { return(1); }
#rawdata=sort(#rawdata);
#cols=split(/,/,$rawdata[0]);
$x=join(",",#cols[2..#cols-1]);
$x=~ s/\%/_PCT/g;
$x=~ s/\(MB\)/_MB/g;
$x=~ s/-/_/g;
$x=~ s/ /_/g;
$x=~ s/__/_/g;
$x=~ s/,_/,/g;
$x=~ s/_,/,/g;
$x=~ s/^_//;
$x=~ s/_$//;
print INSERT (qq|insert into $table (serialnum,hostname,mode,nmonver,time,ZZZZ,$x) values\n| );
$comma="";
$n=#cols;
$n=$n-1; # number of columns -1
for($i=1;$i<#rawdata;$i++){
$TS=$UTC_START + $INTERVAL*($i);
#cols=split(/,/,$rawdata[$i]);
$x=join(",",#cols[2..$n]);
$x=~ s/,,/,-1,/g; # replace missing data ",," with a ",-1,"
print INSERT (qq|$comma("$SN","$HOSTNAME","$MODE","$NMONVER",$TS,"$DATETIME{#cols[1]}",$x)| );
$comma=",\n";
}
print INSERT (qq|;\n\n|);
} # end mk_mysql_insert
##################################################################
## Extract CPU_ALL data for variable fields
##################################################################
sub mk_mysql_insert_variable {
my($nmon_var)=#_;
my $table=lc($nmon_var);
my #rawdata;
my $x;
my $j;
my #cols;
my $comma;
my $TS;
my $n;
my #devices;
#rawdata=grep(/^$nmon_var,/, #nmon);
if ( #rawdata < 1) { return; }
#rawdata=sort(#rawdata);
$rawdata[0]=~ s/\%/_PCT/g;
$rawdata[0]=~ s/\(/_/g;
$rawdata[0]=~ s/\)/_/g;
$rawdata[0]=~ s/ /_/g;
$rawdata[0]=~ s/__/_/g;
$rawdata[0]=~ s/,_/,/g;
#devices=split(/,/,$rawdata[0]);
print INSERT (qq|insert into $table (serialnum,hostname,time,ZZZZ,device,value) values\n| );
$n=#rawdata;
$n--;
for($i=1;$i<#rawdata;$i++){
$TS=$UTC_START + $INTERVAL*($i);
$rawdata[$i]=~ s/,$//;
#cols=split(/,/,$rawdata[$i]);
print INSERT (qq|\n("$SN","$HOSTNAME",$TS,"$DATETIME{$cols[1]}","$devices[2]",$cols[2])| );
for($j=3;$j<#cols;$j++){
print INSERT (qq|,\n("$SN","$HOSTNAME",$TS,"$DATETIME{$cols[1]}","$devices[$j]",$cols[$j])| );
}
if ($i < $n) { print INSERT (","); }
}
print INSERT (qq|;\n\n|);
} # end mk_mysql_insert_variable
########################################################
### Get an nmon setting from csv file ###
### finds first occurance of $search ###
### Return the selected column...$return_col ###
### Syntax: ###
### get_setting($search,$col_to_return,$separator)##
########################################################
sub get_setting {
my $i;
my $value="-1";
my ($search,$col,$separator)= #_; # search text, $col, $separator
for ($i=0; $i<#nmon; $i++){
if ($nmon[$i] =~ /$search/ ) {
$value=(split(/$separator/,$nmon[$i]))[$col];
$value =~ s/["']*//g; #remove non alphanum characters
return($value);
} # end if
} # end for
return($value);
} # end get_setting
#####################
## Clean up ##
#####################
sub clean_up_line {
# remove characters not compatible with nmon variable
# Max rrdtool variable length is 19 chars
# Variable can not contain special characters (% - () )
my ($x)=#_;
# print ("clean_up, before: $i\t$nmon[$i]\n");
$x =~ s/\%/Pct/g;
# $x =~ s/\W*//g;
$x =~ s/\/s/ps/g; # /s - ps
$x =~ s/\//s/g; # / - s
$x =~ s/\(/_/g;
$x =~ s/\)/_/g;
$x =~ s/ /_/g;
$x =~ s/-/_/g;
$x =~ s/_KBps//g;
$x =~ s/_tps//g;
$x =~ s/[:,]*\s*$//;
$retval=$x;
} # end clean up
##########################################
## Extract headings from nmon csv file ##
##########################################
sub initialize {
%MONTH2NUMBER = ("jan", 1, "feb",2, "mar",3, "apr",4, "may",5, "jun",6, "jul",7, "aug",8, "sep",9, "oct",10, "nov",11, "dec",12 );
#MONTH2ALPHA = ( "junk","jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec" );
} # end initialize
# Get data from nmon file, extract specific data fields (hostname, date, ...)
sub get_nmon_data {
my $key;
my $x;
my $category;
my %toc;
my #cols;
# Read nmon file
unless (open(FILE, $FILENAME)) { return(1); }
#nmon=<FILE>; # input entire file
close(FILE);
chomp(#nmon);
# Cleanup nmon data remove trainig commas and colons
for($i=0; $i<#nmon;$i++ ) {
$nmon[$i] =~ s/[:,]*\s*$//;
}
# Get nmon/server settings (search string, return column, delimiter)
$AIXVER =&get_setting("AIX",2,",");
$DATE =&get_setting("date",2,",");
$HOSTNAME =&get_setting("host",2,",");
$INTERVAL =&get_setting("interval",2,","); # nmon sampling interval
$MEMORY =&get_setting(qq|lsconf,"Good Memory Size:|,1,":");
$MODEL =&get_setting("modelname",3,'\s+');
$NMONVER =&get_setting("version",2,",");
$SNAPSHOTS =&get_setting("snapshots",2,","); # number of readings
$STARTTIME =&get_setting("AAA,time",2,",");
($HR, $MIN)=split(/\:/,$STARTTIME);
if ($AIXVER eq "-1") {
$SN=$HOSTNAME; # Probably a Linux host
} else {
$SN =&get_setting("systemid",4,",");
$SN =(split(/\s+/,$SN))[0]; # "systemid IBM,SN ..."
}
$TYPE =&get_setting("^BBBP.*Type",3,",");
if ( $TYPE =~ /Shared/ ) { $TYPE="SPLPAR"; } else { $TYPE="Dedicated"; }
$MODE =&get_setting("^BBBP.*Mode",3,",");
$MODE =(split(/: /, $MODE))[1];
# $MODE =~s/\"//g;
# Calculate UTC time (seconds since 1970)
# NMON V9 dd/mm/yy
# NMON V10+ dd-MMM-yyyy
if ( $DATE =~ /[a-zA-Z]/ ) { # Alpha = assume dd-MMM-yyyy date format
($DAY, $MMM, $YR)=split(/\-/,$DATE);
$MMM=lc($MMM);
$MON=$MONTH2NUMBER{$MMM};
} else {
($DAY, $MON, $YR)=split(/\//,$DATE);
$YR=$YR + 2000;
$MMM=$MONTH2ALPHA[$MON];
} # end if
## Calculate UTC time (seconds since 1970). Required format for the rrdtool.
## timelocal format
## day=1-31
## month=0-11
## year = x -1900 (time since 1900) (seems to work with either 2006 or 106)
$m=$MON - 1; # jan=0, feb=2, ...
$UTC_START=timelocal(0,$MIN,$HR,$DAY,$m,$YR);
$UTC_END=$UTC_START + $INTERVAL * $SNAPSHOTS;
#ZZZZ=grep(/^ZZZZ,/,#nmon);
for ($i=0;$i<#ZZZZ;$i++){
#cols=split(/,/,$ZZZZ[$i]);
($DAY,$MON,$YR)=split(/-/,$cols[3]);
$MON=lc($MON);
$MON="00" . $MONTH2NUMBER{$MON};
$MON=substr($MON,-2,2);
$ZZZZ[$i]="$YR-$MON-$DAY $cols[2]";
$DATETIME{$cols[1]}="$YR-$MON-$DAY $cols[2]";
} # end ZZZZ
return(0);
} # end get_nmon_data
It almost (i say almost because with recent NMON versions it can sometimes have some issue when no data present) does the job, and it does it much much faster that would do my shell script if i would use it for these section
This is why i think perl shall be a perfect solution.
Off course, i don't ask anyone to convert my shell script into something final in perl, but at least to give me to right direction :-)
I really thank anyone in advance for your help !
Normally i am strongly opposed to questions like this but our production systems are down and until they are fixed i do not really have all that much to do...
Here is some code that might get you started. Please consider it pseudo code as it is completely untested and probably won't even compile (i always forget some parantheses or semicolons and as i said, the actual machines that can run code are unreachable) but i commented a lot and hopefully you will be able to modify it to your actual needs and get it to run.
use strict;
use warnings;
open INFILE, "<", "path/to/file.nmon"; # Open the file.
my #topLines; # Initialize variables.
my %timestamps;
while <INFILE> # This will walk over all the lines of the infile.
{ # Storing the current line in $_.
chomp $_; # Remove newline at the end.
if ($_ =~ m/^TOP/) # If the line starts with TOP...
{
push #topLines, $_; # ...store it in the array for later use.
}
elsif ($_ =~ m/^ZZZZ/) # If it is in the ZZZZ section...
{
my #fields = split ',', $_; # ...split the line at commas...
my $timestamp = join ",", $fields(2), $fields(3); # ...join the timestamp into a string as you wish...
$timestamps{$fields(1)} = $timestamp; # ...and store it in the hash with the Twhatever thing as key.
}
# This iteration could certainly be improved with more knowledge
# of how the file looks. For example the search could be cancelled
# after the ZZZZ section if the file is still long.
}
close INFILE;
open OUTFILE, ">", "path/to/output.csv"; # Open the file you want your output in.
foreach (#topLines) # Iterate through all elements of the array.
{ # Once again storing the current value in $_.
my #fields = split ',', $_; # Probably not necessary, depending on how output should be formated.
my $outstring = join ',', $fields(0), $fields(1), $timestamps{$fields(2)}; # And whatever other fields you care for.
print OUTFILE $outstring, "\n"; # Print.
}
close OUTFILE;
print "Done.\n";

Perl Regular Expressions + delete line if it starts with #

How to delete lines if they begin with a "#" character using Perl regular expressions?
For example (need to delete the following examples)
line="#a"
line=" #a"
line="# a"
line=" # a"
...
the required syntax
$line =~ s/......../..
or skip loop if line begins with "#"
from my code:
open my $IN ,'<', $file or die "can't open '$file' for reading: $!";
while( defined( $line = <$IN> ) ){
.
.
.
You don't delete lines with s///. (In a loop, you probably want next;)
In the snippet you posted, it would be:
while (my $line = <IN>) {
if ($line =~ /^\s*#/) { next; }
# will skip the rest of the code if a line matches
...
}
Shorter forms /^\s*#/ and next; and next if /^\s*#/; are possible.
perldoc perlre
/^\s*#/
^ - "the beginning of the line"
\s - "a whitespace character"
* - "0 or more times"
# - just a #
Based off Aristotle Pagaltzis's answer you could do:
perl -ni.bak -e'print unless m/^\s*#/' deletelines.txt
Here, the -n switch makes perl put a loop around the code you provide
which will read all the files you pass on the command line in
sequence. The -i switch (for “in-place”) says to collect the output
from your script and overwrite the original contents of each file with
it. The .bak parameter to the -i option tells perl to keep a backup of
the original file in a file named after the original file name with
.bak appended. For all of these bits, see perldoc perlrun.
deletelines.txt (initially):
#a
b
#a
# a
c
# a
becomes:
b
c
Program (Cut & paste whole thing including DATA section, adjust shebang line, run)
#!/usr/bin/perl
use strict;
use warnings;
while(<DATA>) {
next if /^\s*#/; # skip comments
print; # process data
}
__DATA__
# comment
data
# another comment
more data
Output
data
more data
$text ~= /^\s*#.*\n//g
That will delete all of the lines with # in the entire file of $text, without requiring that you loop through each line of the text manually.