I want the Perl's equivalent of Python's os.path.normpath():
Normalize a pathname by collapsing redundant separators and up-level references so that A//B, A/B/, A/./B and A/foo/../B all become A/B. This string manipulation may change the meaning of a path that contains symbolic links. […]
For instance, I want to convert '/a/../b/./c//d' into /b/c/d.
The path I'm manipulating does NOT represent a real directory in the local file tree. There are no symlinks involved. So a plain string manipulation works fine.
I tried Cwd::abs_path and File::Spec, but they don't do what I want.
my $path = '/a/../b/./c//d';
File::Spec->canonpath($path);
File::Spec->rel2abs($path, '/');
# Both return '/a/../b/c/d'.
# They don't remove '..' because it might change
# the meaning of the path in case of symlinks.
Cwd::abs_path($path);
# Returns undef.
# This checks for the path in the filesystem, which I don't want.
Cwd::fast_abs_path($path);
# Gives an error: No such file or directory
Possibly related link:
Normalized directory paths - perlmonks: people discuss several approaches.
Given that File::Spec is almost what I needed, I ended up writing a function that removes ../ from File::Spec->canonpath(). The full code including tests is available as a GitHub Gist.
use File::Spec;
sub path_normalize_by_string_manipulation {
my $path = shift;
# canonpath does string manipulation, but does not remove "..".
my $ret = File::Spec->canonpath($path);
# Let's remove ".." by using a regex.
while ($ret =~ s{
(^|/) # Either the beginning of the string, or a slash, save as $1
( # Followed by one of these:
[^/]| # * Any one character (except slash, obviously)
[^./][^/]| # * Two characters where
[^/][^./]| # they are not ".."
[^/][^/][^/]+ # * Three or more characters
) # Followed by:
/\.\./ # "/", followed by "../"
}{$1}x
) {
# Repeat this substitution until not possible anymore.
}
# Re-adding the trailing slash, if needed.
if ($path =~ m!/$! && $ret !~ m!/$!) {
$ret .= '/';
}
return $ret;
}
My use case was normalizing include paths inside files relative to another path. For example, I might have a file at '/home/me/dita-ot/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/rng/concept.rng' that includes the following file relative to itself:
<include href="../../base/rng/topicMod.rng"/>
and I needed the absolute path of that included file. (The including file path might be absolute or relative.)
Path::Tiny was promising, but I can only use core modules.
I tried using chdir to the include file location then using File::Spec->rel2abs() to resolve the path, but that was painfully slow on my system.
I ended up writing a subroutine to implement a simple string-based method of evaporating '../' components:
#!/usr/bin/perl
use strict;
use warnings;
use Cwd;
use File::Basename;
use File::Spec;
sub adjust_local_path {
my ($file, $relative_to) = #_;
return Cwd::realpath($file) if (($relative_to eq '.') || ($file =~ m!^\/!)); # handle the fast cases
$relative_to = dirname($relative_to) if (-f $relative_to);
$relative_to = Cwd::realpath($relative_to);
while ($file =~ s!^\.\./!!) { $relative_to =~ s!/[^/]+$!!; }
return File::Spec->catdir($relative_to, $file);
}
my $included_file = '/home/chrispy/dita-ot/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/rng/topic.rng';
my $source_file = '.././base/rng/topicMod.rng';
print adjust_local_path($included_file, $source_file)."\n";
The result of the script above is
$ ./test.pl
/home/me/dita-ot-3.1.3/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/base/rng/topicMod.rng
Using realpath() had the nice side-effect of resolving symlinks, which I needed. In the example above, dita-ot/ is a link to dita-ot-3.1.3/.
You can provide either a file or a path as the second argument; if it's a file, the directory path of that file is used. (This was convenient for my own purposes.)
Fixing Tom van der Woerdt code:
foreach my $path ("/a/b/c/d/../../../e" , "/a/../b/./c//d") {
my #c= reverse split m#/#, $path;
my #c_new;
while (#c) {
my $component= shift #c;
next unless length($component);
if ($component eq ".") { next; }
if ($component eq "..") {
my $i=0;
while ($c[$i] =~ m/^\.{0,2}$/) {
$i++
}
splice(#c, $i, 1);
next
}
push #c_new, $component;
}
print "/".join("/", reverse #c_new) ."\n";
}
Removing '.' and '..' from paths is pretty straight-forward if you process the path right-to-left :
my $path= "/a/../b/./c//d";
my #c= reverse split m#/#, $path;
my #c_new;
while (#c) {
my $component= shift #c;
next unless length($component);
if ($component eq ".") { next; }
if ($component eq "..") { shift #c; next }
push #c_new, $component;
}
say "/".join("/", reverse #c_new);
(Assumes the path starts with a /)
Note that this violates the UNIX pathname resolution standards, specifically this part :
A pathname that begins with two successive slashes may be interpreted in an implementation-defined manner, although more than two leading slashes shall be treated as a single slash.
The Path::Tiny module does exactly this:
use strict;
use warnings;
use 5.010;
use Path::Tiny;
say path('/a/../b/./c//d');
Output:
/b/c/d
I am very new to Perl and i am currently trying to convert a bash script to perl.
My script is used to convert nmon files (AIX / Linux perf monitoring tool), it takes nmon files present in a directory, grep and redirect the specific section to a temp file, grep and redirect the associated timestamp to aother file.
Then, it parses data into a final csv file that will be indexed by a a third tool to be exploited.
A sample NMON data looks like:
TOP,%CPU Utilisation
TOP,+PID,Time,%CPU,%Usr,%Sys,Threads,Size,ResText,ResData,CharIO,%RAM,Paging,Command,WLMclass
TOP,5165226,T0002,10.93,9.98,0.95,1,54852,4232,51220,311014,0.755,1264,PatrolAgent,Unclassified
TOP,5365876,T0002,1.48,0.81,0.67,135,85032,132,84928,38165,1.159,0,db2sysc,Unclassified
TOP,5460056,T0002,0.32,0.27,0.05,1,5060,616,4704,1719,0.072,0,db2kmchan64.v9,Unclassified
The field "Time" (Seen as T0002 and really called ZZZZ in NMON) is a specific NMON timestamp, the real value of this timestamp is present later (in a dedicated section) in the NMON file and looks like:
ZZZZ,T0001,00:09:55,01-JAN-2014
ZZZZ,T0002,00:13:55,01-JAN-2014
ZZZZ,T0003,00:17:55,01-JAN-2014
ZZZZ,T0004,00:21:55,01-JAN-2014
ZZZZ,T0005,00:25:55,01-JAN-2014
The NMON format is very specific and can't be exploited directly without being parsed, the timestamp has to be associated with the corresponding value. (A NMON file is almost like a concatenation of numerous different csv files with each a different format, different fileds and so on.)
I wrote the following bash script to parse the section i'm interested in (The "TOP" section which represents top process cpu, mem, io stats per host)
#!/bin/bash
# set -x
################################################################
# INFORMATION
################################################################
# nmon2csv_TOP.sh
# Convert TOP section of nmon files to csv
# CAUTION: This script is expected to be launched by the main workflow
# $DST and DST_CONVERTED_TOP are being exported by it, if not this script will exit at launch time
################################################################
# VARS
################################################################
# Location of NMON files
NMON_DIR=${DST}
# Location of generated files
OUTPUT_DIR=${DST_CONVERTED_TOP}
# Temp files
rawdatafile=/tmp/temp_rawdata.$$.temp
timestampfile=/tmp/temp_timestamp.$$.temp
# Main Output file
finalfile=${DST_CONVERTED_TOP}/NMON_TOP_processed_at_date_`date '+%F'`.csv
###########################
# BEGIN OF WORK
###########################
# Verify exported var are not null
if [ -z ${NMON_DIR} ]; then
echo -e "\nERROR: Var NMON_DIR is null!\n" && exit 1
elif [ -z ${OUTPUT_DIR} ]; then
echo -e "\nERROR: Var OUTPUT_DIR is null!\n" && exit 1
fi
# Check if temp and output files already exists
if [ -s ${rawdatafile} ]; then
rm -f ${rawdatafile}
elif [ -s ${timestampfile} ]; then
rm -f ${timestampfile}
elif [ -s ${finalfile} ]; then
rm -f ${finalfile}
fi
# Get current location
PWD=`pwd`
# Go to NMON files location
cd ${NMON_DIR}
# For each NMON file present:
# To restrict to only PROD env: `ls *.nmon | grep -E -i 'sp|gp|ge'`
for NMON_FILE in `ls *.nmon | grep -E -i 'sp|gp|ge'`; do
# Set Hostname identification
serialnum=`grep 'AAA,SerialNumber,' ${NMON_FILE} | awk -F, '{print $3}' OFS=, | tr [:lower:] [:upper:]`
hostname=`grep 'AAA,host,' ${NMON_FILE} | awk -F, '{print $3}' OFS=, | tr [:lower:] [:upper:]`
# Grep and redirect TOP Section
grep 'TOP' ${NMON_FILE} | grep -v 'AAA,version,TOPAS-NMON' | grep -v 'TOP,%CPU Utilisation' > ${rawdatafile}
# Grep and redirect associated timestamps (ZZZZ)
grep 'ZZZZ' ${NMON_FILE}> ${timestampfile}
# Begin of work
while IFS=, read TOP PID Time Pct_CPU Pct_Usr Pct_Sys Threads Size ResText ResData CharIO Pct_RAM Paging Command WLMclass
do
timestamp=`grep ${Time} ${timestampfile} | awk -F, '{print $4 " "$3}' OFS=,`
echo ${serialnum},${hostname},${timestamp},${Time},${PID},${Pct_CPU},${Pct_Usr},${Pct_Sys},${Threads},${Size},${ResText},${ResData},${CharIO},${Pct_RAM},${Paging},${Command},${WLMclass} \
| grep -v '+PID,%CPU,%Usr,%Sys,Threads,Size,ResText,ResData,CharIO,%RAM,Paging,Command,WLMclass' >> ${finalfile}
done < ${rawdatafile}
echo -e "INFO: Done for Serialnum: ${serialnum} Hostname: ${hostname}"
done
# Go back to initial location
cd ${PWD}
###########################
# END OF WORK
###########################
This works as wanted and generate a main csv file (you'll see in the code that i voluntary don't keep the csv header in the file) wich is a concatenation of all parsed hosts.
But, i have a very large amount of host to treat each day (around 3000 hosts), with this current code and in worst cases, it can takes a few minutes to generate data for 1 host, multiplicated per number of hosts minutes becomes easily hours...
So, this code is really not performer enough to deal with such amount of data
10 hosts represents around 200.000 lines, which represents finally around 20 MB of csv file.
That's not that much, but i think that a shell script is probably not the better choice to manage such a process...
I guess that perl shall be much better at this task (even if the shell script could probably be improved), but my knowledge in perl is (currently) very poor, this is why i ask your help... I think that code should be quite simple to do in perl but i can't get it to work as for now...
One guy used to develop a perl script to manage NMON files and convert them to sql files (to dump these data into a database), i staged it to use its feature and with the help of some shell scripts i manage the sql files to get my final csv files.
But the TOP section was not integrated into that perl script and can't be used to that without being redeveloped.
The code in question:
#!/usr/bin/perl
# Program name: nmon2mysql.pl
# Purpose - convert nmon.csv file(s) into mysql insert file
# Author - Bruce Spencer
# Disclaimer: this provided "as is".
# Date - March 2007
#
$nmon2mysql_ver="1.0. March 2007";
use Time::Local;
#################################################
## Your Customizations Go Here ##
#################################################
# Source directory for nmon csv files
my $NMON_DIR=$ENV{DST_TMP};
my $OUTPUT_DIR=$ENV{DST_CONVERTED_CPU_ALL};
# End "Your Customizations Go Here".
# You're on your own, if you change anything beyond this line :-)
####################################################################
############# Main Program ############
####################################################################
# Initialize common variables
&initialize;
# Process all "nmon" files located in the $NMON_DIR
# #nmon_files=`ls $NMON_DIR/*.nmon $NMON_DIR/*.csv`;
#nmon_files=`ls $NMON_DIR/*.nmon`;
if (#nmon_files eq 0 ) { die ("No \*.nmon or csv files found in $NMON_DIR\n"); }
#nmon_files=sort(#nmon_files);
chomp(#nmon_files);
foreach $FILENAME ( #nmon_files ) {
#cols= split(/\//,$FILENAME);
$BASEFILENAME= $cols[#cols-1];
unless (open(INSERT, ">$OUTPUT_DIR/$BASEFILENAME.sql")) {
die("Can not open /$OUTPUT_DIR/$BASEFILENAME.sql\n");
}
print INSERT ("# nmon version: $NMONVER\n");
print INSERT ("# AIX version: $AIXVER\n");
print INSERT ("use nmon;\n");
$start=time();
#now=localtime($start);
$now=join(":",#now[2,1,0]);
print ("$now: Begin processing file = $FILENAME\n");
# Parse nmon file, skip if unsuccessful
if (( &get_nmon_data ) gt 0 ) { next; }
$now=time();
$now=$now-$start;
print ("\t$now: Finished get_nmon_data\n");
# Static variables (number of fields always the same)
##static_vars=("LPAR","CPU_ALL","FILE","MEM","PAGE","MEMNEW","MEMUSE","PROC");
##static_vars=("LPAR","CPU_ALL","FILE","MEM","PAGE","MEMNEW","MEMUSE");
#static_vars=("CPU_ALL");
foreach $key (#static_vars) {
&mk_mysql_insert_static($key);;
$now=time();
$now=$now-$start;
print ("\t$now: Finished $key\n");
} # end foreach
# Dynamic variables (variable number of fields)
##dynamic_vars=("DISKBSIZE","DISKBUSY","DISKREAD","DISKWRITE","DISKXFER","ESSREAD","ESSWRITE","ESSXFER","IOADAPT","NETERROR","NET","NETPACKET");
#dynamic_vars=("");
foreach $key (#dynamic_vars) {
&mk_mysql_insert_variable($key);;
$now=time();
$now=$now-$start;
print ("\t$now: Finished $key\n");
}
close(INSERT);
# system("gzip","$FILENAME");
}
exit(0);
############################################
############# Subroutines ############
############################################
##################################################################
## Extract CPU_ALL data for Static fields
##################################################################
sub mk_mysql_insert_static {
my($nmon_var)=#_;
my $table=lc($nmon_var);
my #rawdata;
my $x;
my #cols;
my $comma;
my $TS;
my $n;
#rawdata=grep(/^$nmon_var,/, #nmon);
if (#rawdata < 1) { return(1); }
#rawdata=sort(#rawdata);
#cols=split(/,/,$rawdata[0]);
$x=join(",",#cols[2..#cols-1]);
$x=~ s/\%/_PCT/g;
$x=~ s/\(MB\)/_MB/g;
$x=~ s/-/_/g;
$x=~ s/ /_/g;
$x=~ s/__/_/g;
$x=~ s/,_/,/g;
$x=~ s/_,/,/g;
$x=~ s/^_//;
$x=~ s/_$//;
print INSERT (qq|insert into $table (serialnum,hostname,mode,nmonver,time,ZZZZ,$x) values\n| );
$comma="";
$n=#cols;
$n=$n-1; # number of columns -1
for($i=1;$i<#rawdata;$i++){
$TS=$UTC_START + $INTERVAL*($i);
#cols=split(/,/,$rawdata[$i]);
$x=join(",",#cols[2..$n]);
$x=~ s/,,/,-1,/g; # replace missing data ",," with a ",-1,"
print INSERT (qq|$comma("$SN","$HOSTNAME","$MODE","$NMONVER",$TS,"$DATETIME{#cols[1]}",$x)| );
$comma=",\n";
}
print INSERT (qq|;\n\n|);
} # end mk_mysql_insert
##################################################################
## Extract CPU_ALL data for variable fields
##################################################################
sub mk_mysql_insert_variable {
my($nmon_var)=#_;
my $table=lc($nmon_var);
my #rawdata;
my $x;
my $j;
my #cols;
my $comma;
my $TS;
my $n;
my #devices;
#rawdata=grep(/^$nmon_var,/, #nmon);
if ( #rawdata < 1) { return; }
#rawdata=sort(#rawdata);
$rawdata[0]=~ s/\%/_PCT/g;
$rawdata[0]=~ s/\(/_/g;
$rawdata[0]=~ s/\)/_/g;
$rawdata[0]=~ s/ /_/g;
$rawdata[0]=~ s/__/_/g;
$rawdata[0]=~ s/,_/,/g;
#devices=split(/,/,$rawdata[0]);
print INSERT (qq|insert into $table (serialnum,hostname,time,ZZZZ,device,value) values\n| );
$n=#rawdata;
$n--;
for($i=1;$i<#rawdata;$i++){
$TS=$UTC_START + $INTERVAL*($i);
$rawdata[$i]=~ s/,$//;
#cols=split(/,/,$rawdata[$i]);
print INSERT (qq|\n("$SN","$HOSTNAME",$TS,"$DATETIME{$cols[1]}","$devices[2]",$cols[2])| );
for($j=3;$j<#cols;$j++){
print INSERT (qq|,\n("$SN","$HOSTNAME",$TS,"$DATETIME{$cols[1]}","$devices[$j]",$cols[$j])| );
}
if ($i < $n) { print INSERT (","); }
}
print INSERT (qq|;\n\n|);
} # end mk_mysql_insert_variable
########################################################
### Get an nmon setting from csv file ###
### finds first occurance of $search ###
### Return the selected column...$return_col ###
### Syntax: ###
### get_setting($search,$col_to_return,$separator)##
########################################################
sub get_setting {
my $i;
my $value="-1";
my ($search,$col,$separator)= #_; # search text, $col, $separator
for ($i=0; $i<#nmon; $i++){
if ($nmon[$i] =~ /$search/ ) {
$value=(split(/$separator/,$nmon[$i]))[$col];
$value =~ s/["']*//g; #remove non alphanum characters
return($value);
} # end if
} # end for
return($value);
} # end get_setting
#####################
## Clean up ##
#####################
sub clean_up_line {
# remove characters not compatible with nmon variable
# Max rrdtool variable length is 19 chars
# Variable can not contain special characters (% - () )
my ($x)=#_;
# print ("clean_up, before: $i\t$nmon[$i]\n");
$x =~ s/\%/Pct/g;
# $x =~ s/\W*//g;
$x =~ s/\/s/ps/g; # /s - ps
$x =~ s/\//s/g; # / - s
$x =~ s/\(/_/g;
$x =~ s/\)/_/g;
$x =~ s/ /_/g;
$x =~ s/-/_/g;
$x =~ s/_KBps//g;
$x =~ s/_tps//g;
$x =~ s/[:,]*\s*$//;
$retval=$x;
} # end clean up
##########################################
## Extract headings from nmon csv file ##
##########################################
sub initialize {
%MONTH2NUMBER = ("jan", 1, "feb",2, "mar",3, "apr",4, "may",5, "jun",6, "jul",7, "aug",8, "sep",9, "oct",10, "nov",11, "dec",12 );
#MONTH2ALPHA = ( "junk","jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec" );
} # end initialize
# Get data from nmon file, extract specific data fields (hostname, date, ...)
sub get_nmon_data {
my $key;
my $x;
my $category;
my %toc;
my #cols;
# Read nmon file
unless (open(FILE, $FILENAME)) { return(1); }
#nmon=<FILE>; # input entire file
close(FILE);
chomp(#nmon);
# Cleanup nmon data remove trainig commas and colons
for($i=0; $i<#nmon;$i++ ) {
$nmon[$i] =~ s/[:,]*\s*$//;
}
# Get nmon/server settings (search string, return column, delimiter)
$AIXVER =&get_setting("AIX",2,",");
$DATE =&get_setting("date",2,",");
$HOSTNAME =&get_setting("host",2,",");
$INTERVAL =&get_setting("interval",2,","); # nmon sampling interval
$MEMORY =&get_setting(qq|lsconf,"Good Memory Size:|,1,":");
$MODEL =&get_setting("modelname",3,'\s+');
$NMONVER =&get_setting("version",2,",");
$SNAPSHOTS =&get_setting("snapshots",2,","); # number of readings
$STARTTIME =&get_setting("AAA,time",2,",");
($HR, $MIN)=split(/\:/,$STARTTIME);
if ($AIXVER eq "-1") {
$SN=$HOSTNAME; # Probably a Linux host
} else {
$SN =&get_setting("systemid",4,",");
$SN =(split(/\s+/,$SN))[0]; # "systemid IBM,SN ..."
}
$TYPE =&get_setting("^BBBP.*Type",3,",");
if ( $TYPE =~ /Shared/ ) { $TYPE="SPLPAR"; } else { $TYPE="Dedicated"; }
$MODE =&get_setting("^BBBP.*Mode",3,",");
$MODE =(split(/: /, $MODE))[1];
# $MODE =~s/\"//g;
# Calculate UTC time (seconds since 1970)
# NMON V9 dd/mm/yy
# NMON V10+ dd-MMM-yyyy
if ( $DATE =~ /[a-zA-Z]/ ) { # Alpha = assume dd-MMM-yyyy date format
($DAY, $MMM, $YR)=split(/\-/,$DATE);
$MMM=lc($MMM);
$MON=$MONTH2NUMBER{$MMM};
} else {
($DAY, $MON, $YR)=split(/\//,$DATE);
$YR=$YR + 2000;
$MMM=$MONTH2ALPHA[$MON];
} # end if
## Calculate UTC time (seconds since 1970). Required format for the rrdtool.
## timelocal format
## day=1-31
## month=0-11
## year = x -1900 (time since 1900) (seems to work with either 2006 or 106)
$m=$MON - 1; # jan=0, feb=2, ...
$UTC_START=timelocal(0,$MIN,$HR,$DAY,$m,$YR);
$UTC_END=$UTC_START + $INTERVAL * $SNAPSHOTS;
#ZZZZ=grep(/^ZZZZ,/,#nmon);
for ($i=0;$i<#ZZZZ;$i++){
#cols=split(/,/,$ZZZZ[$i]);
($DAY,$MON,$YR)=split(/-/,$cols[3]);
$MON=lc($MON);
$MON="00" . $MONTH2NUMBER{$MON};
$MON=substr($MON,-2,2);
$ZZZZ[$i]="$YR-$MON-$DAY $cols[2]";
$DATETIME{$cols[1]}="$YR-$MON-$DAY $cols[2]";
} # end ZZZZ
return(0);
} # end get_nmon_data
It almost (i say almost because with recent NMON versions it can sometimes have some issue when no data present) does the job, and it does it much much faster that would do my shell script if i would use it for these section
This is why i think perl shall be a perfect solution.
Off course, i don't ask anyone to convert my shell script into something final in perl, but at least to give me to right direction :-)
I really thank anyone in advance for your help !
Normally i am strongly opposed to questions like this but our production systems are down and until they are fixed i do not really have all that much to do...
Here is some code that might get you started. Please consider it pseudo code as it is completely untested and probably won't even compile (i always forget some parantheses or semicolons and as i said, the actual machines that can run code are unreachable) but i commented a lot and hopefully you will be able to modify it to your actual needs and get it to run.
use strict;
use warnings;
open INFILE, "<", "path/to/file.nmon"; # Open the file.
my #topLines; # Initialize variables.
my %timestamps;
while <INFILE> # This will walk over all the lines of the infile.
{ # Storing the current line in $_.
chomp $_; # Remove newline at the end.
if ($_ =~ m/^TOP/) # If the line starts with TOP...
{
push #topLines, $_; # ...store it in the array for later use.
}
elsif ($_ =~ m/^ZZZZ/) # If it is in the ZZZZ section...
{
my #fields = split ',', $_; # ...split the line at commas...
my $timestamp = join ",", $fields(2), $fields(3); # ...join the timestamp into a string as you wish...
$timestamps{$fields(1)} = $timestamp; # ...and store it in the hash with the Twhatever thing as key.
}
# This iteration could certainly be improved with more knowledge
# of how the file looks. For example the search could be cancelled
# after the ZZZZ section if the file is still long.
}
close INFILE;
open OUTFILE, ">", "path/to/output.csv"; # Open the file you want your output in.
foreach (#topLines) # Iterate through all elements of the array.
{ # Once again storing the current value in $_.
my #fields = split ',', $_; # Probably not necessary, depending on how output should be formated.
my $outstring = join ',', $fields(0), $fields(1), $timestamps{$fields(2)}; # And whatever other fields you care for.
print OUTFILE $outstring, "\n"; # Print.
}
close OUTFILE;
print "Done.\n";