awk sed perl to merge rows based on keyword match - perl

I've been banging my head head against the wall on this issue due to my limited awk/sed wizardry. I'm happy to use awk,sed,bash,perl, or whatever to accomplish this text manipulation.
I have the following output and would like to merge lines based on a sort of key match:
Node: server1
Active Server: SECONDARY
Standby Server: PRIMARY
Primary 192.168.1.1
Secondary 192.168.1.2
Node: server2
Active Server: PRIMARY
Standby Server: SECONDARY
Primary 10.1.1.1
Secondary 10.1.1.2
Desired output:
Node: server1
Active Server: Secondary 192.168.1.2
Standby Server: Primary 192.168.1.1
Node: server2
Active Server: Primary 10.1.1.1
Standby Server: Secondary 10.1.1.2
So I need the lines to merge based on the words "primary" and "secondary". My first thought was to change "Primary" to "PRIMARY" so it would be easier to match.
My eventual goal is to have this:
server1,Active,192.168.1.2,Standby,192.168.1.1
server2,Active,10.1.1.1,Standy,10.1.1.2
(but I can figure this part out after help merging the rows)
Thanks for the help!

This Perl solution seems to do what you ask. It simply pulls the values into a hash line by line, and dumps the hash contents when all the required values are present.
Update I've used any from List::Util in place of grep to make the code more legible.
use strict;
use warnings;
use autodie;
use List::Util 'any';
my #names = qw/ node active standby primary secondary /;
open my $fh, '<', 'myfile.txt';
my %server;
while (my $line = <$fh>) {
next unless my ($key, $val) = lc($line) =~ /(\w+).*\s+(\S+)/;
%server = () if $key eq 'server';
$server{$key} = $val;
unless ( any { not exists $server{$_} } #names ) {
printf "%s,Active,%s,Standby,%s\n", #server{'node', $server{active}, $server{standby}};
%server = ();
}
}
output
server1,Active,192.168.1.2,Standby,192.168.1.1
server2,Active,10.1.1.1,Standby,10.1.1.2

It is dense and very ugly multi-liner,
perl -00 -nE'
s/ ^(\w+)\s+([\d.]+)\s* / $s{$1}=$2; ""/xmge;
($l=$_) =~ s! \s*\w+:\s*|\n !,!xg;
$l =~ s|\U$_|$s{$_}| for keys %s;
($_=$l) =~ s/^,|,$//g;
say
' file
output
server1,Active,192.168.1.2,Standby,192.168.1.1
server2,Active,10.1.1.1,Standby,10.1.1.2
Explanation
# -00 => instead of single line read lines into $_ until \n\n+
perl -00 -nE'
# read and remove 'Primary|Secondary IP' into $s{Primary} = IP
s/ ^(\w+)\s+([\d.]+)\s* / $s{$1}=$2; ""/xmge;
# replace 'something:' or new line by ','
($l=$_) =~ s! \s*\w+:\s*|\n !,!xg;
# replace SECONDARY|PRIMARY with actual IP address
$l =~ s|\U$_|$s{$_}| for keys %s;
# remove ',' at beginning and end of the string
($_=$l) =~ s/^,|,$//g;
# print result
say
' file

You can use tr to eliminate spaces, then sed to put then back in the right place and use perl to get the output you want:
Input file:
tiago#dell:/tmp$ cat file
Node: server1
Active Server: SECONDARY
Standby Server: PRIMARY
Primary 192.168.1.1
Secondary 192.168.1.2
Node: server2
Active Server: PRIMARY
Standby Server: SECONDARY
Primary 10.1.1.1
Secondary 10.1.1.2
Script:
tiago#dell:/tmp$ cat test.sh
#! /bin/bash
tr -d '\n' < $1 | sed -r 's/(Node:)/\n\1/g' |\
perl -lne '
/^\s+$/ && next;
/Node:\s+(\w+.*?)\s/ && {$server=$1};
/Active Server:\s+(\w+.*?)\s/ && {$active=$1};
/Standby Server:\s+(\w+.*?)\s/ && {$standby=$1};
/Primary\s+(\w+.*?)\s/ && {$pri=$1};
/Secondary\s+(\w+.*?)\s/ && {$sec=$1};
if ( "$active" eq "PRIMARY" ){
$out="$server,Active,$pri,Standby,$sec";
}else{
$out="$server,Active,$sec,Standby,$pri";
}
print $out;
'
Execution:
tiago#dell:/tmp$ bash test.sh file
server1,Active,192.168.1.2,Standby,192.168.1.1
server2,Active,10.1.1.1,Standby,192.168.1.2

Or using a one-liner for the intermediate desired solution (final solution to follow):
perl -00 -lpe '
s/ Server: \K(\w+)(?=.*^(\1[^\n]*))/$2/ismg;
s/\n[^:]+$//;
' file.txt
Outputs:
Node: server1
Active Server: Secondary 192.168.1.2
Standby Server: Primary 192.168.1.1
Node: server2
Active Server: Primary 10.1.1.1
Standby Server: Secondary 10.1.1.2
Explanation:
Switches:
-00: process input in paragraph mode (separated by double returns)
-l: enable line ending processing
-p: assume "while (<>) { ...; print; }" loop around program
-e: evaluate perl code
Code:
Replace all Server values with a matching line that begins with the same key
Remove the server list at the bottom.
To get the final solution you want, the following one liner will accomplish that goal.
There are some slight changes from the first solution like using -n instead of -p because we want to move from two newlines between records to one new line. However, the regex tools are the same:
perl -00 -ne'
s/ Server: (\w+)(?=.*^\1\s+(\S+))/:$2/ismg;
s/\n[^:]+$//;
s/^Node: //;
s/[\n:]/,/g;
print "$_\n";
' file.txt
Outputs:
server1,Active,192.168.1.2,Standby,192.168.1.1
server2,Active,10.1.1.1,Standby,10.1.1.2

awk '
$1 == "Active" {active = tolower($NF); next}
$1 == "Standby" {standby = tolower($NF); next}
$1 == "Primary" {ip["primary"] = $0; next}
$1 == "Secondary" {
ip["secondary"] = $0
print "Active Server:",ip[active]
print "Standby Server:",ip[standby]
next
}
1
'
This assumes the "Secondary" line is at the end of a "block".
To achieve your next output:
awk -v OFS="," '
$1 == "Node:" {node = $NF}
$1 == "Active" {active = tolower($NF)}
$1 == "Standby" {standby = tolower($NF)}
$1 == "Primary" {ip["primary"] = $2}
$1 == "Secondary" {
ip["secondary"] = $2;
print node, "Active",ip[active],"Standup",ip[standby]
}
'
Responding to jhill's comment:
awk -v RS="" -v OFS=, '{
node = active = standby = ""
delete ip
for (i=1; i<NF; i++) {
if ($i == "Node:") {node=$(++i)}
else if ($i == "Active") {active = tolower( $(i+=2) )}
else if ($i == "Standby") {standby = tolower( $(i+=2) )}
else if ($i == "Primary") {ip["primary"] = $(++i)}
else if ($i == "Secondary") {ip["secondary"] = $(++i)}
}
print node, "Active", ip[active], "Standup", ip[standby]
}'

A bit more verbose:
use strict;
use warnings;
use feature qw/say/;
my $struct;
local $/ = 'Node: ';
for my $record (<DATA>) {
next if $record =~ /^Node:/; # skip first
my ($node, #values) = split /\n\s*/, $record;
for my $line (#values) {
my ($intent, $actual, $ip);
if ( ($intent, $actual) = $line =~ /(Active|Standby) Server: (.*)$/ ) {
$struct->{$node}{lc($intent)} = lc($actual);
}
elsif ( ($actual, $ip) = $line =~ /(Primary|Secondary) (.*)$/ ) {
$struct->{$node}{lc($actual)} = $ip;
}
}
}
for my $node (sort keys %$struct) {
printf "Node: %s\n", $node;
printf "Active server: %s %s\n", ucfirst $struct->{$node}{active}, $struct->{$node}{$struct->{$node}{active}};
printf "Standby server: %s %s\n", ucfirst $struct->{$node}{standby}, $struct->{$node}{$struct->{$node}{standby}};
print "\n";
}
## Desired final output is simpler:
for my $node (sort keys %$struct) {
say join ',', $node, 'Active', $struct->{$node}{$struct->{$node}{active}}, 'Standby', $struct->{$node}{$struct->{$node}{standby}};
}
__DATA__
Node: server1
Active Server: SECONDARY
Standby Server: PRIMARY
Primary 192.168.1.1
Secondary 192.168.1.2
Node: server2
Active Server: PRIMARY
Standby Server: SECONDARY
Primary 10.1.1.1
Secondary 10.1.1.2

Here's an option in awk.
#!/usr/bin/awk -f
# Output processing goes in a function, as it's called from different places
function spew() {
split(servers[d["active"]], active);
split(servers[d["standby"]], standby);
printf("%s,%s,%s,%s,%s\n",
d["name"], active[1], active[2], standby[1], standby[2]);
}
# trim unnecessary (leading) whitespace
1 { $1=$1; }
# Store our references
$1=="Active" {
d["active"]=tolower($3);
}
#
$1=="Standby" {
d["standby"]=tolower($3);
}
# And store our data
/^ *[A-za-z]+ [0-9.]+$/ {
servers[tolower($1)]=tolower($0);
}
# Then, if we hit a new record, process the last one.
$1=="Node:" && length(d["name"]) {
spew();
}
# And if we've just process a record, clear our workspace.
$1=="Node:" {
delete d;
delete s;
d["name"]=$2;
}
# Finally, process the last record.
END {
spew();
}
An advantage of this over some of the other solutions is that it can handle names other than "primary" and "secondary". The idea is that if you have data like:
Node: serverN
Active Server: starfleet
Standby Server: babylon5
starfleet 172.16.0.1
babylon5 172.16.0.2
the Active/Standby lines will refer to a record by its index, rather than assuming "Primary" or "Secondary".
I've normalized everything to lower case for easier handling, but you can of course adjust tolower() to suit.

awk ' s==0{print;s=1;next;}
s==1{i=$0;s=2;next;}
s==2{j=$0;s=3;next;}
s==3{r1=$0;s=4;next;}
s==4{r2=$0;
sub(/SECONDARY/,r2,i);sub(/PRIMARY/,r1,j);
sub(/SECONDARY/,r2,j);sub(/PRIMARY/,r1,i);
s=5; print i;print j;next}
s==5{s=0;print}' input.txt
Output:
Node: server1
Active Server: Secondary 192.168.1.2
Standby Server: Primary 192.168.1.1
Node: server2
Active Server: Primary 10.1.1.1
Standby Server: Secondary 10.1.1.2
Prints first line of the current input section, stores next four lines in variables, then makes replacements and then print the result. then reads and print the blank line and starts again for next section.

You can use this awk
awk -v RS="" '{$5=tolower($5);sub(".",substr(toupper($5),1,1),$5);$8=tolower($8);sub(".",substr(toupper($8),1,1),$8);print $1,$2"\n"$3,$4,$5,$10"\n",$6,$7,$8,$12}' file
Node: server1
Active Server: Secondary 192.168.1.1
Standby Server: Primary 192.168.1.2
Node: server2
Active Server: Primary 10.1.1.1
Standby Server: Secondary 10.1.1.2
By sette setting RS to nothing, awk works with group of line.

Related

New to Perl - Parsing file and replacing pattern with dynamic values

I am very new to Perl and i am currently trying to convert a bash script to perl.
My script is used to convert nmon files (AIX / Linux perf monitoring tool), it takes nmon files present in a directory, grep and redirect the specific section to a temp file, grep and redirect the associated timestamp to aother file.
Then, it parses data into a final csv file that will be indexed by a a third tool to be exploited.
A sample NMON data looks like:
TOP,%CPU Utilisation
TOP,+PID,Time,%CPU,%Usr,%Sys,Threads,Size,ResText,ResData,CharIO,%RAM,Paging,Command,WLMclass
TOP,5165226,T0002,10.93,9.98,0.95,1,54852,4232,51220,311014,0.755,1264,PatrolAgent,Unclassified
TOP,5365876,T0002,1.48,0.81,0.67,135,85032,132,84928,38165,1.159,0,db2sysc,Unclassified
TOP,5460056,T0002,0.32,0.27,0.05,1,5060,616,4704,1719,0.072,0,db2kmchan64.v9,Unclassified
The field "Time" (Seen as T0002 and really called ZZZZ in NMON) is a specific NMON timestamp, the real value of this timestamp is present later (in a dedicated section) in the NMON file and looks like:
ZZZZ,T0001,00:09:55,01-JAN-2014
ZZZZ,T0002,00:13:55,01-JAN-2014
ZZZZ,T0003,00:17:55,01-JAN-2014
ZZZZ,T0004,00:21:55,01-JAN-2014
ZZZZ,T0005,00:25:55,01-JAN-2014
The NMON format is very specific and can't be exploited directly without being parsed, the timestamp has to be associated with the corresponding value. (A NMON file is almost like a concatenation of numerous different csv files with each a different format, different fileds and so on.)
I wrote the following bash script to parse the section i'm interested in (The "TOP" section which represents top process cpu, mem, io stats per host)
#!/bin/bash
# set -x
################################################################
# INFORMATION
################################################################
# nmon2csv_TOP.sh
# Convert TOP section of nmon files to csv
# CAUTION: This script is expected to be launched by the main workflow
# $DST and DST_CONVERTED_TOP are being exported by it, if not this script will exit at launch time
################################################################
# VARS
################################################################
# Location of NMON files
NMON_DIR=${DST}
# Location of generated files
OUTPUT_DIR=${DST_CONVERTED_TOP}
# Temp files
rawdatafile=/tmp/temp_rawdata.$$.temp
timestampfile=/tmp/temp_timestamp.$$.temp
# Main Output file
finalfile=${DST_CONVERTED_TOP}/NMON_TOP_processed_at_date_`date '+%F'`.csv
###########################
# BEGIN OF WORK
###########################
# Verify exported var are not null
if [ -z ${NMON_DIR} ]; then
echo -e "\nERROR: Var NMON_DIR is null!\n" && exit 1
elif [ -z ${OUTPUT_DIR} ]; then
echo -e "\nERROR: Var OUTPUT_DIR is null!\n" && exit 1
fi
# Check if temp and output files already exists
if [ -s ${rawdatafile} ]; then
rm -f ${rawdatafile}
elif [ -s ${timestampfile} ]; then
rm -f ${timestampfile}
elif [ -s ${finalfile} ]; then
rm -f ${finalfile}
fi
# Get current location
PWD=`pwd`
# Go to NMON files location
cd ${NMON_DIR}
# For each NMON file present:
# To restrict to only PROD env: `ls *.nmon | grep -E -i 'sp|gp|ge'`
for NMON_FILE in `ls *.nmon | grep -E -i 'sp|gp|ge'`; do
# Set Hostname identification
serialnum=`grep 'AAA,SerialNumber,' ${NMON_FILE} | awk -F, '{print $3}' OFS=, | tr [:lower:] [:upper:]`
hostname=`grep 'AAA,host,' ${NMON_FILE} | awk -F, '{print $3}' OFS=, | tr [:lower:] [:upper:]`
# Grep and redirect TOP Section
grep 'TOP' ${NMON_FILE} | grep -v 'AAA,version,TOPAS-NMON' | grep -v 'TOP,%CPU Utilisation' > ${rawdatafile}
# Grep and redirect associated timestamps (ZZZZ)
grep 'ZZZZ' ${NMON_FILE}> ${timestampfile}
# Begin of work
while IFS=, read TOP PID Time Pct_CPU Pct_Usr Pct_Sys Threads Size ResText ResData CharIO Pct_RAM Paging Command WLMclass
do
timestamp=`grep ${Time} ${timestampfile} | awk -F, '{print $4 " "$3}' OFS=,`
echo ${serialnum},${hostname},${timestamp},${Time},${PID},${Pct_CPU},${Pct_Usr},${Pct_Sys},${Threads},${Size},${ResText},${ResData},${CharIO},${Pct_RAM},${Paging},${Command},${WLMclass} \
| grep -v '+PID,%CPU,%Usr,%Sys,Threads,Size,ResText,ResData,CharIO,%RAM,Paging,Command,WLMclass' >> ${finalfile}
done < ${rawdatafile}
echo -e "INFO: Done for Serialnum: ${serialnum} Hostname: ${hostname}"
done
# Go back to initial location
cd ${PWD}
###########################
# END OF WORK
###########################
This works as wanted and generate a main csv file (you'll see in the code that i voluntary don't keep the csv header in the file) wich is a concatenation of all parsed hosts.
But, i have a very large amount of host to treat each day (around 3000 hosts), with this current code and in worst cases, it can takes a few minutes to generate data for 1 host, multiplicated per number of hosts minutes becomes easily hours...
So, this code is really not performer enough to deal with such amount of data
10 hosts represents around 200.000 lines, which represents finally around 20 MB of csv file.
That's not that much, but i think that a shell script is probably not the better choice to manage such a process...
I guess that perl shall be much better at this task (even if the shell script could probably be improved), but my knowledge in perl is (currently) very poor, this is why i ask your help... I think that code should be quite simple to do in perl but i can't get it to work as for now...
One guy used to develop a perl script to manage NMON files and convert them to sql files (to dump these data into a database), i staged it to use its feature and with the help of some shell scripts i manage the sql files to get my final csv files.
But the TOP section was not integrated into that perl script and can't be used to that without being redeveloped.
The code in question:
#!/usr/bin/perl
# Program name: nmon2mysql.pl
# Purpose - convert nmon.csv file(s) into mysql insert file
# Author - Bruce Spencer
# Disclaimer: this provided "as is".
# Date - March 2007
#
$nmon2mysql_ver="1.0. March 2007";
use Time::Local;
#################################################
## Your Customizations Go Here ##
#################################################
# Source directory for nmon csv files
my $NMON_DIR=$ENV{DST_TMP};
my $OUTPUT_DIR=$ENV{DST_CONVERTED_CPU_ALL};
# End "Your Customizations Go Here".
# You're on your own, if you change anything beyond this line :-)
####################################################################
############# Main Program ############
####################################################################
# Initialize common variables
&initialize;
# Process all "nmon" files located in the $NMON_DIR
# #nmon_files=`ls $NMON_DIR/*.nmon $NMON_DIR/*.csv`;
#nmon_files=`ls $NMON_DIR/*.nmon`;
if (#nmon_files eq 0 ) { die ("No \*.nmon or csv files found in $NMON_DIR\n"); }
#nmon_files=sort(#nmon_files);
chomp(#nmon_files);
foreach $FILENAME ( #nmon_files ) {
#cols= split(/\//,$FILENAME);
$BASEFILENAME= $cols[#cols-1];
unless (open(INSERT, ">$OUTPUT_DIR/$BASEFILENAME.sql")) {
die("Can not open /$OUTPUT_DIR/$BASEFILENAME.sql\n");
}
print INSERT ("# nmon version: $NMONVER\n");
print INSERT ("# AIX version: $AIXVER\n");
print INSERT ("use nmon;\n");
$start=time();
#now=localtime($start);
$now=join(":",#now[2,1,0]);
print ("$now: Begin processing file = $FILENAME\n");
# Parse nmon file, skip if unsuccessful
if (( &get_nmon_data ) gt 0 ) { next; }
$now=time();
$now=$now-$start;
print ("\t$now: Finished get_nmon_data\n");
# Static variables (number of fields always the same)
##static_vars=("LPAR","CPU_ALL","FILE","MEM","PAGE","MEMNEW","MEMUSE","PROC");
##static_vars=("LPAR","CPU_ALL","FILE","MEM","PAGE","MEMNEW","MEMUSE");
#static_vars=("CPU_ALL");
foreach $key (#static_vars) {
&mk_mysql_insert_static($key);;
$now=time();
$now=$now-$start;
print ("\t$now: Finished $key\n");
} # end foreach
# Dynamic variables (variable number of fields)
##dynamic_vars=("DISKBSIZE","DISKBUSY","DISKREAD","DISKWRITE","DISKXFER","ESSREAD","ESSWRITE","ESSXFER","IOADAPT","NETERROR","NET","NETPACKET");
#dynamic_vars=("");
foreach $key (#dynamic_vars) {
&mk_mysql_insert_variable($key);;
$now=time();
$now=$now-$start;
print ("\t$now: Finished $key\n");
}
close(INSERT);
# system("gzip","$FILENAME");
}
exit(0);
############################################
############# Subroutines ############
############################################
##################################################################
## Extract CPU_ALL data for Static fields
##################################################################
sub mk_mysql_insert_static {
my($nmon_var)=#_;
my $table=lc($nmon_var);
my #rawdata;
my $x;
my #cols;
my $comma;
my $TS;
my $n;
#rawdata=grep(/^$nmon_var,/, #nmon);
if (#rawdata < 1) { return(1); }
#rawdata=sort(#rawdata);
#cols=split(/,/,$rawdata[0]);
$x=join(",",#cols[2..#cols-1]);
$x=~ s/\%/_PCT/g;
$x=~ s/\(MB\)/_MB/g;
$x=~ s/-/_/g;
$x=~ s/ /_/g;
$x=~ s/__/_/g;
$x=~ s/,_/,/g;
$x=~ s/_,/,/g;
$x=~ s/^_//;
$x=~ s/_$//;
print INSERT (qq|insert into $table (serialnum,hostname,mode,nmonver,time,ZZZZ,$x) values\n| );
$comma="";
$n=#cols;
$n=$n-1; # number of columns -1
for($i=1;$i<#rawdata;$i++){
$TS=$UTC_START + $INTERVAL*($i);
#cols=split(/,/,$rawdata[$i]);
$x=join(",",#cols[2..$n]);
$x=~ s/,,/,-1,/g; # replace missing data ",," with a ",-1,"
print INSERT (qq|$comma("$SN","$HOSTNAME","$MODE","$NMONVER",$TS,"$DATETIME{#cols[1]}",$x)| );
$comma=",\n";
}
print INSERT (qq|;\n\n|);
} # end mk_mysql_insert
##################################################################
## Extract CPU_ALL data for variable fields
##################################################################
sub mk_mysql_insert_variable {
my($nmon_var)=#_;
my $table=lc($nmon_var);
my #rawdata;
my $x;
my $j;
my #cols;
my $comma;
my $TS;
my $n;
my #devices;
#rawdata=grep(/^$nmon_var,/, #nmon);
if ( #rawdata < 1) { return; }
#rawdata=sort(#rawdata);
$rawdata[0]=~ s/\%/_PCT/g;
$rawdata[0]=~ s/\(/_/g;
$rawdata[0]=~ s/\)/_/g;
$rawdata[0]=~ s/ /_/g;
$rawdata[0]=~ s/__/_/g;
$rawdata[0]=~ s/,_/,/g;
#devices=split(/,/,$rawdata[0]);
print INSERT (qq|insert into $table (serialnum,hostname,time,ZZZZ,device,value) values\n| );
$n=#rawdata;
$n--;
for($i=1;$i<#rawdata;$i++){
$TS=$UTC_START + $INTERVAL*($i);
$rawdata[$i]=~ s/,$//;
#cols=split(/,/,$rawdata[$i]);
print INSERT (qq|\n("$SN","$HOSTNAME",$TS,"$DATETIME{$cols[1]}","$devices[2]",$cols[2])| );
for($j=3;$j<#cols;$j++){
print INSERT (qq|,\n("$SN","$HOSTNAME",$TS,"$DATETIME{$cols[1]}","$devices[$j]",$cols[$j])| );
}
if ($i < $n) { print INSERT (","); }
}
print INSERT (qq|;\n\n|);
} # end mk_mysql_insert_variable
########################################################
### Get an nmon setting from csv file ###
### finds first occurance of $search ###
### Return the selected column...$return_col ###
### Syntax: ###
### get_setting($search,$col_to_return,$separator)##
########################################################
sub get_setting {
my $i;
my $value="-1";
my ($search,$col,$separator)= #_; # search text, $col, $separator
for ($i=0; $i<#nmon; $i++){
if ($nmon[$i] =~ /$search/ ) {
$value=(split(/$separator/,$nmon[$i]))[$col];
$value =~ s/["']*//g; #remove non alphanum characters
return($value);
} # end if
} # end for
return($value);
} # end get_setting
#####################
## Clean up ##
#####################
sub clean_up_line {
# remove characters not compatible with nmon variable
# Max rrdtool variable length is 19 chars
# Variable can not contain special characters (% - () )
my ($x)=#_;
# print ("clean_up, before: $i\t$nmon[$i]\n");
$x =~ s/\%/Pct/g;
# $x =~ s/\W*//g;
$x =~ s/\/s/ps/g; # /s - ps
$x =~ s/\//s/g; # / - s
$x =~ s/\(/_/g;
$x =~ s/\)/_/g;
$x =~ s/ /_/g;
$x =~ s/-/_/g;
$x =~ s/_KBps//g;
$x =~ s/_tps//g;
$x =~ s/[:,]*\s*$//;
$retval=$x;
} # end clean up
##########################################
## Extract headings from nmon csv file ##
##########################################
sub initialize {
%MONTH2NUMBER = ("jan", 1, "feb",2, "mar",3, "apr",4, "may",5, "jun",6, "jul",7, "aug",8, "sep",9, "oct",10, "nov",11, "dec",12 );
#MONTH2ALPHA = ( "junk","jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec" );
} # end initialize
# Get data from nmon file, extract specific data fields (hostname, date, ...)
sub get_nmon_data {
my $key;
my $x;
my $category;
my %toc;
my #cols;
# Read nmon file
unless (open(FILE, $FILENAME)) { return(1); }
#nmon=<FILE>; # input entire file
close(FILE);
chomp(#nmon);
# Cleanup nmon data remove trainig commas and colons
for($i=0; $i<#nmon;$i++ ) {
$nmon[$i] =~ s/[:,]*\s*$//;
}
# Get nmon/server settings (search string, return column, delimiter)
$AIXVER =&get_setting("AIX",2,",");
$DATE =&get_setting("date",2,",");
$HOSTNAME =&get_setting("host",2,",");
$INTERVAL =&get_setting("interval",2,","); # nmon sampling interval
$MEMORY =&get_setting(qq|lsconf,"Good Memory Size:|,1,":");
$MODEL =&get_setting("modelname",3,'\s+');
$NMONVER =&get_setting("version",2,",");
$SNAPSHOTS =&get_setting("snapshots",2,","); # number of readings
$STARTTIME =&get_setting("AAA,time",2,",");
($HR, $MIN)=split(/\:/,$STARTTIME);
if ($AIXVER eq "-1") {
$SN=$HOSTNAME; # Probably a Linux host
} else {
$SN =&get_setting("systemid",4,",");
$SN =(split(/\s+/,$SN))[0]; # "systemid IBM,SN ..."
}
$TYPE =&get_setting("^BBBP.*Type",3,",");
if ( $TYPE =~ /Shared/ ) { $TYPE="SPLPAR"; } else { $TYPE="Dedicated"; }
$MODE =&get_setting("^BBBP.*Mode",3,",");
$MODE =(split(/: /, $MODE))[1];
# $MODE =~s/\"//g;
# Calculate UTC time (seconds since 1970)
# NMON V9 dd/mm/yy
# NMON V10+ dd-MMM-yyyy
if ( $DATE =~ /[a-zA-Z]/ ) { # Alpha = assume dd-MMM-yyyy date format
($DAY, $MMM, $YR)=split(/\-/,$DATE);
$MMM=lc($MMM);
$MON=$MONTH2NUMBER{$MMM};
} else {
($DAY, $MON, $YR)=split(/\//,$DATE);
$YR=$YR + 2000;
$MMM=$MONTH2ALPHA[$MON];
} # end if
## Calculate UTC time (seconds since 1970). Required format for the rrdtool.
## timelocal format
## day=1-31
## month=0-11
## year = x -1900 (time since 1900) (seems to work with either 2006 or 106)
$m=$MON - 1; # jan=0, feb=2, ...
$UTC_START=timelocal(0,$MIN,$HR,$DAY,$m,$YR);
$UTC_END=$UTC_START + $INTERVAL * $SNAPSHOTS;
#ZZZZ=grep(/^ZZZZ,/,#nmon);
for ($i=0;$i<#ZZZZ;$i++){
#cols=split(/,/,$ZZZZ[$i]);
($DAY,$MON,$YR)=split(/-/,$cols[3]);
$MON=lc($MON);
$MON="00" . $MONTH2NUMBER{$MON};
$MON=substr($MON,-2,2);
$ZZZZ[$i]="$YR-$MON-$DAY $cols[2]";
$DATETIME{$cols[1]}="$YR-$MON-$DAY $cols[2]";
} # end ZZZZ
return(0);
} # end get_nmon_data
It almost (i say almost because with recent NMON versions it can sometimes have some issue when no data present) does the job, and it does it much much faster that would do my shell script if i would use it for these section
This is why i think perl shall be a perfect solution.
Off course, i don't ask anyone to convert my shell script into something final in perl, but at least to give me to right direction :-)
I really thank anyone in advance for your help !
Normally i am strongly opposed to questions like this but our production systems are down and until they are fixed i do not really have all that much to do...
Here is some code that might get you started. Please consider it pseudo code as it is completely untested and probably won't even compile (i always forget some parantheses or semicolons and as i said, the actual machines that can run code are unreachable) but i commented a lot and hopefully you will be able to modify it to your actual needs and get it to run.
use strict;
use warnings;
open INFILE, "<", "path/to/file.nmon"; # Open the file.
my #topLines; # Initialize variables.
my %timestamps;
while <INFILE> # This will walk over all the lines of the infile.
{ # Storing the current line in $_.
chomp $_; # Remove newline at the end.
if ($_ =~ m/^TOP/) # If the line starts with TOP...
{
push #topLines, $_; # ...store it in the array for later use.
}
elsif ($_ =~ m/^ZZZZ/) # If it is in the ZZZZ section...
{
my #fields = split ',', $_; # ...split the line at commas...
my $timestamp = join ",", $fields(2), $fields(3); # ...join the timestamp into a string as you wish...
$timestamps{$fields(1)} = $timestamp; # ...and store it in the hash with the Twhatever thing as key.
}
# This iteration could certainly be improved with more knowledge
# of how the file looks. For example the search could be cancelled
# after the ZZZZ section if the file is still long.
}
close INFILE;
open OUTFILE, ">", "path/to/output.csv"; # Open the file you want your output in.
foreach (#topLines) # Iterate through all elements of the array.
{ # Once again storing the current value in $_.
my #fields = split ',', $_; # Probably not necessary, depending on how output should be formated.
my $outstring = join ',', $fields(0), $fields(1), $timestamps{$fields(2)}; # And whatever other fields you care for.
print OUTFILE $outstring, "\n"; # Print.
}
close OUTFILE;
print "Done.\n";

Find matches in a log file based on the time and ID

I have a radius log file which is comma separated.
"1/3/2013","00:52:23","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","Start","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC500",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","A","2","7",,,"1385772885",,
Is it possible through any Linux command line tool like awk to count the number of occurrences where the second column (the time) and the seventh column (the number) are the same, and a Start event follows a Stop event?
I want to find the occurrences where a Stop is followed by a Start at the same time for the same number.
There will be other entries as well with the same timestamp between these cases.
You don't say very clearly what kind of result you want, but you should use Perl with Text::CSV to process CSV files.
This program just prints the three relevant fields from all lines of the file where the event is Start or Stop and the time and the ID string are duplicated.
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new;
open my $fh, '<', 'text.csv' or die $!;
my %data;
while (my $row = $csv->getline($fh)) {
my ($time, $event, $id) = #$row[1,3,6];
next unless $event eq 'Start' or $event eq 'Stop';
push #{ $data{"$time/$id"} }, $row;
}
for my $lines (values %data) {
next unless #$lines > 1;
print "#{$_}[1,3,6]\n" for #$lines;
print "\n";
}
output
00:52:23 Stop 15444111111
00:52:23 Start 15444111111
I have tried the following using GNU sed & awk
sed -n '/Stop/,/Start/{/Stop/{h};/Start/{H;x;p}}' text.csv \
| awk -F, 'NR%2 != 0 {prev=$0;time=$2;num=$7} \
NR%2 == 0 {if($2==time && $7==num){print prev,"\n", $0}}'
The sed part would select pairing Stop line and Start line. There can(or not) be other lines between the two lines, and if there are multiple Stop lines before a Start line the last Stop line would be selected (This may be not necessary in this case...).
The awk part would compare the selected pairs in sed part, if the second and seventh columns are identical, the pair would be print out.
My test as below:
text.csv:
"1/3/2013","00:52:20","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","XXXX","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","XXXX","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","Start","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC500",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","A","2","7",,,"1385772885",,
"1/3/2013","00:52:28","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:29","NASK","Start","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC500",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","A","2","7",,,"1385772885",,
The output:
"1/3/2013","00:52:23","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","Start","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC500",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","A","2","7",,,"1385772885",,
If the "stop" line is followed immediately by the "start" line, you could try the following:
awk -f cnt.awk input.txt
where cnt.awk is
BEGIN {
FS=","
}
$4=="\"Stop\"" {
key=($2 $5)
startl=$0
getline
if ($4=="\"Start\"") {
if (key==($2 $5)) {
print startl
print $0
}
}
}
Update
If there can be other lines between a "Start" and "Stop" line, you could try:
BEGIN {
FS=","
}
$4=="\"Stop\"" {
a[($2 $5)]=$0
next
}
$4=="\"Start\"" {
key=($2 $5)
if (key in a) {
sl[++i]=a[key]
el[i]=$0
}
}
END {
nn=i
for (i=1; i<=nn; i++) {
print sl[i]
print el[i]
}
}

Fast alternative to grep -f

file.contain.query.txt
ENST001
ENST002
ENST003
file.to.search.in.txt
ENST001 90
ENST002 80
ENST004 50
Because ENST003 has no entry in 2nd file and ENST004 has no entry in 1st file the expected output is:
ENST001 90
ENST002 80
To grep multi query in a particular file we usually do the following:
grep -f file.contain.query <file.to.search.in >output.file
since I have like 10000 query and almost 100000 raw in file.to.search.in it takes very long time to finish (like 5 hours). Is there a fast alternative to grep -f ?
If you want a pure Perl option, read your query file keys into a hash table, then check standard input against those keys:
#!/usr/bin/env perl
use strict;
use warnings;
# build hash table of keys
my $keyring;
open KEYS, "< file.contain.query.txt";
while (<KEYS>) {
chomp $_;
$keyring->{$_} = 1;
}
close KEYS;
# look up key from each line of standard input
while (<STDIN>) {
chomp $_;
my ($key, $value) = split("\t", $_); # assuming search file is tab-delimited; replace delimiter as needed
if (defined $keyring->{$key}) { print "$_\n"; }
}
You'd use it like so:
lookup.pl < file.to.search.txt
A hash table can take a fair amount of memory, but searches are much faster (hash table lookups are in constant time), which is handy since you have 10-fold more keys to lookup than to store.
If you have fixed strings, use grep -F -f. This is significantly faster than regex search.
This Perl code may helps you:
use strict;
open my $file1, "<", "file.contain.query.txt" or die $!;
open my $file2, "<", "file.to.search.in.txt" or die $!;
my %KEYS = ();
# Hash %KEYS marks the filtered keys by "file.contain.query.txt" file
while(my $line=<$file1>) {
chomp $line;
$KEYS{$line} = 1;
}
while(my $line=<$file2>) {
if( $line =~ /(\w+)\s+(\d+)/ ) {
print "$1 $2\n" if $KEYS{$1};
}
}
close $file1;
close $file2;
If the files are already sorted:
join file1 file2
if not:
join <(sort file1) <(sort file2)
If you are using perl version 5.10 or newer, you can join the 'query' terms into a regular expression with the query terms separated by the 'pipe'. (Like:ENST001|ENST002|ENST003) Perl builds a 'trie' which, like a hash, does lookups in constant time. It should run as fast as the solution using a lookup hash. Just to show another way to do this.
#!/usr/bin/perl
use strict;
use warnings;
use Inline::Files;
my $query = join "|", map {chomp; $_} <QUERY>;
while (<RAW>) {
print if /^(?:$query)\s/;
}
__QUERY__
ENST001
ENST002
ENST003
__RAW__
ENST001 90
ENST002 80
ENST004 50
Mysql:
Importing the data into Mysql or similar will provide an immense improvement. Will this be feasible ? You could see results in a few seconds.
mysql -e 'select search.* from search join contains using (keyword)' > outfile.txt
# but first you need to create the tables like this (only once off)
create table contains (
keyword varchar(255)
, primary key (keyword)
);
create table search (
keyword varchar(255)
,num bigint
,key (keyword)
);
# and load the data in:
load data infile 'file.contain.query.txt'
into table contains fields terminated by "add column separator here";
load data infile 'file.to.search.in.txt'
into table search fields terminated by "add column separator here";
use strict;
use warings;
system("sort file.contain.query.txt > qsorted.txt");
system("sort file.to.search.in.txt > dsorted.txt");
open (QFILE, "<qsorted.txt") or die();
open (DFILE, "<dsorted.txt") or die();
while (my $qline = <QFILE>) {
my ($queryid) = ($qline =~ /ENST(\d+)/);
while (my $dline = <DFILE>) {
my ($dataid) = ($dline =~ /ENST(\d+)/);
if ($dataid == $queryid) { print $qline; }
elsif ($dataid > $queryid) { break; }
}
}
This may be a little dated, but is tailor-made for simple UNIX utilities. Given:
keys are fixed-length (here 7 chars)
files are sorted (true in the example) allowing the use of fast merge sort
Then:
$ sort -m file.contain.query.txt file.to.search.in.txt | tac | uniq -d -w7
ENST002 80
ENST001 90
Variants:
To strip the number printed after the key, remove tac command:
$ sort -m file.contain.query.txt file.to.search.in.txt | uniq -d -w7
To keep sorted order, add an extra tac command at the end:
$ sort -m file.contain.query.txt file.to.search.in.txt | tac | uniq -d -w7 | tac

Perl / grep / awk -- splitting multiple results, if statement checking for second string

I am trying to grep a file for the first 2 matches of a string (there will only ever be a maximum 2 matches) including some context (grep -B 1 -A 5), split each set of 7 lines into two separate variables and write an if statement based on whether or not each set contains a different string.
In some cases, the file may contain only one match.
I know how to grep for the two matches, but not how to split them into separate variables. I can also write an if statement to check if the variable is empty (indicating a lack of a second match). I am not sure how to check each variable to see if it contains the second string. Any assistance would be helpful. Thanks!
Example:
grep -B1 -A5 "Resolution:" file.txt
Color LCD:
Resolution: 1440 x 900
Pixel Depth: 32-Bit Color (ARGB8888)
Main Display: Yes
Mirror: Off
Online: Yes
Built-In: Yes
LED Cinema Display:
Resolution: 1920 x 1200
Depth: 32-Bit Color
Core Image: Hardware Accelerated
Mirror: Off
Online: Yes
Quartz Extreme: Supported
Desired result based on whether or not each match set contains "Main Display":
$mainDisplay = Color LCD
$secondDisplay = LED Cinema Display (or null indicating no second match)
Your file is valid YAML, so if you have installed YAML perl module, here is an oneliner:
eval $(perl -MYAML -0777 -e '$r=Load(<>);map { exists($r->{$_}->{"Main Display"}) ? print "main=\"$_\";\n" : print "second=\"$_\";\n" } keys %$r' < filename.txt)
echo =$main= =$second=
so, after the eval, here are shell variables main and second
or, exactly for your OS X, with system_profiler command:
eval $(
system_profiler SPDisplaysDataType |\
grep -B1 -A5 'Resolution:' |\
perl -MYAML -0777 -e '$r=Load(<>);map { printf "%s=\"%s\"\n", exists($r->{$_}->{"Main Display"}) ? "main" : "second", $_ } keys %$r'
)
echo =$main=$second=
my($first, $second) = split /--\n/, qx/grep -B1 -A5 foo data.text/;
awk:
awk -F : '
/^[^[:space:]]/ {current = $1; devices[$1]++}
$1 ~ /Main Display/ {main = current}
END {
for (d in devices)
if (d == main)
print "mainDisplay=\"" d "\""
else
print "secondDisplay=\"" d "\""
}
'
outputs
mainDisplay="Color LCD"
secondDisplay="LED Cinema Display"
which you can capture and eval in the shell.
Here's a perl solution. Use it like so: script.pl Resolution:. Default search is "Resolution:".
The values are stored in %values, for example:
$values{Color LCD}{Resolution} == "1440 x 900";
use strict;
use warnings;
my $grep = shift || "Resolution:";
my %values;
my $pre;
while (my $line = <DATA>) {
chomp $line;
if ($line =~ /$grep/) {
my #data;
push #data, scalar <DATA> for (0 .. 4);
chomp #data;
for my $pair ($line, #data) {
if ($pair =~ /^([^:]+): (.*)$/) {
$values{$pre}{$1} = $2;
} else { die "Unexpected data: $pair" }
}
} else {
$pre = $line;
}
}
use Data::Dumper;
print Dumper \%values;
__DATA__
Color LCD:
Resolution: 1440 x 900
Pixel Depth: 32-Bit Color (ARGB8888)
Main Display: Yes
Mirror: Off
Online: Yes
Built-In: Yes
LED Cinema Display:
Resolution: 1920 x 1200
Depth: 32-Bit Color
Core Image: Hardware Accelerated
Mirror: Off
Online: Yes
Quartz Extreme: Supported

How can I replace only complete IP addresses in a file using Perl?

I used the following Perl syntax in order to replace strings or IP address in a file:
OLD=aaa.bbb.ccc.ddd (old IP address)
NEW=yyy.zzz.www.qqq (new IP address)
export OLD
export NEW
perl -pe 'next if /^ *#/; s/\Q$ENV{OLD }\E/$1$ENV{NEW }$2/' file
example of problem:
I want to change the IP address in file from 1.1.1.1 to 5.5.5.5
But I get the following:
more file (before change)
11.1.1.10 machine_moon1
more file (after change)
15.5.5.50 machine_moon1
According to "after change example) the IP "11.1.1.10" must to stay as it is , because I want to change only the 1.1.1.1 and not 11.1.1.10
I need help about my perl one line syntax:
How to change my perl syntax only according to the following rule:
RULE: Not change the IP address if:left IP side or right IP side have number/s
Example
IP=1.1.1.1
IP=10.10.1.11
IP=yyy.yyy.yyy.yyy
[number]1.1.1.1[number] - then not replace
[number]10.10.1.11[number] - then not replace
[number]yyy.yyy.yyy.yyy[number] - then not replace
Other cases:
[any character beside number ]yyy.yyy.yyy.yyy[[any character beside number ]] - then replace
Here's what you start with:
OLD=1.1.1.1
NEW=5.5.5.5
export OLD
export NEW
~/sandbox/$ cat file
1.1.1.10 machine1
11.1.1.10 machine2
11.1.1.1 machine3
1.1.1.1 machine4
A1.1.1.1 machine5
A1.1.1.1 machine6
1.1.1.1Z machine7
If you anchor the patterns to only match on word boundaries or non-digits (see perlre), you should only match a complete IP address:
~/sandbox/$ perl -pe 'next if /^ *#/; s/(\b|\D)$ENV{OLD}(\b|\D)/$1$ENV{NEW}$2/' file
1.1.1.10 machine1
11.1.1.10 machine2
11.1.1.1 machine3
5.5.5.5 machine4
A5.5.5.5 machine5
A5.5.5.5Z machine6
5.5.5.5Z machine7
You should use look-behind and look-ahead syntax, see a good article on perlmonks : http://www.perlmonks.org/?node_id=518444
It might be easier to write a short script to do this.
use strict;
use autodie;
my $old_ip = 10.1.1.1; # or $ENV{'OLD'}
my $new_ip = 50.5.5.5; # or $ENV{'NEW'}
open my $infh, '<', $ARGV[0];
open my $outfh, '>', $ARGV[1];
while ( my $line = <$infh> ) {
chomp $line;
my #elems = split '\s+', $line;
next unless $elems[0] eq $old_ip;
print $outfh $new_ip . join(" ", #elems[1..$#elems]) . "\n";
}
close $outfh;
close $infh;