awk - Call external command and populate output before the first column - date

I have a file that contains some information about daily storage utilization. There are two columns - DD.MM date and usage in KB for every day.
I'm using awk to show the difference between every second line and the previous one in GB as storage usage increases.
Example file:
20.09 10485760
21.09 20971520
22.09 26214400
23.09 27262976
My awk command:
awk 'NR > 1 {a=($2-prev)/1024^2" GB"} {prev=$2} {print $1,$2,a}' file
This outputs:
20.09 10485760
21.09 20971520 10 GB
22.09 26214400 5 GB
23.09 27262976 1 GB
I would also like to add the weekday name before the first column. The date format in the file is always DD.MM, so, to make GNU date accept it as a valid input and return the weekday name, i composed this pipeline:
echo '20.09.2022' | awk -v FS=. -v OFS=- '{print $3,$2,$1}' | date -f - +%a
It works, but i want to call it from the first awk for every processed line with the first column date as an argument and ".2022" appended to it in order to work, and put the output of this external pipeline (it will be the weekday name) before the date in first column.
Example output:
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB
I looked at the system() option in awk, but i couldn't make it to work with my pipeline and my first awk command.

1st solution: Using a getline within awk please try following solution.
awk '
NR>1{
a=($2-prev)/1024^2" GB"
}
{
split($1,arr,".")
value="2022-"arr[2]"-"arr[1]
dateVal="date -d \"" value "\" +%a"
newVal = ( (dateVal | getline line) > 0 ? line : "N/A" )
close(dateVal)
print newVal,$0,a
prev=$2
}
' Input_file
2nd solution: With your shown samples please try following awk code. What system command does in awk is: It runs mentioned commands in a separate shell so basically you are calling awk-->system-->shell-->commands so in spite of that just get all the values with 1 awk for all days(based on 1st field of your Input_file) and we can pass it as an input to another awk where we are doing actual space calculations and we can merge both of them(because system command prints the output through shell commands so then we can't merge that output with awk's output). We could also do it with a while loop but IMHO doing it with awk could be faster.
awk '
FNR==NR{
arr[FNR]=$0
next
}
NR>1{
a=($2-prev)/1024^2" GB"
}
{
print arr[FNR],$1,$2,a
prev=$2
}
' <(awk '{split($1,arr,".");system("d=\"2022-" arr[2]"-"arr[1]"\";date -d \"$d\" +%a")}' Input_file) Input_file
Output with shown samples will be as follows:
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB

Since you have GNU date you should also have GNU awk which has builtin time functions that'll be orders of magnitude faster than awk spawning a subshell to call date for each input line:
$ cat tst.sh
#!/usr/bin/env bash
awk '
BEGIN {
year = strftime("%Y")
}
NR > 1 {
diff = ( ($2 - prev) / (1024 ^ 2) ) " GB"
}
{
split($1,dayMth,/[.]/)
secs = mktime(year " " dayMth[2] " " dayMth[1] " 12 0 0")
day = strftime("%a",secs)
print day, $0, diff
prev = $2
}
' "${#:--}"
$ ./tst.sh file
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB
If for some reason you don't have GNU awk and can't get it then this 2-pass approach would work fairly efficiently using GNU date and any awk:
$ cat tst.sh
#!/usr/bin/env bash
awk -v year="$(date +'%Y')" -v OFS='-' '{
split($1,dayMth,/[.]/)
print year, dayMth[2], dayMth[1]
}' "$#" |
date -f- +'%a' |
awk '
NR == FNR {
days[NR] = $1
next
}
FNR > 1 {
diff = ( ($2 - prev) / (1024 ^ 2) ) " GB"
}
{
print days[FNR], $0, diff
prev = $2
}
' - "$#"
$ ./tst.sh file
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB
The downside to that 2nd script is it couldn't read input from a stream, only from a file, since it has to read it twice. If that's an issue and your input isn't too massive to fit a copy on disk then you could always use a temp file, e.g.:
$ cat tst.sh
#!/usr/bin/env bash
tmp=$(mktemp) &&
trap 'rm -f "$tmp"; exit' 0 &&
cat "${#:--}" > "$tmp" || exit 1
awk -v year="$(date +'%Y')" -v OFS='-' '{
split($1,dayMth,/[.]/)
print year, dayMth[2], dayMth[1]
}' "$tmp" |
date -f- +'%a' |
awk '
NR == FNR {
days[NR] = $1
next
}
FNR > 1 {
diff = ( ($2 - prev) / (1024 ^ 2) ) " GB"
}
{
print days[FNR], $0, diff
prev = $2
}
' - "$tmp"
$ ./tst.sh file
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB

date can process multiple newline-sheared dates, therefore I propose following solution, let file.txt content be
20.09 10485760
21.09 20971520 10 GB
22.09 26214400 5 GB
23.09 27262976 1 GB
then
awk 'BEGIN{FS="[[:space:].]";OFS="-"}{print "2022",$2,$1}' file.txt | date -f - +%a | paste -d ' ' - file.txt
gives output
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB
Explanation: I use GNU AWK to extract and prepare date for consumption by date, so 20.09 becomes 2022-09-20 and so on, then date is used to compute codename of day of week, then paste is used to get columns side by side sheared by space character, 1st column is - meaning use standard input, 2nd column is unchanged file.txt
(tested in GNU Awk 5.0.1 and paste (GNU coreutils) 8.30)

who says you can't use system() to get the weekday ?
this function also comes with auto gnu-date vs. bsd-date detection,
(by way of gnu-date's ability to return up to nanoseconds precision, something that bsd-date lacks),
and adjusts its calling syntax accordingly
jot -w '2022-09-%d' 30 | gtail -n 12 |
mawk 'function ____(_) {
return \
substr("SunMonTueWedThuFriSat",(_=\
system("exit \140 date -" (\
system("exit \140date +\"%s%6N"\
"\" |grep -cF N\140") ? "j -f " \
"\"%Y-%m-%d\"":"d") " \""(_) \
"\" +%w \140")) +_+_+(_^=_<_),_+_+_)
} ($++NF=____($!_))^_'
2022-09-19 Mon
2022-09-20 Tue
2022-09-21 Wed
2022-09-22 Thu
2022-09-23 Fri
2022-09-24 Sat
2022-09-25 Sun
2022-09-26 Mon
2022-09-27 Tue
2022-09-28 Wed
2022-09-29 Thu
2022-09-30 Fri
system() typically can return you an unsigned integer from 0 to 255 if you explicitly set its exit code to be whatever value you desire,
so as long as the range of values needed is within 256 (or can be binned into it), then one can leverage system() and get the results quicker than a full getline routine.
But since this workaround requires numeric value returns, it wouldn't be able to directly just use the built-in formatting code date +'%a'.

Related

sh: add command string ourtput to array

In sh I want to create an array with one element that is the curent date and time formatted with spaces.
$ date +"%b %d %H:%m"
Jun 23 16:06
Here are some things that don't work:
$ date +"%b %d %H:%m"
Jun 23 16:06
$ d=`date +"%b %d %H:%m"`
$ echo $d
Jun 23 16:06
$ arr=($d)
$ echo ${#arr}
3
$ arr=("$d")
$ echo ${#arr}
12
$ arr=("`date +"%b %d %H:%m"`")
$ echo ${#arr}
12
$ arr=(`date +"%b %d %H:%m"`)
$ echo ${#arr}
3
$ echo ${arr[2]}
$
You have a working solution already, but merely misinterpreted the test:
$ d=$(date +"%b %d %H:%m")
$ a=("$d")
$ echo ${#a[*]}
1
$ echo ${#a} # Gives length of ${a[0]}, not number of elements in the array
12
$ a=("$d" "second entry")
$ echo ${a[0]}
Jun 23 16:06
$ echo ${a[1]}
second entry
$ echo ${#a[*]}
2
The strangeness is because these three things are sometimes different:
${#arr}
${#arr[*]}
${#arr[#]}
Also, it's impossible to change IFS
# echo $IFS
# IFS="."
# echo $IFS
#

How to convert a ASCII NULL (NUL) into single spacing in a text file using Unix command?

When I BCP the data in sql server
In the output file I am getting a NUL like character in the output file, and i want to replace this with the single blank space.
When I used the below sed command it removes the NUL character but between those 2 delimiter we don't have single space.
sed 's/\x0/ /g' output file name
Example: After sed command i am getting output file like below
PHMO||P00000005233
PHMO||P00000005752
But i need a single spacing in between those delimiter as
PHMO| |P00000005233
PHMO| |P00000005752
The usual approach to this would be using tr. However, solutions with tr and sed are not portable. (The question is tagged "unix", so only portable solutions are interesting).
Here is a simple demo script
#!/bin/sh
date
tr '\000' ' ' <$0.in
date
sed -e 's/\x00/ /g' <$0.in
which I named foo, and its input (with the ASCII NUL shown here as ^#):
this is a null: "^#"
Running with GNU tr and sed:
Fri Apr 1 04:41:15 EDT 2016
this is a null: " "
Fri Apr 1 04:41:15 EDT 2016
this is a null: " "
With OSX:
Fri Apr 1 04:41:53 EDT 2016
this is a null: " "
Fri Apr 1 04:41:53 EDT 2016
this is a null: "^#"
With Solaris 10 (and 11, though there may be a recent change):
Fri Apr 1 04:38:08 EDT 2016
this is a null: ""
Fri Apr 1 04:38:08 EDT 2016
this is a null: ""
Bear in mind that sed is line-oriented, and that ASCII NUL is considered a binary (non-line) character. If you want a portable solution, then other tools such as Perl (which do not have that limitation) are useful. For that case one could add this to the script:
perl -np -e 's/\0/ /g' <$0.in
The intermediate tool awk is no better in this instance. Going to Solaris again, with these lines:
for awk in awk nawk mawk gawk
do
echo "** $awk:"
$awk '{ gsub("\0"," "); print; }' <$0.in
done
I see this output:
** awk:
awk: syntax error near line 1
awk: illegal statement near line 1
** nawk:
nawk: empty regular expression
source line number 1
context is
{ gsub("\0"," >>> ") <<<
** mawk:
this is a null: " "
** gawk:
this is a null: " "
Further reading:
sed - stream editor (POSIX)
tr - translate characters (POSIX), which notes
Unlike some historical implementations, this definition of the tr utility correctly processes NUL characters in its input stream. NUL characters can be stripped by using:
tr -d '\000'
perlrun - how to execute the Perl interpreter
This is an easy job for sed. Let's start creating a test file as you didn't provide one:
$ echo -e "one,\x00,two,\x00,three" > a
$ echo -e "four,\x00,five,\x00,six" >> a
As you can see it contains ASCII 0:
$ od -c a
0000000 o n e , \0 , t w o , \0 , t h r e
0000020 e \n f o u r , \0 , f i v e , \0 ,
0000040 s i x \n
0000044
Now let's run sed:
$ sed 's/\x00/ /g' a > b
And check the output:
$ cat b
one, ,two, ,three
four, ,five, ,six
$ od -c b
0000000 o n e , , t w o , , t h r e
0000020 e \n f o u r , , f i v e , ,
0000040 s i x \n
0000044
it can be done quite easily with perl
cat -v inputfile.txt
abc^#def^#ghij^#klmnop^#qrstuv^#wxyz
perl -np -e 's/\0/ /g' <inputfile.txt >outputfile.txt
cat -v outputfile.txt
abc def ghij klmnop qrstuv wxyz

Substring pattern matching in two files

I have an input flat file like this with many rows:
Apr 3 13:30:02 aag8-ca-acs01-en2 CisACS_01_PassedAuth p1n5ut5s 1 0 Message-Type=Authen OK,User-Name=joe7#it.test.com,NAS- IP-Address=4.196.63.55,Caller-ID=az-4d-31-89-92-90,EAP Type=17,EAP Type Name=LEAP,Response Time=0,
Apr 3 13:30:02 aag8-ca-acs01-en2 CisACS_01_PassedAuth p1n6ut5s 1 0 Message-Type=Authen OK,User-Name=bobe#jg.test.com,NAS-IP-Address=4.197.43.55,Caller-ID=az-4d-4q-x8-92-80,EAP Type=17,EAP Type Name=LEAP,Response Time=0,
Apr 3 13:30:02 abg8-ca-acs01-en2 CisACS_01_PassedAuth p1n4ut5s 1 0 Message-Type=Authen OK,User-Name=jerry777#it.test.com,NAS-IP-Address=7.196.63.55,Caller-ID=az-4d-n6-4e-y2-90,EAP Type=17,EAP Type Name=LEAP,Response Time=0,
Apr 3 13:30:02 aca8-ca-acs01-en2 CisACS_01_PassedAuth p1n4ut5s 1 0 Message-Type=Authen OK,User-Name=frc777o.#it.test.com,NAS-IP-Address=4.196.263.55,Caller-ID=a4-4e-31-99-92-90,EAP Type=17,EAP Type Name=LEAP,Response Time=0,
Apr 3 13:30:02 aag8-ca-acs01-en2 CisACS_01_PassedAuth p1n4ut5s 1 0 Message-Type=Authen OK,User-Name=frc77#xed.test.com,NAS-IP-Address=4.136.163.55,Caller-ID=az-4d-4w-b5-s2-90,EAP Type=17,EAP Type Name=LEAP,Response Time=0,
I'm trying to grep the email addresses from input file to see if they already exist in the master file.
Master flat file looks like this:
a44e31999290;frc777o.#it.test.com;20150403
az4d4qx89280;bobe#jg.test.com;20150403
0dbgd0fed04t;rrfuf#us.test.com;20150403
28cbe9191d53;rttuu4en#us.test.com;20150403
az4d4wb5s290;frc77#xed.test.com;20150403
d89695174805;ccis6n#cn.test.com;20150403
If the email doesn't exist in master I want a simple count.
So using the examples I hope to see: count=3, because bobe#jg.test.com and frc77#xed.test.com already exist in master but the others don't.
I tried various combinations of grep, example below from last tests but it is not working.. I'm using grep within a perl script to first capture emails and then count them but all I really need is the count of emails from input file that don't exist in master.
grep -o -P '(?<=User-Name=\).*(?=,NAS-IP-)' $infile $mstr > $new_emails;
Any help would be appreciated, Thanks.
I would use this approach in awk:
$ awk 'FNR==NR {FS=";"; a[$2]; next}
{FS="[,=]"; if ($4 in a) c++}
END{print c}' master file
3
This works by setting different field separators and storing / matching the emails. Then, printing the final sum.
For master file we use ; and get the 2nd field:
$ awk -F";" '{print $2}' master
frc777o.#it.test.com
bobe#jg.test.com
rrfuf#us.test.com
rttuu4en#us.test.com
frc77#xed.test.com
ccis6n#cn.test.com
For file file (the one with all the info) we use either , or = and get the 4th field:
$ awk -F[,=] '{print $4}' file
joe7#it.test.com
bobe#jg.test.com
jerry777#it.test.com
frc777o.#it.test.com
frc77#xed.test.com
Think the below does what you want as a one liner with diff and perl:
diff <( perl -F';' -anE 'say #F[1]' master | sort -u ) <( perl -pe 'm/User-Name=([^,]+),/; $_ = "$1\n"' data | sort -u ) | grep '^>' | perl -pe 's/> //;'
The diff <( command_a |sort -u ) <( command_b |sort -u) | grep '>' lets you handle the set difference of the command output.
perl -F';' -anE 'say #F[1]' just splits each line of the file on ';' and prints the second field on its own line.
perl -pe 'm/User-Name=([^,]+),/; $_ = "$1\n"' gets the specific field you wanted ignoring the surrounding key= and prints on a new line implicitly.

How do I capture first tuesday in a month with zero padded in Unix

#Unix
I am trying to capture first tuesday of every month into a variable and trying to pad Zero against it without luck.
Below is the piece of code I was trying:
cal | sed -e 's/ \([1-9]\) /0\1 /g' -e 's/ \([1-9]\)$/0\1/' | awk 'NR>2{Sfields=7-NF; if (Sfields == 0 ) {printf "%d\n",$3;exit}}'
Can someone help me what I am missing here?
This awk should do:
cal | awk 'NR>2 && NF>4 {printf "%02d\n",$(NF-4);exit}'
03
To confirm its working:
for i in {1..12}; do cal -m $i | awk 'NR>2 && NF>4 {printf "%02d\n",$(NF-4);exit}' ; done
06
03
03
07
05
02
07
04
01
06
03
01
Or you can use ncal
ncal | awk '/Tu/ {printf "%02d\n",$2}'
03
If you like a version where you can specify name of week,
and would work if Monday is first day of week, then this gnu awk should do:
cal | awk 'NR==2 {for (i=1;i<=NF;i++) {sub(/ /,"",$i);a[$i]=i}} NR>2 {if ($a["Tu"]~/[0-9]/) {printf "%02d\n",$a["Tu"];exit}}' FIELDWIDTHS="3 3 3 3 3 3 3 3"
03
It uses FIELDWITH to make sure empty columns in start of month does not changes the output.
# for monday calendar
cal -m1 | sed -n '1,2b;/^.\{3\} \{0,1\}\([0-9]\{1,2\}\) .*/ {s//0\1/;s/.*\([0-9]\{2\}\)$/\1/p;q;}'
# for sunday calendar
cal -s1 01 01 2015 | sed -n '1,2b;/^.\{6\} \{0,1\}\([0-9]\{1,2\}\) .*/ {s//0\1/;s/.*\([0-9]\{2\}\)$/\1/p;q;}'
cal option depend on system (tested here on Red Hat 6.6) and mean -m for monday as first day and -sfor sunday (the attached 1 is for 1 month display). Take the line according to your specified output of cal.
don't print line by default
don't care of line 1 and 2
take line with non empty second(/third) group
take second(/third) group (position) of number until next one and replace by a 0, remove trailng char
take the 2 last digit of first group, remove the rest and print it
quit (no other line)
thanks to #Jotne for all remark about first wanted day in second week (4th line and not 3th) and first day of the week
I think I got the answer.
cal | awk 'NR>2{Sfields=7-NF; if (Sfields == 0 ) {printf "%02d\n",$3;exit}}'
Above statement would do."%02d" does it for me
bash and date. May be slower than parsing cal:
y=2015
for m in {1..12}; do
for d in {01..07}; do
if [[ $(date -d "$y-$m-$d" +%w) -eq 2 ]]; then
echo $d
break
fi
done
done
Translating into awk: will be faster as it doesn't have to call date multiple times:
gawk -v y=2015 '
BEGIN {
for (m=1; m<=12; m++) {
for (d=1; d<=7; d++) {
t = mktime( y " " m " " d " 12 0 0" )
if (strftime("%w", t) == 2) {
printf "%02d\n", d
break
}
}
}
}
'

Pass "file name" from a text file to a command line where each line of a file is file name

I'm running the following code
git log --pretty=format: --numstat -- SOMEFILENAME |
perl -ane '$i += ($F[0]-$F[1]); END{print "changed: $i\n"}' \
>> random.txt
What this does is it takes a file with a name "SOMEFILENAME" and saves the sum of the total amount of added and removed lines to a textfile called "random.txt"
I need to run this program on every file in repository and there are looots of them. What would be an easy way to do this?
If you want a total per file:
git log --pretty=format: --numstat |
perl -ane'
$c{$F[2]} += $F[0]-$F[1] if $F[2];
END { print "$_\t$c{$_}\n" for sort keys %c }
' >random.txt
If you want a single total:
git log --pretty=format: --numstat |
perl -ane'
$c += $F[0]-$F[1];
END { print "$c\n" }
' >random.txt
Their respective outputs are:
.gitignore 22
Build.PL 48
CHANGES.txt 0
Changes 25
LICENSE 132
LICENSE.txt 0
MANIFEST 18
MANIFEST.SKIP 9
README.txt 67
TODO.txt 1
lib/feature/qw_comments.pm 129
lib/feature/qw_comments.xs 250
t/00_load.t 13
t/01_basic.t 85
t/02_pragma.t 56
t/03_line_numbers.t 37
t/04_errors.t 177
t/05-unicode.t 39
t/devel-pod-coverage.t 26
t/pod.t 17
and
1151
Rather than use find, you can just let git give you all the files by using the name . (representing the current directory). With that, here's a version using awk that prints out stats per file:
git log --pretty=format: --numstat -- . |
awk '
NF == 3 {changed[$3] += $1 - $2}
END { for (name in changed) { printf("%s: %d changed\n", name, changed[name]); } }
'
And an even shorter one that prints a single overall changed line:
git log --pretty=format: --numstat -- . |
awk '
NF == 3 {changed += $1 - $2}
END { printf("%d changed\n", changed); }
'
(The NF == 3 is to account for the fact that git seems to print spurious blank lines in its output. I didn't try to figure out if there's a better git command.)