I've a tab delimited log file that has date time in format '2011-07-20 11:34:52' in the first two columns:
An example line from the log file is:
2011-07-20 11:34:15 LHR3 1488 111.111.111.111 GET djq2eo454b45f.cloudfront.net /1010.gif 200 - Mozilla/5.0%20(Windows%20NT%206.1;%20rv:5.0)%20Gecko/20100101%20Firefox/5.0 T=F&Event=SD&MID=67&AID=dc37bcff-70ec-419a-ad43-b92d6092c9a2&VID=8&ACID=36&ENV=demo-2&E=&P=Carousel&C=3&V=3
I'm trying to convert the date time to epoch using just awk:
cat logfile.log | grep 1010.gif | \
awk '{ print $1" "$2" UTC|"$5"|"$10"|"$11"|"$12 }' | \
awk 'BEGIN {FS="|"};{system ("date -d \""$1"\" +%s" ) | getline myvar}'
So this gets me some way, in that it gets me epoch less three 000's on the end - however i'm just getting the output of the system command - where as i really want to substitute $1 with the epoch time.
I'm aiming for the following output:
<epoch time>|$5|$10|$11|$12
I've tried just using:
cat logfile.log | grep 1010.gif | awk '{ print d };' "d=$(date +%s -d"$1")"
But this just gives me blank rows.
Any thoughts.
Thanks
This assumes gawk -- can't do any timezone translation though, strictly local time.
... | gawk '
BEGIN {OFS = "|"}
{
split($1, d, "-")
split($2, t, ":")
epoch = mktime(d[1] " " d[2] " " d[3] " " t[1] " " t[2] " " t[3])
print epoch, $5, $10, $11, $12
}
'
Related
I have this following shell command:
ssh user#host "df | grep /dev/ | \
awk 'BEGIN{print "DISK", "%USAGE", "STATUS"} {split($5, a, "%"); \
var="GREEN"; print $1, $5, var}' | column -t"
I need to run this over ssh but I get syntax error due to the presence of nested double and single quotes.
I tried the escape characters for before the beginning and ending of the quotes but it did not solve the problem.
However, on local system running this will give the following output:
$ df | grep /dev/ | \
awk 'BEGIN{print "DISK", "%USAGE", "STATUS"} {split($5, a, "%"); \
var="GREEN"; print $1, $5, var}' | column -t
DISK %USAGE STATUS
/dev/sda1 95% GREEN
A quoted heredoc allows you to omit the outer quotes:
ssh user#host <<'END'
df | grep /dev/ | awk 'BEGIN{print "DISK", "%USAGE", "STATUS"} {split($5, a, "%"); var="GREEN"; print $1, $5, var}' | column -t
END
This is the case where here document comes handy:
ssh -t -t user#host<<'EOF'
df | awk 'BEGIN{print "DISK", "%USAGE", "STATUS"} /dev/{split($5, a, "%"); var="GREEN"; print $1, $5, var}' | column -t
EOF
It's much simpler to just run df | grep remotely, and process the output locally with awk:
ssh user#host 'df | grep /dev' | awk '
BEGIN{print "DISK", "%USAGE", "STATUS"}
{split($5, a, "%"); var="GREEN"; print $1, $5, var}' | column -t
I am looking for how to convert all dates in a csv file row into this format ? example I want to convert 23/1/17 to 23/01/2017
I use unix
thank you
my file is like this :
23/1/17
17/08/18
1/1/2
5/6/03
18/05/2019
and I want this :
23/01/2017
17/08/2018
01/01/2002
05/06/2003
18/05/2019
I used date_samples.csv as my test data:
23/1/17,17/08/18,1/1/02,5/6/03,18/05/2019
cat date_samples.csv | tr "," "\n" | awk 'BEGIN{FS=OFS="/"}{print $2,$1,$3}' | \
while read CMD; do
date -d $CMD +%d/%m/%Y >> temp
done; cat temp | tr "\n" "," > converted_dates.csv ; rm temp; truncate -s-1 converted_dates.csv
Output:
23/01/2017,17/08/2018,01/01/2002,05/06/2003,18/05/2019
This portion of the code converts your "," to new lines and makes your input DD/MM/YY to MM/DD/YY, since the date command does not accept date inputs of DD/MM/YY. It then loops through re-arranged dates and convert them to DD/MM/YYYY format and temporarily stores them in temp.
cat date_samples.csv | tr "," "\n" | awk 'BEGIN{FS=OFS="/"}{print $2,$1,$3}' | \
while read CMD; do
date -d $CMD +%d/%m/%Y >> temp
done;
This line cat temp | tr "\n" "," > converted_dates.csv ; rm temp; truncate -s-1 converted_dates.csv converts the new line back to "," and puts the output to converted_dates.csv and deletes temp.
Using awk:
awk -F, '{ for (i=1;i<=NF;i++) { split($i,map,"/");if (length(map[3])==1) { map[3]="0"map[3] } "date -d \""map[2]"/"map[1]"/"map[3]"\" \"+%d/%m/%y\"" | getline dayte;close("date -d \""map[2]"/"map[1]"/"map[3]"\" \"+%d/%m/%y\"");$i=dayte }OFS="," }1' file
Explanation:
awk -F, '{
for (i=1;i<=NF;i++) {
split($i,map,"/"); # Loop through each comma separated field and split into the array map using "/" as the field seperator
if (length(map[3])==1) {
map[3]="0"map[3] # If the year is just one digit, pad out with prefix 0
}
"date -d \""map[2]"/"map[1]"/"map[3]"\" \"+%d/%m/%y\"" | getline dayte; # Run date command on day month and year and read result into variable dayte
close("date -d \""map[2]"/"map[1]"/"map[3]"\" \"+%d/%m/%y\""); # Close the date execution pipe
$i=dayte # Replace the field for the dayte variable
}
OFS="," # Set the output field seperator
}1' file
Hello i have some csv files like that:
"N.º","Fecha Tiempo, GMT-03:00","Temp, °C (LGR S/N: 10466185, SEN S/N: 10466185, LBL: Temperatura)","Acoplador separado (LGR S/N: 10466185)","Acoplador adjunto (LGR S/N: 10466185)","Host conectado (LGR S/N: 10466185)","Parado (LGR S/N: 10466185)","Final de archivo (LGR S/N: 10466185)"
1,03/03/14 01:00:00 PM,25.477,Registrado,,,,
2,03/03/14 02:00:00 PM,24.508,,,,,
3,03/03/14 03:00:00 PM,26.891,,,,,
4,03/03/14 04:00:00 PM,25.525,,,,,
5,03/03/14 05:00:00 PM,27.358,,,,,
Then i wanna convert the second field of data-hour in two fields: date, hour
I'm ok with split date and hour, but when i try to convert hours in am-pm to hours in 24hrs i failed.
Using for all files this command:
awk -F"," '{print $2}' *.csv|awk '{print $1","$2" "$3}'
I'm arriving to that command, in particular:
echo "11:04:44 PM" | awk -F, -v hora=$1 '{system("date --date=$hora +%T");print $hora}'
00:00:00
11:04:44 PM
The problem is the variable inside system(date... beacuse it returns 0 or empty.
Then the question is about how to do thath.
And finnally how to insert tath changes inside the file.
Thanks, very thanks!
On my machine (Mac OS), the command you need is
echo "11:22:33 AM" | awk '{split($1,a,":"); if($2=="PM") {a[1]=a[1]+12;} print a[1] ":" a[2] ":" a[3]}'
This does the splitting of the time manually (rather than relying on date which is a bit platform dependent) and adds 12 to the time if it's PM.
So the whole thing becomes:
awk -F"," '{print $2}' *.csv | awk '{split($1,a,":"); if($2=="PM") {a[1]=a[1]+12;} print a[1] ":" a[2] ":" a[3]}'
Although you really want to skip the first line in the file, so
awk -F"," 'if(NR>1){print $2}' *.csv | awk '{split($1,a,":"); if($2=="PM") {a[1]=a[1]+12;} print a[1] ":" a[2] ":" a[3]}'
Thanks!
Now, after many hours i can convert the time using 'date', then the code is:
echo "11:04:44 PM" | awk -F, -v hora=$1 '{system("date --date=\""$hora"\" +%T");print $hora}'
With thaat you can compare the time in 24hrs and AM/PM
The details was the '\"' before and after the '$hora' variable
;)
Then, to converte the complete DAte-Hour from the csv file you have to put:
awk -F"," '{if (FNR>=3) print $2}' *.csv | awk '{print $1","$2" "$3}'| awk -F, '{system("printf "$1", & date --date=\""$2"\" +%T")}'
Now, i have to design a new file qith the id and values columns....
I am improving a script listing duplicated files that I have written last year (see the second script if you follow the link).
The record separator of the duplicated.log output is the zero byte instead of the carriage return \n. Example:
$> tr '\0' '\n' < duplicated.log
12 dir1/index.htm
12 dir2/index.htm
12 dir3/index.htm
12 dir4/index.htm
12 dir5/index.htm
32 dir6/video.m4v
32 dir7/video.m4v
(in this example, the five files dir1/index.htm, ... and dir5/index.htm have same md5sum and their size is 12 bytes. The other two files dir6/video.m4vand dir7/video.m4v have same md5sum and their content size (du) is 32 bytes.)
As each line is ended by a zero byte (\0) instead of carriage return symbol (\n), blank lines are represented as two successive zero bytes (\0\0).
I use zero byte as line separator because, path-file-name may contain carriage return symbol.
But, doing that I am faced to this issue:
How to 'grep' all duplicates of a specified file from duplicated.log?
(e.g. How to retrieve duplicates of dir1/index.htm?)
I need:
$> ./youranswer.sh "dir1/index.htm" < duplicated.log | tr '\0' '\n'
12 dir1/index.htm
12 dir2/index.htm
12 dir3/index.htm
12 dir4/index.htm
12 dir5/index.htm
$> ./youranswer.sh "dir4/index.htm" < duplicated.log | tr '\0' '\n'
12 dir1/index.htm
12 dir2/index.htm
12 dir3/index.htm
12 dir4/index.htm
12 dir5/index.htm
$> ./youranswer.sh "dir7/video.m4v" < duplicated.log | tr '\0' '\n'
32 dir6/video.m4v
32 dir7/video.m4v
I was thinking about some thing like:
awk 'BEGIN { RS="\0\0" } #input record separator is double zero byte
/filepath/ { print $0 }' duplicated.log
...but filepathmay contain slash symbols / and many other symbols (quotes, carriage return...).
I may have to use perl to deal with this situation...
I am open to any suggestions, questions, other ideas...
You're almost there: use the matching operator ~:
awk -v RS='\0\0' -v pattern="dir1/index.htm" '$0~pattern' duplicated.log
I have just realized that I could use the md5sum instead of the pathname because in my new version of the script I am keeping the md5sum information.
This is the new format I am currently using:
$> tr '\0' '\n' < duplicated.log
12 89e8a208e5f06c65e6448ddeb40ad879 dir1/index.htm
12 89e8a208e5f06c65e6448ddeb40ad879 dir2/index.htm
12 89e8a208e5f06c65e6448ddeb40ad879 dir3/index.htm
12 89e8a208e5f06c65e6448ddeb40ad879 dir4/index.htm
12 89e8a208e5f06c65e6448ddeb40ad879 dir5/index.htm
32 fc191f86efabfca83a94d33aad2f87b4 dir6/video.m4v
32 fc191f86efabfca83a94d33aad2f87b4 dir7/video.m4v
gawk and nawk give wanted result:
$> awk 'BEGIN { RS="\0\0" }
/89e8a208e5f06c65e6448ddeb40ad879/ { print $0 }' duplicated.log |
tr '\0' '\n'
12 89e8a208e5f06c65e6448ddeb40ad879 dir1/index.htm
12 89e8a208e5f06c65e6448ddeb40ad879 dir2/index.htm
12 89e8a208e5f06c65e6448ddeb40ad879 dir3/index.htm
12 89e8a208e5f06c65e6448ddeb40ad879 dir4/index.htm
12 89e8a208e5f06c65e6448ddeb40ad879 dir5/index.htm
But I am still open about your answers :-)
(this current answer is just a workaround)
For curious, below the new (horrible) script under construction...
#!/bin/bash
fifo=$(mktemp -u)
fif2=$(mktemp -u)
dups=$(mktemp -u)
dirs=$(mktemp -u)
menu=$(mktemp -u)
numb=$(mktemp -u)
list=$(mktemp -u)
mkfifo $fifo $fif2
# run processing in background
find . -type f -printf '%11s %P\0' | #print size and filename
tee $fifo | #write in fifo for dialog progressbox
grep -vzZ '^ 0 ' | #ignore empty files
LC_ALL=C sort -z | #sort by size
uniq -Dzw11 | #keep files having same size
while IFS= read -r -d '' line
do #for each file compute md5sum
echo -en "${line:0:11}" "\t" $(md5sum "${line:12}") "\0"
#file size + md5sim + file name + null terminated instead of '\n'
done | #keep the duplicates (same md5sum)
tee $fif2 |
uniq -zs12 -w46 --all-repeated=separate |
tee $dups |
#xargs -d '\n' du -sb 2<&- | #retrieve size of each file
gawk '
function tgmkb(size) {
if(size<1024) return int(size) ; size/=1024;
if(size<1024) return int(size) "K"; size/=1024;
if(size<1024) return int(size) "M"; size/=1024;
if(size<1024) return int(size) "G"; size/=1024;
return int(size) "T"; }
function dirname (path)
{ if(sub(/\/[^\/]*$/, "", path)) return path; else return "."; }
BEGIN { RS=ORS="\0" }
!/^$/ { sz=substr($0,0,11); name=substr($0,48); dir=dirname(name); sizes[dir]+=sz; files[dir]++ }
END { for(dir in sizes) print tgmkb(sizes[dir]) "\t(" files[dir] "\tfiles)\t" dir }' |
LC_ALL=C sort -zrshk1 > $dirs &
pid=$!
tr '\0' '\n' <$fifo |
dialog --title "Collecting files having same size..." --no-shadow --no-lines --progressbox $(tput lines) $(tput cols)
tr '\0' '\n' <$fif2 |
dialog --title "Computing MD5 sum" --no-shadow --no-lines --progressbox $(tput lines) $(tput cols)
wait $pid
DUPLICATES=$( grep -zac -v '^$' $dups) #total number of files concerned
UNIQUES=$( grep -zac '^$' $dups) #number of files, if all redundant are removed
DIRECTORIES=$(grep -zac . $dirs) #number of directories concerned
lins=$(tput lines)
cols=$(tput cols)
cat > $menu <<EOF
--no-shadow
--no-lines
--hline "After selection of the directory, you will choose the redundant files you want to remove"
--menu "There are $DUPLICATES duplicated files within $DIRECTORIES directories.\nThese duplicated files represent $UNIQUES unique files.\nChoose directory to proceed redundant file removal:"
$lins
$cols
$DIRECTORIES
EOF
tr '\n"' "_'" < $dirs |
gawk 'BEGIN { RS="\0" } { print FNR " \"" $0 "\" " }' >> $menu
dialog --file $menu 2> $numb
[[ $? -eq 1 ]] && exit
set -x
dir=$( grep -zam"$(< $numb)" . $dirs | tac -s'\0' | grep -zam1 . | cut -f4- )
md5=$( grep -zam"$(< $numb)" . $dirs | tac -s'\0' | grep -zam1 . | cut -f2 )
grep -zao "$dir/[^/]*$" "$dups" |
while IFS= read -r -d '' line
do
file="${line:47}"
awk 'BEGIN { RS="\0\0" } '"/$md5/"' { print $0 }' >> $list
done
echo -e "
fifo $fifo \t dups $dups \t menu $menu
fif2 $fif2 \t dirs $dirs \t numb $numb \t list $list"
#rm -f $fifo $fif2 $dups $dirs $menu $numb
I want to change the second column to upper case and I want to do it in shell script only. (no one liners!)
#!/bin/sh
# read file line by line
file="/pdump/country.000000.txt"
while read line
do
mycol=`echo $line | awk -F"," '{print $2}'`
mycol_new=`echo $mycol | tr "[:lower:]" [:upper:]`
echo $line | awk -F"," '{print $1 "," $mycol_new "," $3 "," $4 "," $5 "," $6 "," $7 "," $8}'
done < $file
I am not able to replace the $2 with $mycol_new.
Any suggestion?
awk cannot see $mycol_new because it is a shell variable. Here is one way of passing a shell variable into awk using the -v flag:
echo $line | awk -v var="$mycol_new" -F"," '{print $1 "," var "," $3 "," $4 "," $5 "," $6 "," $7 "," $8}'
Here is an alternative method which lets the shell expand $mycol_new:
echo $line | awk -F"," '{print $1 ",'"$mycol_new"'," $3 "," $4 "," $5 "," $6 "," $7 "," $8}'
why no one liners? Doing homework?
$ cat file
one two three four
five six seven eight
$ awk '{$2=toupper($2)}1' file
one TWO three four
five SIX seven eight
If you want to do this all in the shell, then you don't need awk:
IFS=,
while read line; do
set -- $line
a="$1"
b="${2^^}" # assumes bash, use "tr" otherwise
shift 2
set -- "$a" "$b" "$#"
echo "$*"
done < "$file" > "$file.new"