Split results of du command by new line - sed

I have got a list of the top 20 files/folders that are taking the most amount of room on my hard drive. I would like to separate them into size path/to/file. Below is what I have done so far.
I am using: var=$(du -a -g /folder/ | sort -n -r | head -n 20). It returns the following:
120 /path/to/file
115 /path/to/another/file
110 /file/path/
etc.
I have tried the following code to split it up into single lines.
for i in $(echo $var | sed "s/\n/ /g")
do
echo "$i"
done
The result I would like is as follows:
120 /path/to/file,
115 /path/to/another/file,
110 /file/path/,
etc.
This however is the result I am getting:
120,
/path/to/file,
115,
/path/to/another/file,
110,
/file/path/,
etc.

I think awk will be easier, can be combined with a pipe to the original command:
du -a -g /folder/ | sort -n -r | head -n 20 | awk '{ print $1, $2 "," }'
If you can not create a single pipe, and have to use $var
echo "$var" | awk '{ print $1, $2 "," }'

Related

wc -c gives one more than I expected, why is that?

echo '2003'| wc -c
I thought it would give me 4, but it turned to be 5, what is that additional byte?
Because echo will get a new line.
echo "2014" | wc -c
it will get 5
printf "2014" | wc -c
it will get 4 where printf will not add a new line.
echo contains a built-in switch, -n, to remove newline. So running:
echo -n "2021" | wc -c
Will output the expected 4.
echo adds new line which is causing the issue.
As mentioned by "KyChen", you can use printf or:
a="2014 ;
echo $a |awk '{print length}'

Finding max value of a specific date awk

I have a file with several rows and with each row containing the following data-
name 20150801|1 20150802|4 20150803|6 20150804|7 20150805|7 20150806|8 20150807|11532 20150808|12399 2015089|12619 20150810|12773 20150811|14182 20150812|27856 20150813|81789 20150814|41168 20150815|28982 20150816|24500 20150817|22534 20150818|3 20150819|4 20150820|47773 20150821|33168 20150822|53541 20150823|46371 20150824|34664 20150825|32249 20150826|29181 20150827|38550 20150828|28843 20150829|3 20150830|23543 20150831|6
name2 20150801|1 20150802|4 20150803|6 20150804|7 20150805|7 20150806|8 20150807|11532 20150808|12399 2015089|12619 20150810|12773 20150811|14182 20150812|27856 20150813|81789 20150814|41168 20150815|28982 20150816|24500 20150817|22534 20150818|3 20150819|4 20150820|47773 20150821|33168 20150822|53541 20150823|46371 20150824|34664 20150825|32249 20150826|29181 20150827|38550 20150828|28843 20150829|3 20150830|23543 20150831|6
The pipe separated value indicates the value for each of the dates in the month.
Each row has the same format with same number of columns.
The first column name indicates a unique name for the row e.g. 20150818 is yyyyddmm
Given a specific date, how do I extract the name of the row that has the largest value on that day?
I think you mean this:
awk -v date=20150823 '{for(f=2;f<=NF;f++){split($f,a,"|");if(a[1]==date&&a[2]>max){max=a[2];name=$1}}}END{print name,max}' YourFile
So, you pass the date you are looking for in as a variable called date. You then iterate through all fields on the line, and split the date and value of each into an array using | as separator - a[1] has the date, a[2] has the value. If the date matches and the value is greater than any previously seen maximum, save this as the new maximum and save the first field from this line for printing at the end.
You couldn't have taken 5 seconds to give your sample input different values? Anyway, this may work when run against input that actually has different values for the dates:
$ cat tst.awk
BEGIN { FS="[|[:space:]]+" }
FNR==1 {
for (i=2;i<=NF;i+=2) {
if ( $i==tgt ) {
f = i+1
}
}
max = $f
}
$f >= max { max=$f; name=$1 }
END { print name }
$ awk -v tgt=20150801 -f tst.awk file
name2
As a quick&dirty solution, we can perform this in following Unix commands:
yourdatafile=<yourdatafile>
yourdate=<yourdate>
cat $yourdatafile | sed 's/|/_/g' | awk -F "${yourdate}_" '{print $1" "$2}' | sed 's/[0-9]*_[0-9]*//g' | awk '{print $1" "$2}' |sort -k 2n | tail -n 1
With following sample data:
$ cat $yourdatafile
Alice 20150801|44 20150802|21 20150803|7 20150804|76 20150805|71
Bob 20150801|31 20150802|5 20150803|21 20150804|133 20150805|71
and yourdate=20150803 we get:
$ cat $yourdatafile | sed 's/|/_/g' | awk -F "${yourdate}_" '{print $1" "$2}' | sed 's/[0-9]*_[0-9]*//g' | awk '{print $1" "$2}' |sort -k 2n | tail -n 1
Bob 21
and for yourdate=20150802 we get:
$ cat $yourdatafile | sed 's/|/_/g' | awk -F "${yourdate}_" '{print $2" "$1}' | sed 's/[0-9]*_[0-9]*//g' | awk '{print $2" "$1}' | sort -k 2n | tail -n 1
Alice 21
The drawback is that only one line is printed the highest value of a day was achieved by more than one name as can be seen with:
$ yourdate=20150805; cat $yourdatafile | sed 's/|/_/g' | awk -F "${yourdate}_" '{print $2" "$1}' | sed 's/[0-9]*_[0-9]*//g' | awk '{print $2" "$1}' | sort -k 2n | tail -n 1
Bob 71
I hope that helps anyway.

Remove eval in an existing code

I am working on an existing shell script code which has eval. I feel like that eval is unnecessary here and wanted to remove to avoid Injection.
Could you please check the code and advise why there is an eval in the code.
FILE_PATH=`echo $1 | awk '{ print $10 }' | cut -f2 -d'"'
FILE_PATH=`(eval "echo ${FILE_PATH}")`
if $1 is something like that ---"~/tttttttt.txt.
FILE_PATH will be ~/tttttttt.txt without eval.
but with eval;
FILE_PATH will be /home/user/tttttttt.txt
#!/bin/bash
path='-----"~/tttttttt.txt'
FILE_PATH=`echo $path | awk '{ print $1 }' | cut -f2 -d'"'`
echo "${FILE_PATH}"
ls -lart ${FILE_PATH}
FILE_PATH=`(eval "echo ${FILE_PATH}")`
echo $FILE_PATH
ls -lart ${FILE_PATH}
if run above script, output:
~/tttttttt.txt
ls: cannot access ~/tttttttt.txt: No such file or directory
/home/user/tttttttt.txt
-rw-rw-r-- 1 user user 0 Aug 26 15:54 /home/user/tttttttt.txt

Removing matching text from line

I have a example cut down from a log file.
112 172.172.172.1#50912 (ssl.bing.com):
I would like some how to remove the # and numbers after and (): from the url.
Would like the result.
112 172.172.172.1 ssl.bing.com
Here is the sed oneliner I have been working on.
cat newdns.log | sed -e 's/.*query: //' | cut -f 1 -d' ' | sort | uniq -c | sort -k2 > old.log
Thanks
Using sed, you could say:
sed 's/#[0-9]*//;s/(\(.*\)):$/\1/' filename
or, in a single substitution:
sed 's/#[0-9]* *(\(.*\)):$/ \1/' filename
Another sed:
sed -r 's/#[^ ]+|[():]//g'
$ echo '112 172.172.172.1#50912 (ssl.bing.com):' | sed -r 's/#[^ ]+|[():]//g'
112 172.172.172.1 ssl.bing.com

Pattern extraction using SED or AWK

How do I extract 68 from v1+r0.68?
Using awk, returns everything after the last '.'
echo "v1+r0.68" | awk -F. '{print $NF}'
Using sed to get the number after the last dot:
echo 'v1+r0.68' | sed 's/.*[.]\([0-9][0-9]*\)$/\1/'
grep is good at extracting things:
kent$ echo " v1+r0.68"|grep -oE "[0-9]+$"
68
Match the digit string before the end of the line using grep:
$ echo 'v1+r0.68' | grep -Eo '[0-9]+$'
68
Or match any digits after a .
$ echo 'v1+r0.68' | grep -Po '(?<=\.)\d+'
68
Print everything after the . with awk:
echo "v1+r0.68" | awk -F. '{print $NF}'
68
Substitute everything before the . with sed:
echo "v1+r0.68" | sed 's/.*\.//'
68
type man grep
and you will see
...
-o, --only-matching
Show only the part of a matching line that matches PATTERN.
then type echo 'v1+r0.68' | grep -o '68'
if you want it any where special do:
echo 'v1+r0.68' | grep -o '68' > anyWhereSpecial.file_ending