Currently, this shows only numbers.
sed 's/[^0-9]*//g')
How can I tell sed to display ONLY the largest number found, taking into account ONLY the line which contains the word "Page" ?
sed '/Page/!d; s/[^0-9]//g' | sort -n | tail -1
or
awk '/Page/ {gsub(/[^0-9]/,""); if ($0 > max) max = $0} END {print max}'
grep Page filename | awk '{print $2}' | sort -n | tail -n 1
This assumes the page number is the 2nd word of the line (if not, change the awk command as appropriate)
Related
I have a file with several rows and with each row containing the following data-
name 20150801|1 20150802|4 20150803|6 20150804|7 20150805|7 20150806|8 20150807|11532 20150808|12399 2015089|12619 20150810|12773 20150811|14182 20150812|27856 20150813|81789 20150814|41168 20150815|28982 20150816|24500 20150817|22534 20150818|3 20150819|4 20150820|47773 20150821|33168 20150822|53541 20150823|46371 20150824|34664 20150825|32249 20150826|29181 20150827|38550 20150828|28843 20150829|3 20150830|23543 20150831|6
name2 20150801|1 20150802|4 20150803|6 20150804|7 20150805|7 20150806|8 20150807|11532 20150808|12399 2015089|12619 20150810|12773 20150811|14182 20150812|27856 20150813|81789 20150814|41168 20150815|28982 20150816|24500 20150817|22534 20150818|3 20150819|4 20150820|47773 20150821|33168 20150822|53541 20150823|46371 20150824|34664 20150825|32249 20150826|29181 20150827|38550 20150828|28843 20150829|3 20150830|23543 20150831|6
The pipe separated value indicates the value for each of the dates in the month.
Each row has the same format with same number of columns.
The first column name indicates a unique name for the row e.g. 20150818 is yyyyddmm
Given a specific date, how do I extract the name of the row that has the largest value on that day?
I think you mean this:
awk -v date=20150823 '{for(f=2;f<=NF;f++){split($f,a,"|");if(a[1]==date&&a[2]>max){max=a[2];name=$1}}}END{print name,max}' YourFile
So, you pass the date you are looking for in as a variable called date. You then iterate through all fields on the line, and split the date and value of each into an array using | as separator - a[1] has the date, a[2] has the value. If the date matches and the value is greater than any previously seen maximum, save this as the new maximum and save the first field from this line for printing at the end.
You couldn't have taken 5 seconds to give your sample input different values? Anyway, this may work when run against input that actually has different values for the dates:
$ cat tst.awk
BEGIN { FS="[|[:space:]]+" }
FNR==1 {
for (i=2;i<=NF;i+=2) {
if ( $i==tgt ) {
f = i+1
}
}
max = $f
}
$f >= max { max=$f; name=$1 }
END { print name }
$ awk -v tgt=20150801 -f tst.awk file
name2
As a quick&dirty solution, we can perform this in following Unix commands:
yourdatafile=<yourdatafile>
yourdate=<yourdate>
cat $yourdatafile | sed 's/|/_/g' | awk -F "${yourdate}_" '{print $1" "$2}' | sed 's/[0-9]*_[0-9]*//g' | awk '{print $1" "$2}' |sort -k 2n | tail -n 1
With following sample data:
$ cat $yourdatafile
Alice 20150801|44 20150802|21 20150803|7 20150804|76 20150805|71
Bob 20150801|31 20150802|5 20150803|21 20150804|133 20150805|71
and yourdate=20150803 we get:
$ cat $yourdatafile | sed 's/|/_/g' | awk -F "${yourdate}_" '{print $1" "$2}' | sed 's/[0-9]*_[0-9]*//g' | awk '{print $1" "$2}' |sort -k 2n | tail -n 1
Bob 21
and for yourdate=20150802 we get:
$ cat $yourdatafile | sed 's/|/_/g' | awk -F "${yourdate}_" '{print $2" "$1}' | sed 's/[0-9]*_[0-9]*//g' | awk '{print $2" "$1}' | sort -k 2n | tail -n 1
Alice 21
The drawback is that only one line is printed the highest value of a day was achieved by more than one name as can be seen with:
$ yourdate=20150805; cat $yourdatafile | sed 's/|/_/g' | awk -F "${yourdate}_" '{print $2" "$1}' | sed 's/[0-9]*_[0-9]*//g' | awk '{print $2" "$1}' | sort -k 2n | tail -n 1
Bob 71
I hope that helps anyway.
I have a file containing:
L1
L2
L3
.
.
.
L512
I want to change its content to :
L1 | L2 | L3 | ... | L512
It seems so easy , but its now 1 hour Im sitting and trying to make it, I tried to do it by sed, but didn't get what I want. It seems that sed just inserts empty lines between the content, any suggestion please?
With sed this requires to read the whole input into a buffer and afterwards replace all newlines by |, like this:
sed ':a;N;$!ba;s/\n/ | /g' input.txt
Part 1 - buffering input
:a defines a label called 'a'
N gets the next line from input and appends it to the pattern buffer
$!ba jumps to a unless the end of input is reached
Part 2 - replacing newlines by |
s/\n/|/ execute the substitute command on the pattern buffern
As you can see, this is very inefficient since it requires to:
read the complete input into memory
operate three times on the input: 1. reading, 2. substituting, 3. printing
Therefore I would suggest to use awk which can do it in one loop:
awk 'NR==1{printf $0;next}{printf " | "$0}END{print ""}' input.txt
Here is one sed
sed ':a;N;s/\n/ | /g;ta' file
L1 | L2 | L3 | ... | L512
And one awk
awk '{printf("%s%s",sep,$0);sep=" | "} END {print ""}' file
L1 | L2 | L3 | ... | L512
perl -pe 's/\n/ |/g unless(eof)' file
if space between | is not mandatory
tr "\n" '|' YourFile
Several options, including those mentioned here:
paste -sd'|' file
sed ':a;N;s/\n/ | /g;ta' file
sed ':a;N;$!ba;s/\n/ | /g' file
perl -0pe 's/\n/ | /g;s/ \| $/\n/' file
perl -0nE 'say join " | ", split /\n/' file
perl -E 'chomp(#x=<>); say join " | ", #x' file
mapfile -t ary < file; (IFS="|"; echo "${ary[*]}")
awk '{printf("%s%s",sep,$0);sep=" | "} END {print ""}' file
I have a example cut down from a log file.
112 172.172.172.1#50912 (ssl.bing.com):
I would like some how to remove the # and numbers after and (): from the url.
Would like the result.
112 172.172.172.1 ssl.bing.com
Here is the sed oneliner I have been working on.
cat newdns.log | sed -e 's/.*query: //' | cut -f 1 -d' ' | sort | uniq -c | sort -k2 > old.log
Thanks
Using sed, you could say:
sed 's/#[0-9]*//;s/(\(.*\)):$/\1/' filename
or, in a single substitution:
sed 's/#[0-9]* *(\(.*\)):$/ \1/' filename
Another sed:
sed -r 's/#[^ ]+|[():]//g'
$ echo '112 172.172.172.1#50912 (ssl.bing.com):' | sed -r 's/#[^ ]+|[():]//g'
112 172.172.172.1 ssl.bing.com
I'm trying to turn a big list of data into a CSV. Its basically a giant list with no spaces, and the rows are separated by newlines. I have made a bash script that basically loops through the document, awks out the line, cuts the byte range, and then adds a comma and appends it to the end of the line. It looks like this:
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 1-12 | tr -d '\n' >> $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 13-17 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 18-22 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 23-34 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
The problem is this is EXTREMELY slow, and the data has about 400k rows. I know there must be a better way to accomplish this. Essentially I just need to add a comma after every 12/17/22/34 etc character of a line.
Any help is appreciated, thank you!
There are many many ways to do this with Perl. Here is one way:
perl -pe 's/(.{12})(.{5})(.{5})(.{12})/$1,$2,$3,$4,/' < input-file > output-file
The matching pattern in the substitution captures four groups of text from the beginning of each line with 12, 5, 5, and 12 arbitrary characters. The replacement pattern places a comma after each group.
With GNU awk, you could write
gawk 'BEGIN {FIELDWIDTHS="12 5 5 12"; OFS=","} {$1=$1; print}'
The $1=$1 part is to force awk to rewrite the like, incorporating the output field separator, without changing anything.
This is very much a job for substr.
use strict;
use warnings;
my #widths = (12, 5, 5, 12);
my $offset;
while (my $line = <DATA>) {
for my $width (#widths) {
$offset += $width;
substr $line, $offset, 0, ',';
++$offset;
}
print $line;
}
__DATA__
1234567890123456789012345678901234567890
output
123456789012,34567,89012,345678901234,567890
I am having trouble on stripping characters within a filename.
For example:
1326847080_MUNDO-Cinco-Cosas-Que-Aprendimos-Del-Debate-De-Los-Republicanos-1.xml
1326836220_PLANETACNN-Una-Granja-De-Mariposas-Ayuda-A-Reducir-La-Tala-De-Bosques-En-Tanzania-3.xml
This is the output I want:
1326847080_MUNDO-1.xml
1326836220_PLANETACNN-3.xml
for i in *.xml
do
j=$(echo $i | sed -e s/-.*-/-/)
echo mv $i $j
done
or in one line:
for i in *.xml; do echo mv $i $(echo $i | sed -e s/-.*-/-/); done
remove echo to actually perform the mv command.
Or, without sed, using bash builtin pattern replacement:
for i in *.xml; do echo mv $i ${i//-*-/-}; done
rename to the rescue, with Perl regular expressions. This command will show which moves will be made; just remove -n to actually rename the files:
$ rename -n 's/([^-]+)-.*-([^-]+)/$1-$2/' *.xml
1326836220_PLANETACNN-Una-Granja-De-Mariposas-Ayuda-A-Reducir-La-Tala-De-Bosques-En-Tanzania-3.xml renamed as 1326836220_PLANETACNN-3.xml
1326847080_MUNDO-Cinco-Cosas-Que-Aprendimos-Del-Debate-De-Los-Republicanos-1.xml renamed as 1326847080_MUNDO-1.xml
The regular expression explained:
Save the part up to (but excluding) the first dash as match 1.
Save the part after the last dash as match 2.
Replace the part from the start of match 1 to the end of match 2 with match 1, a dash, and match 2.
sorry for the late reply , but i saw it today :( .
I think you are looking for the following
input file ::
cat > abc
1326847080_MUNDO-Cinco-Cosas-Que-Aprendimos-Del-Debate-De-Los-Republicanos-1.xml
1326836220_PLANETACNN-Una-Granja-De-Mariposas-Ayuda-A-Reducir-La-Tala-De-Bosques-En-Tanzania-3.xml
code : (its a bit too basic , even for my liking)
while read line
do
echo $line ;
fname=`echo $line | cut -d"-" -f1`;
lfield=`echo $line | sed -n 's/\-/ /gp' | wc -w`;
lname=`echo $line | cut -d"-" -f${lfield}`;
new_name="${fname}-${lname}";
echo "new name is :: $new_name";
done < abc ;
output ::
1326847080_MUNDO-Cinco-Cosas-Que-Aprendimos-Del-Debate-De-Los-Republicanos-1.xml
new name is :: 1326847080_MUNDO-1.xml
1326836220_PLANETACNN-Una-Granja-De-Mariposas-Ayuda-A-Reducir-La-Tala-De-Bosques-En-Tanzania-3.xml
new name is :: 1326836220_PLANETACNN-3.xml