bas64 decode to csv file, sed script - sed

I have the following script to extract text inside "reportBody" text, but I need also to decode this text from a new file to base64. How can I do this?
Here's a script:
cat $1 | tr "\n" "|" | grep -o '<reportBody>.*</reportBody>' | sed 's/\(<reportBody>\|<\/reportBody>\)//g' | sed 's/|/\n/g' | sed '/^\s*$/d' > $2
tried :
cat $1 | tr "\n" "|" | grep -o '<reportBody>.*</reportBody>' | sed 's/\(<reportBody>\|<\/reportBody>\)//g' | sed 's/|/\n/g' | sed '/^\s*$/d' | base64 -d $2 > $2
but it doesn't decode it,
Can I overwrite the same file or at least save decoded text in a new one? without calling addition modules from python etc.
Note: File contains 20k+ symbols to decode.

Related

Finding max value of a specific date awk

I have a file with several rows and with each row containing the following data-
name 20150801|1 20150802|4 20150803|6 20150804|7 20150805|7 20150806|8 20150807|11532 20150808|12399 2015089|12619 20150810|12773 20150811|14182 20150812|27856 20150813|81789 20150814|41168 20150815|28982 20150816|24500 20150817|22534 20150818|3 20150819|4 20150820|47773 20150821|33168 20150822|53541 20150823|46371 20150824|34664 20150825|32249 20150826|29181 20150827|38550 20150828|28843 20150829|3 20150830|23543 20150831|6
name2 20150801|1 20150802|4 20150803|6 20150804|7 20150805|7 20150806|8 20150807|11532 20150808|12399 2015089|12619 20150810|12773 20150811|14182 20150812|27856 20150813|81789 20150814|41168 20150815|28982 20150816|24500 20150817|22534 20150818|3 20150819|4 20150820|47773 20150821|33168 20150822|53541 20150823|46371 20150824|34664 20150825|32249 20150826|29181 20150827|38550 20150828|28843 20150829|3 20150830|23543 20150831|6
The pipe separated value indicates the value for each of the dates in the month.
Each row has the same format with same number of columns.
The first column name indicates a unique name for the row e.g. 20150818 is yyyyddmm
Given a specific date, how do I extract the name of the row that has the largest value on that day?
I think you mean this:
awk -v date=20150823 '{for(f=2;f<=NF;f++){split($f,a,"|");if(a[1]==date&&a[2]>max){max=a[2];name=$1}}}END{print name,max}' YourFile
So, you pass the date you are looking for in as a variable called date. You then iterate through all fields on the line, and split the date and value of each into an array using | as separator - a[1] has the date, a[2] has the value. If the date matches and the value is greater than any previously seen maximum, save this as the new maximum and save the first field from this line for printing at the end.
You couldn't have taken 5 seconds to give your sample input different values? Anyway, this may work when run against input that actually has different values for the dates:
$ cat tst.awk
BEGIN { FS="[|[:space:]]+" }
FNR==1 {
for (i=2;i<=NF;i+=2) {
if ( $i==tgt ) {
f = i+1
}
}
max = $f
}
$f >= max { max=$f; name=$1 }
END { print name }
$ awk -v tgt=20150801 -f tst.awk file
name2
As a quick&dirty solution, we can perform this in following Unix commands:
yourdatafile=<yourdatafile>
yourdate=<yourdate>
cat $yourdatafile | sed 's/|/_/g' | awk -F "${yourdate}_" '{print $1" "$2}' | sed 's/[0-9]*_[0-9]*//g' | awk '{print $1" "$2}' |sort -k 2n | tail -n 1
With following sample data:
$ cat $yourdatafile
Alice 20150801|44 20150802|21 20150803|7 20150804|76 20150805|71
Bob 20150801|31 20150802|5 20150803|21 20150804|133 20150805|71
and yourdate=20150803 we get:
$ cat $yourdatafile | sed 's/|/_/g' | awk -F "${yourdate}_" '{print $1" "$2}' | sed 's/[0-9]*_[0-9]*//g' | awk '{print $1" "$2}' |sort -k 2n | tail -n 1
Bob 21
and for yourdate=20150802 we get:
$ cat $yourdatafile | sed 's/|/_/g' | awk -F "${yourdate}_" '{print $2" "$1}' | sed 's/[0-9]*_[0-9]*//g' | awk '{print $2" "$1}' | sort -k 2n | tail -n 1
Alice 21
The drawback is that only one line is printed the highest value of a day was achieved by more than one name as can be seen with:
$ yourdate=20150805; cat $yourdatafile | sed 's/|/_/g' | awk -F "${yourdate}_" '{print $2" "$1}' | sed 's/[0-9]*_[0-9]*//g' | awk '{print $2" "$1}' | sort -k 2n | tail -n 1
Bob 71
I hope that helps anyway.

how to put | between content lines of a text file?

I have a file containing:
L1
L2
L3
.
.
.
L512
I want to change its content to :
L1 | L2 | L3 | ... | L512
It seems so easy , but its now 1 hour Im sitting and trying to make it, I tried to do it by sed, but didn't get what I want. It seems that sed just inserts empty lines between the content, any suggestion please?
With sed this requires to read the whole input into a buffer and afterwards replace all newlines by |, like this:
sed ':a;N;$!ba;s/\n/ | /g' input.txt
Part 1 - buffering input
:a defines a label called 'a'
N gets the next line from input and appends it to the pattern buffer
$!ba jumps to a unless the end of input is reached
Part 2 - replacing newlines by |
s/\n/|/ execute the substitute command on the pattern buffern
As you can see, this is very inefficient since it requires to:
read the complete input into memory
operate three times on the input: 1. reading, 2. substituting, 3. printing
Therefore I would suggest to use awk which can do it in one loop:
awk 'NR==1{printf $0;next}{printf " | "$0}END{print ""}' input.txt
Here is one sed
sed ':a;N;s/\n/ | /g;ta' file
L1 | L2 | L3 | ... | L512
And one awk
awk '{printf("%s%s",sep,$0);sep=" | "} END {print ""}' file
L1 | L2 | L3 | ... | L512
perl -pe 's/\n/ |/g unless(eof)' file
if space between | is not mandatory
tr "\n" '|' YourFile
Several options, including those mentioned here:
paste -sd'|' file
sed ':a;N;s/\n/ | /g;ta' file
sed ':a;N;$!ba;s/\n/ | /g' file
perl -0pe 's/\n/ | /g;s/ \| $/\n/' file
perl -0nE 'say join " | ", split /\n/' file
perl -E 'chomp(#x=<>); say join " | ", #x' file
mapfile -t ary < file; (IFS="|"; echo "${ary[*]}")
awk '{printf("%s%s",sep,$0);sep=" | "} END {print ""}' file

Removing matching text from line

I have a example cut down from a log file.
112 172.172.172.1#50912 (ssl.bing.com):
I would like some how to remove the # and numbers after and (): from the url.
Would like the result.
112 172.172.172.1 ssl.bing.com
Here is the sed oneliner I have been working on.
cat newdns.log | sed -e 's/.*query: //' | cut -f 1 -d' ' | sort | uniq -c | sort -k2 > old.log
Thanks
Using sed, you could say:
sed 's/#[0-9]*//;s/(\(.*\)):$/\1/' filename
or, in a single substitution:
sed 's/#[0-9]* *(\(.*\)):$/ \1/' filename
Another sed:
sed -r 's/#[^ ]+|[():]//g'
$ echo '112 172.172.172.1#50912 (ssl.bing.com):' | sed -r 's/#[^ ]+|[():]//g'
112 172.172.172.1 ssl.bing.com

Pattern extraction using SED or AWK

How do I extract 68 from v1+r0.68?
Using awk, returns everything after the last '.'
echo "v1+r0.68" | awk -F. '{print $NF}'
Using sed to get the number after the last dot:
echo 'v1+r0.68' | sed 's/.*[.]\([0-9][0-9]*\)$/\1/'
grep is good at extracting things:
kent$ echo " v1+r0.68"|grep -oE "[0-9]+$"
68
Match the digit string before the end of the line using grep:
$ echo 'v1+r0.68' | grep -Eo '[0-9]+$'
68
Or match any digits after a .
$ echo 'v1+r0.68' | grep -Po '(?<=\.)\d+'
68
Print everything after the . with awk:
echo "v1+r0.68" | awk -F. '{print $NF}'
68
Substitute everything before the . with sed:
echo "v1+r0.68" | sed 's/.*\.//'
68
type man grep
and you will see
...
-o, --only-matching
Show only the part of a matching line that matches PATTERN.
then type echo 'v1+r0.68' | grep -o '68'
if you want it any where special do:
echo 'v1+r0.68' | grep -o '68' > anyWhereSpecial.file_ending

Insert comma after certain byte range

I'm trying to turn a big list of data into a CSV. Its basically a giant list with no spaces, and the rows are separated by newlines. I have made a bash script that basically loops through the document, awks out the line, cuts the byte range, and then adds a comma and appends it to the end of the line. It looks like this:
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 1-12 | tr -d '\n' >> $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 13-17 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 18-22 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 23-34 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
The problem is this is EXTREMELY slow, and the data has about 400k rows. I know there must be a better way to accomplish this. Essentially I just need to add a comma after every 12/17/22/34 etc character of a line.
Any help is appreciated, thank you!
There are many many ways to do this with Perl. Here is one way:
perl -pe 's/(.{12})(.{5})(.{5})(.{12})/$1,$2,$3,$4,/' < input-file > output-file
The matching pattern in the substitution captures four groups of text from the beginning of each line with 12, 5, 5, and 12 arbitrary characters. The replacement pattern places a comma after each group.
With GNU awk, you could write
gawk 'BEGIN {FIELDWIDTHS="12 5 5 12"; OFS=","} {$1=$1; print}'
The $1=$1 part is to force awk to rewrite the like, incorporating the output field separator, without changing anything.
This is very much a job for substr.
use strict;
use warnings;
my #widths = (12, 5, 5, 12);
my $offset;
while (my $line = <DATA>) {
for my $width (#widths) {
$offset += $width;
substr $line, $offset, 0, ',';
++$offset;
}
print $line;
}
__DATA__
1234567890123456789012345678901234567890
output
123456789012,34567,89012,345678901234,567890