Escape percent sign (%) in hexdump format - text-processing

Problem
I'm trying to emit a hex string like:
echo hello | hexdump -ve '/1 "_%02X"' ; echo
but with % instead.
Actual vs Expected
echo hello | hexdump -ve '/1 "%%%02X"' ; echo
fails with
hexdump: bad conversion character %%
Question
Is there any way to escape % in hexdump format string?

I don't see any way to get hexdump to emit a '%' character directly. Perhaps you could continue to emit the '_' character and then pipe the result through sed to convert the '_' into a '%'. Something like this:
echo hello | hexdump -ve '/1 "_%02X"' | sed -e 's/_/%/g'
which produces:
%68%65%6C%6C%6F%0A

Related

Hot to replace newline characters with a string in sed

First, this is not a duplicate of, e.g., How can I replace each newline (\n) with a space using sed?
What I want is to exactly replace every newline (\n) in a string, like so:
printf '%s' $'' | sed '...; s/\n/\\&/g'
should result in the empty string
printf '%s' $'a' | sed '...; s/\n/\\&/g'
should result in a (not followed by a newline)
printf '%s' $'a\n' | sed '...; s/\n/\\&/g'
should result in
a\
(the trailing \n of the final line should be replaced, too)
A solution like :a;N;$!ba; s/\n/\\&/g from the other question doesn't do that properly:
printf '%s' $'' | sed ':a;N;$!ba; s/\n/\\&/g' | hd
works;
printf '%s' $'a' | sed ':a;N;$!ba;s/\n/\\&/g' | hd
00000000 61 |a|
00000001
works;
printf '%s' $'a\nb' | sed ':a;N;$!ba;s/\n/\\&/g' | hd
00000000 61 5c 0a 62 |a\.b|
00000004
works;
but when there's a trailing \n on the last line
printf '%s' $'a\nb\n' | sed ':a;N;$!ba;s/\n/\\&/g' | hd
00000000 61 5c 0a 62 0a |a\.b.|
00000005
it doesn't get quoted.
Easier to use perl than sed, since it has (by default, at least) a more straightforward treatment of the newlines in its input:
printf '%s' '' | perl -pe 's/\n/\\\n/' # Empty string
printf '%s' a | perl -pe 's/\n/\\\n/' # a
printf '%s\n' a | perl -pe 's/\n/\\\n/' # a\<newline>
printf '%s\n' a b | perl -pe 's/\n/\\\n/' # a\<newline>b\<newline>
# etc
If your inputs aren't huge, you could use
perl -0777 -pe 's/\n/\\\n/g'
instead to read the entire input at once instead of line by line, which can be more efficient.
how to replace newline charackters with a string in sed
It's not possible. From sed script point of view, the trailing line missing or not makes no difference and is undetectable.
Aaaanyway, use GNU sed with sed -z:
sed -z 's/\n/\\\n/g'
GNU awk can use the RT variable to detect a missing record terminator:
$ printf 'a\nb\n' | gawk '{ORS=(RT != "" ? "\\" : "") RT} 1'
a\
b\
$ printf 'a\nb' | gawk '{ORS=(RT != "" ? "\\" : "") RT} 1'
a\
b$
This adds a "\" before each non-empty record terminator.
Using any awk:
$ printf 'a\nb\n\n' | awk '{printf "%s%s", sep, $0; sep="\\\n"}'
a\
b\
$ printf 'a\nb\n' | awk '{printf "%s%s", sep, $0; sep="\\\n"}'
a\
b$
Or { cat file; echo; } | awk ... – always add a newline to the input.

bas64 decode to csv file, sed script

I have the following script to extract text inside "reportBody" text, but I need also to decode this text from a new file to base64. How can I do this?
Here's a script:
cat $1 | tr "\n" "|" | grep -o '<reportBody>.*</reportBody>' | sed 's/\(<reportBody>\|<\/reportBody>\)//g' | sed 's/|/\n/g' | sed '/^\s*$/d' > $2
tried :
cat $1 | tr "\n" "|" | grep -o '<reportBody>.*</reportBody>' | sed 's/\(<reportBody>\|<\/reportBody>\)//g' | sed 's/|/\n/g' | sed '/^\s*$/d' | base64 -d $2 > $2
but it doesn't decode it,
Can I overwrite the same file or at least save decoded text in a new one? without calling addition modules from python etc.
Note: File contains 20k+ symbols to decode.

Removing matching text from line

I have a example cut down from a log file.
112 172.172.172.1#50912 (ssl.bing.com):
I would like some how to remove the # and numbers after and (): from the url.
Would like the result.
112 172.172.172.1 ssl.bing.com
Here is the sed oneliner I have been working on.
cat newdns.log | sed -e 's/.*query: //' | cut -f 1 -d' ' | sort | uniq -c | sort -k2 > old.log
Thanks
Using sed, you could say:
sed 's/#[0-9]*//;s/(\(.*\)):$/\1/' filename
or, in a single substitution:
sed 's/#[0-9]* *(\(.*\)):$/ \1/' filename
Another sed:
sed -r 's/#[^ ]+|[():]//g'
$ echo '112 172.172.172.1#50912 (ssl.bing.com):' | sed -r 's/#[^ ]+|[():]//g'
112 172.172.172.1 ssl.bing.com

Pattern extraction using SED or AWK

How do I extract 68 from v1+r0.68?
Using awk, returns everything after the last '.'
echo "v1+r0.68" | awk -F. '{print $NF}'
Using sed to get the number after the last dot:
echo 'v1+r0.68' | sed 's/.*[.]\([0-9][0-9]*\)$/\1/'
grep is good at extracting things:
kent$ echo " v1+r0.68"|grep -oE "[0-9]+$"
68
Match the digit string before the end of the line using grep:
$ echo 'v1+r0.68' | grep -Eo '[0-9]+$'
68
Or match any digits after a .
$ echo 'v1+r0.68' | grep -Po '(?<=\.)\d+'
68
Print everything after the . with awk:
echo "v1+r0.68" | awk -F. '{print $NF}'
68
Substitute everything before the . with sed:
echo "v1+r0.68" | sed 's/.*\.//'
68
type man grep
and you will see
...
-o, --only-matching
Show only the part of a matching line that matches PATTERN.
then type echo 'v1+r0.68' | grep -o '68'
if you want it any where special do:
echo 'v1+r0.68' | grep -o '68' > anyWhereSpecial.file_ending

Insert comma after certain byte range

I'm trying to turn a big list of data into a CSV. Its basically a giant list with no spaces, and the rows are separated by newlines. I have made a bash script that basically loops through the document, awks out the line, cuts the byte range, and then adds a comma and appends it to the end of the line. It looks like this:
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 1-12 | tr -d '\n' >> $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 13-17 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 18-22 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
awk -v n=$x 'NR==n { print;exit}' PROP.txt | cut -c 23-34 | tr -d '\n' | xargs -I {} sed -i '' -e 's~$~,{}~' $x.tmp
The problem is this is EXTREMELY slow, and the data has about 400k rows. I know there must be a better way to accomplish this. Essentially I just need to add a comma after every 12/17/22/34 etc character of a line.
Any help is appreciated, thank you!
There are many many ways to do this with Perl. Here is one way:
perl -pe 's/(.{12})(.{5})(.{5})(.{12})/$1,$2,$3,$4,/' < input-file > output-file
The matching pattern in the substitution captures four groups of text from the beginning of each line with 12, 5, 5, and 12 arbitrary characters. The replacement pattern places a comma after each group.
With GNU awk, you could write
gawk 'BEGIN {FIELDWIDTHS="12 5 5 12"; OFS=","} {$1=$1; print}'
The $1=$1 part is to force awk to rewrite the like, incorporating the output field separator, without changing anything.
This is very much a job for substr.
use strict;
use warnings;
my #widths = (12, 5, 5, 12);
my $offset;
while (my $line = <DATA>) {
for my $width (#widths) {
$offset += $width;
substr $line, $offset, 0, ',';
++$offset;
}
print $line;
}
__DATA__
1234567890123456789012345678901234567890
output
123456789012,34567,89012,345678901234,567890