Can I get objdump --full-contents to only print the raw text? - text-processing

I have an object file which with a section containing proper ASCII (or Latin-1?) text. So, if I write:
$ objdump -s my_file.so --section=.rodata
looks like this (only presenting a few lines from the middle, it's obviously very long):
070a80 656d3b0a 73697a65 5f742073 68617265 em;.size_t share
070a90 644d656d 50657242 6c6f636b 3b0a696e dMemPerBlock;.in
070aa0 74207265 67735065 72426c6f 636b3b0a t regsPerBlock;.
070ab0 696e7420 77617270 53697a65 3b0a7369 int warpSize;.si
070ac0 7a655f74 206d656d 50697463 683b0a69 ze_t memPitch;.i
070ad0 6e74206d 61785468 72656164 73506572 nt maxThreadsPer
070ae0 426c6f63 6b3b0a69 6e74206d 61785468 Block;.int maxTh
My question: Can I get objdump to just print the text, without the line indices and the hexadecimal values? And to print at least all the printing characters properly (e.g. a newline for 0x0a)? Or - must I perform a bunch of text processing to correlate the dots to their values, replace them with the proper characters, cut the line prefixes, drop the artificial newlines etc?

Use xxd
Using xxd will avoid cuting and awking, and is your best solution short of a objdump flag.
Saving your formatted hexdump to file temp, we can pipe the result to xxd -r, (which expects such a formatted hexdump):
$ cat temp | xxd -r
em;
size_t sharedMemPerBlock;
int regsPerBlock;
int warpSize;
size_t memPitch;
int maxThreadsPerBlock;
int maxTh
If you need to pass in a hex string with no line numbers or ascii representation instead, use xxd -r -p.

Related

How to escape special char when use glib.string.escape()

Due to the document of glib.string.escape()
Escapes the special characters '\b', '\f', '\n', '\r', '\t', '\v', '\' and '"' in the string source by inserting a '\' before them.
Additionally all characters in the range 0x01-0x1F (everything below SPACE) and in the range 0x7F-0xFF (all non-ASCII chars) are replaced with a '\' followed by their octal representation. Characters supplied in exceptions are not escaped.
Now I want not eacape "0x7F-0xFF" characters. How to write the exceptions part?
my example code no work.
shellcmd = "bash -c \""+file.get_string(title,"List").escape("0x7F-0xFF")+"\"";
print("shellcmd: %s\n", shellcmd);
Process.spawn_command_line_sync (shellcmd,
out ls_stdout, out ls_stderr, out ls_status);
if(ls_status!=0){ list = ls_stderr.split("\n"); }
else{ list = ls_stdout.split("\n"); }
this works.
shellcmd = "bash -c \""+file.get_string(title,"Check").replace("\"","\\\"")+"\"";
You actually have to put the characters 0x7f to 0xff in the exceptions argument. So something like:
shellcmd = "bash -c \""+file.get_string(title,"List").escape("\x7F\x80\x81\x82…\xfe\xff")+"\"";
You would need to list them all manually.
Looking more generally at your code, you seem to be constructing a command to run. This is a very bad idea and you should never do it. It is wide open to code injection. Use Process.spawn_sync() and pass it an argument vector instead.

Manually calculating JWT signature never outputs the real signature

I've been reading a lot of questions on stackOverflow and jwt's docs.
Right now from what I understand this is what I should do to calculate a token:
header =
{
"alg": "HS256",
"typ": "JWT"
}
payload =
{
"sub": "1234567890",
"name": "JohnDoe",
"iat": 1516239022
}
secret = "test123"
Remove unnecessary spaces and breaklines from header and payload and then encoding both to base64url.
base64urlEncode(header)
// output: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9
base64urlEncode(payload)
// output: eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG5Eb2UiLCJpYXQiOjE1MTYyMzkwMjJ9
Same output as on jwt.io, perfect.
Calculate the sha256 hmac using "test123" as secret.
sha256_hmac("eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG5Eb2UiLCJpYXQiOjE1MTYyMzkwMjJ9", "test123)
// output: 3b59324118bcd59a5435194120c2cfcb7cf295f25a79149b79145696329ffb95
Convert the hash to string and then base64url encode it.
I use hex to string converter for this part, then I encode it using base64urlEncode and I get the following output:
O1kyQRjCvMOVwppUNRlBIMOCw4_Di3zDssKVw7JaeRTCm3kUVsKWMsKfw7vClQ
Output from jwt.io
O1kyQRi81ZpUNRlBIMLPy3zylfJaeRSbeRRWljKf-5U
But if I go to this page From Hex, to Base64 I get the correct output:
O1kyQRi81ZpUNRlBIMLPy3zylfJaeRSbeRRWljKf-5U
So what am I doing wrong? Why converting the hex to string and then Encoding it outputs a different result?
In case the online hex to string conversion is wrong, how can I convert this hex to string (so then I can encode it) on c++ without using any libray. Am I correct if I convert each byte (2 characters because hex = 4 bits) to ASCII character and then encode?
Thanks in advance.
Your hmac step is correct, does have the right output bytes (as commented). The conversion problem you have is caused by non-display chars in the temporary string (the raw bytes were not correctly copied pasted from first webpage to second).
To reproduce the exact output at each stage, you can use these commands below.
In terms of C++, you should try to operate on the raw bytes, rather than on the hex string. Take the raw bytes and run them through a base64 URL-safe encoder. Or, as in the example below, take the raw bytes, run them through a plain base64 encoder, and then fix the generated base64 string to be URL safe.
Construct the header
jwt_header=$(echo -n '{"alg":"HS256","typ":"JWT"}' | base64 | sed s/\+/-/g | sed 's/\//_/g' | sed -E s/=+$//)
# ans: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9
Construct the payload
payload=$(echo -n '{"sub":"1234567890","name":"JohnDoe","iat":1516239022}' | base64 | sed s/\+/-/g |sed 's/\//_/g' | sed -E s/=+$//)
# ans: eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG5Eb2UiLCJpYXQiOjE1MTYyMzkwMjJ9
Raw password
secret="test123"
Convert secret to hex (not base64)
hexsecret=$(echo -n "$secret" | xxd -p | tr -d '\n')
# ans: 74657374313233
Perform hmac, and capture the raw bytes (caution, this is a non printable string)
hmac_signature_rawbytes=$(echo -n "${jwt_header}.${payload}" | openssl dgst -sha256 -mac HMAC -macopt hexkey:$hexsecret -binary)
Dump the raw bytes as hex, for illustration only (matches OP output)
echo -n ${hmac_signature_rawbytes} | xxd -p | tr -d '\n'
#ans: 3b59324118bcd59a5435194120c2cfcb7cf295f25a79149b79145696329ffb95
For JWT signature, convert raw bytes to base64uri encoding
hmac_signature=$(echo -n ${hmac_signature_rawbytes} | base64 | sed s/\+/-/g | sed 's/\//_/g' | sed -E s/=+$//)
#ans: O1kyQRi81ZpUNRlBIMLPy3zylfJaeRSbeRRWljKf-5U
Create the full token
jwt="${jwt_header}.${payload}.${hmac_signature}"
# ans: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG5Eb2UiLCJpYXQiOjE1MTYyMzkwMjJ9.O1kyQRi81ZpUNRlBIMLPy3zylfJaeRSbeRRWljKf-5U

Extracting values from a single file

I have a file with multiple lines; but a specific line contains tons of information, with several repeated expressions. I'm trying to extract some specific values. I first tried some commands with sed, for instance, but with no success. So, I was wondering if you could give me some insights.
So, here you have one fraction of the unique line of the given document I mentioned:
[...]6[&length_range={0.19
[... a lot of more information here in between ...]
0.01},habitat.set.prob={0.01,0.03,0.56,0.01,0.01,0.34,0.01,0.01,0.01},DLOOP.rate_median=0.04131395026396427,length=
[...]
10[&length_range={0.19
[... a lot of more information here in between ...]
0.01},habitat.set.prob={0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61},DLOOP.rate_median=0.04131395026396427,length=
[...]
My aim here is first to extract all the values that is between the brackets, after "habitat.set.prob={". and put them in a single line in a text file.
Also, it would be important to extract the numbers that appears just before the expression "[&length_range=]", which in this case are "6" and "10". They are the label of the set of numbers after "prob={"
So the set of numbers I want to extract always appears between "habitat.set.prob={" and "},DLOOP.rate_median", while the other number (the label) is always rigth before "[&length_range="; but what is before the label is not the same expression; actually it is a random number.
The goal then is end up with a file with the following characteristcs:
6 0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
10 0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
and so on …
What do you think? Is this possible?
I started with this very basic command at least to try to extract the set of numbers, but it didn't work
sed -n "/habitat.set.prob={/,/},DLOOP.rate_median=/ p"
| Well... I got some improvement.
I was able to get the values at least:
awk '{gsub("habitat.set.prob={","\n");printf"%s",$0}' filename | awk -F'},' '{print $1"}"}' | grep -iv "TREE" > stats.txt
|
Many thanks in advance.
Cheers,
Luiz
Something like that:
sed -rn '/.*[0-9]+\[&length_range=\{/,/habitat.set.prob=\{/{s/.*\b([0-9]+)\[&length_range.*/\1/p; s/.*habitat.set.prob=\{([^D]+)\},DLOOP.rate.*/\1/p}' habitat
6
0.01,0.03,0.56,0.01,0.01,0.34,0.01,0.01,0.01
10
0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
The first part '/.a./,/.b./' searches from pattern a to b, distributed over multiple lines. The -n told sed to do non-printing as default.
In '/.a./,/.b./{s/.c./.d./p; s/.e./.f./p}'
there are two substitution commands with p=print in curly braces.
I am not sure if you really digged a little, so not providing the complete answer, but let's hope this would help you:
for the first part: getting the no(which you call as label) you didn't mention if there is any specific pattern, so try this (data is the file which contains the actual input) - you need to work on how to get the number and tweak the RE a bit
sed -n 's/.*\([0-9][0-9]*\).*length_range.*/\1/p' data
For the other part which gives the numericals between habitat and DLOOP:
sed -n 's/.*habitat.set.prob=\(.*\),DLOOP.*/\1/pg' data | tr '{' ' ' | tr '}' ' '
Now, try to take this as a starter and work on your output to get your desired result!
To explain a bit:
In the first section - I am trying to capture the numericals between anything(.*) and (.*)length_range [you can escape the character [ and & by using \ in front of them]
In the second section: I am capturing pattern in between habitat.set.prob and DLOOP and then doin a tr to remove the brackets.
#include <iostream>
using namespace std;
int main()
{
string p = "1:2:3:4"; //input your string
int arr[4] = {}; //create a new empty integer array to put the integers in it
for(int i=0, j=0; i <p.length(); i++){//loop on the string to extract integers
if( p[i] == ':'){continue;}//if the value = ':' skip it and continue
arr[j]=(int)p[i]-48;j++;//put the integer in the array we created
}
cout << "String={"<<arr[0]<<" "<<arr[1]<<" "<<arr[2]<<" "<<arr[3]<<"}";//print the array
return 0;
}

use sed to change a text report to csv

I have a report looks like this:
par_a
.xx
.yy
par_b
.zz
.tt
I wish to convert this format into csv format as below using sed 1 liner:
par_a,.xx
par_a,.yy
par_b,.zz
par_b,.tt
please help.
With awk:
awk '/^par_/{v=$0;next}/^ /{$0=v","$1;print}' File
Or to make it more generic:
awk '/^[^[:blank:]]/{v=$0;next} /^[[:blank:]]/{$0=v","$1;print}' File
When a line starts with par_, save the content to variable v. Now, when a line starts with space, change the line to content of v followed by , followed by the first field.
Output:
AMD$ awk '/^par_/{v=$0}/^ /{$0=v","$1;print}' File
par_a,.xx
par_a,.yy
par_b,.zz
par_b,.tt
With sed:
sed '/^par_/ { h; d; }; G; s/^[[:space:]]*//; s/\(.*\)\n\(.*\)/\2,\1/' filename
This works as follows:
/^par_/ { # if a new paragraph begins
h # remember it
d # but don't print anything yet
}
# otherwise:
G # fetch the remembered paragraph line to the pattern space
s/^[[:space:]]*// # remove leading whitespace
s/\(.*\)\n\(.*\)/\2,\1/ # rearrange to desired CSV format
Depending on your actual input data, you may want to replace the /^par_/ with, say, /^[^[:space:]]/. It just has to be a pattern that recognizes the beginning line of a paragraph.
Addendum: Shorter version that avoids regex repetition when using the space pattern to recognize paragraphs:
sed -r '/^\s+/! { h; d; }; s///; G; s/(.*)\n(.*)/\2,\1/' filename
Or, if you have to use BSD sed (as comes with Mac OS X):
sed '/^[[:space:]]\{1,\}/! { h; d; }; s///; G; s/\(.*\)\n\(.*\)/\2,\1/' filename
The latter should be portable to all seds, but as you can see, writing portable sed involves some pain.

What is the fastest way to autobreak a line of gigabytes separated by keywords using bash shell?

For example, given a line a11b12c22d322 e... the fields of break are the numbers or spaces, we want to transform it into
a
b
c
d
e
...
sed need to read the whole line into memory, for gigabytes a line, it would not be efficient, and the job could not be done if we don't have sufficient memory.
EDIT:
Could anyone please explain how do grep, tr, Awk, perl, and python manipulate the memory in reading a large file? What and how much content do they read into memory once a time?
If you use gawk (which is the default awk on Linux, I believe), you can use the RS parameter to specify that multi-digit numbers or spaces are recognized as line terminators instead of a new-line.
awk '{print}' RS="[[:digit:]]+| +" file.txt
As to your second question, all of these programs will need to read some fixed number of bytes and search for its idea of a line separator in an internal buffer to simulate the appearance of reading a single line at a time. To prevent it from reading too much data while searching for the end of the line, you need to change the programs idea of what terminates a line.
Most languages allow you to do this, but only allow you to specify a single character. gawk makes it easy by allowing you to specify a regular expression to recognize an end-of-line character. This saves you from having to implement the fixed-size buffer and end-of-line search yourself.
Fastest... You can do it with help of gcc, here's a version which reads data from given file name if given, otherwise from stdin. If this is still too slow, you can see if you can make it faster by replacing getchar() and putchar() (which may be macros and should optimize very well) with your own buffering code. If we want to get ridiculous, for even faster, you should have three threads, so kernel can copy next block of data with one core, while another core does processing, and third core copies processed output back to kernel.
#!/bin/bash
set -e
BINNAME=$(mktemp)
gcc -xc -O3 -o $BINNAME - <<"EOF"
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int sep = 0;
/* speed is a requirement, so let's reduce io overhead */
const int bufsize = 1024*1024;
setvbuf(stdin, malloc(bufsize), _IOFBF, bufsize);
setvbuf(stdout, malloc(bufsize), _IOFBF, bufsize);
/* above buffers intentionally not freed, it doesn't really matter here */
int ch;
while((ch = getc(stdin)) >= 0) {
if (isdigit(ch) || isspace(ch)) {
if (!sep) {
if (putc('\n', stdout) == EOF) break;
sep = 1;
}
} else {
sep = 0;
if (putc(ch, stdout) == EOF) break;
}
}
/* flush should happen by on-exit handler, as buffer is not freed,
but this will detect write errors, for program exit code */
fflush(stdout);
return ferror(stdin) || ferror(stdout);
}
EOF
if [ -z "$1" ] ; then
$BINNAME <&0
else
$BINNAME <"$1"
fi
Edit: I happened too look at GNU/Linux stdio.h, some notes: putchar/getchar are not macros, but putc/getc are, so using those instead might be a slight optimization, probably avoiding one function call, changed code to reflect this. Also added checking return code of putc, while at it.
With grep:
$ grep -o '[^0-9 ]' <<< "a11b12c22d322 e"
a
b
c
d
e
With sed:
$ sed 's/[0-9 ]\+/\n/g' <<< "a11b12c22d322 e"
a
b
c
d
e
With awk:
$ awk 'gsub(/[0-9 ]+/,"\n")' <<< "a11b12c22d322 e"
a
b
c
d
e
I'll let you benchmark.
Try with tr:
tr -s '[:digit:][:space:]' '\n' <<< "a11b12c22d322e"
That yields:
a
b
c
d
e