how to find offset of a pattern from binary file (without grep -b) - sh

I want to get a byte offset of a string pattern from a binary file on embedded linux platform.
If I can use "grep -b" option, It would be best way but It is not supported on my machine.
machine does not support
ADDR=`grep -oba <pattern string> <file path> | cut -d ":" -f1`
Here the manual of grep command on the machine.
root# grep --help
BusyBox v1.29.3 () multi-call binary.
Usage: grep \[-HhnlLoqvsriwFE\] \[-m N\] \[-A/B/C N\] PATTERN/-e PATTERN.../-f FILE \[FILE\]...
Search for PATTERN in FILEs (or stdin)
-H Add 'filename:' prefix
-h Do not add 'filename:' prefix
-n Add 'line_no:' prefix
-l Show only names of files that match
-L Show only names of files that don't match
-c Show only count of matching lines
-o Show only the matching part of line
-q Quiet. Return 0 if PATTERN is found, 1 otherwise
-v Select non-matching lines
-s Suppress open and read errors
-r Recurse
-i Ignore case
-w Match whole words only
-x Match whole lines only
-F PATTERN is a literal (not regexp)
-E PATTERN is an extended regexp
-m N Match up to N times per file
-A N Print N lines of trailing context
-B N Print N lines of leading context
-C N Same as '-A N -B N'
-e PTRN Pattern to match
-f FILE Read pattern from file
Since that option isn't available, I'm looking for an alternative.
the combination of hexdump and grep can be also useful
such as
ADDR=`hexdump <file path> -C | grep <pattern string> | cut -d' ' -f1`
But if pattren spans multiple lines, it will not be found.
Is there a way to find the byte offset of a specific pattern with a Linux command?

Set the pattern as the record separator in awk. The offset of the occurrence is the length of the first record. BusyBox awk treats RS as an extended regular expression, so add backslashes before any of .[]\*+?^$ in the pattern string.
<myfile.bin awk -v RS='pattern' '{print length($0); exit}'
If the pattern contains a null byte, you need a little extra work. Use tr to exchange null bytes with some byte value that doesn't appear in the pattern. For example, if the pattern's hex dump is 00002a61:
<myfile.bin tr '\0!' '!\0' | awk -v RS='!!-A' '{print length($0); exit}'
If the pattern is not found, this prints the length of the whole file. So if you aren't sure whether the pattern is present, you need again some extra work. Append some text that can't be part of a pattern match to the file, so that you know that if there's a match, it won't be at the very end of the file. Then, if the pattern is present, the file will contain at least two records. But if the pattern is not present, the file only contains the first record (without a record separator after it).
{ cat myfile.bin; echo garbage; } |
awk -v RS='pattern' '
NR==1 {n = length($0)}
NR==2 {print n; found = 1; exit}
END {exit !found}
'

Something like this?
hexdump -C "$file" |
awk -v pattern="$pattern" 'residue { matched = ($0 ~ "\\|" residue)
if (matched) print $1; residue = ""; if (matched) next }
$0 ~ pattern { print $1 }
{ for(i=length(pattern)-1; i>0; i--)
if ($0 ~ substr(pattern, 1, i) "\\|$") { residue=substr(pattern, i+1); break } }'
The offset is just the first field from the hexdump output; if you need the precise location of the match, this requires some additional massaging to figure out the offset to add to the address, or subtract if it was wrapped.
Briefly tested in a clean-slate Busybox Docker container where hexdump -C output looks like this:
/ # hexdump -C /etc/resolv.conf
00000000 23 20 44 4e 53 20 72 65 71 75 65 73 74 73 20 61 |# DNS requests a|
00000010 72 65 20 66 6f 72 77 61 72 64 65 64 20 74 6f 20 |re forwarded to |
00000020 74 68 65 20 68 6f 73 74 2e 20 44 48 43 50 20 44 |the host. DHCP D|
00000030 4e 53 20 6f 70 74 69 6f 6e 73 20 61 72 65 20 69 |NS options are i|
00000040 67 6e 6f 72 65 64 2e 0a 6e 61 6d 65 73 65 72 76 |gnored..nameserv|
00000050 65 72 20 31 39 32 2e 31 36 38 2e 36 35 2e 35 0a |er 192.168.65.5.|
00000060 20 | |

Related

End-of-Transmission character as an IFS

I have a Bourne shell script which uses End-of-Transmission character as an IFS:
ASCII_EOT=`echo -e '\004'`
while IFS="$ASCII_EOT" read DEST PASSWORD; do
...
done
How does the EOT behave as an IFS? Or what kind of input might the read expect?
It's an ASCII character just like ,; it just isn't printable.
$ printf 'foo\004bar' > tmp.txt
$ hexdump -C tmp.txt
00000000 66 6f 6f 04 62 61 72 0a |foo.bar.|
00000008
$ IFS=$(printf '\004') read f1 f2 < tmp.txt
$ echo "$f1"
foo
$ echo "$f2"
bar

Extra computation steps on adding brackets?

Ok, so just a question popped up in my mind while writing some code.
Does the CPU compute this
gap = (gap*3) + 1;
with the same efficiency as the below expression without the brackets?
gap = gap*3 + 1;
Update to be more clarifying:
Often while writing arithmetic expressions, we tend to add brackets to make the code easier to read, even if the brackets do not change what the expression evaluates to. So the question is, does adding such brackets have any affect on the performance?
As mentioned above in the comments:
file test1.c:
int main(void)
{
int gap = 1;
gap = (gap*3) + 1;
return gap;
}
file test2.c:
int main(void)
{
int gap = 1;
gap = gap*3 + 1;
return gap;
}
Using gcc and its -S option ("Compile only; do not assemble or link"): gcc -S test1.c && gcc -S test2.c. Comparing those two files:
$ diff test1.s test2.s
1c1
< .file "test1.c"
---
> .file "test2.c"
(i.e. only the file names differ in the first line)
When you still don't believe me (for this particular example), then you can further compile & assemble: gcc test1.c -o test1 && gcc test2.c -o test2. Comparing those binary files (or rather ELF executables) will give you one byte difference, i.e. the file name again:
$ hexdump -C test1 > hex1.txt && hexdump -C test2 > hex2.txt
$ diff hex1.txt hex2.txt
387c387
< 00001fc0 79 5f 65 6e 74 72 79 00 74 65 73 74 31 2e 63 00 |y_entry.test1.c.|
---
> 00001fc0 79 5f 65 6e 74 72 79 00 74 65 73 74 32 2e 63 00 |y_entry.test2.c.|

join every 240 lines of a large file consisting of different numbers in cshell script

I have a large file containing 5,000,000 lines and 3 columns, and I want to merge every 240 lines.
I tried using sed in a cshell script for merging 3 lines: 'N;N;s/\n/ /g' filename. but if I want to use it for 240 lines I should write 240 n;n;n;n;n;n....(240times)! what is the best way to solve this problem?
awk to the rescue!
$ awk 'ORS=NR%240?FS:RS' filename
for example
$ seq 10 99 | awk 'ORS=NR%10?FS:RS'
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99
Explanation
ORS=NR%10?FS:RS here the ternary operator sets output record separator if the line number is divisible by 10 to record separator (newline) or if not to field separator (space). Effectively adding a new line after each tenth record and space in between.
Something like this perhaps, which removes the newline from every line, and then prints it followed by a space or a newline as appropriate
perl -ne's/\s*\z//; print $_, eof || $. % 240 == 0 ? "\n" : " "' myfile
If I understand right, the
paste -d, $(printf "%0.s- " {1..240})
does the job. Assumes the field delimiter is ,.
Demo:
produce some test file
seq -f '%g,a,b' 2400 >demo_file
it contains lines like:
1,a,b
2,a,b
3,a,b
4,a,b
...
2398,a,b
2399,a,b
2400,a,b
the command
paste -d, $(printf "%0.s- " {1..240}) < demo_file | head -2
prints:
1,a,b,2,a,b,3,a,b,4,a,b,5,a,b,6,a,b,7,a,b,8,a,b,9,a,b,10,a,b,11,a,b,12,a,b,13,a,b,14,a,b,15,a,b,16,a,b,17,a,b,18,a,b,19,a,b,20,a,b,21,a,b,22,a,b,23,a,b,24,a,b,25,a,b,26,a,b,27,a,b,28,a,b,29,a,b,30,a,b,31,a,b,32,a,b,33,a,b,34,a,b,35,a,b,36,a,b,37,a,b,38,a,b,39,a,b,40,a,b,41,a,b,42,a,b,43,a,b,44,a,b,45,a,b,46,a,b,47,a,b,48,a,b,49,a,b,50,a,b,51,a,b,52,a,b,53,a,b,54,a,b,55,a,b,56,a,b,57,a,b,58,a,b,59,a,b,60,a,b,61,a,b,62,a,b,63,a,b,64,a,b,65,a,b,66,a,b,67,a,b,68,a,b,69,a,b,70,a,b,71,a,b,72,a,b,73,a,b,74,a,b,75,a,b,76,a,b,77,a,b,78,a,b,79,a,b,80,a,b,81,a,b,82,a,b,83,a,b,84,a,b,85,a,b,86,a,b,87,a,b,88,a,b,89,a,b,90,a,b,91,a,b,92,a,b,93,a,b,94,a,b,95,a,b,96,a,b,97,a,b,98,a,b,99,a,b,100,a,b,101,a,b,102,a,b,103,a,b,104,a,b,105,a,b,106,a,b,107,a,b,108,a,b,109,a,b,110,a,b,111,a,b,112,a,b,113,a,b,114,a,b,115,a,b,116,a,b,117,a,b,118,a,b,119,a,b,120,a,b,121,a,b,122,a,b,123,a,b,124,a,b,125,a,b,126,a,b,127,a,b,128,a,b,129,a,b,130,a,b,131,a,b,132,a,b,133,a,b,134,a,b,135,a,b,136,a,b,137,a,b,138,a,b,139,a,b,140,a,b,141,a,b,142,a,b,143,a,b,144,a,b,145,a,b,146,a,b,147,a,b,148,a,b,149,a,b,150,a,b,151,a,b,152,a,b,153,a,b,154,a,b,155,a,b,156,a,b,157,a,b,158,a,b,159,a,b,160,a,b,161,a,b,162,a,b,163,a,b,164,a,b,165,a,b,166,a,b,167,a,b,168,a,b,169,a,b,170,a,b,171,a,b,172,a,b,173,a,b,174,a,b,175,a,b,176,a,b,177,a,b,178,a,b,179,a,b,180,a,b,181,a,b,182,a,b,183,a,b,184,a,b,185,a,b,186,a,b,187,a,b,188,a,b,189,a,b,190,a,b,191,a,b,192,a,b,193,a,b,194,a,b,195,a,b,196,a,b,197,a,b,198,a,b,199,a,b,200,a,b,201,a,b,202,a,b,203,a,b,204,a,b,205,a,b,206,a,b,207,a,b,208,a,b,209,a,b,210,a,b,211,a,b,212,a,b,213,a,b,214,a,b,215,a,b,216,a,b,217,a,b,218,a,b,219,a,b,220,a,b,221,a,b,222,a,b,223,a,b,224,a,b,225,a,b,226,a,b,227,a,b,228,a,b,229,a,b,230,a,b,231,a,b,232,a,b,233,a,b,234,a,b,235,a,b,236,a,b,237,a,b,238,a,b,239,a,b,240,a,b
241,a,b,242,a,b,243,a,b,244,a,b,245,a,b,246,a,b,247,a,b,248,a,b,249,a,b,250,a,b,251,a,b,252,a,b,253,a,b,254,a,b,255,a,b,256,a,b,257,a,b,258,a,b,259,a,b,260,a,b,261,a,b,262,a,b,263,a,b,264,a,b,265,a,b,266,a,b,267,a,b,268,a,b,269,a,b,270,a,b,271,a,b,272,a,b,273,a,b,274,a,b,275,a,b,276,a,b,277,a,b,278,a,b,279,a,b,280,a,b,281,a,b,282,a,b,283,a,b,284,a,b,285,a,b,286,a,b,287,a,b,288,a,b,289,a,b,290,a,b,291,a,b,292,a,b,293,a,b,294,a,b,295,a,b,296,a,b,297,a,b,298,a,b,299,a,b,300,a,b,301,a,b,302,a,b,303,a,b,304,a,b,305,a,b,306,a,b,307,a,b,308,a,b,309,a,b,310,a,b,311,a,b,312,a,b,313,a,b,314,a,b,315,a,b,316,a,b,317,a,b,318,a,b,319,a,b,320,a,b,321,a,b,322,a,b,323,a,b,324,a,b,325,a,b,326,a,b,327,a,b,328,a,b,329,a,b,330,a,b,331,a,b,332,a,b,333,a,b,334,a,b,335,a,b,336,a,b,337,a,b,338,a,b,339,a,b,340,a,b,341,a,b,342,a,b,343,a,b,344,a,b,345,a,b,346,a,b,347,a,b,348,a,b,349,a,b,350,a,b,351,a,b,352,a,b,353,a,b,354,a,b,355,a,b,356,a,b,357,a,b,358,a,b,359,a,b,360,a,b,361,a,b,362,a,b,363,a,b,364,a,b,365,a,b,366,a,b,367,a,b,368,a,b,369,a,b,370,a,b,371,a,b,372,a,b,373,a,b,374,a,b,375,a,b,376,a,b,377,a,b,378,a,b,379,a,b,380,a,b,381,a,b,382,a,b,383,a,b,384,a,b,385,a,b,386,a,b,387,a,b,388,a,b,389,a,b,390,a,b,391,a,b,392,a,b,393,a,b,394,a,b,395,a,b,396,a,b,397,a,b,398,a,b,399,a,b,400,a,b,401,a,b,402,a,b,403,a,b,404,a,b,405,a,b,406,a,b,407,a,b,408,a,b,409,a,b,410,a,b,411,a,b,412,a,b,413,a,b,414,a,b,415,a,b,416,a,b,417,a,b,418,a,b,419,a,b,420,a,b,421,a,b,422,a,b,423,a,b,424,a,b,425,a,b,426,a,b,427,a,b,428,a,b,429,a,b,430,a,b,431,a,b,432,a,b,433,a,b,434,a,b,435,a,b,436,a,b,437,a,b,438,a,b,439,a,b,440,a,b,441,a,b,442,a,b,443,a,b,444,a,b,445,a,b,446,a,b,447,a,b,448,a,b,449,a,b,450,a,b,451,a,b,452,a,b,453,a,b,454,a,b,455,a,b,456,a,b,457,a,b,458,a,b,459,a,b,460,a,b,461,a,b,462,a,b,463,a,b,464,a,b,465,a,b,466,a,b,467,a,b,468,a,b,469,a,b,470,a,b,471,a,b,472,a,b,473,a,b,474,a,b,475,a,b,476,a,b,477,a,b,478,a,b,479,a,b,480,a,b
EDIT: Just noticed the "cshell"... Unfortunately, the above is for bash, use the perl solution. ;)
This might work for you (GNU sed):
sed -r ':a;$!{N;s/[^\n]+/&/240;Ta};s/\n/ /g' file
This keeps appending lines until the pattern space contains 240 lines then replaces all newlines by spaces.
Given a small test file like
a,b,c
d,e,f
g,h,i
j,k,l
m,n,o
p,q,r
s,t,u
v,w,x
y,z
Change the cnt=? argument to fold the number of lines. You should be able to use your target 240 without an issue.
awk -v cnt=3 'BEGIN{i=1}
{
printf $0 ","; i++;
while (i<=cnt){
getline
printf("%s%s", $0 ,(i!=cnt)?",":"")
i++
}
{ i=1
print ""
}
}' file
output
a,b,c,d,e,f,g,h,i
j,k,l,m,n,o,p,q,r
s,t,u,v,w,x,y,z
One slight problem, for some values for cnt there will be extra ,s at the end of the list line, i.e.
awk -v cnt=4 'BEGIN{i=1}
{
printf $0 ","; i++;
while (i<=cnt){
getline
printf("%s%s", $0 ,(i!=cnt)?",":"")
i++
}
{ i=1
print ""
}
}' file
output
a,b,c,d,e,f,g,h,i,j,k,l
m,n,o,p,q,r,s,t,u,v,w,x
y,z,,,
You can clean these up by appending
awk .... file | sed '$/s/,*$//' > outFile
To the tail end of your process.
IHTH

combine one colum from two files into a thrid file

I have two files a.txt and b.txt which contains the following data.
$ cat a.txt
0x5212cb03caa111e0
0x5212cb03caa113c0
0x5212cb03caa115c0
0x5212cb03caa117c0
0x5212cb03caa119e0
0x5212cb03caa11bc0
0x5212cb03caa11dc0
0x5212cb03caa11fc0
0x5212cb03caa121c0
$ cat b.txt
36 65 fb 60 7a 5e
36 65 fb 60 7a 64
36 65 fb 60 7a 6a
36 65 fb 60 7a 70
36 65 fb 60 7a 76
36 65 fb 60 7a 7c
36 65 fb 60 7a 82
36 65 fb 60 7a 88
36 65 fb 60 7a 8e
I want to generate a third file c.txt that contains
0x5212cb03caa111e0 36 65 fb 60 7a 5e
0x5212cb03caa113c0 36 65 fb 60 7a 64
0x5212cb03caa115c0 36 65 fb 60 7a 6a
Can I achieve this using awk? How do I do this?
use paste command:
paste a.txt b.txt
paste is really the shortest solution, however if you're looking for awk solution as stated in question then:
awk 'FNR==NR{a[++i]=$0;next} {print a[FNR] "\t" $0}' a.txt b.txt
Here is an awk solution that only stores two lines in memory at a time:
awk '{ getline b < "b.txt"; print $0, b }' OFS='\t' a.txt
Lines from a.txt are implicitly stored in $0 and for each line in a.txt a line is read from b.txt by getline.

How can I convert a 48 hex string to bytes using Perl?

I have a hex string (length 48 chars) that I want to convert to raw bytes with the pack function in order to put it in a Win32 vector of bytes.
How I can do this with Perl?
my $bytes = pack "H*", $hex;
See perlpacktut for more information.
The steps are:
Extract pairs of hexadecimal characters from the string.
Convert each pair to a decimal number.
Pack the number as a byte.
For example:
use strict;
use warnings;
my $string = 'AA55FF0102040810204080';
my #hex = ($string =~ /(..)/g);
my #dec = map { hex($_) } #hex;
my #bytes = map { pack('C', $_) } #dec;
Or, expressed more compactly:
use strict;
use warnings;
my $string = 'AA55FF0102040810204080';
my #bytes = map { pack('C', hex($_)) } ($string =~ /(..)/g);
I have the string:
"61 62 63 64 65 67 69 69 6a"
which I want to interpret as hex values, and display those as ASCII chars (those values should reproduce the character string "abcdefghij").
Typically, I try to write something quick like this:
$ echo "61 62 63 64 65 67 69 69 6a" | perl -ne 'print "$_"; print pack("H2 "x10, $_)."\n";'
61 62 63 64 65 67 69 69 6a
a
... and then I wonder, why do I get only one character back :)
First, let me note down that the string I have, can also be represented as the hex values of bytes that it takes up in memory:
$ echo -n "61 62 63 64 65 67 68 69 6a" | hexdump -C
00000000 36 31 20 36 32 20 36 33 20 36 34 20 36 35 20 36 |61 62 63 64 65 6|
00000010 37 20 36 38 20 36 39 20 36 61 |7 68 69 6a|
0000001a
_(NB: Essentially, I want to "convert" the above byte values in memory as input, to these below ones, if viewed by hexdump:
$ echo -n "abcdefghij" | hexdump -C
00000000 61 62 63 64 65 66 67 68 69 6a |abcdefghij|
0000000a
... which is how the original values for the input hex string were obtained.
)_
Well, this Pack/Unpack Tutorial (AKA How the System Stores Data) turns out is the most helpful for me, as it mentions:
The pack function accepts a template string and a list of values [...]
$rec = pack( "l i Z32 s2", time, $emp_id, $item, $quan, $urgent);
It returns a scalar containing the list of values stored according to the formats specified in the template [...]
$rec would contain the following (first line in decimal, second in hex, third as characters where applicable). Pipe characters indicate field boundaries.
Offset Contents (increasing addresses left to right)
0 160 44 19 62| 41 82 3 0| 98 111 120 101 115 32 111 102
A0 2C 13 3E| 29 52 03 00| 62 6f 78 65 73 20 6f 66
| b o x e s o f
That is, in my case, $_ is a single string variable -- whereas pack expects as input a list of several such 'single' variables (in addition to a formatting template string); and outputs again a 'single' variable (which could, however, be a sizeable chunk of memory!). In my case, if the output 'single' variable contains the ASCII code in each byte in memory, then I'm all set (I could simply print the output variable directly, then).
Thus, in order to get a list of variables from the $_ string, I can simply split it at the space sign - however, note:
$ echo "61 62 63 64 65 67 68 69 6a" | perl -ne 'print "$_"; print pack("H2", split(/ /, $_))."\n";'
61 62 63 64 65 67 68 69 6a
a
... that amount of elements to be packed must be specified (otherwise again we get only one character back); then, either of these alternatives work:
$ echo "61 62 63 64 65 67 68 69 6a" | perl -ne 'print "$_"; print pack("H2"x10, split(/ /, $_))."\n";'
61 62 63 64 65 67 68 69 6a
abcdeghij
$ echo "61 62 63 64 65 67 68 69 6a" | perl -ne 'print "$_"; print pack("(H2)*", split(/ /, $_))."\n";'
61 62 63 64 65 67 68 69 6a
abcdeghij