Modifying an input string using sed - sed

Using sed, how can I change an input string like 9872 to 39 38 37 32 i.e. insert digit 3 before and a space after each digit of entered string 9872.
Input string:
Required output:
39 38 37 32

echo 9872 | sed 's/./3&\ /g'

And just for completeness, a more general way using regex references.
echo 9872 | sed -r 's/([[:digit:]])/3\1 /g'

$ echo "9872" | sed 's/[0-9]/3& /g'
39 38 37 32


how to find offset of a pattern from binary file (without grep -b)

I want to get a byte offset of a string pattern from a binary file on embedded linux platform.
If I can use "grep -b" option, It would be best way but It is not supported on my machine.
machine does not support
ADDR=`grep -oba <pattern string> <file path> | cut -d ":" -f1`
Here the manual of grep command on the machine.
root# grep --help
BusyBox v1.29.3 () multi-call binary.
Usage: grep \[-HhnlLoqvsriwFE\] \[-m N\] \[-A/B/C N\] PATTERN/-e PATTERN.../-f FILE \[FILE\]...
Search for PATTERN in FILEs (or stdin)
-H Add 'filename:' prefix
-h Do not add 'filename:' prefix
-n Add 'line_no:' prefix
-l Show only names of files that match
-L Show only names of files that don't match
-c Show only count of matching lines
-o Show only the matching part of line
-q Quiet. Return 0 if PATTERN is found, 1 otherwise
-v Select non-matching lines
-s Suppress open and read errors
-r Recurse
-i Ignore case
-w Match whole words only
-x Match whole lines only
-F PATTERN is a literal (not regexp)
-E PATTERN is an extended regexp
-m N Match up to N times per file
-A N Print N lines of trailing context
-B N Print N lines of leading context
-C N Same as '-A N -B N'
-e PTRN Pattern to match
-f FILE Read pattern from file
Since that option isn't available, I'm looking for an alternative.
the combination of hexdump and grep can be also useful
such as
ADDR=`hexdump <file path> -C | grep <pattern string> | cut -d' ' -f1`
But if pattren spans multiple lines, it will not be found.
Is there a way to find the byte offset of a specific pattern with a Linux command?
Set the pattern as the record separator in awk. The offset of the occurrence is the length of the first record. BusyBox awk treats RS as an extended regular expression, so add backslashes before any of .[]\*+?^$ in the pattern string.
<myfile.bin awk -v RS='pattern' '{print length($0); exit}'
If the pattern contains a null byte, you need a little extra work. Use tr to exchange null bytes with some byte value that doesn't appear in the pattern. For example, if the pattern's hex dump is 00002a61:
<myfile.bin tr '\0!' '!\0' | awk -v RS='!!-A' '{print length($0); exit}'
If the pattern is not found, this prints the length of the whole file. So if you aren't sure whether the pattern is present, you need again some extra work. Append some text that can't be part of a pattern match to the file, so that you know that if there's a match, it won't be at the very end of the file. Then, if the pattern is present, the file will contain at least two records. But if the pattern is not present, the file only contains the first record (without a record separator after it).
{ cat myfile.bin; echo garbage; } |
awk -v RS='pattern' '
NR==1 {n = length($0)}
NR==2 {print n; found = 1; exit}
END {exit !found}
Something like this?
hexdump -C "$file" |
awk -v pattern="$pattern" 'residue { matched = ($0 ~ "\\|" residue)
if (matched) print $1; residue = ""; if (matched) next }
$0 ~ pattern { print $1 }
{ for(i=length(pattern)-1; i>0; i--)
if ($0 ~ substr(pattern, 1, i) "\\|$") { residue=substr(pattern, i+1); break } }'
The offset is just the first field from the hexdump output; if you need the precise location of the match, this requires some additional massaging to figure out the offset to add to the address, or subtract if it was wrapped.
Briefly tested in a clean-slate Busybox Docker container where hexdump -C output looks like this:
/ # hexdump -C /etc/resolv.conf
00000000 23 20 44 4e 53 20 72 65 71 75 65 73 74 73 20 61 |# DNS requests a|
00000010 72 65 20 66 6f 72 77 61 72 64 65 64 20 74 6f 20 |re forwarded to |
00000020 74 68 65 20 68 6f 73 74 2e 20 44 48 43 50 20 44 |the host. DHCP D|
00000030 4e 53 20 6f 70 74 69 6f 6e 73 20 61 72 65 20 69 |NS options are i|
00000040 67 6e 6f 72 65 64 2e 0a 6e 61 6d 65 73 65 72 76 |gnored..nameserv|
00000050 65 72 20 31 39 32 2e 31 36 38 2e 36 35 2e 35 0a |er|
00000060 20 | |

sed: delete n lines after first match

I want to delete N number of lines after the first match in a text file using sed.
(I know most of these questions have been answered with "use awk", but I want to use sed, regardless of how much more powerful it is than awk. It's more a matter of which tool I'm most comfortable with using at the moment, within a certain time constraint)
The furthest I got is this:
sed -i "0,/pattern/{/pattern/,+Nd}" file.txt
The thought is that 0, denotes the first occurrence, where the curly brackets search the first line for the pattern, and deletes N lines after that occurence
sed '/pattern/{N;N;N;N;N;N;N;d;}' file.txt
The 0, construct and the relative line number addressing you tried to use are specific to GNU sed. Portable sed does not have these facilities.
This will remove the next six lines after every match. If you only want to remove the first occurrence and leave the rest of the file unchanged, maybe add a separate loop to simply print all remaining lines.
The problem with your attempt is that 0,/pattern/ restricts matching to the lines up through the first occurrence of /pattern/ but then that's the end of the range, so anything selected by this expression cannot operate on lines outside of that range.
Assuming your shell is bash (the question originally had a bash tag):
sed -f <(printf -v nsp '%*s' $n; printf '/%s/{x;/./!{s/^/./;h;%sd;};x;}\n' 'pattern' "${nsp// /N;}") file
Note that n is variable (3 is just an instance) and constructed sed script is not GNU specific.
This might work for you (GNU sed):
sed '0,/pattern/{//{:a;N;s/\n/&/N;Ta;d}}' file
Deletes the line containing pattern and then N lines after it once only.
sed '/pattern/{x;//{x;b};x;h;:a;N;s/\n/&/N;Ta;d}' file
N.B. The N following the substitution command refers to the nth occurrence of a newline in the pattern space.
UPDATE 1 : Example where sed solution above does not meet objective universally:
echo "\n input \${b} :: \n\n———————\n" \
"${b}\n--------------\n\n sed " \
"commands :: \n\n--------------\n " \
"${cmd}\n--------------\n\n GNU sed "\
"::\n\n$( gsed "${cmd}" <<< "${b}" )" \
"\n\n BSD sed ::\n\n$( sed "${cmd}" <<< "${b}" )\n\n"
input ${b} ::
84 77138=48001=P
85 77138=48035=P
86 77138=78118=P
87 77138=79248=P
sed commands ::
GNU sed ::
84 77138=48001=P
85 77138=48035=P
86 77138=78118=P
87 77138=79248=P
BSD sed ::
84 77138=48001=P
For unknown reasons, when the input lacks sufficient rows past the pattern,
this solution works on BSD sed,
but totally fails on GNU sed.
Is sed a must have requirement ? You can also do one-liners with awk :
(it's intentionally verbose to showcase exactly what the lines matched and skipped look like) :
# gawk profile, created Thu Apr 28 18:36:55 2022
# BEGIN rule(s)
1 printf "\n\t N :: %.f :: FS i.e. "\
"pattern :: %*s\n\n", N = +N, ++__, FS = pattern
# Rule(s)
87 NF *= -(_+=(_= __<NF ? -__-N :_)^!__)<+_ { # 45
45 print
1 77138=501=A
2 77138=3413=A
3 77138=3414=A
4 77138=8624=A
5 77138=19572=A
6 77138=22220=A
7 77138=23670=A
8 77138=25413=A
9 77138=26351=A
10 77138=27340=A
11 77138=29288=A
12 77138=121060=A
13 77138=123028=A
14 77138=132081=A
15 77138=135789=A
16 77138=154341=A
17 77138=155876=A
18 77138=170871=A
19 77138=178562=A
skipped :: 20 77138=185367=A
skipped :: 21 77138=196718=A
skipped :: 22 77138=196985=A
skipped :: 23 77138=200012=A
skipped :: 24 77138=207162=A
skipped :: 25 77138=228289=A
skipped :: 26 77138=244747=A
skipped :: 27 77138=284795=A
skipped :: 28 77138=294579=A
skipped :: 29 77138=299765=A
skipped :: 30 77138=317856=A
skipped :: 31 77138=318815=A
32 77138=324570=A
33 77138=408049=A
34 77138=514403=A
35 77138=1647865=A
36 77138=1738771=A
37 77138=3217183=A
skipped :: 38 77138=3222837=A
skipped :: 39 77138=3235292=A
skipped :: 40 77138=14957980=I
skipped :: 41 77138=1159=M
skipped :: 42 77138=1196=M
skipped :: 43 77138=1251=M
44 77138=1252=M
45 77138=4951=M
46 77138=16740=M
47 77138=71501=M
skipped :: 48 77138=137=P
skipped :: 49 77138=348=P
skipped :: 50 77138=518=P
skipped :: 51 77138=519=P
skipped :: 52 77138=520=P
skipped :: 53 77138=925=P
54 77138=1363=P
55 77138=1483=P
56 77138=1814=P
57 77138=2692=P
58 77138=3540=P
59 77138=3594=P
60 77138=3682=P
61 77138=3869=P
62 77138=3940=P
skipped :: 63 77138=3977=P
skipped :: 64 77138=4025=P
skipped :: 65 77138=4252=P
skipped :: 66 77138=4396=P
skipped :: 67 77138=9501=P
skipped :: 68 77138=13006=P
69 77138=18113=P
skipped :: 70 77138=20907=P
skipped :: 71 77138=31936=P
skipped :: 72 77138=34954=P
skipped :: 73 77138=37126=P
skipped :: 74 77138=37482=P
skipped :: 75 77138=40135=P
76 77138=40206=P
77 77138=41279=P
78 77138=41280=P
79 77138=46140=P
skipped :: 80 77138=46157=P
skipped :: 81 77138=46173=P
skipped :: 82 77138=46218=P
skipped :: 83 77138=47592=P
skipped :: 84 77138=48001=P
skipped :: 85 77138=48035=P
86 77138=78118=P
87 77138=79248=P
N :: 5 :: FS i.e. pattern :: [7]=[AP]$
1 77138=501=A
2 77138=3413=A
3 77138=3414=A
4 77138=8624=A
5 77138=19572=A
6 77138=22220=A
7 77138=23670=A
8 77138=25413=A
9 77138=26351=A
10 77138=27340=A
11 77138=29288=A
12 77138=121060=A
13 77138=123028=A
14 77138=132081=A
15 77138=135789=A
16 77138=154341=A
17 77138=155876=A
18 77138=170871=A
19 77138=178562=A
32 77138=324570=A
33 77138=408049=A
34 77138=514403=A
35 77138=1647865=A
36 77138=1738771=A
37 77138=3217183=A
44 77138=1252=M
45 77138=4951=M
46 77138=16740=M
47 77138=71501=M
54 77138=1363=P
55 77138=1483=P
56 77138=1814=P
57 77138=2692=P
58 77138=3540=P
59 77138=3594=P
60 77138=3682=P
61 77138=3869=P
62 77138=3940=P
69 77138=18113=P
76 77138=40206=P
77 77138=41279=P
78 77138=41280=P
79 77138=46140=P
86 77138=78118=P
87 77138=79248=P
more concisely, it would be
mawk -v pattern='[7]=[AP]$' -v N='5' -- '
FS = pattern
} NF *= -(_+=(_=__<NF?-__-N:_)^!__) < +_'
or in awk one-liner style
mawk 'NF*=-(_+=(_=1<NF?-1-N:_)^0)<+_' FS='[7]=[AP]$' N=5

End-of-Transmission character as an IFS

I have a Bourne shell script which uses End-of-Transmission character as an IFS:
ASCII_EOT=`echo -e '\004'`
How does the EOT behave as an IFS? Or what kind of input might the read expect?
It's an ASCII character just like ,; it just isn't printable.
$ printf 'foo\004bar' > tmp.txt
$ hexdump -C tmp.txt
00000000 66 6f 6f 04 62 61 72 0a ||
$ IFS=$(printf '\004') read f1 f2 < tmp.txt
$ echo "$f1"
$ echo "$f2"

join every 240 lines of a large file consisting of different numbers in cshell script

I have a large file containing 5,000,000 lines and 3 columns, and I want to merge every 240 lines.
I tried using sed in a cshell script for merging 3 lines: 'N;N;s/\n/ /g' filename. but if I want to use it for 240 lines I should write 240 n;n;n;n;n;n....(240times)! what is the best way to solve this problem?
awk to the rescue!
$ awk 'ORS=NR%240?FS:RS' filename
for example
$ seq 10 99 | awk 'ORS=NR%10?FS:RS'
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99
ORS=NR%10?FS:RS here the ternary operator sets output record separator if the line number is divisible by 10 to record separator (newline) or if not to field separator (space). Effectively adding a new line after each tenth record and space in between.
Something like this perhaps, which removes the newline from every line, and then prints it followed by a space or a newline as appropriate
perl -ne's/\s*\z//; print $_, eof || $. % 240 == 0 ? "\n" : " "' myfile
If I understand right, the
paste -d, $(printf "%0.s- " {1..240})
does the job. Assumes the field delimiter is ,.
produce some test file
seq -f '%g,a,b' 2400 >demo_file
it contains lines like:
the command
paste -d, $(printf "%0.s- " {1..240}) < demo_file | head -2
EDIT: Just noticed the "cshell"... Unfortunately, the above is for bash, use the perl solution. ;)
This might work for you (GNU sed):
sed -r ':a;$!{N;s/[^\n]+/&/240;Ta};s/\n/ /g' file
This keeps appending lines until the pattern space contains 240 lines then replaces all newlines by spaces.
Given a small test file like
Change the cnt=? argument to fold the number of lines. You should be able to use your target 240 without an issue.
awk -v cnt=3 'BEGIN{i=1}
printf $0 ","; i++;
while (i<=cnt){
printf("%s%s", $0 ,(i!=cnt)?",":"")
{ i=1
print ""
}' file
One slight problem, for some values for cnt there will be extra ,s at the end of the list line, i.e.
awk -v cnt=4 'BEGIN{i=1}
printf $0 ","; i++;
while (i<=cnt){
printf("%s%s", $0 ,(i!=cnt)?",":"")
{ i=1
print ""
}' file
You can clean these up by appending
awk .... file | sed '$/s/,*$//' > outFile
To the tail end of your process.

Append the end of one line to the start of the next with sed

I'm looking for sed only solutions for the following:
Corrupted Input:
A 123 dgbsdgsbg
A 345 gsgsdgdgs A 23
2 afaffaaf
A 324 fsgdggsdg A 345 avsa
Expected output:
A 123 dgbsdgsbg
A 345 gsgsdgdgs
A 232 afaffaaf
A 324 fsgdggsdg
A 345 avsafasf
How can the trailing A [0-9].* be appended to the start of the next line. So far I have:
$ sed -r 's/ (A [0-9]+.*)/\n\1/' file
A 123 dgbsdgsbg
A 345 gsgsdgdgs
A 23
2 afaffaaf
A 324 fsgdggsdg
A 345 avsa
This might work for you (GNU sed):
sed -r '$!N;s/ (A[^\n]*)\n/\n\1/;P;D' file
Did you try:
sed -e :a -e '$!N;s/\([0-9]\)\n\([0-9]\)/\1\2/;ta' -e 'P;D'
$ cat input
abc 123
ghi 123
jkl 456
$ sed -e :a -e '$!N;s/\([0-9]\)\n\([0-9]\)/\1\2/;ta' -e 'P;D' input
abc 123456
ghi 123
jkl 456789
EDIT: You modified the example in the question later. For your modified input, try:
$ sed -e 's/ \(A .*\)/\n\1/' -e :a -e '$!N;s/\n\([^A]\)/\1/;ta' -e 'P;D' newinput
A 123 dgbsdgsbg
A 345 gsgsdgdgs
A 232 afaffaaf
A 324 fsgdggsdg
A 345 avsafasf
This can be an option:
$ sed -r ':a;$!N;s/ (A [0-9]+.*)\n(.*)/\n\1\2/;ta;P;D' file
A 123 dgbsdgsbg
A 345 gsgsdgdgs
A 232 afaffaaf
A 324 fsgdggsdg
A 345 avsafasf
It is an adaption of the last example from How to match newlines in sed:
sed ':begin;$!N;s/FOO\nBAR/FOOBAR/;tbegin;P;D'
# if a line ends in FOO and the next starts with BAR, join them