sed: delete n lines after first match - sed

I want to delete N number of lines after the first match in a text file using sed.
(I know most of these questions have been answered with "use awk", but I want to use sed, regardless of how much more powerful it is than awk. It's more a matter of which tool I'm most comfortable with using at the moment, within a certain time constraint)
The furthest I got is this:
sed -i "0,/pattern/{/pattern/,+Nd}" file.txt
The thought is that 0, denotes the first occurrence, where the curly brackets search the first line for the pattern, and deletes N lines after that occurence

Try
sed '/pattern/{N;N;N;N;N;N;N;d;}' file.txt
The 0, construct and the relative line number addressing you tried to use are specific to GNU sed. Portable sed does not have these facilities.
This will remove the next six lines after every match. If you only want to remove the first occurrence and leave the rest of the file unchanged, maybe add a separate loop to simply print all remaining lines.
The problem with your attempt is that 0,/pattern/ restricts matching to the lines up through the first occurrence of /pattern/ but then that's the end of the range, so anything selected by this expression cannot operate on lines outside of that range.

Assuming your shell is bash (the question originally had a bash tag):
n=3
sed -f <(printf -v nsp '%*s' $n; printf '/%s/{x;/./!{s/^/./;h;%sd;};x;}\n' 'pattern' "${nsp// /N;}") file
Note that n is variable (3 is just an instance) and constructed sed script is not GNU specific.

This might work for you (GNU sed):
sed '0,/pattern/{//{:a;N;s/\n/&/N;Ta;d}}' file
Deletes the line containing pattern and then N lines after it once only.
Alternative:
sed '/pattern/{x;//{x;b};x;h;:a;N;s/\n/&/N;Ta;d}' file
N.B. The N following the substitution command refers to the nth occurrence of a newline in the pattern space.

UPDATE 1 : Example where sed solution above does not meet objective universally:
cmd='/5=P$/{N;N;N;N;N;N;d;}'
echo "\n input \${b} :: \n\n———————\n" \
"${b}\n--------------\n\n sed " \
"commands :: \n\n--------------\n " \
"${cmd}\n--------------\n\n GNU sed "\
"::\n\n$( gsed "${cmd}" <<< "${b}" )" \
"\n\n BSD sed ::\n\n$( sed "${cmd}" <<< "${b}" )\n\n"
input ${b} ::
--------------
84 77138=48001=P
85 77138=48035=P
86 77138=78118=P
87 77138=79248=P
--------------
sed commands ::
--------------
/5=P$/{N;N;N;N;N;N;d;}
--------------
GNU sed ::
84 77138=48001=P
85 77138=48035=P
86 77138=78118=P
87 77138=79248=P
BSD sed ::
84 77138=48001=P
For unknown reasons, when the input lacks sufficient rows past the pattern,
this solution works on BSD sed,
but totally fails on GNU sed.
============================
Is sed a must have requirement ? You can also do one-liners with awk :
(it's intentionally verbose to showcase exactly what the lines matched and skipped look like) :
# gawk profile, created Thu Apr 28 18:36:55 2022
# BEGIN rule(s)
BEGIN {
1 printf "\n\t N :: %.f :: FS i.e. "\
"pattern :: %*s\n\n", N = +N, ++__, FS = pattern
}
# Rule(s)
87 NF *= -(_+=(_= __<NF ? -__-N :_)^!__)<+_ { # 45
45 print
}
1 77138=501=A
2 77138=3413=A
3 77138=3414=A
4 77138=8624=A
5 77138=19572=A
6 77138=22220=A
7 77138=23670=A
8 77138=25413=A
9 77138=26351=A
10 77138=27340=A
11 77138=29288=A
12 77138=121060=A
13 77138=123028=A
14 77138=132081=A
15 77138=135789=A
16 77138=154341=A
17 77138=155876=A
18 77138=170871=A
19 77138=178562=A
skipped :: 20 77138=185367=A
skipped :: 21 77138=196718=A
skipped :: 22 77138=196985=A
skipped :: 23 77138=200012=A
skipped :: 24 77138=207162=A
skipped :: 25 77138=228289=A
skipped :: 26 77138=244747=A
skipped :: 27 77138=284795=A
skipped :: 28 77138=294579=A
skipped :: 29 77138=299765=A
skipped :: 30 77138=317856=A
skipped :: 31 77138=318815=A
32 77138=324570=A
33 77138=408049=A
34 77138=514403=A
35 77138=1647865=A
36 77138=1738771=A
37 77138=3217183=A
skipped :: 38 77138=3222837=A
skipped :: 39 77138=3235292=A
skipped :: 40 77138=14957980=I
skipped :: 41 77138=1159=M
skipped :: 42 77138=1196=M
skipped :: 43 77138=1251=M
44 77138=1252=M
45 77138=4951=M
46 77138=16740=M
47 77138=71501=M
skipped :: 48 77138=137=P
skipped :: 49 77138=348=P
skipped :: 50 77138=518=P
skipped :: 51 77138=519=P
skipped :: 52 77138=520=P
skipped :: 53 77138=925=P
54 77138=1363=P
55 77138=1483=P
56 77138=1814=P
57 77138=2692=P
58 77138=3540=P
59 77138=3594=P
60 77138=3682=P
61 77138=3869=P
62 77138=3940=P
skipped :: 63 77138=3977=P
skipped :: 64 77138=4025=P
skipped :: 65 77138=4252=P
skipped :: 66 77138=4396=P
skipped :: 67 77138=9501=P
skipped :: 68 77138=13006=P
69 77138=18113=P
skipped :: 70 77138=20907=P
skipped :: 71 77138=31936=P
skipped :: 72 77138=34954=P
skipped :: 73 77138=37126=P
skipped :: 74 77138=37482=P
skipped :: 75 77138=40135=P
76 77138=40206=P
77 77138=41279=P
78 77138=41280=P
79 77138=46140=P
skipped :: 80 77138=46157=P
skipped :: 81 77138=46173=P
skipped :: 82 77138=46218=P
skipped :: 83 77138=47592=P
skipped :: 84 77138=48001=P
skipped :: 85 77138=48035=P
86 77138=78118=P
87 77138=79248=P
N :: 5 :: FS i.e. pattern :: [7]=[AP]$
1 77138=501=A
2 77138=3413=A
3 77138=3414=A
4 77138=8624=A
5 77138=19572=A
6 77138=22220=A
7 77138=23670=A
8 77138=25413=A
9 77138=26351=A
10 77138=27340=A
11 77138=29288=A
12 77138=121060=A
13 77138=123028=A
14 77138=132081=A
15 77138=135789=A
16 77138=154341=A
17 77138=155876=A
18 77138=170871=A
19 77138=178562=A
32 77138=324570=A
33 77138=408049=A
34 77138=514403=A
35 77138=1647865=A
36 77138=1738771=A
37 77138=3217183=A
44 77138=1252=M
45 77138=4951=M
46 77138=16740=M
47 77138=71501=M
54 77138=1363=P
55 77138=1483=P
56 77138=1814=P
57 77138=2692=P
58 77138=3540=P
59 77138=3594=P
60 77138=3682=P
61 77138=3869=P
62 77138=3940=P
69 77138=18113=P
76 77138=40206=P
77 77138=41279=P
78 77138=41280=P
79 77138=46140=P
86 77138=78118=P
87 77138=79248=P
more concisely, it would be
mawk -v pattern='[7]=[AP]$' -v N='5' -- '
BEGIN {
++__
FS = pattern
} NF *= -(_+=(_=__<NF?-__-N:_)^!__) < +_'
or in awk one-liner style
mawk 'NF*=-(_+=(_=1<NF?-1-N:_)^0)<+_' FS='[7]=[AP]$' N=5

Related

how to find offset of a pattern from binary file (without grep -b)

I want to get a byte offset of a string pattern from a binary file on embedded linux platform.
If I can use "grep -b" option, It would be best way but It is not supported on my machine.
machine does not support
ADDR=`grep -oba <pattern string> <file path> | cut -d ":" -f1`
Here the manual of grep command on the machine.
root# grep --help
BusyBox v1.29.3 () multi-call binary.
Usage: grep \[-HhnlLoqvsriwFE\] \[-m N\] \[-A/B/C N\] PATTERN/-e PATTERN.../-f FILE \[FILE\]...
Search for PATTERN in FILEs (or stdin)
-H Add 'filename:' prefix
-h Do not add 'filename:' prefix
-n Add 'line_no:' prefix
-l Show only names of files that match
-L Show only names of files that don't match
-c Show only count of matching lines
-o Show only the matching part of line
-q Quiet. Return 0 if PATTERN is found, 1 otherwise
-v Select non-matching lines
-s Suppress open and read errors
-r Recurse
-i Ignore case
-w Match whole words only
-x Match whole lines only
-F PATTERN is a literal (not regexp)
-E PATTERN is an extended regexp
-m N Match up to N times per file
-A N Print N lines of trailing context
-B N Print N lines of leading context
-C N Same as '-A N -B N'
-e PTRN Pattern to match
-f FILE Read pattern from file
Since that option isn't available, I'm looking for an alternative.
the combination of hexdump and grep can be also useful
such as
ADDR=`hexdump <file path> -C | grep <pattern string> | cut -d' ' -f1`
But if pattren spans multiple lines, it will not be found.
Is there a way to find the byte offset of a specific pattern with a Linux command?
Set the pattern as the record separator in awk. The offset of the occurrence is the length of the first record. BusyBox awk treats RS as an extended regular expression, so add backslashes before any of .[]\*+?^$ in the pattern string.
<myfile.bin awk -v RS='pattern' '{print length($0); exit}'
If the pattern contains a null byte, you need a little extra work. Use tr to exchange null bytes with some byte value that doesn't appear in the pattern. For example, if the pattern's hex dump is 00002a61:
<myfile.bin tr '\0!' '!\0' | awk -v RS='!!-A' '{print length($0); exit}'
If the pattern is not found, this prints the length of the whole file. So if you aren't sure whether the pattern is present, you need again some extra work. Append some text that can't be part of a pattern match to the file, so that you know that if there's a match, it won't be at the very end of the file. Then, if the pattern is present, the file will contain at least two records. But if the pattern is not present, the file only contains the first record (without a record separator after it).
{ cat myfile.bin; echo garbage; } |
awk -v RS='pattern' '
NR==1 {n = length($0)}
NR==2 {print n; found = 1; exit}
END {exit !found}
'
Something like this?
hexdump -C "$file" |
awk -v pattern="$pattern" 'residue { matched = ($0 ~ "\\|" residue)
if (matched) print $1; residue = ""; if (matched) next }
$0 ~ pattern { print $1 }
{ for(i=length(pattern)-1; i>0; i--)
if ($0 ~ substr(pattern, 1, i) "\\|$") { residue=substr(pattern, i+1); break } }'
The offset is just the first field from the hexdump output; if you need the precise location of the match, this requires some additional massaging to figure out the offset to add to the address, or subtract if it was wrapped.
Briefly tested in a clean-slate Busybox Docker container where hexdump -C output looks like this:
/ # hexdump -C /etc/resolv.conf
00000000 23 20 44 4e 53 20 72 65 71 75 65 73 74 73 20 61 |# DNS requests a|
00000010 72 65 20 66 6f 72 77 61 72 64 65 64 20 74 6f 20 |re forwarded to |
00000020 74 68 65 20 68 6f 73 74 2e 20 44 48 43 50 20 44 |the host. DHCP D|
00000030 4e 53 20 6f 70 74 69 6f 6e 73 20 61 72 65 20 69 |NS options are i|
00000040 67 6e 6f 72 65 64 2e 0a 6e 61 6d 65 73 65 72 76 |gnored..nameserv|
00000050 65 72 20 31 39 32 2e 31 36 38 2e 36 35 2e 35 0a |er 192.168.65.5.|
00000060 20 | |

Line break other than \n in MATLAB

The following b variable seems to have a last character corresponding to a line break but it is not the classical '\n' character. This is very embarrassing for print using this variable:
K>> b
b =
toto_titi
K>> fprintf('This strange variable (%s) is very strange !!!\n', b);
This strange variable (toto_titi
) is very strange !!!
Indeed, the needed print would be:
This strange variable (toto_titi) is very strange !!!
Below, few observed facts with this variable:
K>> class(b)
ans =
char
K>> [b 'end']
ans =
toto_titi
end
K>> b(end)
ans =
K>> regexp(b,'_', 'split')
ans =
'toto' 'titi…'
I confess that I do not exactely understand for the three dots in 'titi…' but I suppose that here is a part of the explanation !!!
A quick and dirty method will be to remove the last invisible character that cause the line break but if in other application the last caharacter in the b variable is a "real" character we will lost this last character (ex. toto_tit in place of toto_titi).
A MATLAB equivalent of the python strip() exist ?
What is this invisible character in the b variable ?
EDIT 1:
A good idea, from rahnema1, is to determine what is the ASCII code for this invisible character:
K>> double(b)
ans =
116 111 116 111 95 116 105 116 105 13
K>> for i = 1 : 60
str = [num2str(i) ' ' char(i) ' '...
num2str(i+32) ' ' char(i+32) ' '...
num2str(i+64) ' ' char(i+64)];
disp(str)
end
1 33 ! 65 A
2 34 " 66 B
3 35 # 67 C
4 36 $ 68 D
5 37 % 69 E
6 38 & 70 F
7 39 ' 71 G
8 40 ( 72 H
9 41 ) 73 I
10
42 * 74 J
11 43 + 75 K
12 44 , 76 L
13
45 - 77 M
14 46 . 78 N
15 47 / 79 O
16 48 0 80 P
17 49 1 81 Q
18 50 2 82 R
19 51 3 83 S
20 52 4 84 T
21 53 5 85 U
22 54 6 86 V
23 55 7 87 W
24 56 8 88 X
25 57 9 89 Y
26 58 : 90 Z
27 59 ; 91 [
28 60 < 92 \
29 61 = 93 ]
30 62 > 94 ^
31 63 ? 95 _
32 64 # 96 `
33 ! 65 A 97 a
34 " 66 B 98 b
35 # 67 C 99 c
36 $ 68 D 100 d
37 % 69 E 101 e
38 & 70 F 102 f
39 ' 71 G 103 g
40 ( 72 H 104 h
41 ) 73 I 105 i
42 * 74 J 106 j
43 + 75 K 107 k
44 , 76 L 108 l
45 - 77 M 109 m
46 . 78 N 110 n
47 / 79 O 111 o
48 0 80 P 112 p
49 1 81 Q 113 q
50 2 82 R 114 r
51 3 83 S 115 s
52 4 84 T 116 t
53 5 85 U 117 u
54 6 86 V 118 v
55 7 87 W 119 w
56 8 88 X 120 x
57 9 89 Y 121 y
58 : 90 Z 122 z
59 ; 91 [ 123 {
60 < 92 \ 124 |
EDIT 2:
Works around the idea of rahnema1 and excaza:
. If we know the ASCII code (by using strrep):
K>> fprintf('This strange variable (%s) is very stange !!!\n', strrep(b,char(13),''));
This strange variable (toto_titi) is very stange !!!
. For remove leading and trailing whitespace (by using strtrim):
fprintf('This strange variable (%s) is very stange !!!\n', strtrim(b));
This strange variable (toto_titi) is very stange !!!

join every 240 lines of a large file consisting of different numbers in cshell script

I have a large file containing 5,000,000 lines and 3 columns, and I want to merge every 240 lines.
I tried using sed in a cshell script for merging 3 lines: 'N;N;s/\n/ /g' filename. but if I want to use it for 240 lines I should write 240 n;n;n;n;n;n....(240times)! what is the best way to solve this problem?
awk to the rescue!
$ awk 'ORS=NR%240?FS:RS' filename
for example
$ seq 10 99 | awk 'ORS=NR%10?FS:RS'
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99
Explanation
ORS=NR%10?FS:RS here the ternary operator sets output record separator if the line number is divisible by 10 to record separator (newline) or if not to field separator (space). Effectively adding a new line after each tenth record and space in between.
Something like this perhaps, which removes the newline from every line, and then prints it followed by a space or a newline as appropriate
perl -ne's/\s*\z//; print $_, eof || $. % 240 == 0 ? "\n" : " "' myfile
If I understand right, the
paste -d, $(printf "%0.s- " {1..240})
does the job. Assumes the field delimiter is ,.
Demo:
produce some test file
seq -f '%g,a,b' 2400 >demo_file
it contains lines like:
1,a,b
2,a,b
3,a,b
4,a,b
...
2398,a,b
2399,a,b
2400,a,b
the command
paste -d, $(printf "%0.s- " {1..240}) < demo_file | head -2
prints:
1,a,b,2,a,b,3,a,b,4,a,b,5,a,b,6,a,b,7,a,b,8,a,b,9,a,b,10,a,b,11,a,b,12,a,b,13,a,b,14,a,b,15,a,b,16,a,b,17,a,b,18,a,b,19,a,b,20,a,b,21,a,b,22,a,b,23,a,b,24,a,b,25,a,b,26,a,b,27,a,b,28,a,b,29,a,b,30,a,b,31,a,b,32,a,b,33,a,b,34,a,b,35,a,b,36,a,b,37,a,b,38,a,b,39,a,b,40,a,b,41,a,b,42,a,b,43,a,b,44,a,b,45,a,b,46,a,b,47,a,b,48,a,b,49,a,b,50,a,b,51,a,b,52,a,b,53,a,b,54,a,b,55,a,b,56,a,b,57,a,b,58,a,b,59,a,b,60,a,b,61,a,b,62,a,b,63,a,b,64,a,b,65,a,b,66,a,b,67,a,b,68,a,b,69,a,b,70,a,b,71,a,b,72,a,b,73,a,b,74,a,b,75,a,b,76,a,b,77,a,b,78,a,b,79,a,b,80,a,b,81,a,b,82,a,b,83,a,b,84,a,b,85,a,b,86,a,b,87,a,b,88,a,b,89,a,b,90,a,b,91,a,b,92,a,b,93,a,b,94,a,b,95,a,b,96,a,b,97,a,b,98,a,b,99,a,b,100,a,b,101,a,b,102,a,b,103,a,b,104,a,b,105,a,b,106,a,b,107,a,b,108,a,b,109,a,b,110,a,b,111,a,b,112,a,b,113,a,b,114,a,b,115,a,b,116,a,b,117,a,b,118,a,b,119,a,b,120,a,b,121,a,b,122,a,b,123,a,b,124,a,b,125,a,b,126,a,b,127,a,b,128,a,b,129,a,b,130,a,b,131,a,b,132,a,b,133,a,b,134,a,b,135,a,b,136,a,b,137,a,b,138,a,b,139,a,b,140,a,b,141,a,b,142,a,b,143,a,b,144,a,b,145,a,b,146,a,b,147,a,b,148,a,b,149,a,b,150,a,b,151,a,b,152,a,b,153,a,b,154,a,b,155,a,b,156,a,b,157,a,b,158,a,b,159,a,b,160,a,b,161,a,b,162,a,b,163,a,b,164,a,b,165,a,b,166,a,b,167,a,b,168,a,b,169,a,b,170,a,b,171,a,b,172,a,b,173,a,b,174,a,b,175,a,b,176,a,b,177,a,b,178,a,b,179,a,b,180,a,b,181,a,b,182,a,b,183,a,b,184,a,b,185,a,b,186,a,b,187,a,b,188,a,b,189,a,b,190,a,b,191,a,b,192,a,b,193,a,b,194,a,b,195,a,b,196,a,b,197,a,b,198,a,b,199,a,b,200,a,b,201,a,b,202,a,b,203,a,b,204,a,b,205,a,b,206,a,b,207,a,b,208,a,b,209,a,b,210,a,b,211,a,b,212,a,b,213,a,b,214,a,b,215,a,b,216,a,b,217,a,b,218,a,b,219,a,b,220,a,b,221,a,b,222,a,b,223,a,b,224,a,b,225,a,b,226,a,b,227,a,b,228,a,b,229,a,b,230,a,b,231,a,b,232,a,b,233,a,b,234,a,b,235,a,b,236,a,b,237,a,b,238,a,b,239,a,b,240,a,b
241,a,b,242,a,b,243,a,b,244,a,b,245,a,b,246,a,b,247,a,b,248,a,b,249,a,b,250,a,b,251,a,b,252,a,b,253,a,b,254,a,b,255,a,b,256,a,b,257,a,b,258,a,b,259,a,b,260,a,b,261,a,b,262,a,b,263,a,b,264,a,b,265,a,b,266,a,b,267,a,b,268,a,b,269,a,b,270,a,b,271,a,b,272,a,b,273,a,b,274,a,b,275,a,b,276,a,b,277,a,b,278,a,b,279,a,b,280,a,b,281,a,b,282,a,b,283,a,b,284,a,b,285,a,b,286,a,b,287,a,b,288,a,b,289,a,b,290,a,b,291,a,b,292,a,b,293,a,b,294,a,b,295,a,b,296,a,b,297,a,b,298,a,b,299,a,b,300,a,b,301,a,b,302,a,b,303,a,b,304,a,b,305,a,b,306,a,b,307,a,b,308,a,b,309,a,b,310,a,b,311,a,b,312,a,b,313,a,b,314,a,b,315,a,b,316,a,b,317,a,b,318,a,b,319,a,b,320,a,b,321,a,b,322,a,b,323,a,b,324,a,b,325,a,b,326,a,b,327,a,b,328,a,b,329,a,b,330,a,b,331,a,b,332,a,b,333,a,b,334,a,b,335,a,b,336,a,b,337,a,b,338,a,b,339,a,b,340,a,b,341,a,b,342,a,b,343,a,b,344,a,b,345,a,b,346,a,b,347,a,b,348,a,b,349,a,b,350,a,b,351,a,b,352,a,b,353,a,b,354,a,b,355,a,b,356,a,b,357,a,b,358,a,b,359,a,b,360,a,b,361,a,b,362,a,b,363,a,b,364,a,b,365,a,b,366,a,b,367,a,b,368,a,b,369,a,b,370,a,b,371,a,b,372,a,b,373,a,b,374,a,b,375,a,b,376,a,b,377,a,b,378,a,b,379,a,b,380,a,b,381,a,b,382,a,b,383,a,b,384,a,b,385,a,b,386,a,b,387,a,b,388,a,b,389,a,b,390,a,b,391,a,b,392,a,b,393,a,b,394,a,b,395,a,b,396,a,b,397,a,b,398,a,b,399,a,b,400,a,b,401,a,b,402,a,b,403,a,b,404,a,b,405,a,b,406,a,b,407,a,b,408,a,b,409,a,b,410,a,b,411,a,b,412,a,b,413,a,b,414,a,b,415,a,b,416,a,b,417,a,b,418,a,b,419,a,b,420,a,b,421,a,b,422,a,b,423,a,b,424,a,b,425,a,b,426,a,b,427,a,b,428,a,b,429,a,b,430,a,b,431,a,b,432,a,b,433,a,b,434,a,b,435,a,b,436,a,b,437,a,b,438,a,b,439,a,b,440,a,b,441,a,b,442,a,b,443,a,b,444,a,b,445,a,b,446,a,b,447,a,b,448,a,b,449,a,b,450,a,b,451,a,b,452,a,b,453,a,b,454,a,b,455,a,b,456,a,b,457,a,b,458,a,b,459,a,b,460,a,b,461,a,b,462,a,b,463,a,b,464,a,b,465,a,b,466,a,b,467,a,b,468,a,b,469,a,b,470,a,b,471,a,b,472,a,b,473,a,b,474,a,b,475,a,b,476,a,b,477,a,b,478,a,b,479,a,b,480,a,b
EDIT: Just noticed the "cshell"... Unfortunately, the above is for bash, use the perl solution. ;)
This might work for you (GNU sed):
sed -r ':a;$!{N;s/[^\n]+/&/240;Ta};s/\n/ /g' file
This keeps appending lines until the pattern space contains 240 lines then replaces all newlines by spaces.
Given a small test file like
a,b,c
d,e,f
g,h,i
j,k,l
m,n,o
p,q,r
s,t,u
v,w,x
y,z
Change the cnt=? argument to fold the number of lines. You should be able to use your target 240 without an issue.
awk -v cnt=3 'BEGIN{i=1}
{
printf $0 ","; i++;
while (i<=cnt){
getline
printf("%s%s", $0 ,(i!=cnt)?",":"")
i++
}
{ i=1
print ""
}
}' file
output
a,b,c,d,e,f,g,h,i
j,k,l,m,n,o,p,q,r
s,t,u,v,w,x,y,z
One slight problem, for some values for cnt there will be extra ,s at the end of the list line, i.e.
awk -v cnt=4 'BEGIN{i=1}
{
printf $0 ","; i++;
while (i<=cnt){
getline
printf("%s%s", $0 ,(i!=cnt)?",":"")
i++
}
{ i=1
print ""
}
}' file
output
a,b,c,d,e,f,g,h,i,j,k,l
m,n,o,p,q,r,s,t,u,v,w,x
y,z,,,
You can clean these up by appending
awk .... file | sed '$/s/,*$//' > outFile
To the tail end of your process.
IHTH

Function readmtx on matlab

I want to read a matrix that is on my matlab path. I was using the function readmtx but I don't know what to put on 'precision' (mtx = readmtx(fname,nrows,ncols,precision)).
I was wondering if you could help me with that. Or suggest a better way to read the matrix
You could read a matrix from text file with load command. If the first line include text, that should be started with %.
Note that each row of the text file should be values of a row in matrix, which are separated by a space, for Example:
%C1 C2 C3
1 2 3
4 5 6
7 8 9
Then, if you use load command you can read the text file into a matrix, something like:
myMatrix = load('textFileName.txt')
Now, Let's talk about readmtx ;)
About precision as described here:
Both binary and formatted data files can be read. If the file is binary, the precision argument is a format string recognized by fread. Repetition modifiers such as '40*char' are not supported. If the file is formatted, precision is a fscanf and sscanf-style format string of the form '%nX', where n is the number of characters within which the formatted data is found, and X is the conversion character such as 'g' or 'd'. Fortran-style double-precision output such as '0.0D00' can be read using a precision string such as '%nD', where n is the number of characters per element. This is an extension to the C-style format strings accepted by sscanf. Users unfamiliar with C should note that '%d' is preferred over '%i' for formatted integers. MATLAB syntax follows C in interpreting '%i' integers with leading zeros as octal. Formatted files with line endings need to provide the number of trailing bytes per row, which can be 1 for platforms with carriage returns or linefeed (Macintosh, UNIX®), or 2 for platforms with carriage returns and linefeeds (DOS).
Check this example also:
Write and read a binary matrix file:
fid = fopen('binmat','w');
fwrite(fid,1:100,'int16');
fclose(fid);
mtx = readmtx('binmat',10,10,'int16')
mtx =
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100
mtx = readmtx('binmat',10,10,'int16',[2 5],3:2:9)
mtx =
13 15 17 19
23 25 27 29
33 35 37 39
43 45 47 49

Hex dump parsing in perl

I have a hex dump of a message in a file which i want to get it in an array
so i can perform the decoding logic on it.
I was wondering if that was a easier way to parse a message which looks like this.
37 39 30 35 32 34 35 34 3B 32 31 36 39 33 34 35
3B 32 31 36 39 33 34 36 00 00 01 08 40 00 00 15
6C 71 34 34 73 69 6D 31 5F 33 30 33 31 00 00 00
00 00 01 28 40 00 00 15 74 65 6C 63 6F 72 64 69
74 65 6C 63 6F 72 64 69
Note that the data can be max 16 bytes on any row. But any row can contain fewer bytes too (minimum :1 )
Is there a nice and elegant way rather than to read 2 chars at a time in perl ?
Perl has a hex operator that performs the decoding logic for you.
hex EXPR
hex
Interprets EXPR as a hex string and returns the corresponding value. (To convert strings that might start with either 0, 0x, or 0b, see oct.) If EXPR is omitted, uses $_.
print hex '0xAf'; # prints '175'
print hex 'aF'; # same
Remember that the default behavior of split chops up a string at whitespace separators, so for example
$ perl -le '$_ = "a b c"; print for split'
a
b
c
For every line of the input, separate it into hex values, convert the values to numbers, and push them onto an array for later processing.
#! /usr/bin/perl
use warnings;
use strict;
my #values;
while (<>) {
push #values => map hex($_), split;
}
# for example
my $sum = 0;
$sum += $_ for #values;
print $sum, "\n";
Sample run:
$ ./sumhex mtanish-input
4196
I would read a line at a time, strip the whitespace, and use pack 'H*' to convert it. It's hard to be more specific without knowing what kind of "decoding logic" you're trying to apply. For example, here's a version that converts each byte to decimal:
while (<>) {
s/\s+//g;
my #bytes = unpack('C*', pack('H*', $_));
print "#bytes\n";
}
Output from your sample file:
55 57 48 53 50 52 53 52 59 50 49 54 57 51 52 53
59 50 49 54 57 51 52 54 0 0 1 8 64 0 0 21
108 113 52 52 115 105 109 49 95 51 48 51 49 0 0 0
0 0 1 40 64 0 0 21 116 101 108 99 111 114 100 105
116 101 108 99 111 114 100 105
I think reading in two characters at a time is the appropriate way to parse a stream whose logical tokens are two-character units.
Is there some reason you think that's ugly?
If you're trying to extract a particular sequence, you could do that with whitespace-insensitive regular expressions.