sed to match and replace 2 consecutive lines? - sed

I want to match 2 consecutive lines then replace it. For example this pattern:
[0-9]\(.*\)\n[0-9]\(.*\)
so basically both lines are started with a number.
Then replace it:
\2\n\1
Means swap the line content and ditch the leading number.
example file 1:
adfa
adaf
dfd
1 1a
2 2b
3 3d
adfa
sdfa
4 4a
5 5b
6 6d
7 7k
example file 2:
adfa
adaf
dfd
aaa
1 1a
2 2b
3 3d
sdfa
4 4a
5 5b
6 6d
7 7k
Many answers online suggest using N; or 1!N; but This will not work.
The reason is N; will start from 1st line, then read every 2 lines, 1!N; will start from 2nd line, then read every 2 lines. But what I want is to read each line and its next line to match. If N and N+1 lines matched we replace on this 2 lines, then move to the N+3 and N+4 line to match, If no match, we move to N+4 and N+5 ...
The expected result will be following for both example files:
2b
1a
5b
4a
7k
6d
The failed commands:
sed 'N;s%[0-9]\(.*\)\n[0-9]\(.*\)%\2\n\1%;t;d' test0.txt
3d
2b
5b
4a
7k
6d
sed 'N;s%[0-9]\(.*\)\n[0-9]\(.*\)%\2\n\1%;t;d' test1.txt
2b
1a
5b
4a
7k
6d
sed '1!N;s%[0-9]\(.*\)\n[0-9]\(.*\)%\2\n\1%;t;d' test0.txt
2b
1a
6d
5b
7 7k
sed '1!N;s%[0-9]\(.*\)\n[0-9]\(.*\)%\2\n\1%;t;d' test.txt
3d
2b
6d
5b
7 7k

If you use D instead of d, i.e. delete first line of pattern-space and repeat-if-not-empty, then it works, e.g.:
sed -nE 'N; s/^[0-9] (.*)\n[0-9] (.*)/\2\n\1/; ta; D; :a; p' infile
Or even cleaner as suggested by #potong:
sed 'N;/^[0-9] \(.*\)\n[0-9] \(.*\)/!D;s//\2\n\1/' infile
Output:
2b
1a
5b
4a
7k
6d

A imperfect way to achieve it
sed -nr '/^[0-9]/{N;s/.*([0-9][[:alpha:]]).*([0-9][[:alpha:]])/\2\n\1/p}'
We use N only if the line start by a number.
Then we could use regular expression to do as we need.
This regexp is designed by your example file, and should be adjusted if the format of your file changed.

Related

how to find offset of a pattern from binary file (without grep -b)

I want to get a byte offset of a string pattern from a binary file on embedded linux platform.
If I can use "grep -b" option, It would be best way but It is not supported on my machine.
machine does not support
ADDR=`grep -oba <pattern string> <file path> | cut -d ":" -f1`
Here the manual of grep command on the machine.
root# grep --help
BusyBox v1.29.3 () multi-call binary.
Usage: grep \[-HhnlLoqvsriwFE\] \[-m N\] \[-A/B/C N\] PATTERN/-e PATTERN.../-f FILE \[FILE\]...
Search for PATTERN in FILEs (or stdin)
-H Add 'filename:' prefix
-h Do not add 'filename:' prefix
-n Add 'line_no:' prefix
-l Show only names of files that match
-L Show only names of files that don't match
-c Show only count of matching lines
-o Show only the matching part of line
-q Quiet. Return 0 if PATTERN is found, 1 otherwise
-v Select non-matching lines
-s Suppress open and read errors
-r Recurse
-i Ignore case
-w Match whole words only
-x Match whole lines only
-F PATTERN is a literal (not regexp)
-E PATTERN is an extended regexp
-m N Match up to N times per file
-A N Print N lines of trailing context
-B N Print N lines of leading context
-C N Same as '-A N -B N'
-e PTRN Pattern to match
-f FILE Read pattern from file
Since that option isn't available, I'm looking for an alternative.
the combination of hexdump and grep can be also useful
such as
ADDR=`hexdump <file path> -C | grep <pattern string> | cut -d' ' -f1`
But if pattren spans multiple lines, it will not be found.
Is there a way to find the byte offset of a specific pattern with a Linux command?
Set the pattern as the record separator in awk. The offset of the occurrence is the length of the first record. BusyBox awk treats RS as an extended regular expression, so add backslashes before any of .[]\*+?^$ in the pattern string.
<myfile.bin awk -v RS='pattern' '{print length($0); exit}'
If the pattern contains a null byte, you need a little extra work. Use tr to exchange null bytes with some byte value that doesn't appear in the pattern. For example, if the pattern's hex dump is 00002a61:
<myfile.bin tr '\0!' '!\0' | awk -v RS='!!-A' '{print length($0); exit}'
If the pattern is not found, this prints the length of the whole file. So if you aren't sure whether the pattern is present, you need again some extra work. Append some text that can't be part of a pattern match to the file, so that you know that if there's a match, it won't be at the very end of the file. Then, if the pattern is present, the file will contain at least two records. But if the pattern is not present, the file only contains the first record (without a record separator after it).
{ cat myfile.bin; echo garbage; } |
awk -v RS='pattern' '
NR==1 {n = length($0)}
NR==2 {print n; found = 1; exit}
END {exit !found}
'
Something like this?
hexdump -C "$file" |
awk -v pattern="$pattern" 'residue { matched = ($0 ~ "\\|" residue)
if (matched) print $1; residue = ""; if (matched) next }
$0 ~ pattern { print $1 }
{ for(i=length(pattern)-1; i>0; i--)
if ($0 ~ substr(pattern, 1, i) "\\|$") { residue=substr(pattern, i+1); break } }'
The offset is just the first field from the hexdump output; if you need the precise location of the match, this requires some additional massaging to figure out the offset to add to the address, or subtract if it was wrapped.
Briefly tested in a clean-slate Busybox Docker container where hexdump -C output looks like this:
/ # hexdump -C /etc/resolv.conf
00000000 23 20 44 4e 53 20 72 65 71 75 65 73 74 73 20 61 |# DNS requests a|
00000010 72 65 20 66 6f 72 77 61 72 64 65 64 20 74 6f 20 |re forwarded to |
00000020 74 68 65 20 68 6f 73 74 2e 20 44 48 43 50 20 44 |the host. DHCP D|
00000030 4e 53 20 6f 70 74 69 6f 6e 73 20 61 72 65 20 69 |NS options are i|
00000040 67 6e 6f 72 65 64 2e 0a 6e 61 6d 65 73 65 72 76 |gnored..nameserv|
00000050 65 72 20 31 39 32 2e 31 36 38 2e 36 35 2e 35 0a |er 192.168.65.5.|
00000060 20 | |

How to read currency symbol in a perl script

I have a Perl script where we are reading data from a .csv file which is having some different currency symbol . When we are reading that file and write the content I can see it is printing
Get <A3>50 or <80>50 daily
Actual value is
Get £50 or €50 daily
With Dollar sign it is working fine if there is any other currency code it is not working
I tried
open my $in, '<:encoding(UTF-8)', 'input-file-name' or die $!;
open my $out, '>:encoding(latin1)', 'output-file-name' or die $!;
while ( <$in> ) {
print $out $_;
}
$ od -t x1 input-file-name
0000000 47 65 74 20 c2 a3 35 30 20 6f 72 20 e2 82 ac 35
0000020 30 20 64 61 69 6c 79 0a
0000030
od -t x1 output-file-name
0000000 47 65 74 20 a3 35 30 20 6f 72 20 5c 78 7b 32 30
0000020 61 63 7d 35 30 20 64 61 69 6c 79 0a
0000034
but that is also not helping .Output I am getting
Get \xA350 or \x8050 daily
od -t x1 output-file-name
0000000 47 65 74 20 a3 35 30 20 6f 72 20 5c 78 7b 32 30
0000020 61 63 7d 35 30 20 64 61 69 6c 79 0a
0000034
Unicode Code Point
Glyph
UTF-8
Input File
ISO-8859-1
Output File
U+00A3 POUND SIGN
£
C2 A3
C2 A3
A3
A3
U+20AC EURO SIGN
€
E2 82 AC
E2 82 AC
N/A
5C 78 7B 32 30 61 63 7D
("LATIN1" is an alias for "ISO-8859-1".)
There are no problems with the input file.
£ is correctly encoded in your input file.
€ is correctly encoded in your input file.
As for the output file,
£ is correctly encoded in your output file.
€ isn't found in the latin1 charset, so \x{20ac} is used instead.
Your program is working as expected.
You say you see <A3> instead of £. That's probably because the program you are using is expecting a file encoded using UTF-8, but you provided a file encoded using ISO-8859-1.
You also say you see <80> instead of €. But there's no way you'd see that for the file you provided.

join every 240 lines of a large file consisting of different numbers in cshell script

I have a large file containing 5,000,000 lines and 3 columns, and I want to merge every 240 lines.
I tried using sed in a cshell script for merging 3 lines: 'N;N;s/\n/ /g' filename. but if I want to use it for 240 lines I should write 240 n;n;n;n;n;n....(240times)! what is the best way to solve this problem?
awk to the rescue!
$ awk 'ORS=NR%240?FS:RS' filename
for example
$ seq 10 99 | awk 'ORS=NR%10?FS:RS'
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99
Explanation
ORS=NR%10?FS:RS here the ternary operator sets output record separator if the line number is divisible by 10 to record separator (newline) or if not to field separator (space). Effectively adding a new line after each tenth record and space in between.
Something like this perhaps, which removes the newline from every line, and then prints it followed by a space or a newline as appropriate
perl -ne's/\s*\z//; print $_, eof || $. % 240 == 0 ? "\n" : " "' myfile
If I understand right, the
paste -d, $(printf "%0.s- " {1..240})
does the job. Assumes the field delimiter is ,.
Demo:
produce some test file
seq -f '%g,a,b' 2400 >demo_file
it contains lines like:
1,a,b
2,a,b
3,a,b
4,a,b
...
2398,a,b
2399,a,b
2400,a,b
the command
paste -d, $(printf "%0.s- " {1..240}) < demo_file | head -2
prints:
1,a,b,2,a,b,3,a,b,4,a,b,5,a,b,6,a,b,7,a,b,8,a,b,9,a,b,10,a,b,11,a,b,12,a,b,13,a,b,14,a,b,15,a,b,16,a,b,17,a,b,18,a,b,19,a,b,20,a,b,21,a,b,22,a,b,23,a,b,24,a,b,25,a,b,26,a,b,27,a,b,28,a,b,29,a,b,30,a,b,31,a,b,32,a,b,33,a,b,34,a,b,35,a,b,36,a,b,37,a,b,38,a,b,39,a,b,40,a,b,41,a,b,42,a,b,43,a,b,44,a,b,45,a,b,46,a,b,47,a,b,48,a,b,49,a,b,50,a,b,51,a,b,52,a,b,53,a,b,54,a,b,55,a,b,56,a,b,57,a,b,58,a,b,59,a,b,60,a,b,61,a,b,62,a,b,63,a,b,64,a,b,65,a,b,66,a,b,67,a,b,68,a,b,69,a,b,70,a,b,71,a,b,72,a,b,73,a,b,74,a,b,75,a,b,76,a,b,77,a,b,78,a,b,79,a,b,80,a,b,81,a,b,82,a,b,83,a,b,84,a,b,85,a,b,86,a,b,87,a,b,88,a,b,89,a,b,90,a,b,91,a,b,92,a,b,93,a,b,94,a,b,95,a,b,96,a,b,97,a,b,98,a,b,99,a,b,100,a,b,101,a,b,102,a,b,103,a,b,104,a,b,105,a,b,106,a,b,107,a,b,108,a,b,109,a,b,110,a,b,111,a,b,112,a,b,113,a,b,114,a,b,115,a,b,116,a,b,117,a,b,118,a,b,119,a,b,120,a,b,121,a,b,122,a,b,123,a,b,124,a,b,125,a,b,126,a,b,127,a,b,128,a,b,129,a,b,130,a,b,131,a,b,132,a,b,133,a,b,134,a,b,135,a,b,136,a,b,137,a,b,138,a,b,139,a,b,140,a,b,141,a,b,142,a,b,143,a,b,144,a,b,145,a,b,146,a,b,147,a,b,148,a,b,149,a,b,150,a,b,151,a,b,152,a,b,153,a,b,154,a,b,155,a,b,156,a,b,157,a,b,158,a,b,159,a,b,160,a,b,161,a,b,162,a,b,163,a,b,164,a,b,165,a,b,166,a,b,167,a,b,168,a,b,169,a,b,170,a,b,171,a,b,172,a,b,173,a,b,174,a,b,175,a,b,176,a,b,177,a,b,178,a,b,179,a,b,180,a,b,181,a,b,182,a,b,183,a,b,184,a,b,185,a,b,186,a,b,187,a,b,188,a,b,189,a,b,190,a,b,191,a,b,192,a,b,193,a,b,194,a,b,195,a,b,196,a,b,197,a,b,198,a,b,199,a,b,200,a,b,201,a,b,202,a,b,203,a,b,204,a,b,205,a,b,206,a,b,207,a,b,208,a,b,209,a,b,210,a,b,211,a,b,212,a,b,213,a,b,214,a,b,215,a,b,216,a,b,217,a,b,218,a,b,219,a,b,220,a,b,221,a,b,222,a,b,223,a,b,224,a,b,225,a,b,226,a,b,227,a,b,228,a,b,229,a,b,230,a,b,231,a,b,232,a,b,233,a,b,234,a,b,235,a,b,236,a,b,237,a,b,238,a,b,239,a,b,240,a,b
241,a,b,242,a,b,243,a,b,244,a,b,245,a,b,246,a,b,247,a,b,248,a,b,249,a,b,250,a,b,251,a,b,252,a,b,253,a,b,254,a,b,255,a,b,256,a,b,257,a,b,258,a,b,259,a,b,260,a,b,261,a,b,262,a,b,263,a,b,264,a,b,265,a,b,266,a,b,267,a,b,268,a,b,269,a,b,270,a,b,271,a,b,272,a,b,273,a,b,274,a,b,275,a,b,276,a,b,277,a,b,278,a,b,279,a,b,280,a,b,281,a,b,282,a,b,283,a,b,284,a,b,285,a,b,286,a,b,287,a,b,288,a,b,289,a,b,290,a,b,291,a,b,292,a,b,293,a,b,294,a,b,295,a,b,296,a,b,297,a,b,298,a,b,299,a,b,300,a,b,301,a,b,302,a,b,303,a,b,304,a,b,305,a,b,306,a,b,307,a,b,308,a,b,309,a,b,310,a,b,311,a,b,312,a,b,313,a,b,314,a,b,315,a,b,316,a,b,317,a,b,318,a,b,319,a,b,320,a,b,321,a,b,322,a,b,323,a,b,324,a,b,325,a,b,326,a,b,327,a,b,328,a,b,329,a,b,330,a,b,331,a,b,332,a,b,333,a,b,334,a,b,335,a,b,336,a,b,337,a,b,338,a,b,339,a,b,340,a,b,341,a,b,342,a,b,343,a,b,344,a,b,345,a,b,346,a,b,347,a,b,348,a,b,349,a,b,350,a,b,351,a,b,352,a,b,353,a,b,354,a,b,355,a,b,356,a,b,357,a,b,358,a,b,359,a,b,360,a,b,361,a,b,362,a,b,363,a,b,364,a,b,365,a,b,366,a,b,367,a,b,368,a,b,369,a,b,370,a,b,371,a,b,372,a,b,373,a,b,374,a,b,375,a,b,376,a,b,377,a,b,378,a,b,379,a,b,380,a,b,381,a,b,382,a,b,383,a,b,384,a,b,385,a,b,386,a,b,387,a,b,388,a,b,389,a,b,390,a,b,391,a,b,392,a,b,393,a,b,394,a,b,395,a,b,396,a,b,397,a,b,398,a,b,399,a,b,400,a,b,401,a,b,402,a,b,403,a,b,404,a,b,405,a,b,406,a,b,407,a,b,408,a,b,409,a,b,410,a,b,411,a,b,412,a,b,413,a,b,414,a,b,415,a,b,416,a,b,417,a,b,418,a,b,419,a,b,420,a,b,421,a,b,422,a,b,423,a,b,424,a,b,425,a,b,426,a,b,427,a,b,428,a,b,429,a,b,430,a,b,431,a,b,432,a,b,433,a,b,434,a,b,435,a,b,436,a,b,437,a,b,438,a,b,439,a,b,440,a,b,441,a,b,442,a,b,443,a,b,444,a,b,445,a,b,446,a,b,447,a,b,448,a,b,449,a,b,450,a,b,451,a,b,452,a,b,453,a,b,454,a,b,455,a,b,456,a,b,457,a,b,458,a,b,459,a,b,460,a,b,461,a,b,462,a,b,463,a,b,464,a,b,465,a,b,466,a,b,467,a,b,468,a,b,469,a,b,470,a,b,471,a,b,472,a,b,473,a,b,474,a,b,475,a,b,476,a,b,477,a,b,478,a,b,479,a,b,480,a,b
EDIT: Just noticed the "cshell"... Unfortunately, the above is for bash, use the perl solution. ;)
This might work for you (GNU sed):
sed -r ':a;$!{N;s/[^\n]+/&/240;Ta};s/\n/ /g' file
This keeps appending lines until the pattern space contains 240 lines then replaces all newlines by spaces.
Given a small test file like
a,b,c
d,e,f
g,h,i
j,k,l
m,n,o
p,q,r
s,t,u
v,w,x
y,z
Change the cnt=? argument to fold the number of lines. You should be able to use your target 240 without an issue.
awk -v cnt=3 'BEGIN{i=1}
{
printf $0 ","; i++;
while (i<=cnt){
getline
printf("%s%s", $0 ,(i!=cnt)?",":"")
i++
}
{ i=1
print ""
}
}' file
output
a,b,c,d,e,f,g,h,i
j,k,l,m,n,o,p,q,r
s,t,u,v,w,x,y,z
One slight problem, for some values for cnt there will be extra ,s at the end of the list line, i.e.
awk -v cnt=4 'BEGIN{i=1}
{
printf $0 ","; i++;
while (i<=cnt){
getline
printf("%s%s", $0 ,(i!=cnt)?",":"")
i++
}
{ i=1
print ""
}
}' file
output
a,b,c,d,e,f,g,h,i,j,k,l
m,n,o,p,q,r,s,t,u,v,w,x
y,z,,,
You can clean these up by appending
awk .... file | sed '$/s/,*$//' > outFile
To the tail end of your process.
IHTH

WM-Bus extended layer decoding

I am trying to decrypt wm-bus telegram from Kamstrup Multical21 in C1 mode with Extended Link Layer.
The payload together with ELL info is following:
23 44 2D 2C 45 45 71 63 1B 16 8D 20 6A 31 FB 7C 20 39 A3 79 60 4B 90 BD FC BE 8D D8 CB 18 CE 77 DC 41 CE 8C
Analysing CI = 8D I found that there is a ELL with following data:
CI (1 byte) CC(1 byte) ACC(1 byte) SN(4 bytes) CRC(2 bytes)
8D 20 6A 31 FB 7C 20 39 A3
The documentation says that the buffer which should be decrypted shall contain CRC from ELL, i.e:
39 A3 79 60 4B 90 BD FC BE 8D D8 CB 18 CE 77 DC 41 CE 8C
I have got the AES key from the Manufacturer:
B9 7A 6D 4E C2 74 A4 6D 87 0E 31 27 D9 A0 AF 63
Initialization vector for ELL shall be:
M-field A-field CC-field SN-field FN BC
2D 2C 45 45 71 63 1B 16 20 31 FB 7C 20 00 00 00
After decrypting, I get the following result:
08 3a 5f ce b2 8d 51 97 94 a2 5b fb 61 ab 2e c0
e4 20 c8 2a 43 ff 3a 75 6f 93 d0 ac 8c 79 b7 a1
Since there is no 2F 2F in the beginning, something is wrong!
Can somebody help me and tell what I have done wrong?
Thanks in advance.
I had a look in the latest Kamstrup docs ("Wireless M-Bus Communication Kamstrup Water Meters - MULTICAL® 21 and flowIQ® water meters Mode C1 according to EN 13757-4:2013")
When I decrypt your packet I find:
25877968217E8E01000000000000000000
Firstly, it seems the Kamstrup decrypted packets does not start with 2F 2F.
The first 2 bytes of the decrypted packet is supposedly the PLCRC (I can't confirm that right now - don't have immediate access to the standard that defines the crc polynomial algorithm), and then the next byte is 79, which means it is a Compact Frame, then the next 4 bytes are 2 more CRCs, and then the next 2 bytes 0100 is probably the Info, which is manufacturer specific and I don't know how to interpret that yet.
This meter is probably R type 1, right? (on the face place, the "Con.:" parameter's 3rd last digit should be a 1) So its format would be [Info][Volume][Target Volume] - 2 bytes, 4 bytes, 4 bytes - I kind of assume that, since this packet is a compact packet, so I don't get the actual format the long packet would have, e.g. number of decimals - which normally you'd need - but your values are zeroes? so decimals doesn't matter. (the 'long' packet of course is every 6th packet or so?)
The IV I get is:
2D2C454571631B162031FB7C20000000
which is exactly the same as yours.
The encrypted packet I use is:
39A379604B90BDFCBE8DD8CB18CE77DC41
so I exclude the CE and 8C you had on yours?
When I put them in, the decrypted packet becomes:
25877968217E8E01000000000000000000BB49
which is pretty much the same packet with some more crc stuff at the back, I suspect, so I really do not get what you do to decrypt, since your result is completely different?
Ok, maybe you use AES/CBC/NoPadding, as in OpenMUC.
Kamstrup uses AES/CTR/NoPadding. That is how they don't have to decrypt multiples of 16 byte blocks? The way that looks in my Java code is as follows:
Cipher cipher = Cipher.getInstance("AES/CTR/NoPadding");
the hints here are very helpfull. There's one obstacle I stumbled across with the given message. The Length-Field is wrong and there are 2 bytes of garbage at the end.
I guess the original message was encoded in frame format B. That means the length field includes the frame CRCs and should be corrected after the CRCs are removed. After correcting the length to 0x21 (33 bytes + L-Field), I get the correct message and also can verify that the first 2 bytes of the decoded message contain the CRC16 of the remaining message.

combine one colum from two files into a thrid file

I have two files a.txt and b.txt which contains the following data.
$ cat a.txt
0x5212cb03caa111e0
0x5212cb03caa113c0
0x5212cb03caa115c0
0x5212cb03caa117c0
0x5212cb03caa119e0
0x5212cb03caa11bc0
0x5212cb03caa11dc0
0x5212cb03caa11fc0
0x5212cb03caa121c0
$ cat b.txt
36 65 fb 60 7a 5e
36 65 fb 60 7a 64
36 65 fb 60 7a 6a
36 65 fb 60 7a 70
36 65 fb 60 7a 76
36 65 fb 60 7a 7c
36 65 fb 60 7a 82
36 65 fb 60 7a 88
36 65 fb 60 7a 8e
I want to generate a third file c.txt that contains
0x5212cb03caa111e0 36 65 fb 60 7a 5e
0x5212cb03caa113c0 36 65 fb 60 7a 64
0x5212cb03caa115c0 36 65 fb 60 7a 6a
Can I achieve this using awk? How do I do this?
use paste command:
paste a.txt b.txt
paste is really the shortest solution, however if you're looking for awk solution as stated in question then:
awk 'FNR==NR{a[++i]=$0;next} {print a[FNR] "\t" $0}' a.txt b.txt
Here is an awk solution that only stores two lines in memory at a time:
awk '{ getline b < "b.txt"; print $0, b }' OFS='\t' a.txt
Lines from a.txt are implicitly stored in $0 and for each line in a.txt a line is read from b.txt by getline.