join every 240 lines of a large file consisting of different numbers in cshell script - perl

I have a large file containing 5,000,000 lines and 3 columns, and I want to merge every 240 lines.
I tried using sed in a cshell script for merging 3 lines: 'N;N;s/\n/ /g' filename. but if I want to use it for 240 lines I should write 240 n;n;n;n;n;n....(240times)! what is the best way to solve this problem?

awk to the rescue!
$ awk 'ORS=NR%240?FS:RS' filename
for example
$ seq 10 99 | awk 'ORS=NR%10?FS:RS'
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99
Explanation
ORS=NR%10?FS:RS here the ternary operator sets output record separator if the line number is divisible by 10 to record separator (newline) or if not to field separator (space). Effectively adding a new line after each tenth record and space in between.

Something like this perhaps, which removes the newline from every line, and then prints it followed by a space or a newline as appropriate
perl -ne's/\s*\z//; print $_, eof || $. % 240 == 0 ? "\n" : " "' myfile

If I understand right, the
paste -d, $(printf "%0.s- " {1..240})
does the job. Assumes the field delimiter is ,.
Demo:
produce some test file
seq -f '%g,a,b' 2400 >demo_file
it contains lines like:
1,a,b
2,a,b
3,a,b
4,a,b
...
2398,a,b
2399,a,b
2400,a,b
the command
paste -d, $(printf "%0.s- " {1..240}) < demo_file | head -2
prints:
1,a,b,2,a,b,3,a,b,4,a,b,5,a,b,6,a,b,7,a,b,8,a,b,9,a,b,10,a,b,11,a,b,12,a,b,13,a,b,14,a,b,15,a,b,16,a,b,17,a,b,18,a,b,19,a,b,20,a,b,21,a,b,22,a,b,23,a,b,24,a,b,25,a,b,26,a,b,27,a,b,28,a,b,29,a,b,30,a,b,31,a,b,32,a,b,33,a,b,34,a,b,35,a,b,36,a,b,37,a,b,38,a,b,39,a,b,40,a,b,41,a,b,42,a,b,43,a,b,44,a,b,45,a,b,46,a,b,47,a,b,48,a,b,49,a,b,50,a,b,51,a,b,52,a,b,53,a,b,54,a,b,55,a,b,56,a,b,57,a,b,58,a,b,59,a,b,60,a,b,61,a,b,62,a,b,63,a,b,64,a,b,65,a,b,66,a,b,67,a,b,68,a,b,69,a,b,70,a,b,71,a,b,72,a,b,73,a,b,74,a,b,75,a,b,76,a,b,77,a,b,78,a,b,79,a,b,80,a,b,81,a,b,82,a,b,83,a,b,84,a,b,85,a,b,86,a,b,87,a,b,88,a,b,89,a,b,90,a,b,91,a,b,92,a,b,93,a,b,94,a,b,95,a,b,96,a,b,97,a,b,98,a,b,99,a,b,100,a,b,101,a,b,102,a,b,103,a,b,104,a,b,105,a,b,106,a,b,107,a,b,108,a,b,109,a,b,110,a,b,111,a,b,112,a,b,113,a,b,114,a,b,115,a,b,116,a,b,117,a,b,118,a,b,119,a,b,120,a,b,121,a,b,122,a,b,123,a,b,124,a,b,125,a,b,126,a,b,127,a,b,128,a,b,129,a,b,130,a,b,131,a,b,132,a,b,133,a,b,134,a,b,135,a,b,136,a,b,137,a,b,138,a,b,139,a,b,140,a,b,141,a,b,142,a,b,143,a,b,144,a,b,145,a,b,146,a,b,147,a,b,148,a,b,149,a,b,150,a,b,151,a,b,152,a,b,153,a,b,154,a,b,155,a,b,156,a,b,157,a,b,158,a,b,159,a,b,160,a,b,161,a,b,162,a,b,163,a,b,164,a,b,165,a,b,166,a,b,167,a,b,168,a,b,169,a,b,170,a,b,171,a,b,172,a,b,173,a,b,174,a,b,175,a,b,176,a,b,177,a,b,178,a,b,179,a,b,180,a,b,181,a,b,182,a,b,183,a,b,184,a,b,185,a,b,186,a,b,187,a,b,188,a,b,189,a,b,190,a,b,191,a,b,192,a,b,193,a,b,194,a,b,195,a,b,196,a,b,197,a,b,198,a,b,199,a,b,200,a,b,201,a,b,202,a,b,203,a,b,204,a,b,205,a,b,206,a,b,207,a,b,208,a,b,209,a,b,210,a,b,211,a,b,212,a,b,213,a,b,214,a,b,215,a,b,216,a,b,217,a,b,218,a,b,219,a,b,220,a,b,221,a,b,222,a,b,223,a,b,224,a,b,225,a,b,226,a,b,227,a,b,228,a,b,229,a,b,230,a,b,231,a,b,232,a,b,233,a,b,234,a,b,235,a,b,236,a,b,237,a,b,238,a,b,239,a,b,240,a,b
241,a,b,242,a,b,243,a,b,244,a,b,245,a,b,246,a,b,247,a,b,248,a,b,249,a,b,250,a,b,251,a,b,252,a,b,253,a,b,254,a,b,255,a,b,256,a,b,257,a,b,258,a,b,259,a,b,260,a,b,261,a,b,262,a,b,263,a,b,264,a,b,265,a,b,266,a,b,267,a,b,268,a,b,269,a,b,270,a,b,271,a,b,272,a,b,273,a,b,274,a,b,275,a,b,276,a,b,277,a,b,278,a,b,279,a,b,280,a,b,281,a,b,282,a,b,283,a,b,284,a,b,285,a,b,286,a,b,287,a,b,288,a,b,289,a,b,290,a,b,291,a,b,292,a,b,293,a,b,294,a,b,295,a,b,296,a,b,297,a,b,298,a,b,299,a,b,300,a,b,301,a,b,302,a,b,303,a,b,304,a,b,305,a,b,306,a,b,307,a,b,308,a,b,309,a,b,310,a,b,311,a,b,312,a,b,313,a,b,314,a,b,315,a,b,316,a,b,317,a,b,318,a,b,319,a,b,320,a,b,321,a,b,322,a,b,323,a,b,324,a,b,325,a,b,326,a,b,327,a,b,328,a,b,329,a,b,330,a,b,331,a,b,332,a,b,333,a,b,334,a,b,335,a,b,336,a,b,337,a,b,338,a,b,339,a,b,340,a,b,341,a,b,342,a,b,343,a,b,344,a,b,345,a,b,346,a,b,347,a,b,348,a,b,349,a,b,350,a,b,351,a,b,352,a,b,353,a,b,354,a,b,355,a,b,356,a,b,357,a,b,358,a,b,359,a,b,360,a,b,361,a,b,362,a,b,363,a,b,364,a,b,365,a,b,366,a,b,367,a,b,368,a,b,369,a,b,370,a,b,371,a,b,372,a,b,373,a,b,374,a,b,375,a,b,376,a,b,377,a,b,378,a,b,379,a,b,380,a,b,381,a,b,382,a,b,383,a,b,384,a,b,385,a,b,386,a,b,387,a,b,388,a,b,389,a,b,390,a,b,391,a,b,392,a,b,393,a,b,394,a,b,395,a,b,396,a,b,397,a,b,398,a,b,399,a,b,400,a,b,401,a,b,402,a,b,403,a,b,404,a,b,405,a,b,406,a,b,407,a,b,408,a,b,409,a,b,410,a,b,411,a,b,412,a,b,413,a,b,414,a,b,415,a,b,416,a,b,417,a,b,418,a,b,419,a,b,420,a,b,421,a,b,422,a,b,423,a,b,424,a,b,425,a,b,426,a,b,427,a,b,428,a,b,429,a,b,430,a,b,431,a,b,432,a,b,433,a,b,434,a,b,435,a,b,436,a,b,437,a,b,438,a,b,439,a,b,440,a,b,441,a,b,442,a,b,443,a,b,444,a,b,445,a,b,446,a,b,447,a,b,448,a,b,449,a,b,450,a,b,451,a,b,452,a,b,453,a,b,454,a,b,455,a,b,456,a,b,457,a,b,458,a,b,459,a,b,460,a,b,461,a,b,462,a,b,463,a,b,464,a,b,465,a,b,466,a,b,467,a,b,468,a,b,469,a,b,470,a,b,471,a,b,472,a,b,473,a,b,474,a,b,475,a,b,476,a,b,477,a,b,478,a,b,479,a,b,480,a,b
EDIT: Just noticed the "cshell"... Unfortunately, the above is for bash, use the perl solution. ;)

This might work for you (GNU sed):
sed -r ':a;$!{N;s/[^\n]+/&/240;Ta};s/\n/ /g' file
This keeps appending lines until the pattern space contains 240 lines then replaces all newlines by spaces.

Given a small test file like
a,b,c
d,e,f
g,h,i
j,k,l
m,n,o
p,q,r
s,t,u
v,w,x
y,z
Change the cnt=? argument to fold the number of lines. You should be able to use your target 240 without an issue.
awk -v cnt=3 'BEGIN{i=1}
{
printf $0 ","; i++;
while (i<=cnt){
getline
printf("%s%s", $0 ,(i!=cnt)?",":"")
i++
}
{ i=1
print ""
}
}' file
output
a,b,c,d,e,f,g,h,i
j,k,l,m,n,o,p,q,r
s,t,u,v,w,x,y,z
One slight problem, for some values for cnt there will be extra ,s at the end of the list line, i.e.
awk -v cnt=4 'BEGIN{i=1}
{
printf $0 ","; i++;
while (i<=cnt){
getline
printf("%s%s", $0 ,(i!=cnt)?",":"")
i++
}
{ i=1
print ""
}
}' file
output
a,b,c,d,e,f,g,h,i,j,k,l
m,n,o,p,q,r,s,t,u,v,w,x
y,z,,,
You can clean these up by appending
awk .... file | sed '$/s/,*$//' > outFile
To the tail end of your process.
IHTH

Related

how to find offset of a pattern from binary file (without grep -b)

I want to get a byte offset of a string pattern from a binary file on embedded linux platform.
If I can use "grep -b" option, It would be best way but It is not supported on my machine.
machine does not support
ADDR=`grep -oba <pattern string> <file path> | cut -d ":" -f1`
Here the manual of grep command on the machine.
root# grep --help
BusyBox v1.29.3 () multi-call binary.
Usage: grep \[-HhnlLoqvsriwFE\] \[-m N\] \[-A/B/C N\] PATTERN/-e PATTERN.../-f FILE \[FILE\]...
Search for PATTERN in FILEs (or stdin)
-H Add 'filename:' prefix
-h Do not add 'filename:' prefix
-n Add 'line_no:' prefix
-l Show only names of files that match
-L Show only names of files that don't match
-c Show only count of matching lines
-o Show only the matching part of line
-q Quiet. Return 0 if PATTERN is found, 1 otherwise
-v Select non-matching lines
-s Suppress open and read errors
-r Recurse
-i Ignore case
-w Match whole words only
-x Match whole lines only
-F PATTERN is a literal (not regexp)
-E PATTERN is an extended regexp
-m N Match up to N times per file
-A N Print N lines of trailing context
-B N Print N lines of leading context
-C N Same as '-A N -B N'
-e PTRN Pattern to match
-f FILE Read pattern from file
Since that option isn't available, I'm looking for an alternative.
the combination of hexdump and grep can be also useful
such as
ADDR=`hexdump <file path> -C | grep <pattern string> | cut -d' ' -f1`
But if pattren spans multiple lines, it will not be found.
Is there a way to find the byte offset of a specific pattern with a Linux command?
Set the pattern as the record separator in awk. The offset of the occurrence is the length of the first record. BusyBox awk treats RS as an extended regular expression, so add backslashes before any of .[]\*+?^$ in the pattern string.
<myfile.bin awk -v RS='pattern' '{print length($0); exit}'
If the pattern contains a null byte, you need a little extra work. Use tr to exchange null bytes with some byte value that doesn't appear in the pattern. For example, if the pattern's hex dump is 00002a61:
<myfile.bin tr '\0!' '!\0' | awk -v RS='!!-A' '{print length($0); exit}'
If the pattern is not found, this prints the length of the whole file. So if you aren't sure whether the pattern is present, you need again some extra work. Append some text that can't be part of a pattern match to the file, so that you know that if there's a match, it won't be at the very end of the file. Then, if the pattern is present, the file will contain at least two records. But if the pattern is not present, the file only contains the first record (without a record separator after it).
{ cat myfile.bin; echo garbage; } |
awk -v RS='pattern' '
NR==1 {n = length($0)}
NR==2 {print n; found = 1; exit}
END {exit !found}
'
Something like this?
hexdump -C "$file" |
awk -v pattern="$pattern" 'residue { matched = ($0 ~ "\\|" residue)
if (matched) print $1; residue = ""; if (matched) next }
$0 ~ pattern { print $1 }
{ for(i=length(pattern)-1; i>0; i--)
if ($0 ~ substr(pattern, 1, i) "\\|$") { residue=substr(pattern, i+1); break } }'
The offset is just the first field from the hexdump output; if you need the precise location of the match, this requires some additional massaging to figure out the offset to add to the address, or subtract if it was wrapped.
Briefly tested in a clean-slate Busybox Docker container where hexdump -C output looks like this:
/ # hexdump -C /etc/resolv.conf
00000000 23 20 44 4e 53 20 72 65 71 75 65 73 74 73 20 61 |# DNS requests a|
00000010 72 65 20 66 6f 72 77 61 72 64 65 64 20 74 6f 20 |re forwarded to |
00000020 74 68 65 20 68 6f 73 74 2e 20 44 48 43 50 20 44 |the host. DHCP D|
00000030 4e 53 20 6f 70 74 69 6f 6e 73 20 61 72 65 20 69 |NS options are i|
00000040 67 6e 6f 72 65 64 2e 0a 6e 61 6d 65 73 65 72 76 |gnored..nameserv|
00000050 65 72 20 31 39 32 2e 31 36 38 2e 36 35 2e 35 0a |er 192.168.65.5.|
00000060 20 | |

sed: delete n lines after first match

I want to delete N number of lines after the first match in a text file using sed.
(I know most of these questions have been answered with "use awk", but I want to use sed, regardless of how much more powerful it is than awk. It's more a matter of which tool I'm most comfortable with using at the moment, within a certain time constraint)
The furthest I got is this:
sed -i "0,/pattern/{/pattern/,+Nd}" file.txt
The thought is that 0, denotes the first occurrence, where the curly brackets search the first line for the pattern, and deletes N lines after that occurence
Try
sed '/pattern/{N;N;N;N;N;N;N;d;}' file.txt
The 0, construct and the relative line number addressing you tried to use are specific to GNU sed. Portable sed does not have these facilities.
This will remove the next six lines after every match. If you only want to remove the first occurrence and leave the rest of the file unchanged, maybe add a separate loop to simply print all remaining lines.
The problem with your attempt is that 0,/pattern/ restricts matching to the lines up through the first occurrence of /pattern/ but then that's the end of the range, so anything selected by this expression cannot operate on lines outside of that range.
Assuming your shell is bash (the question originally had a bash tag):
n=3
sed -f <(printf -v nsp '%*s' $n; printf '/%s/{x;/./!{s/^/./;h;%sd;};x;}\n' 'pattern' "${nsp// /N;}") file
Note that n is variable (3 is just an instance) and constructed sed script is not GNU specific.
This might work for you (GNU sed):
sed '0,/pattern/{//{:a;N;s/\n/&/N;Ta;d}}' file
Deletes the line containing pattern and then N lines after it once only.
Alternative:
sed '/pattern/{x;//{x;b};x;h;:a;N;s/\n/&/N;Ta;d}' file
N.B. The N following the substitution command refers to the nth occurrence of a newline in the pattern space.
UPDATE 1 : Example where sed solution above does not meet objective universally:
cmd='/5=P$/{N;N;N;N;N;N;d;}'
echo "\n input \${b} :: \n\n———————\n" \
"${b}\n--------------\n\n sed " \
"commands :: \n\n--------------\n " \
"${cmd}\n--------------\n\n GNU sed "\
"::\n\n$( gsed "${cmd}" <<< "${b}" )" \
"\n\n BSD sed ::\n\n$( sed "${cmd}" <<< "${b}" )\n\n"
input ${b} ::
--------------
84 77138=48001=P
85 77138=48035=P
86 77138=78118=P
87 77138=79248=P
--------------
sed commands ::
--------------
/5=P$/{N;N;N;N;N;N;d;}
--------------
GNU sed ::
84 77138=48001=P
85 77138=48035=P
86 77138=78118=P
87 77138=79248=P
BSD sed ::
84 77138=48001=P
For unknown reasons, when the input lacks sufficient rows past the pattern,
this solution works on BSD sed,
but totally fails on GNU sed.
============================
Is sed a must have requirement ? You can also do one-liners with awk :
(it's intentionally verbose to showcase exactly what the lines matched and skipped look like) :
# gawk profile, created Thu Apr 28 18:36:55 2022
# BEGIN rule(s)
BEGIN {
1 printf "\n\t N :: %.f :: FS i.e. "\
"pattern :: %*s\n\n", N = +N, ++__, FS = pattern
}
# Rule(s)
87 NF *= -(_+=(_= __<NF ? -__-N :_)^!__)<+_ { # 45
45 print
}
1 77138=501=A
2 77138=3413=A
3 77138=3414=A
4 77138=8624=A
5 77138=19572=A
6 77138=22220=A
7 77138=23670=A
8 77138=25413=A
9 77138=26351=A
10 77138=27340=A
11 77138=29288=A
12 77138=121060=A
13 77138=123028=A
14 77138=132081=A
15 77138=135789=A
16 77138=154341=A
17 77138=155876=A
18 77138=170871=A
19 77138=178562=A
skipped :: 20 77138=185367=A
skipped :: 21 77138=196718=A
skipped :: 22 77138=196985=A
skipped :: 23 77138=200012=A
skipped :: 24 77138=207162=A
skipped :: 25 77138=228289=A
skipped :: 26 77138=244747=A
skipped :: 27 77138=284795=A
skipped :: 28 77138=294579=A
skipped :: 29 77138=299765=A
skipped :: 30 77138=317856=A
skipped :: 31 77138=318815=A
32 77138=324570=A
33 77138=408049=A
34 77138=514403=A
35 77138=1647865=A
36 77138=1738771=A
37 77138=3217183=A
skipped :: 38 77138=3222837=A
skipped :: 39 77138=3235292=A
skipped :: 40 77138=14957980=I
skipped :: 41 77138=1159=M
skipped :: 42 77138=1196=M
skipped :: 43 77138=1251=M
44 77138=1252=M
45 77138=4951=M
46 77138=16740=M
47 77138=71501=M
skipped :: 48 77138=137=P
skipped :: 49 77138=348=P
skipped :: 50 77138=518=P
skipped :: 51 77138=519=P
skipped :: 52 77138=520=P
skipped :: 53 77138=925=P
54 77138=1363=P
55 77138=1483=P
56 77138=1814=P
57 77138=2692=P
58 77138=3540=P
59 77138=3594=P
60 77138=3682=P
61 77138=3869=P
62 77138=3940=P
skipped :: 63 77138=3977=P
skipped :: 64 77138=4025=P
skipped :: 65 77138=4252=P
skipped :: 66 77138=4396=P
skipped :: 67 77138=9501=P
skipped :: 68 77138=13006=P
69 77138=18113=P
skipped :: 70 77138=20907=P
skipped :: 71 77138=31936=P
skipped :: 72 77138=34954=P
skipped :: 73 77138=37126=P
skipped :: 74 77138=37482=P
skipped :: 75 77138=40135=P
76 77138=40206=P
77 77138=41279=P
78 77138=41280=P
79 77138=46140=P
skipped :: 80 77138=46157=P
skipped :: 81 77138=46173=P
skipped :: 82 77138=46218=P
skipped :: 83 77138=47592=P
skipped :: 84 77138=48001=P
skipped :: 85 77138=48035=P
86 77138=78118=P
87 77138=79248=P
N :: 5 :: FS i.e. pattern :: [7]=[AP]$
1 77138=501=A
2 77138=3413=A
3 77138=3414=A
4 77138=8624=A
5 77138=19572=A
6 77138=22220=A
7 77138=23670=A
8 77138=25413=A
9 77138=26351=A
10 77138=27340=A
11 77138=29288=A
12 77138=121060=A
13 77138=123028=A
14 77138=132081=A
15 77138=135789=A
16 77138=154341=A
17 77138=155876=A
18 77138=170871=A
19 77138=178562=A
32 77138=324570=A
33 77138=408049=A
34 77138=514403=A
35 77138=1647865=A
36 77138=1738771=A
37 77138=3217183=A
44 77138=1252=M
45 77138=4951=M
46 77138=16740=M
47 77138=71501=M
54 77138=1363=P
55 77138=1483=P
56 77138=1814=P
57 77138=2692=P
58 77138=3540=P
59 77138=3594=P
60 77138=3682=P
61 77138=3869=P
62 77138=3940=P
69 77138=18113=P
76 77138=40206=P
77 77138=41279=P
78 77138=41280=P
79 77138=46140=P
86 77138=78118=P
87 77138=79248=P
more concisely, it would be
mawk -v pattern='[7]=[AP]$' -v N='5' -- '
BEGIN {
++__
FS = pattern
} NF *= -(_+=(_=__<NF?-__-N:_)^!__) < +_'
or in awk one-liner style
mawk 'NF*=-(_+=(_=1<NF?-1-N:_)^0)<+_' FS='[7]=[AP]$' N=5

Logical issue in Perl numerically iterating over 10 directories and populating them with numbers?

My basic goal is to take the integers 1 to 100,
and split them into 10 parts, where the initial part is integers 1 to 10,
and the next part is 11 to 20, each part having 10 numbers each.
I want the first part to go into the directory NUMBERS_1
containing the numbers 1 to 10 in a file called FILE_1.txt
and the next part to go into directory NUMBERS_2
containing the numbers 11 to 20 in a text file: FILE_2.txt
and so forth.
How I approached this problem was that I initialized an array with 100 numbers,
and then created an array reference by destructively splicing the array into 10 parts composed of no more than 10 integers each.
Then I created 10 folders NUMBERS_1 to NUMBERS_10 on a for-loop.
As I was doing this for-loop, I created a directory list of all the directories I created.
Then some problem is occuring as I iterate over the directories.
So I attempted to iterate over the directories in a foreach loop,
and then I try to open each of the directories in this directory list one at a time,
create a text file, fill it with a quantity of ten integers, close the file, and then close the directory. It doesn't seem like I'm opening the directories, but I'm not getting any error messages, and I have multiple open or die statements so shouldn't I be getting some errors?
ISSUE:
My problem is that my 10 text files are being created in my current working directory, not on each in the 10 directories that I created, but I just can't see the error in my logic.
#!/usr/bin/env perl
# The objective of this program is to an array of 100 numbers in 10 parts
# And write each 10 parts into 10 files with ten numbers each.
use strict;
use warnings;
use feature 'say';
my #numbers = (1 .. 100);
my $partition_size = 10;
my #number_groups = ();
my $num_elements;
my #directories;
my $directory_handle;
my $incrementer; # Incrementing over the directories ... not the same as my use of $i
my $i;
# This right here is very powerful
# I must confess that I received some help from zdim at stackoverflow:
# https://stackoverflow.com/questions/45158306/splitting-an-array-into-n-accessible-parts-within-perl
# splice is destructive so numbers will be empty,
# but at that cost the array reference #number_groups will have 10 sections filled with 10 numbers each from 1 to 100
push #number_groups, [splice(#numbers, 0, $partition_size)] while #numbers;
$num_elements = scalar(#number_groups); # Retrieiving size of array reference.
# Why is it not being treated like an array reference, but like an array?;
# Here I'm saying every item of the array reference #number_groups.
say "Let's take a look at this array reference containing each of the pieces of numbers 1 to 100";
say "#$_\n" for #number_groups;
# Now let's make folders containing the numbers 1 to 100, with 10 numbers in each folder.
# And the folders properly labeled.
say "I will now create $num_elements folders";
for(my $i = 1; $i <= $num_elements; $i++)
{
mkdir "NUMBERS_$i" or warn "Could not create folder $_, probably because it already exists";
push #directories, "NUMBERS_$i";
}
# I know that the script is misbehaving somewhere below this line.
$incrementer = 0; # The incrementer is at zero because the first item in the list of directories is the zero-ith item.
$i = 1; # This incrementer "$i" is for the 10 logical slices of the numbers from 1 to 100
foreach(#directories)
{
opendir($directory_handle, $_) or die "Could not open directory $_";
my $file = "FILE_$i.txt"; my $filehandle;
say "\nThe incrementer is at $i in directory $_";
open($filehandle, '>', $file) or die "Could not open file $_";
while(my $line = <$filehandle>)
{
chomp $line;
foreach(#{$number_groups[$incrementer]}){print "$_\t";}
}
close $filehandle or die "Could not close file $_";
closedir($directory_handle) or die "Could not close directory $_";
$incrementer++;
$i++;
}
Running this script produces the following output:
Let's take a look at this array reference containing each of the pieces of numbers 1 to 100
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100
I will now create 10 folders
The incrementer is at 1 in directory NUMBERS_1
1 2 3 4 5 6 7 8 9 10
The incrementer is at 2 in directory NUMBERS_2
11 12 13 14 15 16 17 18 19 20
The incrementer is at 3 in directory NUMBERS_3
21 22 23 24 25 26 27 28 29 30
The incrementer is at 4 in directory NUMBERS_4
31 32 33 34 35 36 37 38 39 40
The incrementer is at 5 in directory NUMBERS_5
41 42 43 44 45 46 47 48 49 50
The incrementer is at 6 in directory NUMBERS_6
51 52 53 54 55 56 57 58 59 60
The incrementer is at 7 in directory NUMBERS_7
61 62 63 64 65 66 67 68 69 70
The incrementer is at 8 in directory NUMBERS_8
71 72 73 74 75 76 77 78 79 80
The incrementer is at 9 in directory NUMBERS_9
81 82 83 84 85 86 87 88 89 90
The incrementer is at 10 in directory NUMBERS_10
91 92 93 94 95 96 97 98 99 100
The standard output being sent to the screen is totally logical, but the text files are not in their appropriate directories, and somehow the files are empty.
What's going on?
Thanks!
my $file = "FILE_$i.txt";
should be
my $file = "$_/FILE_$i.txt";
And get rid of that opendir; you never use the handle.

Hex dump parsing in perl

I have a hex dump of a message in a file which i want to get it in an array
so i can perform the decoding logic on it.
I was wondering if that was a easier way to parse a message which looks like this.
37 39 30 35 32 34 35 34 3B 32 31 36 39 33 34 35
3B 32 31 36 39 33 34 36 00 00 01 08 40 00 00 15
6C 71 34 34 73 69 6D 31 5F 33 30 33 31 00 00 00
00 00 01 28 40 00 00 15 74 65 6C 63 6F 72 64 69
74 65 6C 63 6F 72 64 69
Note that the data can be max 16 bytes on any row. But any row can contain fewer bytes too (minimum :1 )
Is there a nice and elegant way rather than to read 2 chars at a time in perl ?
Perl has a hex operator that performs the decoding logic for you.
hex EXPR
hex
Interprets EXPR as a hex string and returns the corresponding value. (To convert strings that might start with either 0, 0x, or 0b, see oct.) If EXPR is omitted, uses $_.
print hex '0xAf'; # prints '175'
print hex 'aF'; # same
Remember that the default behavior of split chops up a string at whitespace separators, so for example
$ perl -le '$_ = "a b c"; print for split'
a
b
c
For every line of the input, separate it into hex values, convert the values to numbers, and push them onto an array for later processing.
#! /usr/bin/perl
use warnings;
use strict;
my #values;
while (<>) {
push #values => map hex($_), split;
}
# for example
my $sum = 0;
$sum += $_ for #values;
print $sum, "\n";
Sample run:
$ ./sumhex mtanish-input
4196
I would read a line at a time, strip the whitespace, and use pack 'H*' to convert it. It's hard to be more specific without knowing what kind of "decoding logic" you're trying to apply. For example, here's a version that converts each byte to decimal:
while (<>) {
s/\s+//g;
my #bytes = unpack('C*', pack('H*', $_));
print "#bytes\n";
}
Output from your sample file:
55 57 48 53 50 52 53 52 59 50 49 54 57 51 52 53
59 50 49 54 57 51 52 54 0 0 1 8 64 0 0 21
108 113 52 52 115 105 109 49 95 51 48 51 49 0 0 0
0 0 1 40 64 0 0 21 116 101 108 99 111 114 100 105
116 101 108 99 111 114 100 105
I think reading in two characters at a time is the appropriate way to parse a stream whose logical tokens are two-character units.
Is there some reason you think that's ugly?
If you're trying to extract a particular sequence, you could do that with whitespace-insensitive regular expressions.

How can I convert a 48 hex string to bytes using Perl?

I have a hex string (length 48 chars) that I want to convert to raw bytes with the pack function in order to put it in a Win32 vector of bytes.
How I can do this with Perl?
my $bytes = pack "H*", $hex;
See perlpacktut for more information.
The steps are:
Extract pairs of hexadecimal characters from the string.
Convert each pair to a decimal number.
Pack the number as a byte.
For example:
use strict;
use warnings;
my $string = 'AA55FF0102040810204080';
my #hex = ($string =~ /(..)/g);
my #dec = map { hex($_) } #hex;
my #bytes = map { pack('C', $_) } #dec;
Or, expressed more compactly:
use strict;
use warnings;
my $string = 'AA55FF0102040810204080';
my #bytes = map { pack('C', hex($_)) } ($string =~ /(..)/g);
I have the string:
"61 62 63 64 65 67 69 69 6a"
which I want to interpret as hex values, and display those as ASCII chars (those values should reproduce the character string "abcdefghij").
Typically, I try to write something quick like this:
$ echo "61 62 63 64 65 67 69 69 6a" | perl -ne 'print "$_"; print pack("H2 "x10, $_)."\n";'
61 62 63 64 65 67 69 69 6a
a
... and then I wonder, why do I get only one character back :)
First, let me note down that the string I have, can also be represented as the hex values of bytes that it takes up in memory:
$ echo -n "61 62 63 64 65 67 68 69 6a" | hexdump -C
00000000 36 31 20 36 32 20 36 33 20 36 34 20 36 35 20 36 |61 62 63 64 65 6|
00000010 37 20 36 38 20 36 39 20 36 61 |7 68 69 6a|
0000001a
_(NB: Essentially, I want to "convert" the above byte values in memory as input, to these below ones, if viewed by hexdump:
$ echo -n "abcdefghij" | hexdump -C
00000000 61 62 63 64 65 66 67 68 69 6a |abcdefghij|
0000000a
... which is how the original values for the input hex string were obtained.
)_
Well, this Pack/Unpack Tutorial (AKA How the System Stores Data) turns out is the most helpful for me, as it mentions:
The pack function accepts a template string and a list of values [...]
$rec = pack( "l i Z32 s2", time, $emp_id, $item, $quan, $urgent);
It returns a scalar containing the list of values stored according to the formats specified in the template [...]
$rec would contain the following (first line in decimal, second in hex, third as characters where applicable). Pipe characters indicate field boundaries.
Offset Contents (increasing addresses left to right)
0 160 44 19 62| 41 82 3 0| 98 111 120 101 115 32 111 102
A0 2C 13 3E| 29 52 03 00| 62 6f 78 65 73 20 6f 66
| b o x e s o f
That is, in my case, $_ is a single string variable -- whereas pack expects as input a list of several such 'single' variables (in addition to a formatting template string); and outputs again a 'single' variable (which could, however, be a sizeable chunk of memory!). In my case, if the output 'single' variable contains the ASCII code in each byte in memory, then I'm all set (I could simply print the output variable directly, then).
Thus, in order to get a list of variables from the $_ string, I can simply split it at the space sign - however, note:
$ echo "61 62 63 64 65 67 68 69 6a" | perl -ne 'print "$_"; print pack("H2", split(/ /, $_))."\n";'
61 62 63 64 65 67 68 69 6a
a
... that amount of elements to be packed must be specified (otherwise again we get only one character back); then, either of these alternatives work:
$ echo "61 62 63 64 65 67 68 69 6a" | perl -ne 'print "$_"; print pack("H2"x10, split(/ /, $_))."\n";'
61 62 63 64 65 67 68 69 6a
abcdeghij
$ echo "61 62 63 64 65 67 68 69 6a" | perl -ne 'print "$_"; print pack("(H2)*", split(/ /, $_))."\n";'
61 62 63 64 65 67 68 69 6a
abcdeghij