Extract data from a binary file - perl

I have a binary file. I want extract all data from $marker + $step to $marker (or end of file).
Data example:
23 40 92 34 32 09 84 39 02 89 30 fe 90 38 01 02 03 f1 f2 00 00 00 22 33 44 56 77 22 aa bb cc dd ee ff 00 11 ff dd cc cc cc 22 80 ee 01 02 03 f1 f2 00 00 00 22 33 44 56 23 40 92 34 32 dd cc cc 22 33 44 22 33 44 01 02 03 f1 f2 00 00 00 22 33 44 56 77 22 FF FF FF 52 FF FF 52 00 00 00 00 00 00 00
It contains three blocks. I need:
1
00 00 00 22 33 44 56 77 22 aa bb cc dd ee ff 00 11 ff dd cc cc cc 22 80 ee
2
00 00 00 22 33 44 56 23 40 92 34 32 dd cc cc 22 33 44 22 33 44
3
00 00 00 22 33 44 56 77 22 FF FF FF 52 FF FF 52 00 00 00 00 00 00 00
I never worked with binary files with Perl.
$filename = $ARGV[0];
$marker = \x01\x02\x03\xf1\xf2;
$step = 3;
$count = 0;
open $file
while <$file> {
seek $marker;
Go to forward +$step bytes;
$count++
print EXTFILE_.$count.'.dat' $_
# Until do not seek new $marker or EOF
}
close file
As a result, I have to get three .dat files.
How can I realize this pseudocode? What would be some simple example?

Perl regular expressions are just as happy with binary data as with readable text, and binary files can be opened with a mode of raw to avoid translating line endings.
Here's a solution that reads the whole file into memory and scans it for the marker string.
use strict;
use warnings;
my $filename = shift;
my $binary = do {
open my $fh, '<:raw', $filename or die $!;
local $/;
<$fh>;
};
my $marker = "\x01\x02\x03\xf1\xf2";
while ( $binary =~ /$marker(.*?)(?=$marker|\z)/sg ) {
my #hex = map { sprintf '%02X', $_ } unpack 'C*', $1;
print "#hex\n";
}
Output
00 00 00 22 33 44 56 77 22 AA BB CC DD EE FF 00 11 FF DD CC CC CC 22 80 EE
00 00 00 22 33 44 56 23 40 92 34 32 DD CC CC 22 33 44 22 33 44
00 00 00 22 33 44 56 77 22 FF FF FF 52 FF FF 52 00 00 00 00 00 00 00
If the file is huge, or if you simply prefer the idea, you could set the input record separator to the marker string. Then a readline operation on the file would fetch up to and including the next occurrence of the marker pattern in the file. It means that each record is being read along with the marker from the beginning of the next record, but as it's going to be removed anyway it doesn't matter.
use strict;
use warnings;
my $filename = shift;
my $marker = "\x01\x02\x03\xf1\xf2";
open my $fh, '<:raw', $filename or die $!;
local $/ = $marker;
<$fh>; # Drop the data up to and including the first marker
while (<$fh>) {
chomp; # Remove the marker string from the end, if any
my #hex = map { sprintf '%02X', $_ } unpack 'C*';
print "#hex\n";
}
The output is identical to that of the program above.
Though that doesn't work for the required output of the program. This program uses the second technique above but writes to a series of EXTFILE.dat files instead of dumping the hex data. Note that an open mode of raw is necessary again.
use strict;
use warnings;
my $filename = shift // 'file.bin';
my $marker = "\x01\x02\x03\xf1\xf2";
open my $fh, '<:raw', $filename or die $!;
local $/ = $marker;
<$fh>; # Drop the data up to and including the first marker
my $count;
while (my $record = <$fh>) {
chomp $record;
my $outfile = sprintf 'EXTFILE_%d.dat', ++$count;
open my $out_fh, '>:raw', $outfile or die $!;
print $out_fh $record;
close $out_fh or die $!;
}

Related

use of join command to edit file contents in perl

I'am doing the following operation to edit a particular line in a file and send the whole contents to another file after editing.
the contents of my input file is:-
;first set
00 01 05 10 10 11 22 55 66
;second set
00 00 00 01 10 11 11 11 11
;third set
00 01 05 10 ff 11 22 55 66
;fourth set
00 00 00 01 10 11 11 11 11
In the row after the third set, fifth element ff I want to replace with 5f and the pass the whole contents of this file to another file.
I have written the code which replaces the fifth element with 5f but the next row as well gets concatenated with the edited row in the output file.
the output file is as follows
;first set
00 01 05 10 10 11 22 55 66
;second set
00 00 00 01 10 11 11 11 11
;third set
00 01 05 10 5f 11 22 55 66;fourth set
00 00 00 01 10 11 11 11 11
my $parameter = "third";
my $inputfile = $ARGV[0];
my $outputfile = "Extract"."_".$inputfile;
my $check = 0;
open(INPUT, "<$inputfile") or die $!;
open(OUT, ">$outputfile") or die $!;
while (<INPUT>)
{
if($check == 1)
{
my $line = $_;
my #chunks = split ' ', $line;
$chunks[4] = "5f";
$check = 0;
print OUT join (" ", #chunks);
}
else
{
print OUT $_;
}
if($_ =~ m/$parameter/gi)
{
$check = 1;
}
}
close(OUT);
close(INPUT);
Your split ' ', $line command removes all whitespace — including newlines — from the string, leaving only the data. It is the same as my #chunks = $line =~ /\S+/g. So you have to add a newline back after you print it.
This is how I would code a solution
use strict;
use warnings;
my $parameter = 'third';
my ($inputfile) = #ARGV;
my $outputfile = "Extract_$inputfile";
open my $in_fh, '<', $inputfile or die $!;
open my $out_fh, '>', $outputfile or die $!;
select $out_fh;
while ( <$in_fh> ) {
print;
if ( /$parameter/ ) {
my #chunks = split ' ', <$in_fh>;
$chunks[4] = '5f';
print "#chunks\n";
}
}
output
;first set
00 01 05 10 10 11 22 55 66
;second set
00 00 00 01 10 11 11 11 11
;third set
00 01 05 10 5f 11 22 55 66
;fourth set
00 00 00 01 10 11 11 11 11

How do print second column elements in row separated by comma(,) if the first element of column is same

The input what I am handling is as follows.
Q9NRG9 15
Q9NRG9 160
Q9NRG9 56
Q9NRG9 89
Q16613 26
Q16613 63
Q16613 102
O95477 19
O95477 91
O95477 78
O95477 86
O95477 16
O95477 203
O95477 66
P78363 18
P78363 159
P78363 88
I want output as
Q9NRG9 15,160,56,89
Q16613 26,63,102
O95477 78,86,16,203,66
I tried with perl program, but I couldn't get correct output what I want.
Using perl from the command line:
perl -lane '
push #{ $h{$F[0]} }, $F[1]
}{
$" = ",";
print "$_ #{ $h{$_} }" for keys %h
' file
O95477 19,91,78,86,16,203,66
Q9NRG9 15,160,56,89
P78363 18,159,88
Q16613 26,63,102
To maintain the order, you can do:
perl -lane '
$k{$F[0]}++ or push #r, $F[0];
push #{ $h{$F[0]} }, $F[1]
}{
$" = ",";
print "$_ #{ $h{$_} }" for #r
' file
Try this:
open (FILE, "text.txt") or die "cannot open file".$!;
my %data;
while(<FILE>){
chomp($_);
my ($key, $value) = split(/\s+/,$_);
push(#{$data{$key}}, $value);
}
foreach (keys %data){
print $_." ".join(",",#{$data{$_}})."\n";
}

why the following code not printing the common unique values from the two files [duplicate]

I have two files myresult and annotation. data in the two files appear to be as range but they are not, that's why i cant store it in an array and i need to use split operator so that i can use it in a for loop and compare. now i need to print all common values from $i(myresult) and $j(annotation) without repetition (unique). i am not getting which condition and how to implement it so as to get desired output. i tried using %hash but could not able to implement.
myresult
0..351
12..363
24..375
36..387
48..399
60..411
.
.
annotation
272..1042
1649..2629
3436..4752
4793..4975
5408..6022
6025..6252
.
.
CODE:
#!/usr/bin/perl
open( $inp0, "<myresult" ) or die "not found";
open( $inp2, "<annotation" ) or die "not found";
open( $out, ">output" );
my #arr2 = <$inp0>;
my #arr4 = <$inp2>;
my $sum1 = 0;
foreach my $line1 (#arr2) {
my ( $from1, $to1 ) = split( /\.\./, $line1 );
foreach my $line2 (#arr4) {
my ( $from2, $to2 ) = split( /\.\./, $line2 );
for ( my $i = $from1; $i <= $to1; $i++ ) {
for ( my $j = $from2; $j <= $to2; $j++ ) {
if ( $i == $j ) {
print $out "$i \n";
$sum1++;
}
}
}
}
}
print "Unique values = $sum1";
You don't need to iterate over both the arrays. If the starting points of the ranges are ascending, you can use the following code:
#!/usr/bin/perl
use warnings;
use strict;
my #result = qw( 4..20 8..12 14..22 22..29 27..29 28..35 40..50 );
my #annot = qw( 1..5 11..13 25..37 45..55 );
my $from = (split /\.\./, $result[0])[0];
my $to = (split /\.\./, $result[-1])[1];
for my $i ($from .. $to) {
print "$i\n" if grep inside($i, $_), #result
and grep inside($i, $_), #annot;
}
sub inside {
my ($i, $range) = #_;
my ($from, $to) = split /\.\./, $range;
return ($from <= $i and $i <= $to)
}
Translate each data set into an array of values.
Then use a hash to count the matched uniq values from both lists:
#!/usr/bin/perl -w
use strict;
use warnings;
use autodie;
use List::MoreUtils qw(uniq);
my #result = do {
#open my $fh, '<', "myresult";
open my $fh, '<', \ "0..351\n12..363\n24..375\n36..387\n48..399\n60..411\n";
map { my ( $min, $max ) = /\d+/g; ( $min .. $max ) } <$fh>;
};
my #annot = do {
#open my $fh, '<', "myresult";
open my $fh, '<', \ "272..1042\n1649..2629\n3436..4752\n4793..4975\n5408..6022\n6025..6252\n";
map { my ( $min, $max ) = /\d+/g; ( $min .. $max ) } <$fh>;
};
my %count;
$count{$_}++ for uniq(#result), uniq(#annot);
print join( ' ', sort { $a <=> $b } grep { $count{$_} == 2 } keys %count ), "\n";
Outputs:
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411

Print unique values in FOR loop

I have two files myresult and annotation. data in the two files appear to be as range but they are not, that's why i cant store it in an array and i need to use split operator so that i can use it in a for loop and compare. now i need to print all common values from $i(myresult) and $j(annotation) without repetition (unique). i am not getting which condition and how to implement it so as to get desired output. i tried using %hash but could not able to implement.
myresult
0..351
12..363
24..375
36..387
48..399
60..411
.
.
annotation
272..1042
1649..2629
3436..4752
4793..4975
5408..6022
6025..6252
.
.
CODE:
#!/usr/bin/perl
open( $inp0, "<myresult" ) or die "not found";
open( $inp2, "<annotation" ) or die "not found";
open( $out, ">output" );
my #arr2 = <$inp0>;
my #arr4 = <$inp2>;
my $sum1 = 0;
foreach my $line1 (#arr2) {
my ( $from1, $to1 ) = split( /\.\./, $line1 );
foreach my $line2 (#arr4) {
my ( $from2, $to2 ) = split( /\.\./, $line2 );
for ( my $i = $from1; $i <= $to1; $i++ ) {
for ( my $j = $from2; $j <= $to2; $j++ ) {
if ( $i == $j ) {
print $out "$i \n";
$sum1++;
}
}
}
}
}
print "Unique values = $sum1";
You don't need to iterate over both the arrays. If the starting points of the ranges are ascending, you can use the following code:
#!/usr/bin/perl
use warnings;
use strict;
my #result = qw( 4..20 8..12 14..22 22..29 27..29 28..35 40..50 );
my #annot = qw( 1..5 11..13 25..37 45..55 );
my $from = (split /\.\./, $result[0])[0];
my $to = (split /\.\./, $result[-1])[1];
for my $i ($from .. $to) {
print "$i\n" if grep inside($i, $_), #result
and grep inside($i, $_), #annot;
}
sub inside {
my ($i, $range) = #_;
my ($from, $to) = split /\.\./, $range;
return ($from <= $i and $i <= $to)
}
Translate each data set into an array of values.
Then use a hash to count the matched uniq values from both lists:
#!/usr/bin/perl -w
use strict;
use warnings;
use autodie;
use List::MoreUtils qw(uniq);
my #result = do {
#open my $fh, '<', "myresult";
open my $fh, '<', \ "0..351\n12..363\n24..375\n36..387\n48..399\n60..411\n";
map { my ( $min, $max ) = /\d+/g; ( $min .. $max ) } <$fh>;
};
my #annot = do {
#open my $fh, '<', "myresult";
open my $fh, '<', \ "272..1042\n1649..2629\n3436..4752\n4793..4975\n5408..6022\n6025..6252\n";
map { my ( $min, $max ) = /\d+/g; ( $min .. $max ) } <$fh>;
};
my %count;
$count{$_}++ for uniq(#result), uniq(#annot);
print join( ' ', sort { $a <=> $b } grep { $count{$_} == 2 } keys %count ), "\n";
Outputs:
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411

Merging multiple text files based on one column in perl

I am new to Perl and this is my first question in this blog hopefully to be solved.
I am having some text (10-18) files in a folder, I want to read all the files and merge all the files which are having common variables in the Names column along with their Area column for all the files.
For example :
file 1.txt
Name sim Area Cas
aa 12 54 222
ab 23 2 343
aaa 32 34 34
bba 54 76 65
file 2.txt
Name Sim Area Cas
ab 45 45 56
abc 76 87 98
bba 54 87 87
aaa 33 43 54
file 3.txt
Name Sim Area Cas
aaa 43 54 65
ab 544 76 87
ac 54 65 76
Output should be
Name Area1 Area2 area3
aaa 32 43 54
ab 23 45 76
Can anyone help regarding this. I am a very new to Perl and struggling to use Hashes.
I have tried this so far
use strict;
use warnings;
my $input_dir = 'C:/Users/Desktop/mr/';
my $output_dir = 'C:/Users/Desktop/test_output/';
opendir SD, $input_dir || die 'cannot open the input directory $!';
my #files_list = readdir(SD);
closedir(SD);
foreach my $each_file(#files_list)
{
if ($each_file!~/^\./)
{
#print "$each_file\n"; exit;
open (IN, $input_dir.$each_file) || die 'cannot open the inputfile $!';
open (OUT, ">$output_dir$each_file") || die 'cannot open the outputfile $!';
print OUT "Name\tArea\n";
my %hash; my %area; my %remaning_data;
while(my $line=<IN>){
chomp $line;
my #line_split=split(/\t/,$line);
# print $_,"\n" foreach(#line_split);
my $name=$line_split[0];
my $area=$line_split[1];
}
}
}
Can anyone provide guidance on how to complete this?
Thanks in advance.
perl -lane '$X{$F[0]}.=" $F[2]";END{foreach(keys %X){if(scalar(split / /,$X{$_})==4){print $_,$X{$_}}}}' file1 file2 file3
tested:
> perl -lane '$X{$F[0]}.=" $F[2]";END{foreach(keys %X){if(scalar(split / /,$X{$_})==4){print $_,$X{$_}}}}' file1 file2 file3
ab 2 45 76
aaa 34 43 54
#!/usr/bin/perl
use strict;
use warnings;
my $inputDir = '/tmp/input';
my $outputDir = '/tmp/out';
opendir my $readdir, $inputDir || die 'cannot open the input directory $!';
my #files_list = readdir($readdir);
closedir($readdir);
my %areas;
foreach my $file (#files_list) {
next if $file =~ /^\.+$/; # skip ..
open (my $fh, "<$inputDir/$file");
while (my $s = <$fh>) {
if ($s =~ /(\w+)\s+[\d\.]+\s+([\d\.]+)/) {
my ($name,$area) = ($1, $2); # parse name and area
push(#{$areas{$name}}, $area); # add area to the hash of arrays
}
}
close ($fh);
}
open (my $out, ">$outputDir/outfile");
foreach my $key (keys %areas) {
print $out "$key ";
print $out join " ", #{$areas{$key}};
print $out "\n";
}
close ($out);