Taking only values which form continous range - perl

I have a file with 3 columns ->
A1 0 9
A1 4 14
A1 16 24
A1 25 54
A1 64 84
A1 74 84
A2 15 20
A2 19 50
I want to check if each line (value in col2 and 3) is present already or is in between the range of previous line, if col1 value is equal.
The desired output is ->
A1 0 14
A1 16 54
A1 64 84
A2 15 50
I have tried ->
#ARGV or die "No input file specified";
open $first, '<',$ARGV[0] or die "Unable to open input file: $!";
#open $second,'<', $ARGV[1] or die "Unable to open input file: $!";
$k=0;
while (<$first>)
{
if($k==0)
{
#cols = split /\s+/;
$p0=$cols[0];
$p1=$cols[1];
$p2=$cols[2];
$p3=$cols[2]+1;
}
else{
#new = split /\s+/;
if ($new[0] eq $p0){
if ($new[1]>$p3)
{
print join("\t", #new),"\n";
$p0=$new[0];
$p1=$new[1];
$p2=$new[2];
$p3=$new[2]+1;
}
elsif ($new[2]>=$p2)
{
print $p0,"\t",$p1,"\t",$new[2],"\n";
$p2=$new[2];
$p3=$new[2]+1;
}
else
{
$p5=1;
}
}
else
{
print join("\t", #new),"\n";
$p0=$new[0];
$p1=$new[1];
$p2=$new[2];
$p3=$new[2]+1;
}}
$k=1;
}
and output I am getting is ->
A1 0 14
A1 16 24
A1 16 54
A1 64 84
A1 64 84
A2 15 20
A2 22 50
I am not able to understand why I am getting this wrong output. Also if there is any way that I can erase(or overwrite) the last printed line, then it will be very easy.

First of all, it would be much more simple to help you if you
used strict and warnings, and declared all your variabled close to first use with my
indented your code properly to show the structure
The reason your code fails is that you are printing data under too many conditions. For example you output A1 16 24 when you find it cannot be joined with the previous range A1 4 14 without waiting for it to be extended by the subsequent A1 25 54 (when you correctly extend the range and print it again). A1 64 84 is output twice for the same reason: first because it cannot be merged with A1 25 54, and again because it has been "extended" with A1 74 84. Finally A2 15 20 is output straight away because it has a new first column, even though it is merged with the next line and output again.
You need to output a range only when you have found that it cannot be extended again. That happens when
a new record is found that doesn't overlap the existing data
the end of the file is reached
This code prints output only in those cases an appears to do what you need.
use strict;
use warnings;
my #data;
while (<DATA>) {
if (not #data) {
#data = split;
next;
}
my #new = split;
if ($new[0] eq $data[0] and $new[1] <= $data[2] + 1) {
$data[2] = $new[2];
}
else {
print join("\t", #data), "\n";
#data = #new;
}
print join("\t", #data), "\n" if eof DATA;
}
__DATA__
A1 0 9
A1 4 14
A1 16 24
A1 25 54
A1 52 57
A1 59 62
A1 64 84
A1 74 84
A2 15 20
A2 19 50
OUTPUT
A1 0 14
A1 16 57
A1 59 62
A1 64 84
A2 15 50

You need to have some variables describing currently-accumulated contiguous region. For each line of input, flush the previously-accumulated region if the new input is a new column1 label, or is same label but non-contiguous, or is end-of-file. If it's same label and contiguous yo update the min and max values.
This assumes that columns 1 and 2 are sorted.
The rest is left as an exercise for the reader.

Related

How to resolve this warning in Perl

I asked this type of ques previously but didn't provide the full code.
I am reading below file and checking the max word width present in each column and then write it to another file with proper alignment.
id0 id1 id2 batch
0 34 56 70
2 3647 58 72 566
4 39 616 75 98 78 78987 9876 7899 776
89 40 62 76
8 42 64 78
34 455 544 565
My code:
unlink "temp1.log";
use warnings;
use strict;
use feature 'say';
my $log1_file = "log1.log";
my $temp1 = "temp1.log";
open(IN1, "<$log1_file" ) or die "Could not open file $log1_file: $!";
my #col_lens;
while (my $line = <IN1>) {
my #fs = split " ", $line;
my #rows = #fs ;
#col_lens = map (length, #rows) if $.==1;
for my $col_idx (0..$#rows) {
my $col_len = length $rows[$col_idx];
if ($col_lens[$col_idx] < $col_len) {
$col_lens[$col_idx] = $col_len;
}
};
};
close IN1;
open(IN1, "<$log1_file" ) or die "Could not open file $log1_file: $!";
open(tempp1,"+>>$temp1") or die "Could not open file $temp1: $!";
while (my $line = <IN1>) {
my #fs = split " ", $line;
my #az;
for my $h (0..$#fs) {
my $len = length $fs[$h];
my $blk_len = $col_lens[$h]+1;
my $right = $blk_len - $len;
$az[$h] = (" ") . $fs[$h] . ( " " x $right );
}
say tempp1 (join "|",#az);
};
My warning
Use of uninitialized value in numeric lt (<) at new.pl line 25, <IN1> line 3.
Use of uninitialized value in numeric lt (<) at new.pl line 25, <IN1> line 4.
Use of uninitialized value in numeric lt (<) at new.pl line 25, <IN1> line 4.
Use of uninitialized value in numeric lt (<) at new.pl line 25, <IN1> line 4.
Use of uninitialized value in numeric lt (<) at new.pl line 25, <IN1> line 4.
Use of uninitialized value in numeric lt (<) at new.pl line 25, <IN1> line 4.
I am getting the output correctly but don't know how to remove this warnings.
$col_idx can be up to the number of fields on a line, minus one. For the third line, this is more than the highest index of #col_lens, which contains at most 3 elements. So doing the following makes no sense:
if ($col_lens[$col_idx] < $col_len) {
$col_lens[$col_idx] = $col_len;
}
Replace it with
if (!defined($col_lens[$col_idx]) || $col_lens[$col_idx] < $col_len) {
$col_lens[$col_idx] = $col_len;
}
With this, there's really no point checking for $. == 1 anymore.
You're getting uninitialized warning because, while checking the $col_lens[$col_idx] < $col_len condition, one or both of them are undef.
Solution 1:
You can skip checking this condition by the use of next statement.
for my $col_idx (0..$#rows) {
my $col_len = length $rows[$col_idx];
next unless $col_lens[$col_idx];
if ($col_lens[$col_idx] < $col_len) {
$col_lens[$col_idx] = $col_len;
}
}
Solution 2: (Not advised):
You can simply ignore Use of uninitialized value.. warnings by putting this line at top of your script. This will disable uninitialized warnings in a block.
no warnings 'uninitialized';
For more info, please refer this link
Following code demonstrates one of many possible ways for solution to this task
read line by line
get length of each field
compare with stored earlier
adjust to max length
form $format string for print
print formatted data
use strict;
use warnings;
use feature 'say';
my(#data,#length,$format);
while ( <DATA> ) {
my #e = split ' ';
my #l = map{ length } #e;
$length[$_] = ($length[$_] // 0) < $l[$_] ? $l[$_] : $length[$_] for 0..$#e;
push #data,\#e;
}
$format = join ' ', map{ '%'.$_.'s' } #length;
$format .= "\n";
for my $row ( #data ) {
printf $format, map { $row->[$_] // '' } 0..$#length;;
}
__DATA__
id0 id1 id2 batch
0 34 56 70
2 3647 58 72 566
4 39 616 75 98 78 78987 9876 7899 776
89 40 62 76
8 42 64 78
34 455 544 565
Output
id0 id1 id2 batch
0 34 56 70
2 3647 58 72 566
4 39 616 75 98 78 78987 9876 7899 776
89 40 62 76
8 42 64 78
34 455 544 565

I have data in hex dump but don't know the encoding. Eg. 0x91 0x05 = 657

I have some data in hexdump code.
left hand are DEC and right hand are hexdump code.
16 = 10
51 = 33
164 = A4 01
388 = 84 03
570 = BA 04
657 = 91 05
1025 = 81 08
246172 = 9C 83 0F
How to calculate any hexdump to DEC ?
In perl, I tried to use ord() command but don't work.
Update
I don't known what it call. It look like 7bits data. I try to build formula in excel look like these:
DEC = hex2dec(X) + (128^1 * hex2dec(Y-1)) + (128^2 * hex2dec(Z-1)) + ...
What you have is a variable-length encoding. The length is encoded using a form of sentinel value: Each byte of the encoded number except the last has its high bit set. The remaining bits form the two's-complement encoding of the number in little-ending byte order.
0xxxxxxx ⇒ 0xxxxxxx
1xxxxxxx 0yyyyyyy ⇒ 00yyyyyy yxxxxxxx
1xxxxxxx 1yyyyyyy 0zzzzzzz ⇒ 000zzzzz zzyyyyyy yxxxxxxx
etc
The following can be used to decode a stream:
use strict;
use warnings;
use feature qw( say );
sub extract_first_num {
$_[0] =~ s/^([\x80-\xFF]*[\x00-\x7F])//
or return;
my $encoded_num = $1;
my $num = 0;
for (reverse unpack 'C*', $encoded_num) {
$num = ( $num << 7 ) | ( $_ & 0x7F );
}
return $num;
}
my $stream_buf = "\x10\x33\xA4\x01\x84\x03\xBA\x04\x91\x05\x81\x08\x9C\x83\x0F";
while ( my ($num) = extract_first_num($stream_buf) ) {
say $num;
}
die("Bad data") if length($stream_buf);
Output:
16
51
164
388
570
657
1025
246172

Why is the output the way it is? -Splitting and chop

I'm trouble understanding the output of the below code.
1. Why is the output Jo Al Ch and Sa? Doesn't chop remove the last character of string and return that character, so shouldn't the output be n i n and y? 2. What is the purpose of the $firstline=0; line in the code?
3. What exactly is happening at the lines
foreach(#data)
{$name,$age)=split(//,$_);
print "$name $age \n";
The output of the following code is
Data in file is:
J o
A l
C h
S a
The file contents are:
NAME AGE
John 26
Ali 21
Chen 22
Sally 25
The code:
#!/usr/bin/perl
my ($firstline,
#data,
$data);
open (INFILE,"heading.txt") or die $.;
while (<INFILE>)
{
if ($firstline)
{
$firstline=0;
}
else
{
chop(#data=<INFILE>);
}
print "Data in file is: \n";
foreach (#data)
{
($name,$age)=split(//,$_);
print "$name $age\n";
}
}
There are few issues with this script but first I will answer your points
chop will remove the last character of a string and returns the character chopped. In your data file "heading.txt" every line might be ending with \n and hence chop will be removing \n. It is always recommended to use chomp instead.
You can verify what is the last character of the line by running the command below:
od -bc heading.txt
0000000 116 101 115 105 040 101 107 105 012 112 157 150 156 040 062 066
N A M E A G E \n J o h n 2 6
0000020 012 101 154 151 040 062 061 012 103 150 145 156 040 062 062 012
\n A l i 2 1 \n C h e n 2 2 \n
0000040 123 141 154 154 171 040 062 065 012
S a l l y 2 5 \n
0000051
You can see \n
There is no use of $firstline because it is never been set to 1. So you can remove the if/else block.
In the first line it is reading all the elements of array #data one by one. In 2nd line it is splitting the contents of the element in characters and capturing first 2 characters and assigning them to $name and $age variables and discarding the rest. In the last line we are printing those captured characters.
IMO, in line 2 we should do split based on space to actual capture the name and age.
So the final script should looks like:
#!/usr/bin/perl
use strict;
use warnings;
my #data;
open (INFILE,"heading.txt") or die "Can't open heading.txt: $!";
while (<INFILE>) {
chomp(#data= <INFILE>);
}
close(INFILE);
print "Data in file is: \n";
foreach (#data) {
my ($name,$age)=split(/ /,$_);
print "$name $age\n";
}
Output:
Data in file is:
John 26
Ali 21
Chen 22
Sally 25

Generate all combinations of up to N digits, including repeating digits, in Perl

What's the best way to generate all combinations of 1 to N digits, where digits could be repeated in the combination? E.g, given array 0..2, the result should be:
0
1
2
00
01
02
10
11
12
20
21
22
000
001
002
010
011
etc.
I've played with Algorithm::Permute, but it looks likt it could generate just unique combinations of N numbers:
for( my $a = 0; $a < 3; $a++ ) {
for( my $b = 0; $b < 3; $b++ ) {
my #array = $a..$b;
Algorithm::Permute::permute {
my $Num = join("", #array);
print $Num;
sleep 1;
} #array;
}
}
Thank you.
As its name suggests,
Algorithm::Permute
offers permutations. There are many mathematical variations on selecting k items from a population of N: with and without replacement, with and without repetition, ignoring order or not
It's hard to be certain, but you probably want
Algorithm::Combinatorics
Here's some example code that reproduces at least the part of your expected data that you have shown. It's pretty much the same as zdim's solution but there may be something extra useful to you here
use strict;
use warnings 'all';
use feature 'say';
use Algorithm::Combinatorics 'variations_with_repetition';
my #data = 0 .. 2;
for my $k ( 1 .. #data ) {
say #$_ for variations_with_repetition(\#data, $k);
}
output
0
1
2
00
01
02
10
11
12
20
21
22
000
001
002
010
011
012
020
021
022
100
101
102
110
111
112
120
121
122
200
201
202
210
211
212
220
221
222
my #digits = 0..2;
my $len = 3;
my #combinations = map glob("{#{[join ',', #digits]}}" x $_), 1..$len;

how to parse a table using perl

Name Mark1 Mark2 Mark3
Student 1 41 51 61
Student 2 42 52 62
Student 3 43 53 63
Student 4 44 54 64
Student 5 45 55 65
I when I give Name as input, I need to output the three Mark columns. How can I do this?
Assuming this is an array called #arr, where each entry is a line, and assuming the number of the student you're looking for is in $num, you can use:
foreach (#arr) {
if (/^Student \b$num\b\s+(\d.*\d)/) {
print "$2\n";
}
}
This iterates over all the entries in the array. It looks for lines that:
begin with "Student"
are followed by the exact number $num (the \bs around it specify word boundaries, so this can't be part of another number)
are followed by some whitespace
have a pattern beginning and ending with a number that is as long as possible.
If so, the pattern beginning and ending with a number is captured and printed. In this case, it corresponds exactly to Mark1, Mark2 & Mark3.
#!/usr/bin/perl
use warnings;
use strict;
my %hash = ();
print "Student No:"; #Eg:Student 1
chomp ( my $input = <>);
while (<DATA>) {
next if /^Name/;
chomp;
my ($student, $no, #marks) = split;
$hash{ "$student " . "$no" } = \#marks;
}
print join " ", #{$hash{ "$input" }};
__DATA__
Name Mark1 Mark2 Mark3
Student 1 41 51 61
Student 2 42 52 62
Student 3 43 53 63
Student 4 44 54 64
Student 5 45 55 65