Perl: read file and re-arrange into columns - perl

I have a file that i want to read in which has the following structure:
EDIT: i made the example a bit more specific to clarify what i need
HEADER
MORE HEADER
POINTS 2
x1 y1 z1
x2 y2 z2
VECTORS velocities
u1 v1 w1
u2 v2 w2
VECTORS displacements
a1 b1 c1
a2 b2 c2
The number of blocks containing some data is arbitrary, so is their order.
i want to read only data under "POINTS" and under "VECTORS displacements" and rearrange them in the following format:
x1 y1 z1 a1 b1 c1
x2 y2 z2 a2 b2 c2
I manage to read the xyz and abc blocks into separate arrays but my problem is to combine them into one.
I should mention that i am a perl newbie. Can somebody help me?

This is made very simple using the range operator. The expression
/DATA-TO-READ/ .. /DATA-NOT-TO-READ/
evaluates to 1 on the first line of the range (the DATA-TO-READ line), 2 on the second etc. On the last line (the DATA-NOT-TO-READ line) E0 is appended to the count so that it evaluates to the same numeric value but can also be tested for being the last line. On lines outside the range it evaluates to a false value.
This program accumulates the data in array #output and prints it when the end of the input is reached. It expects the path to the input file as a parameter on the command line.
use strict;
use warnings;
my (#output, $i);
while (<>) {
my $index = /DATA-TO-READ/ .. /DATA-NOT-TO-READ/;
if ($index and $index > 1 and $index !~ /E/) {
push #{ $output[$index-2] }, split;
}
}
print "#$_\n" for #output;
output
x1 y1 z1 a1 b1 c1
x2 y2 z2 a2 b2 c2

I only used 1 array to remember the first 3 columns. You can output directly when processing the second part of the data.
#!/usr/bin/perl
use strict;
use warnings;
my #first; # To store the first 3 columns.
my $reading; # Flag: are we reading the data?
while (<>) {
next unless $reading or /DATA-TO-READ/; # Skip the header.
$reading = 1, next unless $reading; # Skip the DATA-TO-READ line, enter the
# reading mode.
last if /DATA-NOT-TO-READ/; # End of the first part.
chomp; # Remove a newline.
push #first, $_; # Remember the line.
}
undef $reading; # Restore the flag.
while (<>) {
next unless $reading or /DATA-TO-READ/;
$reading = 1, next unless $reading;
last if /DATA-NOT-TO-READ/;
print shift #first, " $_"; # Print the remembered columns + current line.
}

Related

Splitting an Array into n accessible parts within perl?

My goal is to take an array of letters and cut it up into "n" parts. In this case no more than 10 letters each piece. But I want these arrays to be stored into an array reference which I can access on a counter.
For example, I have the following script to split an array of English alphabetical letters into 1 array of 10 letters. But since the English Alphabet has 26 letters, I need 2 more arrays to access in an array reference.
#!/usr/bin/env perl
#split an array into parts.
use strict;
use warnings;
use feature 'say';
my #letters = ('A' .. 'Z');
say "These are my letters:";
for(#letters){print "$_ ";}
my #letters_selected = splice(#letters, 0, 10);
say "\nThese are my selected letters:";
for(#letters_selected){print "$_ ";}
The output is this:
These are my letters:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
These are my selected letters:
A B C D E F G H I J
This little script only gives me one piece of 10 letters of the alphabet. But I want all three pieces of 10 letters of the alphabet, so I would like to know how I can achieve this:
Goal:
Have an array reference called letters_selected of letters which contains all letters A - Z. But ... I can access all three pieces of size less than or equal to 10 letters like this.
foreach(#{$letters_selected[0]}){say "$_ ";}
returns: A B C D E F G H I J # These are the initial 10 elements of the alphabet.
foreach(#{$letters_selected[1]}){say "$_ ";}
returns: K L M N O P Q R S T # The next 10 after that.
foreach(#{$letters_selected[2]}){say "$_ ";}
returns: U V W X Y Z # The next no more than 10 after that.
Since splice is destructive to its target you can keep applying it
use warnings;
use strict;
use feature 'say';
my #letters = 'A'..'Z';
my #letter_groups;
push #letter_groups, [ splice #letters, 0, 10 ] while #letters;
say "#$_" for #letter_groups;
After this #letters is empty. So make a copy of it and work with that if you will need it.
Every time through, splice removes and returns elements from #letters and [ ] makes an anonymous array of that list. This reference is pushed on #letter_groups.
Since splice takes as many elements as there are (if there aren't 10) once fewer than 10 remain splice removes and returns that, the #letters gets emptied, and while terminates.

Perl script to check another array values depending on current array index

I'm working on a perl assignment, that has three arrays - #array_A, #array_B and array_C with some values in it, I grep for a string "CAT" on array A and fetching its indices too
my #index = grep { $#array_A[$_] =~ 'CAT' } 0..$#array_A;
print "Index : #index\n";
Output: Index : 2 5
I have to take this as an input and check the value of other two arrays at indices 2 and 5 and print it to a file.
Trick is the position of the string - "CAT" varies. (Index might be 5 , 7 and 9)
I'm not quite getting the logic here , looking for some help with the logic.
Here's an overly verbose example of how to extract the values you want as to show what's happening, while hopefully leaving some room for you to have to further investigate. Note that it's idiomatic Perl to use regex delimiters when using =~. eg: $name =~ /steve/.
use warnings;
use strict;
my #a1 = qw(AT SAT CAT BAT MAT CAT SLAT);
my #a2 = qw(a b c d e f g);
my #a3 = qw(1 2 3 4 5 6 7);
# note the difference in the next line... no # symbol...
my #indexes = grep { $a1[$_] =~ /CAT/ } 0..$#a1;
for my $index (#indexes){
my $a2_value = $a2[$index];
my $a3_value = $a3[$index];
print "a1 index: $index\n" .
"a2 value: $a2_value\n" .
"a3 value: $a3_value\n" .
"\n";
}
Output:
a1 index: 2
a2 value: c
a3 value: 3
a1 index: 5
a2 value: f
a3 value: 6

Use keys and pairing elements Perl

My data looks like this:
G1 G2 G3 G4
Pf1 NO B1 NO D1
Pf2 NO NO C1 D1
Pf3 A1 B1 NO D1
Pf4 A1 NO C1 D2
Pf5 A3 B2 C2 D3
Pf6 NO B3 NO D3
My purpose is to check in each column if an element (different from the "NO" cases) is showed twice (like A1 in column 2, for example) and only twice (if it is showed three times or more I don't want it in the output) and, if so, write the correspondent elements of the first column. So, the desired output looks like this:
Pf3 Pf4 A1
Pf1 Pf3 B1
Pf2 Pf4 C1
Pf5 Pf6 D3
I'm trying to write a perl script, but I need some help to focus on the different steps. This is what I did so far:
open (HAN, "< $file_in") || die "Impossible open the in_file";
#r = <HAN>;
close (HAN);
for ($i=0; $i<=$#r; $i++){
chomp ($r[$i]);
($Ids, #v) = split (/\t/, $r[$i]);
}
}
But I cannot go on in any direction!
(My perl knowledge needs to be pushed by you!)
The hot points in my mind are:
how do I compare elements from the same column (or anyway in the same file)?
how can I associate the elements of the first column with the other column ones (may be keys)?
Any help is absolutely necessary and welcome!
use Data::Dumper;
my %hash;
while (<DATA>) {
next if $.==1;
chomp;
my ($first,#others) = (split /\s+/);
for (#others){
$hash{$_}.=' '.$first;
}
}
print Dumper \%hash;
__DATA__
G1 G2 G3 G4
Pf1 NO B1 NO D1
Pf2 NO NO C1 D1
Pf3 A1 B1 NO D1
Pf4 A1 NO C1 D2
Pf5 A3 B2 C2 D3
Pf6 NO B3 NO D3
What I use here? (tricks)
while (<DATA>){BLOCK} - read data from specific DATA section in Perl script file. (yes, you can put test data here, if you want. But don't store everything! this is not a bin!)
next if $.==1 - $. - special variable, that store a line number of input data. like 'index'.
chomp; - back to while(<DATA>).
Some variables in Perl are hidden. In functions - #_ array of input parameters. And always Perl programmers like to use $_ - You variable.
And this while(<DATA>) really a hidden while(defined($_ = <DATA>)).
Function chomp use hidden-You variable and try to chop \n symbol at the end.
Function split /REGEX/ also take as default variable hidden-You variable ($_).
Perl multi liner :),
perl -anE '
/^\S/ or next;
$k = shift #F;
push #{$t{$_}}, $k for#F;
}{
#$_-1==2 and say join" ",#$_ for map [#{$t{$_}},$_], sort keys%t;
' file

Sed, awk, Perl or other for de-interleaving text file

I would like a relatively compact command to perform line-by-line de-interleaving of a text file, i.e
a1
a2
a3
a4
b1
b2
b3
b4
c1
c2
c3
c4
d1
d2
d3
d4
maps to
a1
b1
c1
d1
a2
b2
c2
d2
a3
b3
c3
d3
a4
b4
c4
d4
The interleaving depth should be adjustable. The lines themselves do not contain any useful structure to assist with the process, and the example above is just a toy example for demonstration purposes. What tool can I use to do this?
sort can do it!
$ sort -k1.2 your_file
-k1.2 sorts by first field starting from 2nd character.
Output:
a1
b1
c1
d1
a2
b2
c2
d2
a3
b3
c3
d3
a4
b4
c4
d4
Basically, what you're looking at doing is reading your data into a 2D array. As you read it in, you can (for example) put the data into the array row by row.
Then when you write the data out, you traverse the array column by column. Adjusting the (de-)interleaving you do just requires a different size of array (or at least that you use a different amount of it, though you could leave the array size itself fixed, if you chose).
According to your new requirements, reordering elements based on their position in the file:
use strict;
use warnings;
my #sorted;
my $depth = 4; # the adjustable interleaving depth
while (<DATA>) {
my $num = ($. % $depth) - 1; # $. is input line number
push #{ $sorted[$num] }, $_;
}
for (#sorted) {
print #$_;
}
__DATA__
a1
a2
a3
a4
b1
b2
b3
b4
c1
c2
c3
c4
d1
d2
d3
d4
Note that the script can be tested on an input file by changing <DATA> to <> and running:
perl script.pl input.txt
Update
Having finally understood your question, thanks to TLP, I suggest this solution. It expects the depth and the input file name on the command line:
$ perl deinter.pl 4 interleaved.txt
and prints the reordered data to STDOUT.
use strict;
use warnings;
my $depth = shift;
my #data = <>;
for my $start (0 .. $depth-1) {
for (my $i = $start; $i < #data; $i += $depth) {
print $data[$i];
}
}
output
a1
b1
c1
d1
a2
b2
c2
d2
a3
b3
c3
d3
a4
b4
c4
d4
Previous solution
Here is a technique that reads the whole file into memory, builds a set of keys for comparison, and sorts the indices of the data so that they can be printed in the new order.
It can be changed for your purposes by modifying the regular expression that extracts the keys fields, and changing the sort block so that the sorted order is correct.
If your file is enormous then it may be necessary to build only the array of keys in memory, and leave the rest of the data on file to be read as it is output.
use strict;
use warnings;
open my $fh, '<', 'interleaved.txt' or die $!;
my #data = <$fh>;
my #keys = map [ /^(.)(.)/ ], #data;
my #sorted = sort {
$keys[$a][1] <=> $keys[$b][1] or
$keys[$a][0] cmp $keys[$b][0]
} 0 .. $#keys;
print $data[$_] for #sorted;
This might work for you (GNU sed and sort):
sed '1{x;s/^/1/;x};G;s/\n/\t/p;x;y/1234/2341/;x;d' file|sort -sk2|sed 's/\t.*//'
I'd like to credit Borodin and TLP for their inputs and answers, which inspired the solution. Its ugly, but I like it
awk 'BEGIN{v=4}{now=(NR-1)%v; STOR[now] = STOR[now] "\n" $0;} END {for (v in STOR) print STOR[v]}'
It also has the flaw of printing spurious newlines (well, the ones appended to the start of the array), but I can deal with that.
EDIT:
Solution for the newlines:
awk 'BEGIN{v=4}{now=(NR-1)%v; STOR[now] = STOR[now] "\n" $0;} END {for (v in STOR) print substr(STOR[v],2)}'

How do I calculate the difference between each element in two arrays?

I have a text file with numbers which I have grouped as follows, seperated by blank line:
42.034 41.630 40.158 26.823 26.366 25.289 23.949
34.712 35.133 35.185 35.577 28.463 28.412 30.831
33.490 33.839 32.059 32.072 33.425 33.349 34.709
12.596 13.332 12.810 13.329 13.329 13.569 11.418
Note: the groups are always of equal length and can be arranged in more than one line long, if the group is large, say 500 numbers long.
I was thinking of putting the groups in arrays and iterate along the length of the file.
My first question is: how should I subtract the first element of array 2 from array 1, array 3 from array 2, similarly for the second element and so on till the end of the group?
i.e.:
34.712-42.034,35.133-41.630,35.185-40.158 ...till the end of each group
33.490-34.712,33.839-35.133 ..................
and then save the differences of the first element in one group (second question: how ?) till the end
i.e.:
34.712-42.034 ; 33.490-34.712 ; and so on in one group
35.133-41.630 ; 33.839-35.133 ; ........
I am a beginner so any suggestions would be helpful.
Assuming you have your file opened, the following is a quick sketch
use List::MoreUtils qw<pairwise>;
...
my #list1 = split ' ', <$file_handle>;
my #list2 = split ' ', <$file_handle>;
my #diff = pairwise { $a - $b } #list1, #list2;
pairwise is the simplest way.
Otherwise there is the old standby:
# construct a list using each index in #list1 ( 0..$#list1 )
# consisting of the difference at each slot.
my #diff = map { $list1[$_] - $list2[$_] } 0..$#list1;
Here's the rest of the infrastructure to make Axeman's code work:
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw<pairwise>;
my (#prev_line, #this_line, #diff);
while (<>) {
next if /^\s+$/; # skip leading blank lines, if any
#prev_line = split;
last;
}
# get the rest of the lines, calculating and printing the difference,
# then saving this line's values in the previous line's values for the next
# set of differences
while (<>) {
next if /^\s+$/; # skip embedded blank lines
#this_line = split;
#diff = pairwise { $a - $b } #this_line, #prev_line;
print join(" ; ", #diff), "\n\n";
#prev_line = #this_line;
}
So given the input:
1 1 1
3 2 1
2 2 2
You'll get:
2 ; 1 ; 0
-1 ; 0 ; 1