Use perl script to fill in values in data set - perl

I have a series of numbers (in a text file) from 90,000 to 1,000,000 that correspond to files that I am missing. I would like to take this set and mark these files as "1" as a second column next to a complete series of numbers 90,000 to 1,000,000. For example for series 1to13 (which would correspond to the "missing" files):
3
7
10
12
I would like to create a data set:
1 0
2 0
3 1
4 0
5 0
6 0
7 1
8 0
9 0
10 1
11 0
12 1
13 0
I would like to be able to execute this in perl.

Assuming they are sorted:
use strict;
use warnings;
my $last = 89999;
while (my $next = <>) {
chomp($next);
print $last, " 0\n" while ++$last < $next;
print "$next 1\n";
}
print $last, " 0\n" while ++$last <= 1000000;

Related

Looping through a perl array

I am trying to:
Populate 10 elements of the array with the numbers 1 through 10.
Add all of the numbers contained in the array by looping through the values contained in the array.
For example,
it would start off as 1, then the second number would be 3 (1 plus 2), and then the next would be 6 (the existing 3 plus the new 3)
This is my current code
#!/usr/bin/perl
use warnings;
use strict;
my #b = (1..10);
for(#b){
$_ = $_ *$_ ;
}
print ("The total is: #b\n")
and this is the result
The total is: 1 4 9 16 25 36 49 64 81 100
What im looking for is:
The total is: 1 3 6 10 etc..
The shown sequence has for each element: its index + 1 + value at the previous index
perl -wE'#b = 1..10; #r = 1; $r[$_] = $_+1 + $r[$_-1] for 1..$#b; say "#r"'
The syntax $#name is for the last index in the array #name.
If the array is changed in place, as shown, then there is no need to initialize
perl -wE'#b = 1..10; $b[$_] = $_+1 + $b[$_-1] for 1..$#b; say "#b"'
Both print
1 3 6 10 15 21 28 36 45 55
As a script
use warnings;
use strict;
use feature 'say';
my #seq = 1..10;
for my $i (1..$#seq) {
$seq[$i] = $i+1 + $seq[$i-1];
}
say "#seq";
$ perl -E'say "The total is: ",join" ",map$sum+=$_,1..10'
The total is: 1 3 6 10 15 21 28 36 45 55

extracted data file containing range function is not active

i have extracted the following data in a file
0..5
8..10
12..16
but these are not working as range function. i have stored these in an array.
#arr=('0..5,8..10,12..16');
after printing the array it gives
0..5
8..10
12..16
but i need output as
0 1 2 3 4 5
8 9 10
12 13 14 15 16
am not getting where is the problem. why the stored data (..) is not working as range. function
You're starting with string representations of ranges, not actual perl ranges.
To get a perl array, you must convert your data. You could use eval like others have recommended. However, that's like using a machete to perform a haircut.
Instead, I'd advise using more precision tools to extract the range boundaries from the string and then build your new data structure. Using split or a regex could easily pull the values. The following does so using the latter:
use strict;
use warnings;
while (<DATA>) {
chomp;
my ($start, $end) = /(\d+)/g;
my #array = ($start .. $end);
print "#array\n";
}
__DATA__
0..5
8..10
12..16
Outputs:
0 1 2 3 4 5
8 9 10
12 13 14 15 16
Addendum for multiple entries on a row
The following allows for multiple ranges to be on a single row. Note, I'm using split in this version for the sake of variety, although I could have easily used a regex as well:
use strict;
use warnings;
while (<DATA>) {
chomp;
my #array;
for my $range (split ' ') {
my ($start, $end) = split /\.{2}/, $range, 2;
push #array, ($start .. $end);
}
print "#array\n";
}
__DATA__
0..5
4..9 14..18
8..10
12..16
Outputs:
0 1 2 3 4 5
4 5 6 7 8 9 14 15 16 17 18
8 9 10
12 13 14 15 16
Data is data. Perl does not evaluate data as Perl (i.e. expand .. range operator) unless you explicitly tell it to with eval. The following debug session should clarify things for you.
$ perl -de0
Loading DB routines from perl5db.pl version 1.33
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
main::(-e:1): 0
DB<1> #arr = ('0..5,8..10,12..16')
DB<2> p #arr
0..5,8..10,12..16
DB<3> eval "#arr = ('0..5,8..10,12..16')"
DB<4> p #arr
0..5,8..10,12..16
DB<5> #arr = ('0..5','8..10','12..16')
DB<6> p #arr
0..58..1012..16
DB<7> #arr = eval "(0..5,8..10,12..16);"
DB<8> x #arr
0 0
1 1
2 2
3 3
4 4
5 5
6 8
7 9
8 10
9 12
10 13
11 14
12 15
13 16
DB<9>
If you want Perl to expand string ranges into Perl ranges, you must eval that data.
use strict;
use warnings;
use feature qw(say);
my #arr=('0..5','8..10','12..16');
foreach my $range (#arr) {
say join ' ', eval ($range);
}
__END__
0 1 2 3 4 5
8 9 10
12 13 14 15 16
Try this to store the values in the array:
#arr=((0..5),(8..10),(12..16));

Iteration of an algorithm

I wrote a program that load the data from a 2 columns file, made an algorithm calculation and then write the pair of elements in the file that have this coefficient and put them into an array called #blackPair. I would like to iterate N times the algorithm taking the datas from the file but not those that are in the #blackPair array.
I thought of something like this:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $iter;
my $startNode;
my $endNode;
my %k;
my %end;
my %node;
my %edge;
my #blackPair=();
my $counts=0;
my $inputfile = "file3";
################# DATA ABSORTION
open(DAT,$inputfile) || die("Could not open file!");
while(<DAT>)
{
my ($entry) = $_;
chomp($entry);
my ($startNode, $endNode) = split(/ /,$entry);
$k{$endNode}++;
$k{$startNode}++;
$edge{$startNode}{$endNode}=1;
$edge{$endNode}{$startNode}=1;
}
################# ALGORITHM
my $minCentrality=2;
foreach my $i (keys %edge) {
foreach my $j (keys %{$edge{$i}}){
my #couple =($j,$i);
if($i<$j){
if (($k{$i}-1) !=0 && ($k{$j}-1) !=0){
my $triangleCount=0;
#couple=($i,$j) if ($k{$i}<$k{$j});
foreach (keys %{$edge{$couple[0]}}){
$triangleCount++ if exists $edge{$couple[1]}{$_};
}
my $centrality=($triangleCount+1)/($k{$couple[0]}-1);
if ($centrality<$minCentrality){
$minCentrality=$centrality;
#blackPair=#couple;
}
}
}
}
}
foreach (#blackPair){
say;
}
Close(DAT);
The file is the following:
1 2
1 3
1 4
1 5
1 6
1 9
2 3
4 5
5 9
6 7
6 8
6 16
7 8
9 10
9 11
10 11
10 12
10 14
11 12
11 13
12 13
12 14
14 15
16 17
16 18
17 18
17 19
18 19
18 20
19 20
The first pair that appear in the #blackPair are the 6 and 1. After found them I would like that the program restart the search but avoiding to charge into the array the pairs 1 and 6. Doing that the second pair would be 6 and 16. I would like to repeat this process N times (for example N = 4). I thought to put before the while(<DAT>) in the "DATA ABSORTION" another while(counts<=4){ and inside the while(<DAT>) an if(<DATA> != #blackPair){. There is what I thought
while(counts <= 4) {
while(<DAT>)
{
if(<DAT> != #blackPair){
my ($entry) = $_;
chomp($entry);
.....
}
#### ALGORITHM
counts++;
}
But it doesn't work. Any help?
After 4 iteration, in the #blackPair there should be the following pairs:
6 1
16 6
9 1
9 5
<DAT> != #blackPair is definitely not what you want.
!= is for numerical comparison. You want to do either string comparison (the ne operator) or maybe use the smart match operator to check for list membership (~~ \#blackPair)
but using the right operator won't really help you, because #blackPair already has mangled the input data (#blackPair might contain the elements (6,1), corresponding to an original input line of "1 6\n")
Instead, how about updating your graph in each iteration?
for my $count (1..4) {
my $minCentrality = 2;
...
say join " ", #blackPair;
# now update the graph
delete $edge{$blackPair[0]}{$blackPair[1]};
delete $edge{$blackPair[1]}{$blackPair[0]};
$k{$blackPair[0]}--;
$k{$blackPair[1]}--;
} # next iteration

Iterate a program erasing datas in the original set

I am trying to study an algorithm in which, given a list of numbers I must calculate a coefficient given by the ratio between the number of triangles founded in the data list and the minimum number of neighbors that a number has; for example, given the first two rows of the file:
1 2 3 4 5 6 9
2 1 3
...
if the first element of a row appears in the other rows and if the first element of the subsequent rows appear in the row taken in exam then I found a link;
if the "link" exists, then I want to count how many times the other elements in the row taken in exam appear in the row where the link is present and print "I have found z triangles".
For example in this case when the program compare the first row and the second row and find that "the link 1 2" exists and found that there is 1 triangle made by the vertex 1,2,3.
In the algorithm I have to divide the number of triangles + 1 by the minimum number of element in each row - 2 ( in this case the minimum number come from the second line and the value is 3-2=1); the coefficient that I am looking for is then (1+1)/1 = 2;
The output file will be written as:
1 2 1
in which in the first two columns I find the element that makes a link and in the 3rd column the value of the coefficient;
Here is the code I've written so far:
use strict;
use warnings;
use 5.010;
use List::Util;
my $filename = "data";
open my $fh, '<', $filename or die "Cannot open $filename: $!";
my $output_file = "output_example";
open my $fi, ">", $output_file or die "Error during $output_file opening: $!";
my %vector;
while (<$fh>) {
my #fields = split;
my $root = shift #fields;
$vector{$root} = { map { $_ => 1} #fields };
}
my #roots = sort { $a <=> $b } keys %vector;
for my $i (0 .. $#roots) {
my $aa = $roots[$i];
my $n_element_a = scalar (keys %{$vector{$aa}})-1;
for my $j ($i+1 .. $#roots) {
my $minimum;
my $bb = $roots[$j];
my $n_element_b = scalar (keys %{$vector{$bb}})-1;
next unless $vector{$aa}{$bb} and $vector{$bb}{$aa};
if ($n_element_a < $n_element_b){
$minimum = $n_element_a;
}else {
$minimum = $n_element_b;
}
my $triangles = 0;
for my $cc ( keys %{$vector{$aa}} ) {
next if $cc == $aa or $cc == $bb;
if ($vector{$bb}{$cc}) {
$triangles++;
}
}
my $coeff;
my #minimum_list;
if ($minimum == 0){
$coeff = ($triangles +1)/($minimum+0.00000000001);
} else {
$coeff = ($triangles +1)/($minimum);
}
say $fi "$aa $bb $coeff";
}
}
__DATA__
1 2 3 4 5 6 9
2 1 3
3 1 2
4 1 5
5 1 4
6 1 7 8
8 6 7
9 1 10 11
10 9 11 12 14
11 9 10 12 13
12 10 13 14
13 11 12
14 10 12 15
15 14
I put the entire dataset at the end. The output file gives:
__OUTPUT__
1 2 2
1 3 2
1 4 2
1 5 2
1 6 0.5
1 9 0.5
2 3 2
4 5 2
6 8 2
9 10 1
9 11 1
10 11 1
10 12 1
10 14 1
11 13 2
12 13 1
12 14 1
14 15 100000000000
Now I would like to find the minimum value of the coefficient, identify the link(s) that present this lower value, erase this elements in the original dataset and repeat the same program on the "new" dataset.
For example in this case the links that present the minimum values are the 1 6 and the 1 9 with a coefficient of 0.5. So now the program should delete in the file data the element "6" in the row that start with "1" and vice-versa and the same with the 9. So now the "new" dataset would be:
1 2 3 4 5
2 1 3
3 1 2
4 1 5
5 1 4
6 7 8
8 6 7
9 10 11
10 9 11 12 14
11 9 10 12 13
12 10 13 14
13 11 12
14 10 12 15
15 14
What I am looking for is:
How can I erase the elements that present the minimum coefficient's value from the dataset contained in the data file?
How can I iterate the processes N times?
To find the minimum from the output file I thought to add at the end of the program these lines:
my $file1 = "output_example";
open my $fg, "<", $file1 or die "Error during $file1 opening: $!";
my #minimum_vector;
while (<$fg>) {
push #minimum_vector, [ split ];
}
my $minima=$minimum_vector[0][2];
for my $i (0 .. $#minimum_vector){
if($minima >= $minimum_vector[$i][2] ){
$minima=$minimum_vector[$i][2];
}
}
say $minima;
close $file1;
But it gives me an error with the $minima because I think I can't read from the same file that I have just created (in this case the output_example file). It runs if I compile in a different program.
The best way to iterate would probably be to break your code up into subroutines. This will also help clarify the code and track down exactly where problems might be occuring.
use strict;
use warnings;
use 5.010;
use List::Utils qw/min/;
sub load_initial_data {
# open and read file, load it into an arrayref and return it.
}
sub find_coefficients {
my $data = shift;
my #results;
foreach my $row (#$data) {
# do stuff to calculate $aa, $bb, $coeff
push #results, [$aa, $bb, $coeff];
}
return \#results;
}
sub filter_data {
my $data = shift;
my $coefficients = shift;
my $min = min map { $_->[2] } #$coefficients;
my #deletions = grep { $min == $_->[2] } #$coefficients;
foreach my $del (#deletions) {
delete( $data->{$del->[0]}{$del->[1]} );
}
}
# doing the actual work:
my $data = load_initial_data();
my $coeffs;
foreach my $pass (0 .. $N) {
$coeffs = find_coefficients( $data );
$data = filter_data( $data, $coeffs );
# You could write $data and/or $coeffs out to a file here
# if you need to keep the intermediate stages
}

Find common elements in a file

The program that I would like to write has the same aim of the File row confrontation. This time the file I have is put in a different way:
1 2
1 3
1 4
2 1
2 3
2 4
2 5
3 1
...
8 6
8 7
8 9
9 8
I want to find:
when the first element of a row appears in the second position of the other rows and if the first element of the subsequent rows appear alongside the row taken in exam;
if it found then I want to print "I have found the link x y";
if the "link" exists, then I want to count how many "neighbours" they share, where by eighbours I mean how many elements in the second column they have in common and print "I found z triangles".
The file is sorted.
In this case the program will start founding the first "couple" 1 2 in the file but reversed and it will find it at the 4th row (2 1). Then it looks if the 3 ( second row and neighbour of 1) is also present in 2 ( and it is the case because it exists 2 3) and so on. At the end it will found that the "there is the link 1 2" and it "found 2 triangles" (1 - 2 - 3 and 1 - 2 - 4). I think the answer sould not be so different from the answer in the upper link, but I don't know how to arrange the files from a file made like this.
The first part of the problem is to find only the index of the inverted matching pairs? While reading this problem yesterday I had the feeling that grep may be of use;
#!usr/bin/perl
use warnings;
use strict;
my #parry;
while (<DATA>){
push #parry, [split(' ',$_)];
}
##remind is reverse matched indices;
my #remind = grep {
my $ind = $_;
grep { #reverse #{$parry[$_]} == #{parry[$ind]} did not appear to work.
#{$parry[$_]}[0] == #{$parry[$ind]}[1] &&
#{$parry[$_]}[1] == #{$parry[$ind]}[0];
} 0..$#parry
} 0..$#parry;
grep { print $_,': ',#{$parry[$_]},$/ } #remind;
__END__
1 2
1 3
1 4
2 1
2 3
2 4
2 5
3 1
8 6
8 7
8 9
9 8
output is
0: 12
1: 13
3: 21
7: 31
10: 89
11: 98
from here you then want to find say for
7[0] 7[1] (3 1) with neighbour row 6 and 8 with col 2?
6[1]
7[1] (1 5) and/or
7[1] (1 6) exist in the original set (in #parry)?
8[1]
Which they do not so no triangle.