SPARQL MAX(): Can I access a variable that is not in the GROUP BY projection? - group-by

I have a simple activity based model:
BASE <https://example.org/>
PREFIX paf: <https://paf.link/>
PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
<1> paf:consistsOf <1/activity/1>, <1/activity/2>, <1/activity/3> .
<1/activity/1> a paf:Activity ;
paf:activityType <https://paf.link/text> ;
paf:text "Deutsch"#de, "Französisch"#fr, "Italienisch"#it ;
paf:hasStartDate "2020-01-01"^^xsd:date .
<1/activity/2> a paf:Activity ;
paf:activityType <https://paf.link/text> ;
paf:text "Deutsch"#de, "Französisch"#fr, "Italienisch"#it ;
paf:hasStartDate "2020-02-02"^^xsd:date .
<1/activity/3> a paf:Activity ;
paf:activityType <https://paf.link/text> ;
paf:text "Deutsch"#de, "Französisch"#fr, "Italienisch"#it ;
paf:hasStartDate "2020-03-03"^^xsd:date .
<2> paf:consistsOf <2/activity/1>, <2/activity/2>, <2/activity/3> .
<2/activity/1> a paf:Activity ;
paf:activityType <https://paf.link/text> ;
paf:text "Deutsch"#de, "Französisch"#fr, "Italienisch"#it ;
paf:hasStartDate "2020-01-01"^^xsd:date .
<2/activity/2> a paf:Activity ;
paf:activityType <https://paf.link/text> ;
paf:text "Deutsch"#de, "Französisch"#fr, "Italienisch"#it ;
paf:hasStartDate "2020-02-02"^^xsd:date .
<2/activity/3> a paf:Activity ;
paf:activityType <https://paf.link/text> ;
paf:text "Deutsch"#de, "Französisch"#fr, "Italienisch"#it ;
paf:hasStartDate "2020-03-03"^^xsd:date .
<3> paf:consistsOf <3/activity/1>, <3/activity/2>, <3/activity/3> .
<3/activity/1> a paf:Activity ;
paf:activityType <https://paf.link/text> ;
paf:text "Deutsch"#de, "Französisch"#fr, "Italienisch"#it ;
paf:hasStartDate "2020-01-01"^^xsd:date .
<3/activity/2> a paf:Activity ;
paf:activityType <https://paf.link/text> ;
paf:text "Deutsch"#de, "Französisch"#fr, "Italienisch"#it ;
paf:hasStartDate "2020-02-02"^^xsd:date .
<3/activity/3> a paf:Activity ;
paf:activityType <https://paf.link/text> ;
paf:text "Deutsch"#de, "Französisch"#fr, "Italienisch"#it ;
paf:hasStartDate "2020-03-03"^^xsd:date .
I want to query the most recent activity as only this one matters for what I do next:
PREFIX paf: <https://paf.link/>
PREFIX schema: <http://schema.org/>
SELECT *
WHERE {
{
SELECT ?processIri (max(?activityDate) AS ?latestActivity) WHERE {
?processIri paf:consistsOf ?activityIri .
?activityIri a paf:Activity ;
paf:activityType <https://paf.link/text> ;
paf:hasStartDate ?activityDate ;
} GROUP BY ?processIri
}
?processIri paf:consistsOf ?activityIri .
?activityIri a paf:Activity ;
paf:activityType <https://paf.link/text> ;
paf:hasStartDate ?latestActivity ;
paf:text ?text
}
This works all fine and I can use that. The only thing that bugs me is that I have to do the join again outside of the inner SELECT because I can't put the ?activityIri in the GROUP BY (would not do what I want) and I can't reference it when it's not in the GROUP BY projection.
Or would there be another way of doing that?

Related

How convert matrix (pairwise comparison) to columns using Perl?

A while ago I used a script from Perl to convert a matrix into columns.
The matrix (pairwise comparison matrix) looks like this (727 * 727 = 528.529 values):
ID (column name)
NP_073551
QJY77946 0.3872 (0.0293 0.0757)
QJY77954 0.3668 (0.0273 0.0745) 0.4851 (0.0041 0.0085)
QJY77962 0.3668 (0.0273 0.0745) 0.3767 (0.0041 0.0109)-1.0000 (0.0000 0.0024)
...
Where the empty value is the comparison with the same ID and the other values are equal to this = dN/dS (dN, dS)
To do this, I use this script first:
use strict;
use warnings;
my $file = $ARGV[0] || "file.txt" ;
open (F, $file) or die;
while (my $linea = <F>) {
$linea =~ s/\n// ;
$linea =~ s/\r// ;
$linea =~ s/\s+/\t/g ;
print $linea;
print "\n";
}
close F;
exit;
Then, to generate the columns I used this:
use strict;
use warnings;
my $file = $ARGV[0] || "file_01.txt" ;
my #content;
#### Reading and storing file into an array
open (F, $file) or die;
while (my $linea = <F>) {
$linea =~ s/\n// ;
$linea =~ s/\r// ;
next if ($linea =~ /^\t/);
push #content, $linea;
}
close F;
##### Analyzing content
#### 1. storing ids in an array
my #ids;
foreach my $term (#content) {
next if (length($term) < 1);
my #partition = split ("\t", $term);
my $idd = $partition[0];
next if (length($idd) < 2);
push #ids, $idd;
}
#### 2. using ids to create the pairs
foreach my $term (#content) {
next if (length($term) < 1);
my #partition2 = split ("\t", $term);
my $a = scalar #partition2;
my $id1 = $partition2[0];
my $nn = 0;
foreach my $term2 (#partition2) {
my $id2 = $ids[$nn];
if ($id1 eq $id2) {
print "$id1\t$id2\tNA\n";
} else {
my $nk = $nn+1;
my $value = $partition2[$nk];
print "$id1\t$id2\t$value\n";
}
$nn++;
}
#print "$id1\n";
}
exit;
The result for this script:
NP_073551 NP_073551 NA
QJY77946 NP_073551 0.3872
QJY77946 QJY77946 NA
QJY77946 QJY77954 0.0757)
QJY77946 QJY77962
QJY77954 NP_073551 0.3668
QJY77954 QJY77946 (0.0273
QJY77954 QJY77954 NA
QJY77954 QJY77962 0.4851
QJY77954 QJY77970 (0.0041
QJY77954 QJY77978 0.0085)
QJY77954 QEO75985
But I need this (generate columns):
ID_1 ID_2 Omega dN dS
NP_073551 NP_073551 NA
NP_073551 QJY77946 0.3872 0.0293 0.0757
QJY77954 NP_073551 0.3668 0.0273 0.0745
QJY77962 QJY77954 0.3668 0.0273 0.0745
...
Use hashes to group records.
my %grouped;
for (...) {
...
my $row = $grouped{$id1}{$id2} //= { id1 => $id1, id => $id2 };
$row->{$key} = $value;
}
for (values(%grouped)) {
for my $row (values(%$_)) {
my ($id1, $id2, $omega, $dn, $ds) = $row->#{qw( id1 id2 omega dn ds )};
...
}
}

perl referencing a hash outside of the scope that populated it (histogram)

:Hello Everybody.
I am trying to parse this data into a histogram.
When I parse up the file and substring out the values i get and nice little dataset.
#!/usr/bin/perl
use strict ;
use warnings ;
open (my $fh_tmp, '<', "/tmp/gap_output") ;
while(<$fh_tmp>) {
my ($date, $time, $host) = split ;
my $host_length = 12;
my $time_length = 5;
my $host_slice = substr $host, 0 , $host_length ;
my $time_slice = substr $time, 0 , $time_length ;
print "$host_slice, $time_slice\n" ;
}
see - this is great.
trip7829host, 03:10
trip7829host, 03:10
trip7829host, 03:10
trip7829host, 03:10
trip7829host, 03:10
trip7829host, 03:10
trip7829host, 03:10
trip7829host, 03:10
trip7829host, 07:10
trip7829host, 07:10
trip7829host, 07:10
trip7829host, 07:10
trip7829host, 07:10
trip7829host, 07:10
trip7829host, 07:10
trip7829host, 07:10
trip7829host, 07:10
trip7829host, 07:10
trip7829host, 03:10
trip7829host, 03:10
trip7830host, 07:30
trip7830host, 07:30
trip7831host, 07:30
trip7831host, 07:30
trip7832host, 07:30
ultimately what I would need to get is something like this .
uniqe host, unique time and a number of times this error (which is n0t shown ) occured per minute.
Which occurs to me now - I am probably going to need a multilevel hash (ugg). HOwever until that problem -
i can't even populate a regular hash.
trip7829host, 03:10 10 ##########
trip7829host, 07:10 10 ##########
trip7830host, 07:30 2 ##
trip7831host, 07:30 2 ##
trip7832host, 07:30 1 #
I tried using a bunch of my's and our's. does not really matter though.
I kinda thought that once the gap_ids was populated, that I would not have
to declare it again - that it would live outside the scope of the while loop -
but it does not, becuase the use strict wants me to declare it again. I tried importing
those values with 'our'.
casper#trip0170pap:~/walt/historgram$ cat gap_histgram
#!/usr/bin/perl
use strict ;
use warnings ;
open (my $fh_tmp, '<', "/tmp/gap_output") ;
while(<$fh_tmp>) {
my ($date, $time, $host) = split ;
our %gap_ids ;
my $host_length = 12;
my $time_length = 5;
my $host_slice = substr $host, 0 , $host_length ;
my $time_slice = substr $time, 0 , $time_length ;
our ($key, $value) = ($host_slice, $time_slice) ;
#print "$key , $value\n" ;
$gap_ids{$key} = $value ;
}
while(($key, $value) =each %gap_ids) {
printf ("%-40s %-6s", $key, $value) ;
for (my $index =1; $index <= $value; $index++) {
print "#" ;
}
print "\n" ;
}
-rwxr-xr-x 1 casper casper 643 Nov 24 16:03 gap_histgram*
casper#trip0170host:~/walt/historgram$ ./gap_histgram
Variable "$key" is not imported at ./gap_histgram line 18.
Variable "$value" is not imported at ./gap_histgram line 18.
Variable "%gap_ids" is not imported at ./gap_histgram line 18.
Variable "$key" is not imported at ./gap_histgram line 19.
Variable "$value" is not imported at ./gap_histgram line 19.
Variable "$value" is not imported at ./gap_histgram line 20.
Global symbol "$key" requires explicit package name at ./gap_histgram line 18.
Global symbol "$value" requires explicit package name at ./gap_histgram line 18.
Global symbol "%gap_ids" requires explicit package name at ./gap_histgram line 18.
Global symbol "$key" requires explicit package name at ./gap_histgram line 19.
Global symbol "$value" requires explicit package name at ./gap_histgram line 19.
Global symbol "$value" requires explicit package name at ./gap_histgram line 20.
Execution of ./gap_histgram aborted due to compilation errors.
these error are just mystifying.
is there any reason why you have to use global variable declaration
Your problem seems to be solvable by declaring %gap_ids outside of the first while loop. and this will hash will be accessible in the second while loop
so your code should be :
#!/usr/bin/perl
use strict ;
use warnings ;
open (my $fh_tmp, '<', "/tmp/gap_output") ;
my %gap_ids;
while(<$fh_tmp>) {
my ($date, $time, $host) = split ;
our %gap_ids ;
my $host_length = 12;
my $time_length = 5;
my $host_slice = substr $host, 0 , $host_length ;
my $time_slice = substr $time, 0 , $time_length ;
my ($key, $value) = ($host_slice, $time_slice) ;
#print "$key , $value\n" ;
$gap_ids{$key} = $value ;
}
while(my ($key, $value) = each %gap_ids) {
printf ("%-40s %-6s", $key, $value) ;
for (my $index =1; $index <= $value; $index++) {
print "#" ;
}
print "\n" ;
}
Note that I have shift %gap_ids out. and change our ($key, $value) in your first loop.
my declares a variable that is local to the curly braces that it's inside.
use strict;
my $visible_to_the_whole_file = 7;
{
my $only_visible_in_here = 21;
}
print "$visible_to_whole_file\n"; # works ok
print "$only_visible_in_here\n"; # WILL FAIL, there is no variable any more.
You can ignore our entirely since you're not using package anywhere, but it is also scoped to curly braces. A pair of curly braces is called a "block".
Given that, you can see why you need to declare your variables, including %gap_ids outside both of the while loops, if you want the values to be seen by the code inside both of the while loops. (assuming use strict)
{
my $some_variable = 21;
}
{
print $some_variable; # will FAIL, is no longer visible
}
This is what you want to do if you want the variable visible inside both blocks.
my $some_variable;
{
$some_variable = 21;
}
{
print $some_variable; # yay, it's still 21!
}
What's more, if you my a variable inside two different blocks, you end up with two different variables.
{
my $foo = 21;
}
{
my $foo;
print "foo is $foo\n"; # won't print 21, it's not 21 any more
}
Hope that's helpful!

Perl cannot open file; from Ploteig script

I am a total novice when it comes to computer programming and perl, so please forgive me if this question is simple!
I am trying to run a perl script (called ploteig, a component of a free genetics program download, Eigenstrat:Eigenstrat software) that works fine until I get to line 96-
open (YY, ">$dfile") || die "can't open $dfile\n" ;
I am given the error that the file is unable to be opened and the script dies.
Below, I have provided the entire code for you (since honestly, I have no idea what part of the code could be influencing the inability to open the file). The code uses input from a file created previously with Eigenstrat, example of 4 rows, 12 columns:
#eigvals: 20.388 7.503 4.033 2.929 2.822 2.726 2.700 2.590 2.451 2.365
GREY_BI_011_COMSTOCK_11 0.0164 0.0164 0.0382 -0.1283 -0.0658 0.0406 0.0322 0.0105 -0.0851 -0.0625 Case
GREY_BI_014_COMSTOCK_14 0.0191 0.0094 0.0567 -0.0250 0.0804 -0.0531 -0.0165 0.0321 0.1130 -0.0025 Control
GREY_BI_015_COMSTOCK_15 0.0221 -0.0042 -0.0031 0.0091 0.1448 0.0351 0.0430 0.0359 0.0049 0.0791 Control
(rows represent individual sample pca scores, columns specific pcas. First column sample names, last column case or control status)
Additionally, I call the code as follows:
perl ploteig –i combogreyout.pca.evec –p Case:Control –s Out –c 1:2 –x –o utploteig.xtxt –k
I am really unsure where to go from here. I tried changing the file permissions and ensuring it was in the working directory, but it wouldn’t allow me to change permissions and everything pointed to being in the correct directory. However, I am unsure if either of these are the real problem.
I would very much appreciate any help anyone can give me!
Thank you SO much!
> #!/usr/bin/perl -w
### ploteig -i eigfile -p pops -c a:b [-t title] [-s stem] [-o outfile] [-x] [-k] [-y]
[-z sep] [-f fixgreen]
use Getopt::Std ;
use File::Basename ;
## pops : separated -x = make postscript and pdf -z use another separator
## -k keep intermediate files
## NEW if pops is a file names are read one per line
getopts('i:o:p:c:s:d:z:t:xkyf',\%opts) ;
$postscmode = $opts{"x"} ;
$oldkeystyle = $opts{"y"} ;
$kflag = $opts{"k"} ;
$keepflag = 1 if ($kflag) ;
$keepflag = 1 unless ($postscmode) ;
$dofixgreen = ( exists $opts{"f"} ? $opts{"f"} : 0 );
$zsep = ":" ;
if (defined $opts{"z"}) {
$zsep = $opts{"z"} ;
$zsep = "\+" if ($zsep eq "+") ;
}
$title = "" ;
if (defined $opts{"t"}) {
$title = $opts{"t"} ;
}
if (defined $opts{"i"}) {
$infile = $opts{"i"} ;
}
else {
usage() ;
exit 0 ;
}
open (FF, $infile) || die "can't open $infile\n" ;
#L = (<FF>) ;
chomp #L ;
$nf = 0 ;
foreach $line (#L) {
next if ($line =~ /\#/) ;
#Z = split " ", $line ;
$x = #Z ;
$nf = $x if ($nf < $x) ;
}
printf "## number of fields: %d\n", $nf ;
$popcol = $nf-1 ;
if (defined $opts{"p"}) {
$pops = $opts{"p"} ;
}
else {
die "p parameter compulsory\n" ;
}
$popsname = setpops ($pops) ;
print "$popsname\n" ;
$c1 = 1; $c2 =2 ;
if (defined $opts{"c"}) {
$cols = $opts{"c"} ;
($c1, $c2) = split ":", $cols ;
die "bad c param: $cols\n" unless (defined $cols) ;
}
$stem = "$infile.$c1:$c2" ;
if (defined $opts{"s"}) {
$stem = $opts{"s"} ;
}
$gnfile = "$stem.$popsname.xtxt" ;
if (defined $opts{"o"}) {
$gnfile = $opts{"o"} ;
}
#T = () ; ## trash
open (GG, ">$gnfile") || die "can't open $gnfile\n" ;
print GG "## " unless ($postscmode) ;
print GG "set terminal postscript color\n" ;
print GG "set title \"$title\" \n" ;
print GG "set key outside\n" unless ($oldkeystyle) ;
print GG "set xlabel \"eigenvector $c1\" \n" ;
print GG "set ylabel \"eigenvector $c2\" \n" ;
print GG "plot " ;
$np = #P ;
$lastpop = $P[$np-1] ;
$d1 = $c1+1 ;
$d2 = $c2+1 ;
foreach $pop (#P) {
$dfile = "$stem:$pop" ;
push #T, $dfile ;
print GG " \"$dfile\" using $d1:$d2 title \"$pop\" " ;
print GG ", \\\n" unless ($pop eq $lastpop) ;
chomp $dfile;
open (YY, ">$dfile") || die "can't open $dfile\n" ;
foreach $line (#L) {
next if ($line =~ /\#/) ;
#Z = split " ", $line ;
next unless (defined $Z[$popcol]) ;
next unless ($Z[$popcol] eq $pop) ;
print YY "$line\n" ;
}
close YY ;
}
print GG "\n" ;
print GG "## " if ($postscmode) ;
print GG "pause 9999\n" ;
close GG ;
if ($postscmode) {
$psfile = "$stem.ps" ;
if ($gnfile =~ /xtxt/) {
$psfile = $gnfile ;
$psfile =~ s/xtxt/ps/ ;
}
system "gnuplot < $gnfile > $psfile" ;
if ( $dofixgreen ) {
system "fixgreen $psfile" ;
}
system "ps2pdf $psfile " ;
}
unlink (#T) unless $keepflag ;
sub usage {
print "ploteig -i eigfile -p pops -c a:b [-t title] [-s stem] [-o outfile] [-x] [-k]\n" ;
print "-i eigfile input file first col indiv-id last col population\n" ;
print "## as output by smartpca in outputvecs \n" ;
print "-c a:b a, b columns to plot. 1:2 would be common and leading 2 eigenvectors\n" ;
print "-p pops Populations to plot. : delimited. eg -p Bantu:San:French\n" ;
print "## pops can also be a filename. List populations 1 per line\n" ;
print "[-s stem] stem will start various output files\n" ;
print "[-o ofile] ofile will be gnuplot control file. Should have xtxt suffix\n";
print "[-x] make ps and pdf files\n" ;
print "[-k] keep various intermediate files although -x set\n" ;
print "## necessary if .xtxt file is to be hand edited\n" ;
print "[-y] put key at top right inside box (old mode)\n" ;
print "[-t] title (legend)\n" ;
print "[-f] fix green and yellow colors\n";
print "The xtxt file is a gnuplot file and can be easily hand edited. Intermediate files
needed if you want to make your own plot\n" ;
}
sub setpops {
my ($pops) = #_ ;
local (#a, $d, $b, $e) ;
if (-e $pops) {
open (FF1, $pops) || die "can't open $pops\n" ;
#P = () ;
foreach $line (<FF1>) {
($a) = split " ", $line ;
next unless (defined $a) ;
next if ($a =~ /\#/) ;
push #P, $a ;
}
$out = join ":", #P ;
print "## pops: $out\n" ;
($b, $d , $e) = fileparse($pops) ;
return $b ;
}
#P = split $zsep, $pops ;
return $pops ;
}

How to print unmatched data after comparison of two files?

I have two files here (file 1 & file 2). I would like to match name in bold as shown below from both files. However I need to print those unmatched data in file 1 format. I have been trying the code below but it is not the result I want. How to print those unmatched data in file 1 format after matching?
file 1
ID **alan135/xkr** $work(b05bfn00un0c3)/b05bfn00un0c3 ; #<= b05bfn00un0d0 Size:5848.270996
ID **John06/ext** $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY
ID **lily099/poli** $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY
ID **sam012/pp** $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY
ID **lily099/poli** $wwrk(b05bfn00ld0p8)/b05bfn00ld0p8 ; #<= b05bfn00ld0s0 Size:INFINITY
ID **Steve9018** $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY
file 2
Accept => **John06/ext** Max
Accept => **vivian788/ppr** Maxcap
Accept => **suzan645/pp** Min
Accept => **lily099/poli** Max
Accept => **Nick5670/uu** Max
Accept => **Anne309/pej** Min
code
my ($line1,$line2,#arr1,#arr2,#arr3,#emptyarr);
#arr1 = <FILE1>;
#arr2 = <FILE2>;
foreach $line2 (#arr2) {
if ($line2 =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)/) {
#arr3 = #emptyarr;
my $cname2 = "$2";
push (#arr3, $cname2);
}
}
foreach $line2 (#arr3) {
foreach $line1 (#arr1) {
if ($line1 =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
my $cname1 = "$2";
if ($cname1 ne $line3) {
print NL "$cname1\n";
}
}
}
}
expected result:
ID alan135/xkr $work(b05bfn00un0c3)/b05bfn00un0c3 ; #<= b05bfn00un0d0 Size:5848.270996
ID sam012/pp $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY
ID Steve9018 $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY
This is working for me. You can replace the split('\n', ...) with your arrays.
use strict;
use warnings;
my $file1 = <<'FILE';
ID **alan135/xkr** $work(b05bfn00un0c3)/b05bfn00un0c3 ; #<= b05bfn00un0d0 Size:5848.270996
ID **John06/ext** $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY
ID **lily099/poli** $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY
ID **sam012/pp** $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY
ID **lily099/poli** $wwrk(b05bfn00ld0p8)/b05bfn00ld0p8 ; #<= b05bfn00ld0s0 Size:INFINITY
ID **Steve9018** $work(b05bfn00ld0p7)/b05bfn00ld0p7 ; #<= b05bfn00ld0s0 Size:INFINITY
FILE
my $file2 = <<'FILE';
Accept => **John06/ext** Max
Accept => **vivian788/ppr** Maxcap
Accept => **suzan645/pp** Min
Accept => **lily099/poli** Max
Accept => **Nick5670/uu** Max
Accept => **Anne309/pej** Min
FILE
for (split("\n", $file2)) {
/.*\*\*(.*)\*\*.*./;
my $id = $1;
for (split("\n", $file1)) {
if ( /${id}/ ) {
print $_ . "\n";
}
}
}
To run this script in DEBUG mode, type DEBUG=1 script.pl
#!/usr/bin/env perl
use strict;
use warnings;
# --------------------------------------
use charnames qw( :full :short );
use English qw( -no_match_vars ); # Avoids regex performance penalty
use Data::Dumper;
# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;
# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;
# conditional compile DEBUGging statements
# See http://lookatperl.blogspot.ca/2013/07/a-look-at-conditional-compiling-of.html
use constant DEBUG => $ENV{DEBUG};
# --------------------------------------
# Put the file names in variables so they can be easily changed.
my $file_name_1 = 'data1.tmp';
my $file_name_2 = 'data2.tmp';
# Read the key names from the second file.
my %key_names = ();
open my $fh2, '<', $file_name_2 or die "could not open $file_name_2: $OS_ERROR\n";
while( my $line = <$fh2> ){
# names are after 'Accept => '
if( my ( $name ) = $line =~ m{ \b Accept \s* \=\> \s* (\S+) }msx ){
$key_names{$name} ++;
}
}
close $fh2 or die "could not close $file_name_2: $OS_ERROR\n";
print Dumper \%key_names if DEBUG;
# Read the first file and store the _unmatched_ names.
my %lines_of_unmatched = ();
open my $fh1, '<', $file_name_1 or die "could not open $file_name_1: $OS_ERROR\n";
while( my $line = <$fh1> ){
# name after 'ID'
if( my ( $name ) = $line =~ m{ \b ID \s+ (\S+) }msx ){
if( ! exists $key_names{$name} ){
push #{ $lines_of_unmatched{$name} }, $line;
}
}
}
close $fh1 or die "could not close $file_name_1: $OS_ERROR\n";
print Dumper \%lines_of_unmatched if DEBUG;
use List::Compare;
my #diffs = List::Compare->new(
[map {/\*\*([^\*]+)\*\*/} <FILE1>],
[map {/\*\*([^\*]+)\*\*/} <FILE2>]
)->get_symmetric_difference;
print join("\n", #diffs)."\n";

PERL calculate and output ratio adjusted time series ( OHLC ) from two input files [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
I wish to merge two one minute time
series files with PERL and adjusting
the input data to output a ratio
adjusted time series.
I am having troubles with reading two
files at the same time and designing the control (loop?) architecture.
I need to able to read one line from FileA and one line from fileB and
compare the date and time values. From
there will either calculate the
adjusted time series.
my $cntA = 1 ;
my $cntB = 1 ;
if ($cntA == 1 ) {
unlink ("$rATS");
#open files
open(FA, $fileA) or die "Error opening data file: $fileA $!\n";
open(FB, $fileB) or die "Error opening data file: $fileB $!\n";
open(MYOUTFILE, ">> $rATS") || die("unable to open $fileName");
#header
print MYOUTFILE "Date,Time,Open,High,Low,Close\n";
}
=item
#possible controlling loop
foreach my $SymA (#filesA){
foreach my $SumB (#filesB){
print "$filesASym $filesBSym\n";
=cut
while (<FA>) {
my #fieldsA = split /,/,$_;
#if ($fieldsA[0] != $DateA) { $closeYA = $CloseA ;}
my $DateA = $fieldsA[0] ;
my $TimeA = $fieldsA[1] ;
my $OpenA = $fieldsA[2] ;
my $HighA = $fieldsA[3] ;
my $LowA = $fieldsA[4] ;
my $CloseA = $fieldsA[5] ;
my $VolumeA = $fieldsA[6] ;
my $OpenIntA = $fieldsA[7] ;
my $lineNumA = $. ;
print "line num A: $lineNumA\n\n";
$cntA++;
last if $cntA != 0 ;
}
while (<FB>) {
my #fieldsB = split /,/,$_;
#if ($fieldsB[0] != $DateB) { $closeYB = $CloseB ;}
my $DateB = $fieldsB[0] ;
my $TimeB = $fieldsB[1] ;
my $OpenB = $fieldsB[2] ;
my $HighB = $fieldsB[3] ;
my $LowB = $fieldsB[4] ;
my $CloseB = $fieldsB[5] ;
my $VolumeB = $fieldsB[6] ;
my $OpenIntB = $fieldsB[7] ;
$cntB++;
last if $cntB != 0 ;
}
close(FA) || die("unable to close $fileA") ;
close(FB) || die("unable to close $fileB") ;
close(MYOUTFILE) || die("unable to close $fileName") ;
=item
sub ratio ($$) {
my $ratioAB;
if ($_[0] > $_[1]) { $ratioAB = eval { $_[1] / $_[0]; } ; } warn $# if $# ;
if ($_[0] <= $_[0]) { $ratioAB = eval { $_[0] / $_[1]; } ; } warn $# if $# ;
return sprintf("%.2f", $ratioAB) ;
}
sub calcOHLC {
if ($closeYA > $closeYB) {
$open = sprintf("%.2f",$ratio * $openA - $openB);
$close = sprintf("%.2f",$ratio * $closeA - $closeB);
$high = sprintf("%.2f",$ratio * $highA - $highB);
$low = sprintf("%.2f",$ratio * $lowA - $lowB);
}
if ($closeYA <= $closeYB ) {
$open = sprintf("%.2f",$openA - $ratio * $openB);
$close = sprintf("%.2f",$closeA - $ratio * $closeB);
$high = sprintf("%.2f",$highA - $ratio * $highB);
$low = sprintf("%.2f",$lowA - $ratio * $lowB);
}
return undef;
}
sub outputFile{
print MYOUTFILE "$Date" . "," . "$Time" . "," . "$Open" . "," . "$High" . "," . "$Low" . "," . "$Close" . "\n";
}
=cut
Sample Data:
CVX File:
1/28/2011 957 94.21 94.21 94 94 83424 1357498
1/28/2011 958 94.02 94.11 94.02 94.1 41351 1398849
1/28/2011 959 94.1 94.11 94.06 94.1 27715 1426564
1/28/2011 1000 94.1 94.11 94.06 94.1 27715 1426564
1/28/2011 1001 94.18 94.2 94.04 94.07 61584 1523943
1/28/2011 1002 94.07 94.2 94.04 94.06 67352 1591295
1/28/2011 1003 94.07 94.2 94.04 94.06 67352 1591295
1/28/2011 1004 94.09 94.16 94.02 94.12 42852 1684278
XOM File:
1/28/2011 957 79.59 79.59 79.53 79.55 78759 1997094
1/28/2011 958 79.59 79.59 79.53 79.55 78759 1997094
1/28/2011 959 79.62 79.64 79.58 79.58 77559 2107813
1/28/2011 1000 79.58 79.6 79.58 79.6 87640 2195453
1/28/2011 1001 79.6 79.61 79.54 79.55 88442 2283895
1/28/2011 1002 79.6 79.61 79.54 79.55 88442 2283895
1/28/2011 1003 79.57 79.59 79.55 79.57 54073 2408315
1/28/2011 1004 79.57 79.58 79.5 79.52 118655 2526970
ratio as of 1/27/2011 = 79.88/94.75 = .84
since CVX is higher XOM/CVX
(CVX * .84) - XOM for output on 1/28/2011
Output file
Date time open high low close
1/28/2011 957 -0.45 -0.45 -0.57 -0.59
1/28/2011 958 -0.61 -0.54 -0.55 -0.51
1/28/2011 959 -0.58 -0.59 -0.57 -0.54
1/28/2011 1000 -0.54 -0.55 -0.57 -0.56
1/28/2011 1001 -0.49 -0.48 -0.55 -0.53
1/28/2011 1002 -0.58 -0.48 -0.55 -0.54
1/28/2011 1003 -0.55 -0.46 -0.56 -0.56
1/28/2011 1004 -0.53 -0.49 -0.52 -0.46
It's a little bit unclear what you mean by "in order to process fileA and fileB and have both a line from each to calc, yet keep some sort of master loop iterating".
if you mean that you wish to advance in 2 files in parallel, you can do it fairly easily - keep 1 loop, and read 1 line from each file per iteration. My code below is a bit vague since I don't know what you want to do
my ($done, $done_with_A, $done_with_B, $lineA, $lineB) = (0, 0, 0);
while (!$done_with_A || !$done_with_B) {
if (!$done_with_A) {
$lineA = <FA>;
}
if (!$done_with_B) {
$lineB = <FB>;
}
($done_with_A, $done_with_B) = are_we_done($lineA, $lineB);
# $lineA, $lineB are undef when the files are done.
process($lineA, $lineB);
}
If you need to be able to possibly process several lines from fileA for 1 line for fileB (or vice versa), its a bit more complicated - you accumulate lines in buffers:
my ($done, $read_from_A, $read_from_B, $done_with_A, $done_with_B, $lineA, $lineB) = (0, 1, 1, 0, 0);
my (#buffer_A, #buffer_B);
while (!$done_with_A || !$done_with_B) {
if (!$done_with_A && $need_to_read_from_A) {
$lineA = <FA>;
}
if (!$done_with_B && $need_to_read_from_B) {
$lineB = <FB>;
}
($done_with_A, $done_with_B) = are_we_done($lineA, $lineB);
# $lineA, $lineB are undef when the files are done.
if ( need_more_lines_from_A($lineA, $lineB) ) {
$read_from_A = 1;
$read_from_B = 0; # $lineB stays the same
push #buffer_A, $lineA;
next;
}
if ( need_more_lines_from_B($lineA, $lineB) ) {
$read_from_A = 0; # $lineA stays the same
$read_from_B = 1;
push #buffer_B, $lineB;
next;
}
push #buffer_A, $lineA;
push #buffer_B, $lineB;
process(#buffer_A, #buffer_A);
#buffer_B = (); #buffer_B = (); # Reset the buffers
$read_from_A = 0; $read_from_B = 0; # Read next batch.
}