perl6, how to match 1 to 10000 times except prime number of times? - range

What is the best way to match a string that occurs anywhere from 1 to 10000 times except prime number of times?
say so "xyz" ~~ m/ <[x y z]> ** <[ 1..10000] - [ all prime numbers ]> /
Thanks !!!

Not necessarily the best way (in particular, it will create up to 10_000 submatch objects), but a way:
$ perl6 -e 'say "$_ ", so <x y z>.roll x $_ ~~ /^ (<[xyz]>) ** 1..10_000 <!{$0.elems.is-prime}> $/ for 1..10'
1 True
2 False
3 False
4 True
5 False
6 True
7 False
8 True
9 True
10 True
If the substring of interest has fixed length, you could also capture the repetition as a whole and check its length, avoiding submatch creation.

Related

How to count the numbers of elements in parts of a text file using a loop in Perl?

I´m looking for a way to create a script in Perl to count the elements in my text file and do it in parts. For example, my text file has this form:
ID Position Potential Jury agreement NGlyc result
(PART 1)
NP_073551.1_HCoV229Egp2 23 NTSY 0.5990 (8/9) +
NP_073551.1_HCoV229Egp2 62 NTSS 0.7076 (9/9) ++
NP_073551.1_HCoV229Egp2 171 NTTI 0.5743 (5/9) +
...
(PART 2)
QJY77946.1_NA 20 NGTN 0.7514 (9/9) +++
QJY77946.1_NA 23 NTSH 0.5368 (5/9) +
QJY77946.1_NA 51 NFSF 0.7120 (9/9) ++
QJY77946.1_NA 62 NTSS 0.6947 (9/9) ++
...
(PART 3)
QJY77954.1_NA 20 NGTN 0.7694 (9/9) +++
QJY77954.1_NA 23 NTSH 0.5398 (5/9) +
QJY77954.1_NA 51 NFSF 0.7121 (9/9) ++
...
(PART N°...)
Like you can see the ID is the same in each part (one for PART 1, other to PART 2 and then...). The changes only can see in the columns Position//Potential//Jury agreement//NGlyc result Then, my main goal is to count the line with Potential 0,7 >=.
With this in mind, I´m looking for output like this:
Part 1:
1 (one value 0.7 >=)
Part 2:
2 (two values 0.7 >=)
Part 3:
2 (two values 0.7 >=)
Part N°:
X numbers of values 0.7 >=
This output tells me the number of positive values (0.7 >=) for each ID.
The pseudocode I believe would be something like this:
foreach ID in LIST
foreach LINE in FILE
if (ID is in LINE)
... count the line ...
end foreach LINE
end foreach ID
I´m looking for any suggestion (for a package or script idea) or comment to create a better script.
Thanks! Best!
To count the number of lines, for each part, that match some condition on a certain column, you can just loop over the lines, skip the header, parse the part number, and use an array to count the number of lines matching for each part.
After this you can just loop over the counts recorded in the array and print them out in your specific format.
#!/usr/bin/perl
use strict;
use warnings;
my $part = 0;
my #cnt_part;
while(my $line = <STDIN>) {
if($. == 1) {
next;
}elsif($line =~ m{^\(PART (\d+)\)}) {
$part = $1;
}else {
my #cols = split(m{\s+},$line);
if(#cols == 6) {
my $potential = $cols[3];
if(0.7 <= $potential) {
$cnt_part[$part]++;
};
};
};
};
for(my $i=1;$i<=$#cnt_part;$i++){
print "Part $i:\n";
print "$cnt_part[$i] (values 0.7 <=)\n";
};
To run it, just pipe the entire file through the Perl script:
cat in.txt | perl count.pl
and you get an output like this:
Part 1:
1 (values 0.7 <=)
Part 2:
2 (values 0.7 <=)
Part 3:
2 (values 0.7 <=)
If you want to also display the counts into words, you can use Lingua::EN::Numbers (see this program ) and you get an output very similar to the one in your post:
Part 1:
1 (one values 0.7 <=)
Part 2:
2 (two values 0.7 <=)
Part 3:
2 (two values 0.7 <=)
All the code in this post is also available here.

How to convert output of Emboss:Palindrome into gff/bed file (perl)

I am sorry ton ask this kind of stupid question but I could not find it by myself... I learned perl a while ago and I am a little lost.
I want to convert this kind of output :
Palindromes of: seq1
Sequence length is: 24
Start at position: 1
End at position: 24
Minimum length of Palindromes is: 6
Maximum length of Palindromes is: 12
Maximum gap between elements is: 6
Number of mismatches allowed in Palindrome: 0
Palindromes:
1 aaaaaaaaaaa 11
|||||||||||
24 ttttttttttt 14
Palindromes of: seq2
Sequence length is: 15
Start at position: 1
End at position: 15
Minimum length of Palindromes is: 6
Maximum length of Palindromes is: 12
Maximum gap between elements is: 6
Number of mismatches allowed in Palindrome: 0
Palindromes:
1 aaaaaac 7
|||||||
15 ttttttg 9
Into a gff or bed file :
seq1 1 24
seq2 1 15
I found a perl module to do it : https://metacpan.org/pod/Bio::Tools::GFF
This is my little script :
#!/usr/bin/perl
use strict;
use warnings 'all';
use Bio::Tools::EMBOSS::Palindrome;
use Bio::Tools::GFF;
my $filename = "truc.pal";
# a simple script to turn palindrome output into GFF3
my $parser = Bio::Tools::EMBOSS::Palindrome->new(-file => $filename);
my $out = Bio::Tools::GFF->new(-gff_version => 3,
-file => ">$filename.gff");
while( my $seq = $parser->next_seq ) {
for my $feat ( $seq->get_SeqFeatures ) {
$out->write_feature($feat);
}
}
This is the result :
##gff-version 3
seq1 palindrome similarity 14 24 . - 1 allowed_mismatches=0;end=24;maximum gap=6;maximum_length=12;minimum_length=6;seqlength=24;start=1
seq2 palindrome similarity 9 15 . - 1 allowed_mismatches=0;end=15;maximum gap=6;maximum_length=12;minimum_length=6;seqlength=15;start=1
The issue is : I want to have it the result the start and the end of the palindrome and the specific position in the last line.
Exemple of what I want:
##gff-version 3
seq1 palindrome similarity 1 24 . - 1 mismatches=0;gap_positions=11-14;gap_size=3
seq2 palindrome similarity 1 15 . - 1 mismatches=0;gap_positions=7-9;gap_size=2
Thank you in advance.

Split File into chunks keeping complete lines in solaris

How can I split a file into 3 with equal (or almost equal) number of lines without breaking a line.
for example split a file of 25 lines into 3 files of 9,8 and 8 lines each.
I know of split -n l/3 but does not work on Solaris10.
Tried some stuff i got online but did not give desired result like:
!/usr/bin/ksh
fspec=~/input.list
num_files=3
total_lines=$(wc -l <${fspec})
((lines_per_file = (total_lines + num_files - 1) / num_files))
split -l ${lines_per_file} ${fspec} files.
Here is a generic solution for you in awk
awk '{a[NR]=$0} END {t=int (NR/s);r=((NR/s-t)*s);while (n<s) for (i=t*n+++1;i<=t*n;i++) print a[i] > "file"n;while (i++<=NR) print a[i-1] > "file"n}' s=3 infile
This splits the infile to s numbers of file. If you set s=3 you get file1 file2 file3
The data that does not divide up, ends up in last file.
Example
cat number
1 one
2 two
3 three
4 four
5 five
6 six
7 seven
8 eight
9 nine
10 ten
awk '{a[NR]=$0} END {t=int (NR/s);r=((NR/s-t)*s);while (n<s) for (i=t*n+++1;i<=t*n;i++) print a[i] > "file"n;while (i++<=NR) print a[i-1] > "file"n}' s=3 number
cat file1
1 one
2 two
3 three
cat file2
4 four
5 five
6 six
cat file3
7 seven
8 eight
9 nine
10 ten

Output from calculation is messed in perl one-liner

I'm trying to do some calculations on the columns of a tab delimited file using this perl one-liner:
perl -ape 'if (/^\d/) { s/$F[2]/$F[2]\/$F[4]/e && s/$F[3]/$F[3]\/$F[4]/e}' infile
the idea is to get A and B columns divided by C column
infile:
X Y A B C
5001 3 1.03333 0.652549 4215
6001 4 1.2 0.723137 4870
7001 2 1 0.807843 5153
8001 2 1 0.807843 5355
9001 2 1 0.807843 5389
10001 2 1 0.807843 4955
11001 7 1.7671 1.05573 4966
12001 17 8.18802 4.72554 5124
But the output is this:
X Y A B C
5001 3 0.000245155397390273 0.000154815895610913 4215
6001 4 0.000246406570841889 0.000148488090349076 4870
7000.000194061711624297 2 1 0.000156771395303707 5153
8000.000186741363211951 2 1 0.000150857703081232 5355
9000.000185563184264242 2 1 0.000149905919465578 5389
0.0002018163471241170001 2 1 0.000163035923309788 4955
11001 7 0.000355839710028192 0.000212591623036649 4966
12001 17 0.00159797423887588 0.000922236533957845 5124
What is going on on the 3rd to 6th lines? How can manage to fix this?
Thanks.
EDIT:
I removed the /e option from the substitute command and it seems that the calculation is being performed on the wrong column.
perl -ape 'if (/^\d/) { s/$F[2]/$F[2]\/$F[4]/ && s/$F[3]/$F[3]\/$F[4]/}' infile
X Y A B C
5001 3 1.03333/4215 0.652549/4215 4215
6001 4 1.2/4870 0.723137/4870 4870
7001/5153 2 1 0.807843/5153 5153
8001/5355 2 1 0.807843/5355 5355
9001/5389 2 1 0.807843/5389 5389
1/49550001 2 1 0.807843/4955 4955
11001 7 1.7671/4966 1.05573/4966 4966
12001 17 8.18802/5124 4.72554/5124 5124
13001 30 13.8763/5138 8.05385/5138 5138
After substitution and evaluation, you have something like s/1/0.000194061711624297/. So the s operator looks for a 1 and finds it as part of the first column. Whoops. If we add some \b word-boundary markers, we can force the match part of the s operators to match a complete column, never just part of a column:
perl -ape 'if (/^\d/) { s/\b$F[2]\b/$F[2]\/$F[4]/e && s/\b$F[3]\b/$F[3]\/$F[4]/e}' infile
But that's still going to run into issues if it's possible for column X to equal column A or B. Better to just do the calculations and then replace the entire line by assigning to $_:
perl -ape 'if (/^\d/) { $F[2] /= $F[4]; $F[3] /= $F[4]; $_ = join(" ", #F); }'
Use sprintf instead of join if you want a particular format to the output.
Your basic problem is that you are substituting the value that is in column 3 and 4 whereever they appear in the whole line. For row 3, for example, you are doing s/1/1\/5153/e which affects the first occurrence of the digit 1 in the line, not necessarily the 1 that happens to be in column 3.
Try this:
perl -lane 'if ($F[4] =~ /[1-9]/) { $F[2] /= $F[4]; $F[3] /= $F[4] } print join "\t", #F' infile
If you want to limit the precision, do something like $F[2] = sprintf "%f", $F[2]/$F[4]; ...

I dont understand this little perl code (if ...)

Can someone explain me this short pearl code?
$batstr2 = "empty" if( $status2 & 4 );
What say the if statement ?
Already answered many times, for the case if you don't know what is the Bitwise And, here is a small example:
perl -e 'print "dec\t bin\t&4\n";printf "%d\t%8b\t%-8b\n", $_, $_, ($_ & 4) for (0..8);'
prints:
dec bin &4
0 0 0
1 1 0
2 10 0
3 11 0
4 100 100
5 101 100
6 110 100
7 111 100
8 1000 0
as you can see, when the 3rb bit from right is 1 - the $num & 4 is true.
That's using the if as a statement modifier. It's roughly the same as
if ($status & 4) {
$batstr2 = "empty";
}
and exactly the same as
($status & 4) and ($batstr2 = "empty");
a variety of constructs can be used as statement modifiers, including: if, unless, while, until, for, when. These modifiers can't be stacked (foo() if $bar for #baz won't work), you are limited for one modifer per simple statement.
That's a bitwise and - http://perldoc.perl.org/perlop.html#Bitwise-And . $status2 is being used as a bit mask and it sets $batstr2 to 'empty' if the bit is set.
It sets $batstr2 to "empty" if the 3rd least significant bit of $status2 is set - it is a logical AND mask.