Print the missing number in a unique sequential list with an arbitrary starting range or starting from 1 - perl

This question is similar to How can I find the missing integers in a unique and sequential list (one per line) in a unix terminal?.
The difference being is that I want to know if it is possible to specify a starting range to the list
I have noted the following provided solutions:
awk '{for(i=p+1; i<$1; i++) print i} {p=$1}' file1
and
perl -nE 'say for $a+1 .. $_-1; $a=$_'
file1 is as below:
5
6
7
8
15
16
17
20
Running both solutions, it gives the following output:
1
2
3
4
9
10
11
12
13
14
18
19
Note that the output start printing from 1.
Question is how to pass an arbitrary starting/minimum to start with and if nothing is provided, assume the number 1 as the starting/minimum number?
9
10
11
12
13
14
18
19
Yes, sometimes you will want the starting number to be 1 but sometimes you will want the starting number as the least number from the list.

You can use your awk script, slightly modified, and pass it an initial p value with the -v option:
$ awk 'BEGIN{p=p<1?1:p} {for(i=p; i<$1; i++) print i} {p=p<=$1?$1+1:p}' file1
1
2
3
4
9
10
11
12
13
14
18
19
$ awk -v p=10 'BEGIN{p=p<1?1:p} {for(i=p; i<$1; i++) print i} {p=p<=$1?$1+1:p}' file1
10
11
12
13
14
18
19
The BEGIN block initializes p to 1 if it is not specified or set to 0 or a negative value. The loop starts at p instead of p+1, and the last block assigns $1+1 to p (instead of $1), if and only if p is less or equal $1.
This assumes that the default (1) is the minimum starting number you would want. If you would like to start from 0 or even from a negative number just replace BEGIN{p=p<1?1:p} by BEGIN{p=(p==""?1:p)}:
$ awk -v p=-2 'BEGIN{p=(p==""?1:p)} {for(i=p; i<$1; i++) print i} {p=p<=$1?$1+1:p}' file1
-2
-1
0
1
...

Slight variations of those one-liners to include a start point:
awk
# Optionally include start=NN before the first filename
$ awk 'BEGIN { start= 1 }
$1 < start { next }
$1 == start { p = start }
{ for (i = p + 1; i < $1; i++) print i; p = $1}' start=5 file1
9
10
11
12
13
14
18
19
$ awk 'BEGIN { start= 1 }
$1 < start { next }
$1 == start { p = start }
{ for (i = p + 1; i < $1; i++) print i; p = $1}' file1
1
2
3
4
9
10
11
12
13
14
18
19
perl
# Optionally include -start=NN before the first file and after the --
$ perl -snE 'BEGIN { $start //= 1 }
if ($_ < $start) { next }
if ($_ == $start) { $a = $start }
say for $a+1 .. $_-1; $a=$_' -- -start=5 file1
9
10
11
12
13
14
18
19
$ perl -snE 'BEGIN { $start //= 1 }
if ($_ < $start) { next }
if ($_ == $start) { $a = $start }
say for $a+1 .. $_-1; $a=$_' -- file1
1
2
3
4
9
10
11
12
13
14
18
19

Using Raku (formerly known as Perl_6)
raku -e 'my #a=lines.map: *.Int; .put for (#a.Set (^) #a.minmax.Set).sort.map: *.key;'
Sample Input:
5
6
7
8
15
16
17
20
Sample Output:
9
10
11
12
13
14
18
19
Here's an answer coded in Raku, a member of the Perl-family of programming languages. No, it doesn't address the OP's request for a user-definable starting point. Instead the code above is a general solution that computes the input's minimum Int and counts up from there, returning any missing Ints found up--to the input's maximum Int.
Really need a user-defined lower limit? Try the following code, which allows you to set a $init variable:
~$ raku -e 'my #a=lines.map: *.Int; my $init = 1; .put for (#a.Set (^) ($init..#a.max).Set).sort.map: *.key;'
1
2
3
4
9
10
11
12
13
14
18
19
For explanation and shorter code (including single-line return and/or return without sort), see the link below.
https://stackoverflow.com/a/72221301/7270649
https://raku.org

not as elegant as i hoped :
< file | mawk '
BEGIN { _= int(_)^(\
( ORS = "")<_)
} { ___[ __= $0 ] }
END {
do {
print _ in ___ \
? "" : _ "\n"
} while(++_ < __) }' \_=10
10
11
12
13
14
18
19

Related

Looping through a perl array

I am trying to:
Populate 10 elements of the array with the numbers 1 through 10.
Add all of the numbers contained in the array by looping through the values contained in the array.
For example,
it would start off as 1, then the second number would be 3 (1 plus 2), and then the next would be 6 (the existing 3 plus the new 3)
This is my current code
#!/usr/bin/perl
use warnings;
use strict;
my #b = (1..10);
for(#b){
$_ = $_ *$_ ;
}
print ("The total is: #b\n")
and this is the result
The total is: 1 4 9 16 25 36 49 64 81 100
What im looking for is:
The total is: 1 3 6 10 etc..
The shown sequence has for each element: its index + 1 + value at the previous index
perl -wE'#b = 1..10; #r = 1; $r[$_] = $_+1 + $r[$_-1] for 1..$#b; say "#r"'
The syntax $#name is for the last index in the array #name.
If the array is changed in place, as shown, then there is no need to initialize
perl -wE'#b = 1..10; $b[$_] = $_+1 + $b[$_-1] for 1..$#b; say "#b"'
Both print
1 3 6 10 15 21 28 36 45 55
As a script
use warnings;
use strict;
use feature 'say';
my #seq = 1..10;
for my $i (1..$#seq) {
$seq[$i] = $i+1 + $seq[$i-1];
}
say "#seq";
$ perl -E'say "The total is: ",join" ",map$sum+=$_,1..10'
The total is: 1 3 6 10 15 21 28 36 45 55

I want to add new line after every 4 spaces

i am generating 20 numbers and then i am shuffling it
perl -e 'foreach(1..20){print ",$_ "} '
| perl -MList::Util=shuffle -F',' -lane 'print shuffle #F'
and the output is:-
19 15 11 9 8 13 18 4 2 7 5 20 10 14 3 16 1 17 6 12
Now i want the output something like this:-
19 15 11 9
8 13 18 4
2 7 5 20
...
Any help will be appreciated
Doing that in several steps on the command line is ... strange. You can just do it in one program.
use strict;
use warnings;
use List::Util 'shuffle';
my $count = 1;
foreach my $i ( shuffle 1 .. 20) {
print "$i ";
print "\n\n" unless $count++ % 4;
}
This shuffles the list of 1 to 20 directly and then prints each item, but prints two linebreaks after every four. The % is the modulo operator that returns the left-over from a division by 4. So whenever the $count is divisible by 4, it returns 0, and the print kicks in. On the command line it would be like this:
$ perl -MList::Util=shuffle -e '$c=0; for (shuffle 1..20) { print"$_ "; print "\n\n" unless $c++%4}'
Here's the output:
11 20 8 17
10 18 19 6
1 14 7 5
13 16 4 3
9 2 15 12
You could also use a splice call to chop the result of the shuffle list up as you want and print it that way if you didn't want to code an explicit counter. Something like this:
perl -MList::Util=shuffle -e '#list=shuffle(1..20); while (#ret_line = splice(#list, 0, 4)) {print "#ret_line\n\n"}'
I'd put the numbers into an array and use splice to remove them in blocks of four:
use strict;
use warnings 'all'
use List::Util 'shuffle';
my #nums = shuffle 1 .. 20;
print join(" ", splice #nums, 0, 4), "\n\n" while #nums;

Merge Overlapping coordinate in a table

I have a table with five columns. I want to merge start and end columns if they overlap, and have same RNAiclone and target_mRNA name. If the start-end of two entries are: (A) 1-10, 11-20 means overlapping range; while (B)1-10, 12-20 means no-overlapping range. RNAilength(nt) is same for similar RNAiclone.
input.txt
RNAiclone RNAilength(nt) target_mRNA start end
siRNA1 10 mRNA1 1 10
siRNA1 10 mRNA1 11 20
siRNA1 10 mRNA1 17 30
siRNA1 10 mRNA2 18 19
siRNA2 20 mRNA2 1 10
siRNA2 20 mRNA2 9 100
expected output.txt
RNAiclone RNAilength(nt) target_mRNA start end
siRNA1 10 mRNA1 1 30
siRNA1 10 mRNA2 18 19
siRNA2 20 mRNA2 1 100
program.awk
BEGIN{
i=0;
s="";
m="";
OFS="\t";
}
{
if (s!=$1 && m!=$3){
if (s != "" && m!= ""){
combine(chr,s,m,i);
}
i=0;
s="";
}
s=$1;
m=$3;
chr[i,0]=$4;
chr[i,1]=$5;
i++
}
END{
combine(chr,s,m,i);
}
function combine(arr,s,m,i) {
j=0;
new[j,0]=arr[0,0];
new[j,1]=arr[0,1];
for (k=1;k<i;k++)
{
if ((arr[k,0]<=new[j,1])&&(arr[k,1]>=new[j,1])){
new[j,1]=arr[k,1];
}
else if (arr[k,0]>new[j,1]){
j++;
new[j,0]=arr[k,0];
new[j,1]=arr[k,1];
}
}
for (n=0;n<=j;n++){
print s,m,new[n,0],new[n,1]
}
}
I am running the script using command "wk -f program.awk input.txt > output.txt", but I am not getting the expected result. Could you kindly help me to correct the script. Thank you very much.
It's easier for me to try a reboot than debug your code. Here's an alternate in awk. Put the following into an executable awk file:
#!/usr/bin/awk -f
BEGIN { OFS="\t" }
/^RNA/ { print; next }
{
key = $1 OFS $2 OFS $3
# new key or new range, print last line
if( key != l_key || l_end+1 < $4 ) {
if( FNR>1 && l_key ) { print l_key, s[l_key], l_end }
s[key]=$4
}
l_key=key
l_end=$5
}
END { print l_key, s[l_key], l_end } # print final range
Calling that file awko (and chmod +x awko), then running like awko input.txt gives the following:
RNAiclone RNAilength(nt) target_mRNA start end
siRNA1 10 mRNA1 1 30
siRNA1 10 mRNA2 18 19
siRNA2 20 mRNA2 1 100
which can be re-aligned with awko input.txt | column -t:
RNAiclone RNAilength(nt) target_mRNA start end
siRNA1 10 mRNA1 1 30
siRNA1 10 mRNA2 18 19
siRNA2 20 mRNA2 1 100
so the final command would be
awko data | column -t > output.txt
This has at least the following assumptions:
data is sorted by columns 1-4 as in the example input.txt
the ranges behave as if end always >= start (I didn't test anything else)
You've already gotten your awk answer, but here is a version that does the same in perl:
use strict;
use warnings;
my $last;
while (<>) {
my #cols = split;
my $key = join ' ', splice #cols, 0, 3;
my ($start, $end) = #cols;
if ($last) {
if ($last->[0] ne $key || $last->[2] < $start - 1) {
print "#$last\n"
} else {
$start = $last->[1];
$end = $last->[2] if $end < $last->[2];
}
}
$last = [$key, $start, $end];
}
print "#$last\n";
Executing this program perl merge_range.pl file.dat gives the following results:
RNAiclone RNAilength(nt) target_mRNA start end
siRNA1 10 mRNA1 1 30
siRNA1 10 mRNA2 18 19
siRNA2 20 mRNA2 1 100
Assumes data is sorted like in your example data. To format the columns, just pipe through column -t.

Extraction of rows which have a value > 50

How to select those lines which have a value < 10 value from a large matrix of 21 columns and 150 rows.eg.
miRNameIDs degradome AGO LKM......till 21
osa-miR159a 0 42 42
osa-miR396e 0 7 9
vun-miR156a 121 77 4
ppt-miR156a 12 7 4
gma-miR6300 118 2 0
bna-miR156a 0 114 48
gma-miR156k 0 46 1
osa-miR1882e 0 7 0
.
.
.
Desired output is:-
miRNameIDs degradome AGO LKM......till 21
vun-miR156a 121 77 4
gma-miR6300 118 2 0
bna-miR156a 0 114 48
.
.
.
till 150 rows
Using a perl one-liner
perl -ane 'print if $. == 1 || grep {$_ > 50} #F[1..$#F]' file.txt
Explanation:
Switches:
-a: Splits the line on space and loads them in an array #F
-n: Creates a while(<>){...} loop for each “line” in your input file.
-e: Tells perl to execute the code on command line.
Code:
$. == 1: Checks if the current line is line number 1.
grep {$_ > 50} #F[1..$#F]: Looks at each entries from the array to see if it is greater than 50.
||: Logical OR operator. If any of our above stated condition is true, it prints the line.

Iteration of an algorithm

I wrote a program that load the data from a 2 columns file, made an algorithm calculation and then write the pair of elements in the file that have this coefficient and put them into an array called #blackPair. I would like to iterate N times the algorithm taking the datas from the file but not those that are in the #blackPair array.
I thought of something like this:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $iter;
my $startNode;
my $endNode;
my %k;
my %end;
my %node;
my %edge;
my #blackPair=();
my $counts=0;
my $inputfile = "file3";
################# DATA ABSORTION
open(DAT,$inputfile) || die("Could not open file!");
while(<DAT>)
{
my ($entry) = $_;
chomp($entry);
my ($startNode, $endNode) = split(/ /,$entry);
$k{$endNode}++;
$k{$startNode}++;
$edge{$startNode}{$endNode}=1;
$edge{$endNode}{$startNode}=1;
}
################# ALGORITHM
my $minCentrality=2;
foreach my $i (keys %edge) {
foreach my $j (keys %{$edge{$i}}){
my #couple =($j,$i);
if($i<$j){
if (($k{$i}-1) !=0 && ($k{$j}-1) !=0){
my $triangleCount=0;
#couple=($i,$j) if ($k{$i}<$k{$j});
foreach (keys %{$edge{$couple[0]}}){
$triangleCount++ if exists $edge{$couple[1]}{$_};
}
my $centrality=($triangleCount+1)/($k{$couple[0]}-1);
if ($centrality<$minCentrality){
$minCentrality=$centrality;
#blackPair=#couple;
}
}
}
}
}
foreach (#blackPair){
say;
}
Close(DAT);
The file is the following:
1 2
1 3
1 4
1 5
1 6
1 9
2 3
4 5
5 9
6 7
6 8
6 16
7 8
9 10
9 11
10 11
10 12
10 14
11 12
11 13
12 13
12 14
14 15
16 17
16 18
17 18
17 19
18 19
18 20
19 20
The first pair that appear in the #blackPair are the 6 and 1. After found them I would like that the program restart the search but avoiding to charge into the array the pairs 1 and 6. Doing that the second pair would be 6 and 16. I would like to repeat this process N times (for example N = 4). I thought to put before the while(<DAT>) in the "DATA ABSORTION" another while(counts<=4){ and inside the while(<DAT>) an if(<DATA> != #blackPair){. There is what I thought
while(counts <= 4) {
while(<DAT>)
{
if(<DAT> != #blackPair){
my ($entry) = $_;
chomp($entry);
.....
}
#### ALGORITHM
counts++;
}
But it doesn't work. Any help?
After 4 iteration, in the #blackPair there should be the following pairs:
6 1
16 6
9 1
9 5
<DAT> != #blackPair is definitely not what you want.
!= is for numerical comparison. You want to do either string comparison (the ne operator) or maybe use the smart match operator to check for list membership (~~ \#blackPair)
but using the right operator won't really help you, because #blackPair already has mangled the input data (#blackPair might contain the elements (6,1), corresponding to an original input line of "1 6\n")
Instead, how about updating your graph in each iteration?
for my $count (1..4) {
my $minCentrality = 2;
...
say join " ", #blackPair;
# now update the graph
delete $edge{$blackPair[0]}{$blackPair[1]};
delete $edge{$blackPair[1]}{$blackPair[0]};
$k{$blackPair[0]}--;
$k{$blackPair[1]}--;
} # next iteration