concat columns in perl

concat columns in perl - perl

Each iteration in my perl code generates a vector of 5.
Output of first iteration is
out1
1
2
3
4
5
The second iterations generates same length of vector.
out2
10
20
30
40
50
and then it runs for nth time
out n
100
200
300
400
500
I want to merge these columns and have the final output in a tabular format or matrix format if you like:
out1 out2 ... outn
1 10 100
2 20 200
3 30 300
4 40 400
5 50 500
I tried splitting and then using the push but it prints "(101" and only do it once and not for all 20. I also have no idea where the "(101" comes from.
Any suggestions?

First, put all those output lists to a list. Second, iterate on that list: output every first element of each element-list in the first iteration, output every second element of each element-list in the second iteration, and so on.
For example
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #lists;
for my $i (1..10) {
my #list;
push #list, $_ * $i for (1..5);
push #lists, \#list;
}
$Data::Dumper::Indent = 0;
print Dumper(\#lists), "\n\n";
while (#{$lists[0]}) {
for my $list (#lists) {
print shift #$list, "\t";
}
print "\n";
}
Output:
$ perl t.pl
$VAR1 = [
[1,2,3,4,5],
[2,4,6,8,10],
[3,6,9,12,15],
[4,8,12,16,20],
[5,10,15,20,25],
[6,12,18,24,30],
[7,14,21,28,35],
[8,16,24,32,40],
[9,18,27,36,45],
[10,20,30,40,50]
];
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
Note: The output of Data::Dumper has been edited to make it more compact.

Save your vector information to an array of array as you do your processing. Then you can output the rows using a simple join:
use strict;
use warnings;
my #rows;
for my $i (1..10) {
my #vector = map {$i * $_} (1..5);
push #{$rows[$_]}, $vector[$_] for (0..$#vector);
}
for my $row (#rows) {
print join(" ", map {sprintf "%-3s", $_} #$row), "\n";
}
Outputs:
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
Note: It'd be a lot easier to advise if you provided code and actual data.

Related

Looping through a perl array

I am trying to:
Populate 10 elements of the array with the numbers 1 through 10.
Add all of the numbers contained in the array by looping through the values contained in the array.
For example,
it would start off as 1, then the second number would be 3 (1 plus 2), and then the next would be 6 (the existing 3 plus the new 3)
This is my current code
#!/usr/bin/perl
use warnings;
use strict;
my #b = (1..10);
for(#b){
$_ = $_ *$_ ;
}
print ("The total is: #b\n")
and this is the result
The total is: 1 4 9 16 25 36 49 64 81 100
What im looking for is:
The total is: 1 3 6 10 etc..

The shown sequence has for each element: its index + 1 + value at the previous index
perl -wE'#b = 1..10; #r = 1; $r[$_] = $_+1 + $r[$_-1] for 1..$#b; say "#r"'
The syntax $#name is for the last index in the array #name.
If the array is changed in place, as shown, then there is no need to initialize
perl -wE'#b = 1..10; $b[$_] = $_+1 + $b[$_-1] for 1..$#b; say "#b"'
Both print
1 3 6 10 15 21 28 36 45 55
As a script
use warnings;
use strict;
use feature 'say';
my #seq = 1..10;
for my $i (1..$#seq) {
$seq[$i] = $i+1 + $seq[$i-1];
}
say "#seq";

$ perl -E'say "The total is: ",join" ",map$sum+=$_,1..10'
The total is: 1 3 6 10 15 21 28 36 45 55

Counting nucleotide frequency using perl script [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have this perl script below to calculate sequence length and their frequency along with nucleotide frequency(A,T,G and C). This script works fine for a file with large number of sequences, but it does not give the right result for a file of small size like this:
infile.fasta
>NC_013116_1051_1114
TTGTCCCTTTGAGTCTCTGG
>NC_013116_1051_1114
GCGCAGCCGATATGGATAA
>NC_013116_1051_1114
TCGAGACTTTGTAATGTTTGGG
>NC_013116_1051_1114
TATTCCACGTCAGGTGCTTTT
>NC_013116_1051_1114
TAGAGCCGATTCCAGACTGTTCC
>NC_013116_1051_1114
TACAGGACCAAGCTCTTCACTC
>NC_013116_1051_1114
CCGTCAAGTTCAGCTCCAATAA
>NC_013116_6_301
CCACGCAACGGACAATCAAACA
>NC_013116_6_301
GGACACTTCCAACTATAAATA
>NC_013116_6_301
CCACGCAACGGACAATCAAACA
>NC_013116_1051_1114
GCTCTTCACTCTTCCTCGTCT
>NC_013116_1051_1114
TTGGGAAAAAGAAGTTGCTGCAGC
>NC_013116_1051_1114
TCGCAGTATCTCTGAAGTTG
count.pl
#!/usr/bin/perl -w
#usage ./count.pl infile min_length max_length
#usage ./count.pl infile 18 34
my $min_len = $ARGV[1];
my $max_len = $ARGV[2];
my $read_len = 0;
my #lines = ("header1","sequence","header2","quality");
my #lray = ();
my $count = 0;
my $total = 0;
my $i = 0;
my #Aray = ();
my #Cray = ();
my #Gray = ();
my #Tray = ();
my$FN = "";
for ($i=$min_len; $i<=$max_len; $i++){
$lray[$i] = 0;
}
open (INFILE, "<$ARGV[0]") || die "couldn't open input file!";
while (<INFILE>) {
$lines[$count] = $_;
chomp($lines[$count]);
$count++;
if($count eq 4){
$read_len = length($lines[1]);
# print "$read_len $lines[1]\n";
$FN = substr $lines[1], 0, 1;
$lray[$read_len]++;
if ($FN eq "T") { $Tray[$read_len]++;}
else {
if ($FN eq "A"){ $Aray[$read_len]++;}
else {
if ($FN eq "C"){ $Cray[$read_len]++;}
else {
if ($FN eq "G"){ $Gray[$read_len]++;}
}
}
}
$count = 0;
}
}
print "length\tnumber\tA\tC\tG\tT\n";
for ($i=$min_len; $i<=$max_len; $i++){
print "$i\t$lray[$i]\t$Aray[$i]\t$Cray[$i]\t$Gray[$i]\t$Tray[$i]\n";
}
exit;
This is the type of result I get from a big file with many sequences.
length number A C G T
18 4473 542 710 471 2750
19 12647 990 1680 1103 8874
20 31194 3010 3354 2743 22087
21 61214 6288 7196 5784 41946
22 128642 14596 11902 12518 89626
23 65190 6859 6525 7773 44033
24 10012 611 1401 1112 6888
25 1406 231 192 435 548
26 661 169 91 105 296
27 407 126 81 65 135
28 602 391 49 68 94
29 520 54 30 370 66
30 175 26 93 18 38
31 156 35 28 29 64
32 106 22 16 24 44
33 97 45 17 16 19
34 0
I would really appreciate if you could help me correct this code. Thanks

Trying do not reinvent the wheel, so, using the FAST module, got:
use 5.014;
use warnings;
use FAST::Bio::SeqIO;
my $fasta = FAST::Bio::SeqIO->new(-file => "infile.fasta", -format => 'Fasta');
my $seqnum=0;
while ( my $seq = $fasta->next_seq() ) {
my $stats;
$stats->{len} = length($seq->seq);
$stats->{$_}++ for split //, $seq->seq;
say ++$seqnum, " #$stats{qw(len A C G T)}";
}
The above, for your demo infile.fasta prints:
1 20 1 5 5 9
2 19 6 4 6 3
3 22 4 2 7 9
4 21 3 5 4 9
5 23 5 7 5 6
6 22 6 8 3 5
7 22 7 7 3 5
8 22 10 8 3 1
9 21 9 5 2 5
10 22 10 8 3 1
11 21 1 9 2 9
12 24 8 3 8 5
13 20 4 4 5 7
or the
use 5.014;
use warnings;
use FAST::Bio::SeqIO;
my $fasta = FAST::Bio::SeqIO->new(-file => "file.fasta", -format => 'Fasta');
my $stats;
while ( my $seq = $fasta->next_seq() ) {
my $len = length($seq->seq);
$stats->{$len}{count}++;
$stats->{$len}{$_}++ for split //, $seq->seq;
}
say "Length $_ ($stats->{$_}->{count} times) Letters freq: #{$stats->{$_}}{qw(A C G T)}" for sort { $a <=> $b } keys %$stats;
produce:
Length 19 (1 times) Letters freq: 6 4 6 3
Length 20 (2 times) Letters freq: 5 9 10 16
Length 21 (3 times) Letters freq: 13 19 8 23
Length 22 (5 times) Letters freq: 37 33 19 21
Length 23 (1 times) Letters freq: 5 7 5 6
Length 24 (1 times) Letters freq: 8 3 8 5
and so on...

I want to add new line after every 4 spaces

i am generating 20 numbers and then i am shuffling it
perl -e 'foreach(1..20){print ",$_ "} '
| perl -MList::Util=shuffle -F',' -lane 'print shuffle #F'
and the output is:-
19 15 11 9 8 13 18 4 2 7 5 20 10 14 3 16 1 17 6 12
Now i want the output something like this:-
19 15 11 9
8 13 18 4
2 7 5 20
...
Any help will be appreciated

Doing that in several steps on the command line is ... strange. You can just do it in one program.
use strict;
use warnings;
use List::Util 'shuffle';
my $count = 1;
foreach my $i ( shuffle 1 .. 20) {
print "$i ";
print "\n\n" unless $count++ % 4;
}
This shuffles the list of 1 to 20 directly and then prints each item, but prints two linebreaks after every four. The % is the modulo operator that returns the left-over from a division by 4. So whenever the $count is divisible by 4, it returns 0, and the print kicks in. On the command line it would be like this:
$ perl -MList::Util=shuffle -e '$c=0; for (shuffle 1..20) { print"$_ "; print "\n\n" unless $c++%4}'
Here's the output:
11 20 8 17
10 18 19 6
1 14 7 5
13 16 4 3
9 2 15 12

You could also use a splice call to chop the result of the shuffle list up as you want and print it that way if you didn't want to code an explicit counter. Something like this:
perl -MList::Util=shuffle -e '#list=shuffle(1..20); while (#ret_line = splice(#list, 0, 4)) {print "#ret_line\n\n"}'

I'd put the numbers into an array and use splice to remove them in blocks of four:
use strict;
use warnings 'all'
use List::Util 'shuffle';
my #nums = shuffle 1 .. 20;
print join(" ", splice #nums, 0, 4), "\n\n" while #nums;

Modifying Script to include the Count of a each time a name appears from a table

I have a script below that takes my FILE1 and parses out FILE2 only if the first column of FILE1 matches column number 10 of FILE2. So it will print out the rows I need. This part works great. The part I am having a tad bit of difficulty is inserting a sort of count for the output. The goal of the script is take column 10 at the end and produce an output. In my list there are 12 names and I want to get the count of each name. For the example below, I have used four names.
FILE1:
name1 15
name2 15
name2 30
name5 15
name4 10
name2 5
name2 5
FILE2:
23 15 5.4 1.3 5 55 128 21799 + 32 name2 1 77 0 1
23 20 5.4 1.3 5 55 128 7998 + 18 name4 1 77 0 1
23 20 5.4 1.3 6 55 128 9984 + 13 name4 1 77 1 1
23 20 5.4 1.3 7 55 128 7998 + 14 name5 1 77 2 1
23 20 5.4 1.3 6 55 128 994 + 14 name1 1 77 3
23 20 5.4 1.3 9 55 128 984 + 5 name7 1 77 4 1
23 20 5.4 1.3 5 55 128 99 + 5 name8 1 77 5 1
Expected Output
$VAR1 = {
'name1' => 1,
'name2' => 4,
'name4' => 1,
'name5' => 1,
};
5 55 128 21799 32 name2 77 0 1
5 55 128 7998 18 name4 77 0 1
6 55 128 9984 13 name4 77 1 1
7 55 128 7998 14 name5 77 2 1
6 55 128 994 14 name1 77 3 1
name1 1
name2 1
name4 2
name5 1
You can test the script it works. The part I am having difficulty with is inserting the count of each name based on the output. The print \%x is a way of checking if my original list was truly used as I am working with a much larger set of data. If someone could point me the right direction on how to modify my script without changing it drastically that would be great. I feel like this script fulfills the majority of my needs even if it is not the most efficient way of doing it.
use strict;
use Data::Dumper;
my %x;
open(FILE1, $ARGV[0]) or die "Cannot open the file: $!";
while (my $line = <FILE1>) {
my #array = split(" ", $line);
$x{$array[0]}++;
}
close FILE1;
print Dumper( \%x );
my %count;
open(FILE2, $ARGV[1]) or die "Cannot open the file: $!";
while (my $line = <FILE2>) {
my #name = split(" ", $line);
my $y = $name[9];
if ( $x{ $y } ) {
print join(" ", #name[4,5,6,7,9,11,12,13]), "\n";
$count{#name[9]}++;
}
}
print Dumper (\%count);
close FILE2;
exit;
Script now counts. Just need to debug.

the "minimal" change would be to set the elements of %x to 0 in the FILE1 loop, then check for exists $x{$y} in the FILE2 loop and do ++$x{$y} inside the condition body. Now at the end %x has the counts of all the occurrences.
The usual way (as mentioned in the comments of the question) would be to declare an additional %count and perform the same ++$count{$y} inside the if block as in the above method.
The first has the advantage and disadvantage (depending on your needs) of reporting the count even when the name has zero found occurrences.

Iteration of an algorithm

I wrote a program that load the data from a 2 columns file, made an algorithm calculation and then write the pair of elements in the file that have this coefficient and put them into an array called #blackPair. I would like to iterate N times the algorithm taking the datas from the file but not those that are in the #blackPair array.
I thought of something like this:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $iter;
my $startNode;
my $endNode;
my %k;
my %end;
my %node;
my %edge;
my #blackPair=();
my $counts=0;
my $inputfile = "file3";
################# DATA ABSORTION
open(DAT,$inputfile) || die("Could not open file!");
while(<DAT>)
{
my ($entry) = $_;
chomp($entry);
my ($startNode, $endNode) = split(/ /,$entry);
$k{$endNode}++;
$k{$startNode}++;
$edge{$startNode}{$endNode}=1;
$edge{$endNode}{$startNode}=1;
}
################# ALGORITHM
my $minCentrality=2;
foreach my $i (keys %edge) {
foreach my $j (keys %{$edge{$i}}){
my #couple =($j,$i);
if($i<$j){
if (($k{$i}-1) !=0 && ($k{$j}-1) !=0){
my $triangleCount=0;
#couple=($i,$j) if ($k{$i}<$k{$j});
foreach (keys %{$edge{$couple[0]}}){
$triangleCount++ if exists $edge{$couple[1]}{$_};
}
my $centrality=($triangleCount+1)/($k{$couple[0]}-1);
if ($centrality<$minCentrality){
$minCentrality=$centrality;
#blackPair=#couple;
}
}
}
}
}
foreach (#blackPair){
say;
}
Close(DAT);
The file is the following:
1 2
1 3
1 4
1 5
1 6
1 9
2 3
4 5
5 9
6 7
6 8
6 16
7 8
9 10
9 11
10 11
10 12
10 14
11 12
11 13
12 13
12 14
14 15
16 17
16 18
17 18
17 19
18 19
18 20
19 20
The first pair that appear in the #blackPair are the 6 and 1. After found them I would like that the program restart the search but avoiding to charge into the array the pairs 1 and 6. Doing that the second pair would be 6 and 16. I would like to repeat this process N times (for example N = 4). I thought to put before the while(<DAT>) in the "DATA ABSORTION" another while(counts<=4){ and inside the while(<DAT>) an if(<DATA> != #blackPair){. There is what I thought
while(counts <= 4) {
while(<DAT>)
{
if(<DAT> != #blackPair){
my ($entry) = $_;
chomp($entry);
.....
}
#### ALGORITHM
counts++;
}
But it doesn't work. Any help?
After 4 iteration, in the #blackPair there should be the following pairs:
6 1
16 6
9 1
9 5

<DAT> != #blackPair is definitely not what you want.
!= is for numerical comparison. You want to do either string comparison (the ne operator) or maybe use the smart match operator to check for list membership (~~ \#blackPair)
but using the right operator won't really help you, because #blackPair already has mangled the input data (#blackPair might contain the elements (6,1), corresponding to an original input line of "1 6\n")
Instead, how about updating your graph in each iteration?
for my $count (1..4) {
my $minCentrality = 2;
...
say join " ", #blackPair;
# now update the graph
delete $edge{$blackPair[0]}{$blackPair[1]};
delete $edge{$blackPair[1]}{$blackPair[0]};
$k{$blackPair[0]}--;
$k{$blackPair[1]}--;
} # next iteration

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

concat columns in perl - perl

Related

Looping through a perl array

Counting nucleotide frequency using perl script [closed]

I want to add new line after every 4 spaces

Modifying Script to include the Count of a each time a name appears from a table

Iteration of an algorithm

Categories

Resources