How to randomly pair items in a list - perl

I have a list of Accession numbers that I want to pair randomly using a Perl script below:
#!/usr/bin/perl -w
use List::Util qw(shuffle);
my $file = 'randomseq_acc.txt';
my #identifiers = map { (split /\n/)[1] } <$file>;
chomp #identifiers;
#Shuffle them and put in a hash
#identifiers = shuffle #identifiers;
my %pairs = (#identifiers);
#print the pairs
for (keys %pairs) {
print "$_ and $pairs{$_} are partners\n";
but keep getting errors.
The accession numbers in the file randomseq_acc.txt are:
1094711
1586007
2XFX_C
Q27031.2
P22497.2
Q9TVU5.1
Q4N4N8.1
P28547.2
P15711.1
AAC46910.1
AAA98602.1
AAA98601.1
AAA98600.1
EAN33235.2
EAN34465.1
EAN34464.1
EAN34463.1
EAN34462.1
EAN34461.1
EAN34460.1

I needed to add the closing right curly brace to be able to compile the script.
As arrays are indexed from 0, (split /\n/)[1] returns the second field, i.e. what follows newline on each line (i.e. nothing). Change it to [0] to make it work:
my #identifiers = map { (split /\n/)[0] } <$file>; # Still wrong.
The diamond operator needs a file handle, not a file name. Use open to associate the two:
open my $FH, '<', $file or die $!;
my #identifiers = map { (split /\n/)[0] } <$FH>;
Using split to remove a newline is not common. I'd probably use something else:
map { /(.*)/ } <$FH>
# or
map { chomp; $_ } <$FH>
# or, thanks to ikegami
chomp(my #identifiers = <$FH>);
So, the final result would be something like the following:
#!/usr/bin/perl
use warnings;
use strict;
use List::Util qw(shuffle);
my $filename = '...';
open my $FH, '<', $filename or die $!;
chomp(my #identifiers = <$FH>);
my %pairs = shuffle(#identifiers);
print "$_ and $pairs{$_} are partners\n" for keys %pairs;

Related

Perl: Read columns and convert to array

I am new to perl, trying to read a file with columns and creating an array.
I am having a file with following columns.
file.txt
A 15
A 20
A 33
B 20
B 45
C 32
C 78
I wanted to create an array for each unique item present in A with its values assigned from second column.
eg:
#A = (15,20,33)
#B = (20,45)
#C = (32,78)
Tried following code, only for printing 2 columns
use strict;
use warnings;
my $filename = $ARGV[0];
open(FILE, $filename) or die "Could not open file '$filename' $!";
my %seen;
while (<FILE>)
{
chomp;
my $line = $_;
my #elements = split (" ", $line);
my $row_name = join "\t", #elements[0,1];
print $row_name . "\n" if ! $seen{$row_name}++;
}
close FILE;
Thanks
Firstly some general Perl advice. These days, we like to use lexical variables as filehandles and pass three arguments to open().
open(my $fh, '<', $filename) or die "Could not open file '$filename' $!";
And then...
while (<$fh>) { ... }
But, given that you have your filename in $ARGV[0], another tip is to use an empty file input operator (<>) which will return data from the files named in #ARGV without you having to open them. So you can remove your open() line completely and replace the while with:
while (<>) { ... }
Second piece of advice - don't store this data in individual arrays. Far better to store it in a more complex data structure. I'd suggest a hash where the key is the letter and the value is an array containing all of the numbers matching that letter. This is surprisingly easy to build:
use strict;
use warnings;
use feature 'say';
my %data; # I'd give this a better name if I knew what your data was
while (<>) {
chomp;
my ($letter, $number) = split; # splits $_ on whitespace by default
push #{ $data{$letter} }, $number;
}
# Walk the hash to see what we've got
for (sort keys %data) {
say "$_ : #{ $data{$_ } }";
}
Change the loop to be something like:
while (my $line = <FILE>)
{
chomp($line);
my #elements = split (" ", $line);
push(#{$seen{$elements[0]}}, $elements[1]);
}
This will create/append a list of each item as it is found, and result in a hash where the keys are the left items, and the values are lists of the right items. You can then process or reassign the values as you wish.

How do I properly find double entries in two files in Perl?

Let's say I have two files with lists of ip-addresses. Lines in the first file are unique. Lines in the second may or may not be the same as in the first one.
What I need is to compare two files, and remove possible doubles from the second file in order to merge it with the base file later.
I've managed to write the following code and it seems to work properly, but I have a solid feeling that this code can be improved or I may be totally missing some important concept.
Are there any ways to solve the task without using complex data structures, i.e. hashrefs?
#!/usr/bin/perl
use strict;
use warnings;
my $base = shift #ARGV;
my $input = shift #ARGV;
my $res = 'result.txt';
open ("BASE","<","$base");
open ("INP","<","$input");
open ("RES", ">", "$res");
my $rhash = {}; # result hash
while (my $line = <BASE>) {chomp($line); $rhash->{$line}{'res'} = 1;} # create uniq table
while (my $line = <INP>) { chomp($line); $rhash->{$line}{'res'}++; $rhash->{$line}{'new'} = 1; } # create compare table marking it's entries as new and incrementing double keys
close BASE;
close INP;
for my $line (sort keys %$rhash) {
next if $line =~ /\#/; # removing comments
printf("%-30s%3s%1s", $line, $rhash->{$line}{'res'}, "\n") if $rhash->{$line}{'res'} > 1; # kinda diagnosti output of doubles
if (($rhash->{$line}{'new'}) and ($rhash->{$line}{'res'} < 2)) {
print RES "$line\n"; # printing new uniq entries to result file
}
}
close RES;
If I understand correctly file1 and file2 each contain ips (unique in each file) And you want to get ips in file2 not in file1. If so, then maybe the following code achieves your goal.
Although it seems your code will do it, this might be clearer.
#!/usr/bin/perl
use strict;
use warnings;
my $base = shift #ARGV;
my $input = shift #ARGV;
my $res = 'result.txt';
open ("BASE","<","$base") or die $!;
open ("INP","<","$input") or die $!;
open ("RES", ">", "$res") or die $!;
my %seen;
while (my $line = <BASE>) {
chomp $line;
$seen{$line}++;
}
close BASE or die $!;
while (my $line = <INP>) {
chomp $line;
print RES "$line\n" unless $seen{$line}; # only in file2 not in file1
}
close INP or die $!;
close RES or die $!;

Selecting records from a file based on keys from a second file

My first file looks like:
CHR id position
1 rs58108140 10583
1 rs189107123 10611
1 rs180734498 13302
1 rs144762171 13327
1 chr1:13957:D 13957
And my second file looks like:
CHR SNP POS RiskAl OTHER_ALLELE RAF logOR Pval
10 rs1999138 110140096 T C 0.449034245446375 0.0924443 1.09e-06
6 rs7741604 20839503 C A 0.138318264238111 0.127947 1.1e-06
8 rs1486006 82553172 G C 0.833130882716561 0.147456 1.12727730194884e-06
My script reads in the first file and stores it in an array, and then I would like to find rsIDs from column 2 of the first file that are in column 2 in the second file. I think I am having a problem with how I'm matching the expressions. Here is my script:
#! perl -w
use strict;
use warnings;
my $F = shift #ARGV;
my #snps;
open IN, "$F";
while (<IN>) {
next if m/CHR/;
my #L = split;
push #snps, [$L[0], $L[1], $L[2]] if $L[0] !~ m/[XY]/;
}
close IN;
open IN, "DIAGRAMv3sansWTCCCqc0clumpd_noTCF7L2regOrLeadOrPlt1em6clumps- CHR_SNP_POS_RiskAl_OtherAl_RAF_logOR_Pval.txt";
while (<IN>) {
my #L = split;
next if m/CHR/;
foreach (#snps) {
next if ($L[0] != ${$_}[0]);
# if not on same chromosome
if ($L[0] = ${$_}[0]) {
# if on same chromosome
if ($L[1] =~ ${$_}[1]) {
print "$L[0] $L[1] ${$_}[2]\n";
last;
}
}
}
}
Your code doesn't seem to correspond to your description. You are comparing both the first and second columns of the file rather than just the second.
The main problems are:
You use $L[0] = ${$_}[0] to compare the first columns. This will do an assigmment instead of a comparison. You should use $L[0] == ${$_}[0] instead or, better, $L[0] == $_->[0]
You use $L[1] =~ ${$_}[1] to compare the second columns. This will check whether ${$_}[1] is a substring of $L[1]. You could use anchors like $L[1] =~ /^${$_}[1]$/ but it's much better to just do a string comparison as $L[1] eq $_->[1]
The easiest way is to read the second file first so as to build a list of values that you want included from the first file. I have written it so that it does what your code looks like it's supposed to do, i.e. match the first two columns.
That would look like this
use strict;
use warnings;
use autodie;
my ($f1, $f2) = #_;
my %include;
open my $fh2, '<', $f2;
while (<$fh2>) {
my #fields = split;
my $key = join '|', #fields[0,1];
++$include{$key};
}
close $fh2;
open my $fh1, '<', $f1;
while (<$fh1>) {
my #fields = split;
my $key = join '|', #fields[0,1];
print "#fields[0,1,2]\n" if $include{$key};
}
close $fh1;
output
Unfortunately your choice of sample data doesn't include any records in the first file that have matching keys in the second, so there is no output!
Update
This is a corrected version of your own program. It should work, but it is far more efficient and concise to use hashes, as above
use strict;
use warnings;
use autodie;
my ($filename) = #ARGV;
my #snps;
open my $in_fh, '<', $filename;
<$in_fh>; # Discard header line
while (<$in_fh>) {
my #fields = split;
push #snps, \#fields unless $fields[0] =~ /[XY]/;
}
close $in_fh;
open $in_fh, '<', 'DIAGRAMv3sansWTCCCqc0clumpd_noTCF7L2regOrLeadOrPlt1em6clumps- CHR_SNP_POS_RiskAl_OtherAl_RAF_logOR_Pval.txt';
<$in_fh>; # Discard header line
while (<$in_fh>) {
my #fields = split;
for my $snp (#snps) {
next unless $fields[0] == $snp->[0] and $fields[1] eq $snp->[1];
print "$fields[0] $fields[1] $snp->[2]\n";
last;
}
}
close $in_fh;

Extraction and printing of key-value pair from a text file using Perl

I have a text file temp.txt which contains entries like,
cinterim=3534
cstart=517
cstop=622
ointerim=47
ostart=19
ostop=20
Note: key-value pairs may be arranged in new line or all at once in one line separated by space.
I am trying to print and store these values in DB for corresponding keys using Perl. But I am getting many errors and warnings. Right now I am just trying to print those values.
use strict;
use warnings;
open(FILE,"/root/temp.txt") or die "Unable to open file:$!\n";
while (my $line = <FILE>) {
# optional whitespace, KEY, optional whitespace, required ':',
# optional whitespace, VALUE, required whitespace, required '.'
$line =~ m/^\s*(\S+)\s*:\s*(.*)\s+\./;
my #pairs = split(/\s+/,$line);
my %hash = map { split(/=/, $_, 2) } #pairs;
printf "%s,%s,%s\n", $hash{cinterim}, $hash{cstart}, $hash{cstop};
}
close(FILE);
Could somebody provide help to refine my program.
use strict;
use warnings;
open my $fh, '<', '/root/temp.txt' or die "Unable to open file:$!\n";
my %hash = map { split /=|\s+/; } <$fh>;
close $fh;
print "$_ => $hash{$_}\n" for keys %hash;
What this code does:
<$fh> reads a line from our file, or in list context, all lines and returns them as an array.
Inside map we split our line into an array using the regexp /= | \s+/x. This means: split when you see a = or a sequence of whitespace characters. This is just a condensed and beautified form of your original code.
Then, we cast the list resulting from map to the hash type. We can do that because the item count of the list is even. (Input like key key=value or key=value=valuewill throw an error at this point).
After that, we print the hash out. In Perl, we can interpolate hash values inside strings directly and don't have to use printf and friends except for special formatting.
The for loop iterates over all keys (returned in the $_ special variable), and $hash{$_} is the corresponding value. This could also have been written as
while (my ($key, $val) = each %hash) {
print "$key => $val\n";
}
where each iterates over all key-value pairs.
Try this
use warnings;
my %data = ();
open FILE, '<', 'file1.txt' or die $!;
while(<FILE>)
{
chomp;
$data{$1} = $2 while /\s*(\S+)=(\S+)/g;
}
close FILE;
print $_, '-', $data{$_}, $/ for keys %data;
The simplest way is to slurp the entire file into memory and assign key/value pairs to the hash using a regular expression.
This program shows the technique
use strict;
use warnings;
my %data = do {
open my $fh, '<', '/root/temp.txt' or die $!;
local $/;
<$fh> =~ /(\w+)\s*=\s*(\w+)/g;
};
use Data::Dump;
dd \%data;
output
{
cinterim => 3534,
cstart => 517,
cstop => 622,
ointerim => 47,
ostart => 19,
ostop => 20,
}

Displaying duplicate records

I've a code as below to parse a text file. Display all words after "Enter:" keyword on all lines of the text file. I'm getting displayed all words after "Enter:" keyword, but i wan't duplicated should not be repeated but its repeating. Please guide me as to wht is wrong in my code.
#! /usr/bin/perl
use strict;
use warnings;
$infile "xyz.txt";
open (FILE, $infile) or die ("can't open file:$!");
if(FILE =~ /ENTER/ ){
#functions = substr($infile, index($infile, 'Enter:'));
#functions =~/#functions//;
%seen=();
#unique = grep { ! $seen{$_} ++ } #array;
while (#unique != ''){
print '#unique\n';
}
}
close (FILE);
Here is a way to do the job, it prints unique words found on each line that begins with the keyword Enter:
#!/usr/bin/perl
use strict;
use warnings;
my $infile = "xyz.txt";
# use 3 arg open with lexical file handler
open my $fh, '<', $infile or die "unable to open '$infile' for reading: $!";
# loop thru all lines
while(my $line = <$fh) {
# remove linefeed;
chomp($line);
# if the line begins with "Enter:"
# remove the keyword "Enter:"
if ($line =~ s/^Enter:\s+//) {
# split the line on whitespaces
# and populate the array with all words found
my #words = split(/\s+/, $line);
# create a hash where the keys are the words found
my %seen = map { $_ => 1 }#words;
# display unique words
print "$_\t" for(keys %seen);
print "\n";
}
}
If I understand you correctly, one problem is that your 'grep' only counts the occurrences of each word. I think you want to use 'map' so that '#unique' only contains the unique words from '#array'. Something like this:
#unique = map {
if (exists($seen{$_})) {
();
} else {
$seen{$_}++; $_;
}
} #array;