i want to read a file1 that containfor every word a numeric reprsentation, for example:
clinton 279
capital 553
fond|fonds 1410
I read a second file, every time i find a number i replace it with the corresponding word. above an example of second file
279 695 696 152 - 574
553 95 74 96 - 444
1410 74 95 96 - 447
The problem in my code is that it execute the subroutine only one time. and it only show:
279 clinton
normally in this example it should show 3 words, when i add print $b; in the subrtoutine it show the different numbers.
#!/usr/bin/perl
use stricrt;
use warnings;
use autodie;
my #a,my $i,my $k;
my $j;
my $fich_in = "C:\\charm\\ats\\4.con";
my $fich_nom = "C:\\charm\\ats\\ats_dict.txt";
open(FICH1, "<$fich_in")|| die "Problème d'ouverture : $!";
open my $fh, '<', $fich_nom;
#here i put the file into a table
while (<FICH1>) {
my $ligne=$_;
chomp $ligne;
my #numb=split(/-/,$ligne);
my $b=$numb[0];
$k=$#uniq+1;
print "$b\n";
my_handle($fh,$b);
}
sub my_handle {
my ($handle,$b) = #_;
my $content = '';
#print "$b\n";
## or line wise
while (my $line = <$handle>){
my #liste2=split(/\s/,$line);
if($liste2[1]==$b){
$i=$liste2[0];
print "$b $i";}
}
return $i;
}
close $fh;
close(FIC1);
The common approach to similar problems is to hash the "dictionary" first, than iterate over the second file and search for replacements in the hash table:
#!/usr/bin/perl
use warnings;
use strict;
my $fich_in = '4.con';
my $fich_nom = 'ats_dict.txt';
open my $F1, '<', $fich_in or die "Problème d'ouverture $fich_in : $!";
open my $F2, '<', $fich_nom or die "Problème d'ouverture $fich_nom : $!";;
my %to_word;
while (<$F1>) {
my ($word, $code) = split;
$to_word{$code} = $word;
}
while (<$F2>) {
my ($number_string, $final_num) = split / - /;
my #words = split ' ', $number_string;
$words[0] = $to_word{ $words[0] } || $words[0];
print "#words - $final_num";
}
Related
I have two files. One file has a list of values like so
NC_SNPStest.txt
250
275
375
The other file has space delimited information. Column one is the first value of a range, Column two has the second value of a range, Column 5 has the name of the range, and Column eight has what acts on that range.
promoterstest.txt
20 100 yaaX F yaaX 5147 5.34 Sigma70 99
200 300 yaaA R yaaAp1 6482 6.54 Sigma70 35
350 400 yaaA R yaaAp2 6498 2.86 Sigma70 51
I am trying to write a script that takes the first line from file 1 and then parses file 2 line by line to see if that value falls in the range is between the first two columns.
When the first match is found, I want to print the value from file 1 and then the values in file 2 for columns 5 and 8 from the line with the match. If no match is found in File 2 then just print the value from File 1 and move on.
It seems like it should be a simple enough task but I'm having an issue cycling though both files.
This is what I have written:
#!/usr/bin/perl
use warnings;
use strict;
open my $PromoterFile, '<', 'promoterstest.txt' or die $!;
open my $SNPSFile, '<', 'NC_SNPtest.txt' or die $!;
open (FILE, ">PromoterMatchtest.txt");
while (my $SNPS = <$SNPSFile>) {
chomp ($SNPS);
while (my $Cord = <$PromoterFile>) {
chomp ($Cord);
my #CordFile =split(/\s/, $Cord);
my $Lend = $CordFile[0];
my $Rend = $CordFile[1];
my $Promoter = $CordFile[4];
my $SigmaFactor = $CordFile[7];
foreach $a ($SNPS)
{
if ($a >= $Lend && $a <= $Rend)
{
print FILE "$a\t$CordFile[4]\t$CordFile[7]\n";
}
else
{
print FILE "$a\n";
}
}
}
}
close FILE;
close $PromoterFile;
close $SNPSFile;
exit;
So far my output looks like so:
250
250 yaaAp1 Sigma70
250
Where the first line of file 1 is being called and file 2 is being cycled through. But the else command is being used on each line of file 2 and the script never cycles through the other lines of file 1.
Your problem is you're not resetting your progress through the second file. You read one line from $SNPSFile, check that against ever line in the second file.
But when you start over, you're already at the end of file, so:
while (my $Cord = <$PromoterFile>) {
Doesn't have anything to read.
A quick fix for this would be to add a seek command in there, but that'll make inefficient code. I'd suggest instead reading file 1 into a array, and referencing that instead.
Here's a first draft rewrite that may help.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
open my $PromoterFile, '<', 'promoterstest.txt' or die $!;
open my $SNPSFile, '<', 'NC_SNPtest.txt' or die $!;
open my $output, ">", "PromoterMatchtest.txt" or die $!;
my #data;
while (<$PromoterFile>) {
chomp;
my #CordFile = split;
my $Lend = $CordFile[0];
my $Rend = $CordFile[1];
my $Promoter = $CordFile[4];
my $SigmaFactor = $CordFile[7];
push(
#data,
{ lend => $CordFile[0],
rend => $CordFile[1],
promoter => $CordFile[4],
sigmafactor => $CordFile[7]
}
);
}
print Dumper \#data;
foreach my $value (<$SNPSFile>) {
chomp $value;
my $found = 0;
foreach my $element (#data) {
if ( $value >= $element->{lend}
and $value <= $element->{rend} )
{
#print "Found $value\n";
print {$output} join( "\t",
$value, $element->{promoter}, $element->{sigmafactor} ),
"\n";
$found++;
last;
}
}
if ( not $found ) {
print {$output} $value,"\n";
}
}
close $output;
close $PromoterFile;
close $SNPSFile;
First - we open file2, read in the stuff in it to an array of hashes. (If any of the elements there are unique, we could key off that instead.)
Then we read through SNPSfile one line at a time, looking for each key - printing it if it exists (at least once, on the first hit) and printing just the key if it doesn't.
This generates the output:
250 yaaAp1 Sigma70
275 yaaAp1 Sigma70
375 yaaAp2 Sigma70
Was that what you were aiming for?
Aside from that 'Dumper' statement which outputs the content of #data as thus:
$VAR1 = [
{
'sigmafactor' => 'Sigma70',
'promoter' => 'yaaX',
'lend' => '20',
'rend' => '100'
},
{
'sigmafactor' => 'Sigma70',
'promoter' => 'yaaAp1',
'rend' => '300',
'lend' => '200'
},
{
'promoter' => 'yaaAp2',
'sigmafactor' => 'Sigma70',
'rend' => '400',
'lend' => '350'
}
];
Here's my take on a programming solution. It's important to
Use lexical file handles and the three-paremeter form of open
Keep to lower-case letters, digits and underscores for local variables
I have also used the autodie pragma to remove the need to test the status of open explicitly, and the first function from the core library List::Util to make the code clearer and more concise
use strict;
use warnings;
use 5.010;
use autodie;
use List::Util 'first';
my #promoters;
{
open my $fh, '<', 'promoterstest.txt';
while ( <$fh> ) {
my #fields = split;
push #promoters, [ #fields[0,1,4,7] ];
}
}
open my $fh, '<', 'NC_SNPStest.txt';
open my $out_fh, '>', 'PromoterMatchtest.txt';
select $out_fh;
while ( <$fh> ) {
my ($num) = split;
my $match = first { $num >= $_->[0] and $num <= $_->[1] } #promoters;
if ( $match ) {
print join("\t", $num, #{$match}[2,3]), "\n";
}
else {
print $num, "\n";
}
}
output
250 yaaAp1 Sigma70
275 yaaAp1 Sigma70
375 yaaAp2 Sigma70
I'm trying to improve my script in which I hope to match characters in input.txt (column 4: H1, 2HB, CA, HB3) to dictionary.txt and replace with appropriate characters from dictionary.txt (column 2: H, HB, C, 3HB). Using dictionary.txt as a dictionary:
input.txt
1 N 22 H1 MET
1 H 32 2HB MET
1 C 40 CA MET
2 H 35 HB3 ASP
dictionary.txt
MET H H1
MET HB 2HB
MET C CA
ASP 3HB HB3
output
1 N 22 H MET
1 H 32 HB MET
1 C 40 C MET
2 H 35 3HB ASP
I'm trying to approach this by first matching the word in input.txt (MET) and dictionary.txt (MET) and then performing the substitution. This is what I've written so far:
#!/usr/bin/perl
use strict;
use warnings;
my %dictionary;
open my $dic_fh, '<', 'dictionary.txt' or die "Can't open file: $!";
while (my $ref = <$dic_fh>) {
chomp $ref;
my #columns = split(/\t/, $ref);
my $res_name = $columns[0];
my $ref_nuc = $columns[1];
$dictionary{$res_name} = {$ref_nuc};
open my $in_fh, '<', 'input.txt' or die "Can't open file: $!";
while (my $line = <$in_fh>) {
chomp $line;
my #columns = split(/\t/, $line);
my #name = $columns[3];
if (my $name eq $res_name) {
my $line = $_;
foreach my $res_name (keys %dictionary) {
$line =~ s/$name/$dictionary{$ref_nuc}/;
}
print $line;
}
}
}
The problem seems to be that you are assigning the single field $columns[3] to array #name, and then expecting to find it in $name, which is a separate variable altogether. You even declare $name at the point of the comparison
You are also executing the statement
$line =~ s/$name/$dictionary{$ref_nuc}/;
once for each key in the hash. That is unnecessary: it needs to be done only once. It is also better to change the value of $columns[3] to $dictionary{$columns[3]} instead of doing a search and replace on the whole line, as the target string may appear in other columns that you don't want to modify
It is very simple to do by building a dictionary hash and replacing the fourth field of the input file with its dictionary lookup
use strict;
use warnings;
use 5.010;
use autodie;
open my $fh, '<', 'dictionary.txt';
my %dict;
while ( <$fh> ) {
my ($k, $v) = (split)[2,1];
$dict{$k} = $v;
}
open $fh, '<', 'input.txt';
while ( <$fh> ) {
my #fields = split;
$fields[3] = $dict{$fields[3]};
say join "\t", #fields;
}
output
1 N 22 H MET
1 H 32 HB MET
1 C 40 C MET
2 H 35 3HB ASP
I have a tab-delimited file1:
20 50 80 110
520 590 700 770
410 440 20 50
300 340 410 440
read and put them into an array:
while(<INPUT>)
{
chomp;
push #inputarray, $_;
}
Now I'm looping through another file2:
20, 410, 700
80, 520
300
foreach number of each line in file2, I want to search the #inputarray for the number. If it exists, I want to grab the corresponding number that follows. For instance, for number 20, I want to grab the number 50. I assume that they are still separated by a tab in the string that exists as an array element in #inputarray.
while(my $line = <INPUT2>)
{
chomp $line;
my #linearray = split("\t", $line);
foreach my $start (#linearray)
{
if (grep ($start, #inputarray))
{
#want to grab the corresponding number
}
}
}
Once grep finds it, i don't know how to grab that array element to find the position of the number to extract the corresponding number using perhaps the substr function. How do i grab the array element that grep found?
A desired output would be:
line1:
20 50
410 440
700 770
line2:
80 110
520 590
line3:
300 340
IMHO, it would be best to store the numbers from file1 in a hash. Referring to the example clontent of file1 as you provided above you can have something like below
{
'20' => '50',
'80' => '110',
'520'=> '590',
'700'=> '770',
'410'=> '440',
'20' => '50',
'300'=> '340',
'410' => '440'
}
A sample piece of code will be like
my %inputarray;
while(<INPUT>)
{
my #numbers = split $_;
my $length = scalar $numbers;
# For $i = 0 to $i < $length;
# $inputarray{$numbers[$i]} = $numbers[$i+1];
# $i+=2;
}
An demonstration of the above loop
index: 0 1 2 3
numbers: 20 50 80 110
first iteration: $i=0
$inputarray{$numbers[0]} = $numbers[1];
$i = 2; #$i += 2;
second iteration: $i=2
$inputarray{$numbers[2]} = $numbers[3];
And then while parsing file2, you just need to treat the number as the key of %inputarray.
I believe this gets you close to what you want.
#!/usr/bin/perl -w
my %follows;
open my $file1, "<", $ARGV[0] or die "could not open $ARGV[0]: $!\n";
while (<$file1>)
{
chomp;
my $prev = undef;
foreach my $curr ( split /\s+/ )
{
$follows{$prev} = $curr if ($prev);
$prev = $curr;
}
}
close $file1;
open my $file2, "<", $ARGV[1] or die "could not open $ARGV[1]: $!\n";
my $lineno = 1;
while (<$file2>)
{
chomp;
print "line $lineno\n";
$lineno++;
foreach my $val ( split /,\s+/, $_ )
{
print $val, " ", ($follows{$val} // "no match"), "\n";
}
print "\n";
}
If you only want to consider numbers from file1 in pairs, as opposed to seeing which numbers follow what other numbers without taking pair boundaries into account, then you need to change the logic in the first while loop slightly.
#!/usr/bin/perl -w
my %follows;
open my $file1, "<", $ARGV[0] or die "could not open $ARGV[0]: $!\n";
while (<$file1>)
{
chomp;
my $line = $_;
while ( $line =~ s/(\S+)\s+(\S+)\s*// )
{
$follows{$1} = $2;
}
}
close $file1;
open my $file2, "<", $ARGV[1] or die "could not open $ARGV[1]: $!\n";
my $lineno = 1;
while (<$file2>)
{
chomp;
print "line $lineno\n";
$lineno++;
foreach my $val ( split /,\s+/, $_ )
{
print $val, " ", ($follows{$val} // "no match"), "\n";
}
print "\n";
}
If you want to read the input once but check for numbers a lot, you might be better off to split the input line into individual numbers. Then add each each number as key into a hash with the following number as value. That makes reading slow and takes more memory but the second part, where you want to check for following numbers will be a breeze thanks to exist and the nature of hashes.
If i understood your question correct, you could use just one big hash. That is of course assuming that every number is always followed by the same number.
Code :
#!/usr/bin/perl
my $file = $ARGV[0];
my $position = $ARGV[1]; # POSITION OF THE RESIDUE
open (FILE, $file);
while (<FILE>) {
my #f = split;
if (($f[0] == "ANNOT_RESID_NO") && ($f[1] == $position)){
push #line, $_;
}
}
print #line;
close(FILE);
INPUT :
ANNOT_TYPE[1] 0
ANNOT_TYPE_NAME[1] CATRES
ANNOT_NUMBER[1][1] 1
ANNOT_NAME[1][1] 3.1.3.16
ANNOT_DESC[1][1] Phosphoprotein phosphatase.
ANNOT_RESID_NO[1][1][1] 91
ANNOT_RESID_NAME[1][1][1] ASP
ANNOT_RESID_NUM[1][1][1] 95
ANNOT_RESID_NO[1][1][2] 92
ANNOT_RESID_NAME[1][1][2] ARG
ANNOT_NRESID[1][1] 6
ANNOT_NUMBER[1][2] 2
ANNOT_NAME[1][2] 3.1.3.53
ANNOT_DESC[1][2] [Myosin-light-chain] phosphatase.
ANNOT_RESID_NO[1][2][1] 91
ANNOT_RESID_NAME[1][2][1] ASP
ANNOT_RESID_NUM[1][2][1] 95
ANNOT_RESID_NO[1][2][2] 92
ANNOT_RESID_NAME[1][2][2] ARG
Question :
I am printing the line with has $position(for example 91) for the line starting with "ANNOT_RESID_NO". Along with this line, I also want to print, every time, in #line is the first line above this match containing "ANNOT_DESC". This "ANNOT_DESC" line is not necessarily always the line just above my matched line.
Try (complete code):
#!/usr/bin/perl
use strict;
use warnings;
my $file = $ARGV[0];
my $position = $ARGV[1];
open (FILE, $file) or die $!;
my $desc;
my #line;
while (<FILE>) {
my #f = split " ";
if ( $f[0] =~ /^ANNOT_DESC/ ) {
$desc = $_;
next;
}
if ( $f[0] =~ /^ANNOT_RESID_NO/ and $f[1] == $position ) {
push #line, $desc, $_;
}
}
output:
ANNOT_DESC[1][1] Phosphoprotein phosphatase.
ANNOT_RESID_NO[1][1][1] 91
ANNOT_DESC[1][2] [Myosin-light-chain] phosphatase.
ANNOT_RESID_NO[1][2][1] 91
With a data set that small you can push the lines from the file to an array(e.g. #file_data) , iterate the #file_data array and push the values you want into your #line array.
I am new to Perl and trying to learn it. I have two files, 'file1' and 'file2', I need to find which symbols in 'file1' are not in 'file2' for companyA and departments B and C.
File1
GTY
TTY
UJK
TRE
File2
departmentA_companyA.try=675 UJK 88 KKR
departmentA_companyB.try=878 UJK 37 TAR
departmentA_companyC.try=764 UJK 92 PAM
departmentB_companyA.try=675 UJK 88 KKR
departmentB_companyB.try=878 UJK 37 TAR
departmentB_companyC.try=764 UJK 92 PAM
departmentC_companyA.try=675 UJK 88 KKR
departmentC_companyB.try=878 UJK 37 TAR
departmentC_companyC.try=764 UJK 92 PAM
Create a list of all the symbols in file1
Go through file2. If the criteria matches, delete the symbol from the list.
In this case, I'd suggest you use the keys of a hash to store this list ($symbols{$symbol} = 1;). This is because it's easy and cheap to delete from a hash (delete $symbols{$symbol};).
Spoiler:
use strict;
use warnings;
use feature qw( say );
my %symbols;
{
open(my $fh, '<', 'file1')
or die("Can't open file1: $!\n");
while (<$fh>) {
chomp;
++$symbols{$_};
}
}
{
open(my $fh, '<', 'file2')
or die("Can't open file2: $!\n");
while (<$fh>) {
chomp;
my ($key, $val) = split /=/;
my ($dept, $co) = split /[_\.]/, $key;
if ($co eq 'companyA' || $dept eq 'departmentB' || 'departmentC') {
my #symbols = split ' ', $val;
delete #symbols{#symbols};
}
}
}
say for keys %symbols;
You can use a hash to count the number of times each symbol appears in the file, then print the ones that have a count of 0.
use strict;
open SYMS, $ARGV[0] || die;
open INFILE, $ARGV[1] || die;
my %symbols;
while (<SYMS>) {
chomp;
$symbols{$_} = 0;
}
while (<INFILE>) {
my #F=split;
next unless $F[0] =~ /companyA/;
next unless $F[0] =~ /department[BC]/;
++$symbols{$F[1]} if (defined $symbols{$F[1]});
++$symbols{$F[3]} if (defined $symbols{$F[3]});
}
for my $symbol (keys %symbols) {
print "$symbol\n" if $symbols{$symbol} == 0;
}