script conversion from Perl to shell [closed] - perl

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
following is the code in perl.
Can we write the same thing in shell scripts ??
If yes how ?
I have used associative arrays but unable to achieve what this is doing
open MYFILE, "<", "$ARGV[0]" or die "Can't open $ARGV[0] file \n";
############ to retieve the info and put them in associative arrray ##############
$line = <MYFILE>;
#line1 = split(/,/ , $line);
$length = #line1;
$count = 0;
while($count < $length)
{
$line1[$count] =~ s/^\"//;
$line1[$count] =~ s/\"$//;
$count++;
}
$line = <MYFILE>;
#line2 = split(/,/ , $line);
$length = #line2;
$count = 0;
while($count < $length)
{
$line2[$count] =~ s/^\"//;
$line2[$count] =~ s/\"$//;
$count++;
}
$count = 0;
while($count < $length)
{
$array{$line1[$count]}=$line2[$count];
$count++;
}

Of course you can translate that to a shell script: Just wrap the perl script in a here-doc, pass it to perl, and put #!/bin/sh at the top…
#!/bin/sh
perl - <<'END' $1
...
END
But more seriously, you might achieve enlightenment by rewriting the code in a different fashion. What you are doing is reading a line, splitting it at commata, and removing quotation marks at the beginning and end of each field:
sub get_fields {
map { s/^"//; s/"$//; $_ } split /,/, $_[0];
}
my #keys = get_fields scalar <>; # 1st line
my #vals = get_fields scalar <>; # 2nd line
my %hash;
#hash{ #line1 } = #line2;
Except for the slice operation at the end, you can now more easily rewrite the code because it uses data flow instead of structured programming as the predominant paradigm. Not to mention that my code is shorter by an order of magnitude (in base 3).
If you are writing code for production purposes, don't do this. It will break. I assume you are processing CSV. Stick with Perl, and use Text::CSV. Then:
use strict; use warnings; use autodie;
use Text::CSV;
my $csv = Text::CSV->new({ binary => 1 });
open my $fh, "<:utf8", $ARGV[0];
my $keys = $csv->getline($fh);
my $vals = $csv->getline($fh);
my %hash;
#hash{#$keys} = #$vals;
It isn't even much longer, but very unlikely to break (It doesn't split on commas inside quotes).

Related

How do I read strings into a hash in Perl [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a file with a series of random A's, G's, C's and T's in them that look like this:
>Mary
ACGTACGTACGTAC
>Jane
CCCGGCCCCTA
>Arthur
AAAAAAAAAAT
I took those letters and concatinated them to end up with ACGTACGTACGTACCCCGGCCCCTAAAAAAAAAAT. I now have a series of positions within that concatenated sequence that are of interest to me, and I want to find the associated Names that match with those positions (coordinates). I'm using the Perl function length to calculate the legnth of each sequence, and then associate the culmulative length with the name in a hash.
So far I have:
#! /usr/bin/perl -w
use strict;
my $seq_input = $ARGV[0];
my $coord_input = $ARGV[1];
my %idSeq; #Stores sequence and associated ID's.
open (my $INPUT, "<$seq_input") or die "unable to open $seq_input";
open (my $COORD, "<$coord_input") or die "unable to open $fcoord_input";
while (<$INPUT>) {
if ($_ = /^[AGCT/) {
$idSeq{$_
my $id = ( /^[>]/)
#put information into a hash
#loop through hash looking for coordinates that are lower than the culmulative length
foreach $id
$totallength = $totallength + length($seq)
$lengthId{$totalLength} = $id
foreach $position
foreach $length
if ($length >= $position) { print; last }
close $fasta_input;
close $coord_input;
print "Done!\n";
So far I'm having trouble reading the file into a hash. Also would I need an array to print the hash?
Not completely clear on what you want; maybe this:
my $seq;
my %idSeq;
while ( my $line = <$INPUT> ) {
if ( my ($name) = $line =~ /^>(.*)/ ) {
$idSeq{$name} = length $seq || 0;
}
else {
chomp $line;
$seq .= $line;
}
}
which produces:
$seq = 'ACGTACGTACGTACCCCGGCCCCTAAAAAAAAAAAT';
%idSeq = (
'Mary' => 0,
'Jane' => 14,
'Arthur' => 25,
);

can any one help me to fix the error from this script? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
The below code is showing errors kindly help me to solve the errors in the code
package My::count
use Exporter qw(import);
our #Export_ok=qw(line_count);
sub line_count {
my $line=#_;
return $line;
}
I saved above code in count.pm
use My::count qw(line_count);
open INPUT,"<filename.txt";
$line++;
print line count is $line \n";
I saved this file in a.pl
Let's look at this code in some detail.
# There's a typo on the line below. It should be /usr/bin/perl
#!usr/bin/perl
# Never comment out these lines.
# Instead, take the time to fix the problems.
# Oh, and it's "warnings", not "warning".
#use strict;
#use warning;
# Always check the return value from open()
# Please use lexical filehandles.
# Please use three-arguments to open().
# open my $ip_fh, '<', 'test1.txt' or die $!;
open IP,"<test1.txt";
my ($line_count,$word_count)=(0,0);
# You're rather fighting against Perl here.
# If you do things in a Perlish manner then it all becomes easier
while(my $line=<IP>) {
$line_count++;
my #words_on_this_line= split(" ",$line);
$word_count+= scalar(#words_on_this_line);
}
print"This file contains $line_count lines\n";
print"this file contains $word_count words\n";
# It all goes a bit wrong here. You don't have an array called #IP.
# I think you wanted to iterate across the file again.
# Either use seek() to return to the start of the file, or store
# the lines in #IP as you're reading them.
# Also, you need to declare $line.
foreach $line(#IP) {
if ($line =~ /^>/) {
print $line;
}
}
close IP;
I would do something like this.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010; # for say()
my $filename = shift || die "Must give file name\n";
open my $fh, '<', $filename or die "Can't open $filename: $!\n";
my ($word_count, #matches);
while (<$fh>) {
# By default, split() splits $_ on whitespace
my #words_on_line = split;
$word_count += #words_on_line;
push #matches, $_ if /^>/;
}
# No need to count lines, Perl does that for us (in $.)
say "This file contains $. lines";
say "This file contains $word_count words";
print #matches;

How to match numbers that lie outside the range [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I am trying to print values of range from #arr3 which are same and lie outside the range from #arr4 (not included in range of arr4) but I am not getting the desired output. Please suggest me the modifications in the following code to get the output as 1,2,8,13 (without repeating the values if any)
File 1: result
1..5
5..10
10..15
File 2: annotation
3..7
9..12
14..17
Code:
#!/usr/bin/perl
open($inp1, "<result") or die "not found";
open($inp2, "<annotation") or die "not found";
my #arr3 = <$inp1>;
my #arr4 = <$inp2>;
foreach my $line1 (#arr4) {
foreach my $line2 (#arr3) {
my ($from1, $to1) = split(/\.\./, $line1);
my ($from2, $to2) = split(/\.\./, $line2);
for (my $i = $from1 ; $i <= $to1 ; $i++) {
for (my $j = $from2 ; $j <= $to2 ; $j++) {
$res = grep(/$i/, #result); #to avoid repetition
if ($i != $j && $res == 0) {
print "$i \n";
push(#result, $i);
}
}
}
}
}
Try this:
#!/usr/bin/perl
use strict;
open (my $inp1,"<result.txt") or die "not found";
open (my $inp2,"<annotation.txt") or die "not found";
my #result;
my #annotation;
foreach my $line2 (<$inp2>) {
my ($from2,$to2)=split(/\.\./,$line2);
#annotation = (#annotation, $from2..$to2);
}
print join(",",#annotation),"\n";
my %in_range = map {$_=> 1} #annotation;
foreach my $line1 (<$inp1>) {
my ($from1,$to1)=split(/\.\./,$line1);
#result = (#result, $from1..$to1);
}
print join(",",#result),"\n";
my %tmp_hash = map {$_=> 1} #result;
my #unique = sort {$a <=> $b} keys %tmp_hash;
print join(",",#unique),"\n";
my #out_of_range = grep {!$in_range{$_}} #unique;
print join(",",#out_of_range),"\n";
The print statements are temporary, of course, to help show what's going on when you run this. The basic idea is you use one hash to eliminate duplicate numbers in your "result", and another hash to indicate which ones are in the "annotations".
If you used pattern-matching rather than split then I think it would be a little easier to make this ignore extra lines of input that aren't ranges of numbers, in case you ever have input files with a few "extra" lines that you need to skip over.
If the contents of the files is under your control, you can make use of eval for parsing them. On the other hand, if there might be something else than what you specified, the following is dangerous to use.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
use Data::Dumper;
open my $inc, '<', 'result';
open my $exc, '<', 'annotation';
my (%include, %exclude, #result);
while (<$inc>) { $include{$_} = 1 for eval $_ }
while (<$exc>) { $exclude{$_} = 1 for eval $_ }
for (sort {$a <=> $b} keys %include) {
push #result, $_ unless $exclude{$_}
}
print Dumper \#result;
Returns:
$VAR1 = [ 1, 2, 8, 13 ];
The only major tool you need is a %seen style hash as represented in perlfaq4 - How can I remove duplicate elements from a list or array?
The following opens filehandles to string references, but obviously these could be replaced with the appropriate file names:
use strict;
use warnings;
use autodie;
my %seen;
open my $fh_fltr, '<', \ "3..7\n9..12\n14..17\n";
while (<$fh_fltr>) {
my ($from, $to) = split /\.{2}/;
$seen{$_}++ for $from .. $to;
}
my #result;
open my $fh_src, '<', \ "1..5\n5..10\n10..15\n";
while (<$fh_src>) {
my ($from, $to) = split /\.{2}/;
push #result, $_ for grep {!$seen{$_}++} $from .. $to;
}
print "#result\n";
Outputs:
1 2 8 13

Analyzing a txt list in perl [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
I am trying to analyze a list of coordinates. The txt file is set up like this:
ID START END
A 10 20
B 15 17
C 20 40
How would I check this dataset to check if START and END is included within a user-specified region, e.g. START=10 END=15?
Any help greatly appreciated
// edit //
if(#AGRV != 4) {
print STDOUT "Searches genomic data for CNV within range. \n";
print STDOUT "CNV FILE FORMAT: <ID><CHR>BPS><BPE><AGE><etc...> \n";
print STDOUT "USAGE: [CNVLIST][CHR][BPS][BPE][OUTFILE] \n";
exit;
}
open(CNVLIST,"<$ARGV[0]");
open(OUTFILE,">$ARGV[3]");
$BPS = $ARGV[1];
$BPE = $ARGV[2];
#put CNV file in hash table
$line = <CNVFILE>;
while($line = <CNVFILE>) {
chomp $line;
($Cchr,$CS,$CE,$CID) = split(/\t/,$line);
}
I need to look through each line and find if the start/end lies within the user specified range.
it is unclear whether you can assume that the ID will never overlap with each other, but assuming it won't , you can use hash to store the lines that are within the range. If it's possible that the ID might overlap, I think you can push #{$result{id}}, [$start, $end]; but that'll make the data structure a bit more complicated.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $in_file = "input.txt";
# User-specified region
my $range_start = 10;
my $range_end = 15;
open my $fh, $in_file or die $!;
my %result;
while (<$fh>) {
my ($id, $start, $end) = split " ", $_;
next unless $start =~ /\d/;
# Swap if START is larger than END
($start, $end) = ($end, $start) if $start > $end;
$result{$id} = [$start, $end]
if $start >= $range_start and $end <= $range_end;
}
print Dumper(%result);
You can split() each line and check second and third field:
#!/usr/bin/env perl
use strict;
use warnings;
my ($start, $end) = (shift, shift);
die if $start > $end;
## Skip header
<>;
while ( <> ) {
chomp;
my #f = split;
if ( $f[1] <= $start && $f[2] >= $end ) {
printf qq|%s\n|, $_;
}
}
It accepts three arguments, the first one is the start region, the second one is the end region and the last one is the file to process. It prints to output all lines that pass the condition.
Run it like:
perl script.pl 10 15 infile
That yields:
A 10 20

Common elements from two Perl arrays [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Comparing Two Arrays Using Perl
I am trying to find elements that are common in both the files:
below is my code. Please tell me what mistake I am doing.
open IN, "New_CLDB.txt" or die "couldn't locate input file";
open IN1, "New_adherent.txt" or die "couldn't locate input file";
use Data::Dumper;
#array = ();
while (<IN>) {
$line = $_;
chomp $line;
$a[$i] = $line;
++$i;
}
while (<IN1>) {
$line1 = $_;
chomp $line1;
$b[$m] = $line1;
++$m;
}
for ( $k = 0; $k < $i; ++$k ) {
for ( $f = 0; $f < $m; ++$f ) {
if ( $a[$k] ne $b[$f] ) {
push( #array, $a[$k] );
}
}
}
print #array, "\n";
Please tell me what mistake I am doing.
From superficially eyeballing your code, here's a list:
not using the strict pragma
not having a precise spec of what you want to achieve
attempting to do too much at once
Take a step away from the code and think about it in plain English. What would you need to do?
read files - open, read, close
read file data into an array - how exactly?
use a function not to repeat yourself for file A and file B
compare arrays
Do each task in isolation, always using strict. Always. Only then compose the single steps to a larger script.
You could also take a look at this other SO question.
If there are no duplicates in the second set:
my %set1;
while (<$fh1>) {
chomp;
++$set1{$_};
}
while (<$fh2>) {
chomp;
print("$_ is common to both sets\n")
if $set1{$_};
}
If there are possibly duplicates in the second set:
my %set1;
while (<$fh1>) {
chomp;
++$set1{$_};
}
my %set2;
while (<$fh2>) {
chomp;
print("$_ is common to both sets\n")
if $set1{$_} && !$set2{$_}++;
}
There are several things one should improve:
Always use strict; and use warnings;
Use the three argument version of open
Use lexical filehandles
Use meaningful formatting/indention
Append an array with push #array, $value;
And for SO questions ... what exactly is your problem and what do you expect.