Is there a better way of writing this code to avoid redundancy? - perl

I have a segment of code in a program, which accepts GNU style input from a pipe (which is a list of file names). If STDIN does not contain data, I need to accept input from a predetermined text file containing file names.
I find myself needing to write redundant code. Is it possible to simplify this bit of code to avoid redundancy?
sub downloadlinkgen {
my $fh;
print "Printing links\n";
if ($getfilelist==1) {
open $fh, '<', "fuzzyfile" or die $!;
while (<$fh>) {
chomp ($_);
(my $fname,my $path, my $suffix) = fileparse($_);
my ($name, $ext) = $fname =~ /(.*)\.(.*)/;
my $newfile=$path.$name.".$ext";
$newfile =~ s/\s/%20/g;
$newfile =~ s/\/root/http:\/\/myip/;
print $newfile."\n";
}
} else {
while (<>) {
chomp ($_);
(my $fname,my $path, my $suffix) = fileparse($_);
my ($name, $ext) = $fname =~ /(.*)\.(.*)/;
my $newfile=$path.$name.".$ext";
$newfile =~ s/\s/%20/g;
$newfile =~ s/\/root/http:\/\/myip/;
print $newfile."\n";
}
}
}

Yes, just make the default ARGV filehandle open the file:
sub downloadlinkgen {
#ARGV = 'fuzzyfile' if $getfilelist == 1;
print "Printing links\n";
while (<>) {
chomp ($_);
(my $fname,my $path, my $suffix) = fileparse($_);
my ($name, $ext) = $fname =~ /(.*)\.(.*)/;
my $newfile=$path.$name.".$ext";
$newfile =~ s/\s/%20/g;
$newfile =~ s/\/root/http:\/\/myip/;
print $newfile."\n";
}
}

sub downloadlinkgen {
# default file handle
my $fh = \*ARGV;
print "Printing links\n";
if ($getfilelist==1) {
open $fh, '<', "fuzzyfile" or die $!;
}
while (<$fh>) {
chomp ($_);
(my $fname,my $path, my $suffix) = fileparse($_);
my ($name, $ext) = $fname =~ /(.*)\.(.*)/;
my $newfile=$path.$name.".$ext";
$newfile =~ s/\s/%20/g;
$newfile =~ s/\/root/http:\/\/myip/;
print $newfile."\n";
}
}
From perldoc -f readline
Reads from the filehandle whose typeglob is contained in EXPR (or from *ARGV if EXPR is not provided)
so \*ARGV is reference to file handle used when reading from <>, and you can use $fh in both cases.

Even if you don't know about ARGV, you could do something simple like this:
sub downloadlinkgen {
my $fh;
print "Printing links\n";
if ($getfilelist==1) {
open $fh, '<', "fuzzyfile" or die $!;
while (<$fh>) {
process_line($_);
}
} else {
while (<>) {
process_line($_);
}
}
}
sub process_line {
my $line = shift;
chomp ($line);
(my $fname,my $path, my $suffix) = fileparse($line);
my ($name, $ext) = $fname =~ /(.*)\.(.*)/;
my $newfile=$path.$name.".$ext";
$newfile =~ s/\s/%20/g;
$newfile =~ s/\/root/http:\/\/myip/;
print $newfile."\n";
}

Related

Nested if statements: Swapping headers and sequences in fasta files

I am opening a directory and processing each file. A sample file looks like this when opened:
>AAAAA
TTTTTTTTTTTAAAAATTTTTTTTTT
>BBBBB
TTTTTTTTTTTTTTTTTTBBBBBTTT
>CCCCC
TTTTTTTTTTTTTTTTCCCCCTTTTT
For the above sample file, I am trying to make them look like this:
>TAAAAAT
AAAAA
>TBBBBBT
BBBBB
>TCCCCCT
CCCCC
I need to find the "header" in next line sequence, take flanks on either side of the match, and then flip them. I want to print each file's worth of contents to another separate file.
Here is my code so far. It runs without errors, but doesn't generate any output. My guess is this is probably related to the nested if statements. I have never worked with those before.
#!/usr/bin/perl
use strict;
use warnings;
my ($directory) = #ARGV;
my $dir = "$directory";
my #ArrayofFiles = glob "$dir/*";
my $count = 0;
open(OUT, ">", "/path/to/output_$count.txt") or die $!;
foreach my $file(#ArrayofFiles){
open(my $fastas, $file) or die $!;
while (my $line = <$fastas>){
$count++;
if ($line =~ m/(^>)([a-z]{5})/i){
my $header = $2;
if ($line !~ /^>/){
my $sequence .= $line;
if ($sequence =~ m/(([a-z]{1})($header)([a-z]{1}))/i){
my $matchplusflanks = $1;
print OUT ">", $matchplusflanks, "\n", $header, "\n";
}
}
}
}
}
How can I fix this code? Thanks.
Try this
foreach my $file(#ArrayofFiles)
{
open my $fh," <", $file or die"error opening $!\n";
while(my $head=<$fh>)
{
chomp $head;
$head=~s/>//;
my $next_line = <$fh>;
my($extract) = $next_line =~m/(.$head.)/;
print ">$extract\n$head\n";
}
}
There are several mistakes in your code but the main problem is:
if ($line =~ m/(^>)([a-z]{5})/i) {
my $header = $2;
if ($line !~ /^>/) {
# here you write to the output file
Because the same line can't start and not start with > at the same time, your output files are never written. The second if statement always fails and its block is never executed.
open(OUT, ">", "/path/to/output_$count.txt") or die $!; and $count++ are misplaced. Since you want to produce an output file (with a new name) for each input file, you need to put them in the foreach block, not outside or in the while loop.
Example:
#!/usr/bin/perl
use strict;
use warnings;
my ($dir) = #ARGV;
my #files = glob "$dir/*";
my $count;
my $format = ">%s\n%s\n";
foreach my $file (#files) {
open my $fhi, '<', $file
or die "Can't open file '$file': $!";
$count++;
my $output_path = "/path/to/output_$count.txt";
open my $fho, '>', $output_path
or die "Can't open file '$output_path': $!";
my ($header, $seq);
while(<$fhi>) {
chomp;
if (/^>([a-z]{5})/i) {
if ($seq) { printf $fho $format, $seq =~ /([a-z]$header[a-z])/i, $header; }
($header, $seq) = ($1, '');
} else { $seq .= $_; }
}
if ($seq) { printf $fho $format, $seq =~ /([a-z]$header[a-z])/i, $header; }
}
close $fhi;
close $fho;

RTF to TEXT conversion using perl

Can somebody tell me how can we convert the rtf file into text with all the tags, tables and formatted data using perl programming language ?
#Ahmad Bilal , #petersergeant : I have been using the below code for RTF to TXT conversion and i am able to convert into text. But the problem is i am unable to capture table or image formats and even all the entities in the inputfile are not captured using the program.
use 5.8.0;
use strict;
use warnings;
use Getopt::Long;
use Pod::Usage;
use RTF::HTMLConverter;
#-------------------------------------------------------------------
#Variable Declarions
#-------------------------------------------------------------------
my $tempfile = "";
my $Outfile = "";
my $txtfile = "";
my $URL = "";
my $Format = "";
my $TreeBuilder = "";
my $Parsed = "";
my $line = "";
my %opts;
GetOptions(
"help|h|?" => \$opts{help},
"man|m" => \$opts{man},
"dom=s" => \$opts{dom},
"noimages|n" => \$opts{noimages},
"imagedir|d=s" => \$opts{imagedir},
"imageuri|u=s" => \$opts{imageuri},
"encoding|e=s" => \$opts{encoding},
"indented|i=i" => \$opts{indented},
);
pod2usage(-verbose => 1, -exitval => 0) if $opts{help};
pod2usage(-verbose => 2, -exitval => 0) if $opts{man};
my %params;
if($opts{dom}){
eval "require $opts{dom}";
die $# if $#;
$params{DOMImplementation} = $opts{dom};
}else{
eval { require XML::GDOME };
if($#){
eval { require XML::DOM };
die "Can't load either XML::GDOME or XML::DOM\n" if $#;
$params{DOMImplementation} = 'XML::DOM';
}
}
if($opts{noimages}){
$params{discard_images} = 1;
}else{
$params{image_dir} = $opts{imagedir} if defined $opts{imagedir};
$params{image_uri} = $opts{imageuri} if defined $opts{imageuri};
}
$params{codepage} = $opts{encoding} if $opts{encoding};
$params{formatting} = $opts{indented} if defined $opts{indented};
#-----------------------------------------------
# Converting RTF to HTML
#-----------------------------------------------
if(defined $ARGV[0]){
open(FR, "< $ARGV[0]") or die "Can't open '$ARGV[0]': $!!\n";
$params{in} = \*FR;
$tempfile = $ARGV[0];
$tempfile =~ /^(.*?)rtf/;
$Outfile = $1."html";
$txtfile = $1."txt";
open(FW, "> $Outfile") or die "Can't open '$Outfile': $!!\n";
$params{out} = \*FW;
print "\n$Outfile - HTML Created\n"
}
my $parser = RTF::HTMLConverter->new(%params);
$parser->parse();
close FW;
#-----------------------------------------------
# Opening HTML and TXT files
#-----------------------------------------------
open (FILE1, ">$txtfile") or die "Can't open '$txtfile': $!!\n";
open (FILE2, "$Outfile") or die "Can't open '$Outfile': $!!\n";
#-----------------------------------------------
# Converting HTML to TXT file
#-----------------------------------------------
local $/ = undef;
while ($line = <FILE2>) {
$line =~ s/\n//g;
$line =~ s/(<!DOCTYPE HTML.*><html><head>.*<\/style>)/<sectd>/;
$line =~ s/<font.*?>//g;
$line =~ s/<\/font>//g;
$line =~ s/<table .*?>/\n<table>\n/g;
$line =~ s/<\/table>/\n<\/table>/g;
$line =~ s/<td .*?>/\n<td>/g;
$line =~ s/<tr>/\n<tr>/g;
$line =~ s/<\/tr>/\n<\/tr>/g;
$line =~ s/<ul.*?>/\n<ul>/g;
$line =~ s/<li.*?>/\n<li>/g;
$line =~ s/<\/ul>/\n<\/ul>/g;
$line =~ s/<\/body><\/html>//g;
$line =~ s/<p.*?>/\n<p>/g;
$line =~ s/<p>( |\*|\s)+<\/p>//g;
$line =~ s/ //g;
$line =~ s/(<sectd>\n?.*?)<\/head><body>/$1/g;
#-------------------
# Entity Conversion
#-------------------
$line =~ s/’/‘/g;
$line =~ s/“/“/g;
$line =~ s/”/”/g;
$line =~ s/¶/¶/g;
print FILE1 $line;
}
print "$txtfile - TXT file Created \n";
close FILE1;
close FILE2;
unlink ("$Outfile");
I am the author of the linked module. Don't use it. If at all possible, shell out to a real RTF to text convertor like Pandoc.
you need to use a module like this:
http://search.cpan.org/~sargie/RTF-Parser-1.12/lib/RTF/TEXT/Converter.pm

Can i collect the output of find(\&wanted, #directories) in an array

I am writing a script which will traverse the directory(including subdir also) and push the desired file in an array so that i can work on each file.
Here is my code:
use strict;
use warnings;
use File::Find;
my $path = $ARGV[0];
find({ wanted => \&GetappropriateFile }, $path);
sub GetappropriateFile
{
my $file = $_;
my #all_file;
# print "$file\n";
if ( -f and /traces[_d+]/)
{
#print "$file\n";
open(my $fh, "<", $file) or die "cannot open file:$!\n";
while( my $line = <$fh>){
$line =~ /Cmd\sline:\s+com.android*/;
push(#all_file,$file);
#print "$file\n";
}
close($fh);
#print"#all_file\n";
}
}
Problem Area : my $file = $_;
Instead of using " $file" if i could get a way to use an array here then i can easily read those files one by one and filter it.
Here what i am tring to do is : I have to open each file and check for the string "Cmd line: com.android" as soon as i get this string in the file i have to push this current file in an array and start reading the another file.
It would be better to avoid global vars.
use strict;
use warnings;
use File::Find qw( find );
sub IsAppropriateFile {
my ($file) = #_;
if (-f $file && $file =~ /traces[_d+]/) {
open(my $fh, "<", $file) or die "cannot open file:$!\n";
while ( my $line = <$fh> ) {
if ($line =~ /Cmd\sline:\s+com.android*/) {
return 1;
}
}
}
return 0;
}
{
my $path = $ARGV[0];
my #matching_files;
find({
wanted => sub {
push #matching_files, $_ if IsAppropriateFile($_);
},
}, $path);
print("$_\n") for #matching_files; # Or whatever.
}
Put declaration of #all_file outside of function, and use it after find() finishes,
my #all_file;
sub GetappropriateFile
{
..
}
You could also stop with file reading after successful match,
if ($line =~ /Cmd\sline:\s+com.android*/) {
push(#all_file, $file);
last;
}

Extracting multiple lines of record/data using a subroutine or functions

Can you show me how to create a subroutine or function using this code?
Basically I want to make my code into a subroutine so I'll be able to re-use it without making my script too long.
Here is my script:
#!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);
my #fields;
my ($tmp_var, $rec_type, $country, $header, $Combline, $records, $line);
my $filename = 'data5.txt';
open (my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";
open my $OUTPUTA, ">", 'drp1.txt' or die $!;
open my $OUTPUTB, ">", 'drp2.txt' or die $!;
while (<$input_fh>) {
$line = _trim($_);
#fields = split (/\|/, $line);
$rec_type = $fields[0];
$country = $fields[1];
my $string = substr $fields[1], 0, 1;
$header = $line if(/^INVHDR/);
if ($rec_type eq 'INVDET') {
if ($string eq 'I') {
$records = $header . $line;
print $OUTPUTA $records, scalar <$input_fh>;
}
else {
$records = $header . $line;
print $OUTPUTB $records, scalar <$input_fh>;
}
}
}
close $OUTPUTA or die $!;
close $OUTPUTB or die $!;
sub _trim {
my $word = shift;
if ( $word ) {
$word =~ s/\s*\|/\|/g; #remove trailing spaces
$word =~ s/"//g; #remove double quotes
}
return $word;
}
This is the part of the script that I wanted to put in a subroutine or function:
$line = _trim($_);
#fields = split (/\|/, $line);
$rec_type = $fields[0];
$country = $fields[1];
my $string = substr $fields[1], 0, 1;
$header = $line if (/^INVHDR/);
if ($rec_type eq 'INVDET') {
if ($string eq 'I') {
$records = $header . $line;
print $OUTPUTA $records, scalar <$input_fh>;
}
else {
$records = $header . $line;
print $OUTPUTB $records, scalar <$input_fh>;
}
}
I would suggest breaking it out a little differently and expand on your _trim function, turning it into a parse function:
use strict;
use warnings;
open( my $input_fh, '<', 'data5.txt' ) or die "Can't open $filename: $!";
open( my $OUTPUTA, '>', 'drp1.txt' ) or die $!;
open( my $OUTPUTB, '>', 'drp2.txt' ) or die $!;
my $header = '';
while (<$input_fh>) {
if ($_ =~ /^INVHDR/) {
$header = $_;
}
if ($_ =~ /^INVDET/) {
my #data = parse($_);
my $line = $header . join('|', #data);
# scalar <$input_fh> is almost certainly not doing what you expect,
# though I'm not sure what you're try to accomplish with it
if ( $data[1] =~ /^I/ ) {
print $OUTPUTA $line;
} else {
print $OUTPUTB $line;
}
}
}
sub parse {
my $input = shift || return;
my $input =~ s/"//g; # remove double quotes
# Here I've combined the removal of trailing spaces with the split.
my #fields = split( m{\s*\|}, $input );
return #fields;
}

Using perl, how do I search a text file for _NN (at the end of a word) and print the word in front?

This gives the whole line:
#!/usr/bin/perl
$file = 'output.txt';
open(txt, $file);
while($line = <txt>) {
print "$line" if $line =~ /_NN/;
}
close(txt);
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
binmode(STDOUT, ":utf8") || die;
my $file = "output.txt";
open(TEXT, "< :utf8", $file) || die "Can't open $file: $!";
while(<TEXT>) {
print "$1\n" while /(\w+)_NN\b/g;
}
close(TEXT) || die "Can't close $file: $!";
Your answer script reads a bit awkwardly, and has a couple of potential errors. I'd rewrite the main logic loop like so:
foreach my $line (grep { /expend_VB/ } #sentences) {
my #nouns = grep { /_NN/ } split /\s+/, $line;
foreach my $word (#nouns) {
$word =~ s/_NN//;
print "$word\n";
}
print "$line\n" if scalar(#nouns);
}
You need to put the my declaration inside the loop - otherwise it will persist longer than you want it to, and could conceivably cause problems later.
foreach is a more common perl idiom for iterating over a list.
print "$1" if $line =~ /(\S+)_NN/;
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
my $search_key = "expend"; ## CHANGE "..." to <>
open(my $tag_corpus, '<', "ch13tagged.txt") or die $!;
my #sentences = <$tag_corpus>; # This breaks up each line into list
my #words;
for (my $i=0; $i <= #sentences; $i++) {
if ( defined( $sentences[$i] ) and $sentences[$i] =~ /($search_key)_VB.*/i) {
#words = split /\s/,$sentences[$i]; ## \s is a whitespace
for (my $j=0; $j <= #words; $j++) {
#FILTER if word is noun:
if ( defined( $words[$j] ) and $words[$j] =~ /_NN/) {
#PRINT word and sentence:
print "**",split(/_\S+/,$words[$j]),"**", "\n";
print split(/_\S+/,$sentences[$i]), "\n"
}
} ## put print sentences here to print each sentence after all the nouns inside
}
}
close $tag_corpus || die "Can't close $tag_corpus: $!";