I want to read html files entered from STDIN perform some function and then write another html file to STDOUT. My problem is I have to give file in the stated manner. I have tried many things but nothing is working good. Nothing is getting printed
my command line prompt
perl universe.pl<sun.html>galaxy.html
my code -
open(my $in, "<&STDIN") or die "Can't dup STDIN: $!";
open(my $out, ">&STDOUT") or die "Can't dup STDOUT: $!";
my #lines = <$in>;
foreach(#lines) {
push(#newlines,$_);
say "Lines pushing:", $_;
}
You don't need to open STDIN or STDOUT; they're always ready opened.
You don't need to slurp the whole file into memory as you do with:
my #lines = <$in>;
You never use $out which should be indicative of a problem.
while (<>)
{
print mapping_func($_);
}
where mapping_func() is your function that does the relevant transform on its input string and returns the mapped result:
sub mapping_func
{
my($string) = #_;
$string =~ s/html/HTML/gi;
return $string;
}
Using the magic diamond operator <>, you will able to do what you asked. But please to provide some more search efforts next time.
use strict; use warnings;
while (my $line = <>) {
# do something with $line
}
Last but not least; if you have to parse HTML the good way, have a look to HTML::TreeBuilder::XPath or just HTML::TreeBuilder
I had this problem with a script which was run from inside another Perl script.
The solution was to set $| in the second script (the one which executes the script which reads from STDIN).
Related
So I am quite new to perl programming. I have two txt files, combined_gff.txt and pegs.txt.
I would like to check if each line of pegs.txt is a substring for any of the lines in combined_gff.txt and output only those lines from combined_gff.txt in a separate text file called output.txt
However my code returns empty. Any help please ?
P.S. I should have mentioned this. Both the contents of the combined_gff and pegs.txt are present as rows. One row has a string. second row has another string. I just wish to pickup the rows from combined_gff whose substrings are present in pegs.txt
#!/usr/bin/perl -w
use strict;
open (FILE, "<combined_gff.txt") or die "error";
my #gff = <FILE>;
close FILE;
open (DATA, "<pegs.txt") or die "error";
my #ext = <DATA>;
close DATA;
my $str = ''; #final string
foreach my $gffline (#gff) {
foreach my $extline (#ext) {
if ( index($gffline, $extline) != -1) {
$str=$str.$gffline;
$str=$str."\n";
exit;
}
}
}
open (OUT, ">", "output.txt");
print OUT $str;
close (OUT);
The first problem is exit. The output file is never created if a substring is found.
The second problem is chomp: you don't remove newlines from the lines, so the only way how a substring can be found is when a string from pegs.txt is a suffix of a string from combined_gff.txt.
Even after fixing these two problems, the algorithm will be very slow, as you're comparing each line from one file to each line of the second file. It will also print a line multiple times if it contains several different substrings (not sure if that's what you want).
Here's a different approach: First, read all the lines from pegs.txt and assemble them into a regex (quotemeta is needed so that special characters in substrings are interpreted literally in the regex). Then, read combined_gff.txt line by line, if the regex matches the line, print it.
#!/usr/bin/perl
use warnings;
use strict;
open my $data, '<', 'pegs.txt' or die $!;
chomp( my #ext = <$data> );
my $regex = join '|', map quotemeta, #ext;
open my $file, '<', 'combined_gff.txt' or die $!;
open my $out, '>', 'output.txt' or die $!;
while (<$file>) {
print {$out} $_ if /$regex/;
}
close $out;
I also switched to 3 argument version of open with lexical filehandles as it's the canonical way (3 argument version is safe even for files named >file or rm *| and lexical filehandles aren't global and are easier to pass as arguments to subroutines). Also, showing the actual error is more helpful than just dying with "error".
As choroba says you don't need the "exit" inside the loop since it ends the complete execution of the script and you must remove the line forwards (LF you do it by chomp lines) to find the matches.
Following the logic of your script I made one with the corrections and it worked fine.
#!/usr/bin/perl -w
use strict;
open (FILE, "<combined_gff.txt") or die "error";
my #gff = <FILE>;
close FILE;
open (DATA, "<pegs.txt") or die "error";
my #ext = <DATA>;
close DATA;
my $str = ''; #final string
foreach my $gffline (#gff) {
chomp($gffline);
foreach my $extline (#ext) {
chomp($extline);
print $extline;
if ( index($gffline, $extline) > -1) {
$str .= $gffline ."\n";
}
}
}
open (OUT, ">", "output.txt");
print OUT $str;
close (OUT);
Hope it works for you.
Welcho
I have a file with the format below
locale,English,en_AU,6251
locale,French,fr_BE,25477
charmap,English,EN,5423
And I would like to use perl to print out something with the option "-a" follows by the file and outputs something like
Available locales:
en_Au
fr_BE
EN
To do that, I have the perl script below
$o = $ARGV[0];
$f = $ARGV[1];
open (INFILE, "<$f") or die "error";
my $line = <INFILE>;
my #fields = split(',', $line);
if($o eq "-a"){
if(!$fields[2]){print "No locales available\n";}
else{print "Available locales: \n";
while($fields[2]){print "$fields[2]\n";}
}
}
close(INFILE);
And I have three questions here.
1. my script will only print the first locale "en_Au" forever.
2. it should be able to test if a file is empty, but if a file is purely empty, it outputs nothing, but if I type in two empty lines in the file, it prints two lines of "No locales available" instead.
3.In fact in the (!$filed[2]) part I should verify if the file is empty or no available locales exist, if so do I need to put some regular expression here to verify if it is a locale as well??
Hope someone could help me figure these out! Many thanks!!!
The biggest missing thing is a loop over lines from the file, in which you then process one line at a time. Comments follow the code.
use warnings;
use strict;
use feature 'say';
use Getopt::Long;
#my ($opt, $file) = #ARGV; # better use a module
my ($opt, $file);
Getoptions( 'a' => \$opt, 'file=s' => \$file ) or usage();
usage() if not $file; # mandatory argument
open my $fh, '<', $file or die "Can't open $file: $!";
while (my $line = <$fh>) {
chomp $line;
my #fields = split /,/, $line;
next if not $fields[2];
if ($opt) {
say $fields[2];
}
}
close $fh;
sub usage {
say STDERR "Usage: $0 [-a] --file filename";
exit 1;
}
This prints the desired output. (Is that simple condition on $fields[2] really all you need?)
Comments
Always have use warnings; and use strict; at the beginning
I do not recommend single-letter variable names. One forgets what they mean, it makes the code harder to follow, and it's way too easy to make silly mistakes
The #ARGV can be assigned to variables in a list. Much better, use Getopt::Long module, which checks invocation and allows for far easier interface changes. I set the -a option to act as a "flag," so it just sets a variable ($opt) if it's given. If that should have possible values instead, use 'a=s' => \$opt and check for a value.
Use lexical filehandles and the three-argument open, open my $fh, '<', $file ...
When die-ing print the error, die "... $!";, using $! variable
The "diamond" (angle) operator, <$fh>, reads one line from a file opened with $fh when used in scalar context, as in $line = <$fh>. It advances a pointer in the file as it reads a line so the next time it's used it returns the next line. If you use it in list context then it returns all lines, but when you process a file you normally want to go line by line.
Some of the described logic and requirements aren't clear to me, but hopefully the code above is going to be easier to adjust as needed.
I'm incredibly new to Perl, and never have been a phenomenal programmer. I have some successful BVA routines for controlling microprocessor functions, but never anything embedded, or multi-facted. Anyway, my question today is about a boggle I cannot get over when trying to figure out how to remove duplicate lines of text from a text file I created.
The file could have several of the same lines of txt in it, not sequentially placed, which is problematic as I'm practically comparing the file to itself, line by line. So, if the first and third lines are the same, I'll write the first line to a new file, not the third. But when I compare the third line, I'll write it again since the first line is "forgotten" by my current code. I'm sure there's a simple way to do this, but I have issue making things simple in code. Here's the code:
my $searchString = pseudo variable "ideally an iterative search through the source file";
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
my $count = "0";
open (FILE, $file2) || die "Can't open cutdown.txt \n";
open (FILE2, ">$file3") || die "Can't open output.txt \n";
while (<FILE>) {
print "$_";
print "$searchString\n";
if (($_ =~ /$searchString/) and ($count == "0")) {
++ $count;
print FILE2 $_;
} else {
print "This isn't working\n";
}
}
close (FILE);
close (FILE2);
Excuse the way filehandles and scalars do not match. It is a work in progress... :)
The secret of checking for uniqueness, is to store the lines you have seen in a hash and only print lines that don't exist in the hash.
Updating your code slightly to use more modern practices (three-arg open(), lexical filehandles) we get this:
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
open my $in_fh, '<', $file2 or die "Can't open cutdown.txt: $!\n";
open my $out_fh, '>', $file3 or die "Can't open output.txt: $!\n";
my %seen;
while (<$in_fh>) {
print $out_fh unless $seen{$_}++;
}
But I would write this as a Unix filter. Read from STDIN and write to STDOUT. That way, your program is more flexible. The whole code becomes:
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
while (<>) {
print unless $seen{$_}++;
}
Assuming this is in a file called my_filter, you would call it as:
$ ./my_filter < /tmp/cutdown.txt > /tmp/output.txt
Update: But this doesn't use your $searchString variable. It's not clear to me what that's for.
If your file is not very large, you can store each line readed from the input file as a key in a hash variable. And then, print the hash keys (ordered). Something like that:
my %lines = ();
my $order = 1;
open my $fhi, "<", $file2 or die "Cannot open file: $!";
while( my $line = <$fhi> ) {
$lines {$line} = $order++;
}
close $fhi;
open my $fho, ">", $file3 or die "Cannot open file: $!";
#Sort the keys, only if needed
my #ordered_lines = sort { $lines{$a} <=> $lines{$b} } keys(%lines);
for my $key( #ordered_lines ) {
print $fho $key;
}
close $fho;
You need two things to do that:
a hash to keep track of all the lines you have seen
a loop reading the input file
This is a simple implementation, called with an input filename and an output filename.
use strict;
use warnings;
open my $fh_in, '<', $ARGV[0] or die "Could not open file '$ARGV[0]': $!";
open my $fh_out, '<', $ARGV[1] or die "Could not open file '$ARGV[1]': $!";
my %seen;
while (my $line = <$fh_in>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $fh_out $line;
}
# remember this line
$seen{$line}++;
}
To test it, I've included it with the DATA handle as well.
use strict;
use warnings;
my %seen;
while (my $line = <DATA>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $line;
}
# remember this line
$seen{$line}++;
}
__DATA__
foo
bar
asdf
foo
foo
asdfg
hello world
This will print
foo
bar
asdf
asdfg
hello world
Keep in mind that the memory consumption will grow with the file size. It should be fine as long as the text file is smaller than your RAM. Perl's hash memory consumption grows a faster than linear, but your data structure is very flat.
File "/root/actual" is not getting over written with content of "/root/temp" via perl script. If manually edited "/root/actual" is getting modified.
copy("/root/actual","/root/temp") or die "Copy failed: $!";
open(FILE, "</root/temp") || die "File not found";
my #lines = <FILE>;
close(FILE);
my #newlines;
foreach(#lines) {
$_ =~ s/$aref1[0]/$profile_name/;
push(#newlines,$_);
}
open(FILE, ">/root/actual") || die "File not found";
print FILE #newlines;
close(FILE);
File "/root/actual" is not getting over written with content of "/root/temp" via perl script. If manually edited "/root/actual" is getting modified.
Do you mean that /root/temp isn't being replaced by /root/actual? Or is /root/temp being modified as you wish, but it's not copying over /root/acutual at the end of your program?
I suggest that you read up on modern Perl programming practices. You need to have use warnings; and use strict; in your program. In fact, many people on this forum won't bother answering Perl questions unless use strict; and use warnings; are used.
Where is $aref1[0] coming from? I don't see #aref1 declared anywhere in your program. Or, for that matter $profile_name.
If you're reading in the entire file into a regular expression, there's no reason to copy it over to a temporary file first.
I rewrote what you had in a more modern syntax:
use strict;
use warnings;
use autodie;
use constant {
FILE_NAME => 'test.txt',
};
my $profile_name = "bar"; #Taking a guess
my #aref1 = qw(foo ??? ??? ???); #Taking a guess
open my $input_fh, "<", FILE_NAME;
my #lines = <$input_fh>;
close $input_fh;
for my $line ( #lines ) {
$line =~ s/$aref1[0]/$profile_name/;
}
open my $output_fh, ">", FILE_NAME;
print ${output_fh} #lines;
close $output_fh;
This works.
Notes:
use autodie; means you don't have to check whether files opened.
When I use a for loop, I can do inplace replacing in an array. Each item is a pointer to that entry in the array.
No need for copy or a temporary file since you're replacing the original file anyway.
I didn't use it here since you didn't, but map { s/$aref1[0]/$profile_name/ } #lines; can replace that for loop. See map.
I'm trying to wrap my head around IPC::Run to be able to do the following. For a list of files:
my #list = ('/my/file1.gz','/my/file2.gz','/my/file3.gz');
I want to execute a program that has built-in decompression, does some editing and filtering to them, and prints to stdout, giving some stats to stderr:
~/myprogram options $file
I want to append the stdout of the execution for all the files in the list to one single $out file, and be able to parse and store a couple of lines in each stderr as variables, while letting the rest be written out into separate fileN.log files for each input file.
I want stdout to all go into a ">>$all_into_one_single_out_file", it's the err that I want to keep in different logs.
After reading the manual, I've gone so far as to the code below, where the commented part I don't know how to do:
for $file in #list {
my #cmd;
push #cmd, "~/myprogram options $file";
IPC::Run::run \#cmd, \undef, ">>$out",
sub {
my $foo .= $_[0];
#check if I want to keep my line, save value to $mylog1 or $mylog2
#let $foo and all the other lines be written into $file.log
};
}
Any ideas?
First things first. my $foo .= $_[0] is not necessary. $foo is a new (empty) value, so appending to it via .= doesn't do anything. What you really want is a simple my ($foo) = #_;.
Next, you want to have output go to one specific file for each command while also (depending on some conditional) putting that same output to a common file.
Perl (among other languages) has a great facility to help in problems like this, and it is called closure. Whichever variables are in scope at the time of a subroutine definition, those variables are available for you to use.
use strict;
use warnings;
use IPC::Run qw(run new_chunker);
my #list = qw( /my/file1 /my/file2 /my/file3 );
open my $shared_fh, '>', '/my/all-stdout-goes-here' or die;
open my $log1_fh, '>', '/my/log1' or die "Cannot open /my/log1: $!\n";
open my $log2_fh, '>', '/my/log2' or die "Cannot open /my/log2: $!\n";
foreach my $file ( #list ) {
my #cmd = ( "~/myprogram", option1, option2, ..., $file );
open my $log_fh, '>', "$file.log"
or die "Cannot open $file.log: $!\n";
run \#cmd, '>', $shared_fh,
'2>', new_chunker, sub {
# $out contains each line of stderr from the command
my ($out) = #_;
if ( $out =~ /something interesting/ ) {
print $log1_fh $out;
}
if ( $out =~ /something else interesting/ ) {
print $log2_fh $out;
}
print $log_fh $out;
return 1;
};
}
Each of the output file handles will get closed when they're no longer referenced by anything -- in this case at the end of this snippet.
I fixed your #cmd, though I don't know what your option1, option2, ... will be.
I also changed the way you are calling run. You can call it with a simple > to tell it the next thing is for output, and the new_chunker (from IPC::Run) will break your output into one-line-at-a-time instead of getting all the output all-at-once.
I also skipped over the fact that you're outputting to .gz files. If you want to write to compressed files, instead of opening as:
open my $fh, '>', $file or die "Cannot open $file: $!\n";
Just open up:
open my $fh, '|-', "gzip -c > $file" or die "Cannot startup gzip: $!\n";
Be careful here as this is a good place for command injection (e.g. let $file be /dev/null; /sbin/reboot. How to handle this is given in many, many other places and is beyond the scope of what you're actually asking.
EDIT: re-read problem a bit more, and changed answer to more closely reflect the actual problem.
EDIT2:: Updated per your comment. All stdout goes to one file, and the stderr from command is fed to the inline subroutine. Also fixed a stupid typo (for syntax was pseudo code not Perl).