Getstore to Buffer, not using temporary files - perl

I've started Perl recently and mixed quite a bit of things to get what I want.
My script gets the content of a webpage, writes it to a file.
Then I open a filehandler, plug the file report.html in (sorry i'm not english, i don't know how to say it better) and parse it.
I write every line i encounter to a new file, except lines containing a specific color.
It works, but I'd like to try another way which doesn't require me to create a "report.html" temporary file.
Furthermore, I'd like to print my result directly in a file, I don't want to have to use a system redirection '>'. That'd mean my script has to be called by another .sh script, and I don't want that.
use strict;
use warnings;
use LWP::Simple;
my $report = "report.html";
getstore('http://test/report.php', 'report.html') or d\
ie 'Unable to get page\n';
open my $fh2, "<$report" or die("could not open report file : $!\n");
while (<$fh2>)
{
print if (!(/<td style="background-color:#71B53A;"/ .. //));
}
close($fh2);
Thanks for your help

If you have got the html content into a variable, you can use a open call on this variable. Like:
my $var = "your html content\ncomes here\nstored into this variable";
open my $fh, '<', \$var;
# .. just do the things you like to $fh
You can try get function in LWP::Simple Module ;)
To your sencond question, use open like open $fh, '<', $filepath. you can use perldoc -f open to see more info.

Related

Printing a text file using perl

I am completely new to this and this should be the easiest thing to do but for some reason I cannot get my local text file to print. After trying multiple times with different code I came to use the following code but it doesn't print.
I have searched for days on various threads to solve this and have had no luck. Please help. Here is my code:
#!/usr/bin/perl
$newfile = "file.txt";
open (FH, $newfile);
while ($file = <FH>) {
print $file;
}
I updated my code to the following:
#!/user/bin/perl
use strict; # Always use strict
use warnings; # Always use warnings.
open(my $fh, "<", "file.txt") or die "unable to open file.txt: $!";
# Above we open file using 3 handle method
# or die die with error if unable to open it.
while (<$fh>) { # While in the file.
print $_; # Print each line
}
close $fh; # Close the file
system('C:\Users\RSS\file.txt');
It returns the following: my first report generated by perl. I do not know where this is coming from. Nowhere do I have a print "my first report generated by perl."; statement and it definitely is not in my text file.
My text file is full of various emails, addresses, phone numbers and snippets of emails.
Thank you all for your help. I figured out my problem. I somehow managed to kick myself out of my directory and did not realize it.
This is most likely a combination of a failure to open the file, and a failure to check the return value of open.
If you are completely new to perl, I warmly recommend reading the excellent "perlintro" man page, using either man perlintro or perldoc perlintro on the command line, or taking a look here: https://perldoc.perl.org/perlintro.html.
The "Files and I/O" section there gives a good and concise way of doing this:
open(my $in, "<", "input.txt") or die "Can't open input.txt: $!";
while (<$in>) { # assigns each line in turn to $_
print "Just read in this line: $_";
}
This version will give you an explanation and abort if anything goes wrong while trying to open the file. For example, if there is no file named file.txt in the current working directory, your version will quietly fail to open the file, and afterwards it will quietly fail to read from the closed file handle.
Also, always adding at least one of these to your perl scripts will save you a lot of trouble in the long run:
use warnings; # or use the -w command line switch to turn warnings on globally
use diagnostics;
These won't catch the failure to open the file, but will alert on the failed read.
In the first example here you can see that without the diagnostics module, the code fails without any error messages. The second example shows how the diagnostics module changes this.
$ perl -le 'open FH, "nonexistent.txt"; while(<FH>){print "foo"}'
$ perl -le 'use diagnostics; open FH, "nonexistent.txt"; while(<FH>){print "foo"}'
readline() on closed filehandle FH at -e line 1 (#1)
(W closed) The filehandle you're reading from got itself closed sometime
before now. Check your control flow.
By the way, the legendary "Camel Book" is basically the perl man pages formatted for paper printing, so reading the perldocs in the order listed in perldoc perl will give you a high level of understanding of the language in a reasonably accessible and inexpensive manner.
Happy hacking!
This is simple and including explanations.
use strict; # Always use strict
use warnings; # Always use warnings.
open(my $fh, "<", "file.txt") or die "unable to open file.txt: $!";
# Above we open file using 3 handle method
# or die die with error if unable to open it.
while (<$fh>) { # While in the file.
print $_; # Print each line
}
close $fh; # Close the file
There is then also the case where you are trying to open a file which is not in a location where you think it is. So consider doing full path, if not in the same dir.
open(my $fh, "<", 'F:\Workdir\file.txt') or die "unable to open < input.txt: $!";
EDIT: After your comments, it seems that you are opening an empty file. Please add this at the bottom of that same script and rerun. It will open the file in C:\Users\RSS and make sure it does actually contain data?
system('C:\Users\RSS\file.txt');
First, of all as you are starting out, it is better to enable all warnings by 'use warnings' and disable all such expression which can lead to uncertain behavior or are difficult to debug by pragma 'use strict'.
As you are dealing with file stream, it is always recommended to the check if you were able to open the stream. so, try to use croak or die both would terminate the program with a given message.
Instead of reading inside the while condition, I would recommend checking for end of file. So, loop breaks as end is found. Usually, when reading a line you would use it for further processing, so it is good idea to remove end of lines using chomp.
A sample for reading a file in perl can be as follows:
#!/user/bin/perl
use strict;
use warnings;
my $newfile = "file.txt";
open (my $fh, $newfile) or die "Could not open file '$newfile' $!";
while (!eof($fh))
{
my $line=<$fh>;
chomp($line);
print $line , "\n";
}

Perl: Open a file from a URL

I wanted to know how to open a file from a URL rather than a local file and I found the following answer on another thread:
use IO::String;
my $handle = IO::String->new(get("google.com"));
my #lines = <$handle>;
close $handle;
This works perfectly... on my PC...
But when I transferred the code over to my hosted server it complains that it can't find the IO module. So is there another way to open a file from an URL, that doesn't require any external modules (or uses one that is pretty much installed on every server)...?
You can install PerlIO::http, which will give you an input layer for opening a filehandle from a URL via open. This thing is not included in the Perl core, but it will work with Perls as early as 5.8.9.
Once you've installed it, all you need to do is open with a layer :http in the mode argument. There is nothing to use here. That happens automatically.
open my $fh, '<:http', 'https://metacpan.org/recent';
You can then read from $fh like a regular file. Under the hood it will take care of getting the data over the wire.
while (my $line = <$fh>) { ... }
There is no way to "open a file from a URL" as you ask. Well, I suppose you could throw something together using the progress() callback from LWP::UserAgent, but even then I don't think it would work how you want it to.
But you can make something that looks like it's doing what you want pretty easily. Actually, what we're really doing is pulling all the data back from the URL and then opening a filehandle on a string that contains that data.
use LWP::Simple;
my $data = get('https://google.com');
open my $url_fh, '<', \$data or die $!;
# Now $url_fh is a filehandle wrapped around your data.
# Treat it like any other filehandle.
while (<$url_fh>) {
print;
}
Your problem was that IO::String wasn't installed. But there's no need to install it, as it's simple enough to do what it does with standard Perl features (simply open a filehandle on a reference to a string).
Update: IO::String is completely unnecessary here. Not only because you can do what it does very simply, by just opening a filehandle on a reference to your string, but also because all you want to do is to read a file from a web site into an array. And in that case, your code is simply:
use LWP::Simple;
my $url = 'something';
my #records = split /\n/, get($url);
You might even consider adding some error handing.
use LWP::Simple;
my $url = 'something';
my $data = get($url);
die "No data found\n" unless defined $data;
my #array = split /\n/, get($url);

perl: canot open file within a loop

I am trying to read in a bunch of similar files and process them one by one. Here is the code I have. But somehow the perl script doesn't read in the files correctly. I'm not sure how to fix it. The files are definitely readable and writable by me.
#!/usr/bin/perl
use strict;
use warnings;
my #olap_f = `ls /full_dir_to_file/*txt`;
foreach my $file (#olap_f){
my %traits_h;
open(IN,'<',$file) || die "cannot open $file";
while(<IN>){
chomp;
my #array = split /\t/;
my $trait = $array[4];
$traits_h{$trait} ++;
}
close IN;
}
When I run it, the error message (something like below) showed up:
cannot open /full_dir_to_file/a.txt
You have newlines at the end of each filename:
my #olap_f = `ls ~dir_to_file/*txt`;
chomp #olap_f; # Remove newlines
Better yet, use glob to avoid launching a new process (and having to trim newlines):
my #olap_f = glob "~dir_to_file/*txt";
Also, use $! to find out why a file couldn't be opened:
open(IN,'<',$file) || die "cannot open $file: $!";
This would have told you
cannot open /full_dir_to_file/a.txt
: No such file or directory
which might have made you recognize the unwanted newline.
I'll add a quick plug for IO::All here. It's important to know what's going on under the hood but it's convenient sometimes to be able to do:
use IO::All;
my #olap_f = io->dir('/full_dir_to_file/')->glob('*txt');
In this case it's not shorter than #cjm's use of glob but IO::All does have a few other convenient methods for working with files as well.

backtick vs native way of doing things in PERL

Consider these 2 snippets :
#!/bin/bash/perl
open(DATA,"<input.txt");
while(<DATA>)
{
print($_) ;
}
and
$abcd = `cat input.txt`;
print $abcd;
Both will print the content of file input.txt as output
Question : Is there any standard, as to which one (backticks or native-method) should be preferred over the other, in any particular case or both are equal always??
Reason i am asking this is because i find cat method to be easier than opening a file in native perl method, so, this puts me in doubt that if i can achieve something through backtick way, shall i go with it or prefer other native ways of doing it!!
I checked this thread too : What's the difference between Perl's backticks, system, and exec? but it went a different route than my doubt!!
Use builtin functions wherever possible:
They are more portable: open works on Windows, while `cat input.txt` will not.
They have less overhead: Using backticks will fork, exec a shell which parses the command, which execs the cat program. This unnecessarily loads two programs. This is in contrast to open which is a builtin Perl function.
They make error handling easier. The open function will return a false value on error, which allows you to take different actions, e.g. like terminating the program with an error message:
open my $fh, "<", "input.txt" or die "Couldn't open input.txt: $!";
They are more flexible. For example, you can add encoding layers if your data isn't Latin-1 text:
open my $fh, "<:utf8", "input.txt" or die "Couldn't open input.txt: $!";
open my $fh, "<:raw", "input.bin" or die "Couldn't open input.bin: $!";
If you want a “just read this file into a scalar” function, look at the File::Slurp module:
use File::Slurp;
my $data = read_file "input.txt";
Using the back tick operators to call cat is highly inefficient, because:
It spawns a separate process (or maybe more than one if a shell is used) which does nothing more than read the file, which perl could do itself.
You are reading the whole file into memory instead of processing it one line at a time. OK for a small file, not so good for a large one.
The back tick method is ok for a quick and dirty script but I would not use it for anything serious.

Perl: Substitute text string with value from list (text file or scalar context)

I am a perl novice, but have read the "Learning Perl" by Schwartz, foy and Phoenix and have a weak understanding of the language. I am still struggling, even after using the book and the web.
My goal is to be able to do the following:
Search a specific folder (current folder) and grab filenames with full path. Save filenames with complete path and current foldername.
Open a template file and insert the filenames with full path at a specific location (e.g. using substitution) as well as current foldername (in another location in the same text file, I have not gotten this far yet).
Save the new modified file to a new file in a specific location (current folder).
I have many files/folders that I want to process and plan to copy the perl program to each of these folders so the perl program can make new .
I have gotten so far ...:
use strict;
use warnings;
use Cwd;
use File::Spec;
use File::Basename;
my $current_dir = getcwd;
open SECONTROL_TEMPLATE, '<secontrol_template.txt' or die "Can't open SECONTROL_TEMPLATE: $!\n";
my #secontrol_template = <SECONTROL_TEMPLATE>;
close SECONTROL_TEMPLATE;
opendir(DIR, $current_dir) or die $!;
my #seq_files = grep {
/gz/
} readdir (DIR);
open FASTQFILENAMES, '> fastqfilenames.txt' or die "Can't open fastqfilenames.txt: $!\n";
my #fastqfiles;
foreach (#seq_files) {
$_ = File::Spec->catfile($current_dir, $_);
push(#fastqfiles,$_);
}
print FASTQFILENAMES #fastqfiles;
open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
open SECONTROL, '> secontrol.txt' or die "Can't open SECONTROL: $!\n";
print SECONTROL #secontrol;
close SECONTROL;
close FASTQFILENAMES;
My problem is that I cannot figure out how to use my list of files to replace the "#" in my template text file:
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
The substitute function will not replace the "#" with the list of files listed in $fastqfilenames. I get the "#" replaced with GLOB(0x8ab1dc).
Am I doing this the wrong way? Should I not use substitute as this can not be done, and then rather insert the list of files ($fastqfilenames) in the template.txt file? Instead of the $fastqfilenames, can I substitute with content of file (e.g. s/A/{r file.txt ...). Any suggestions?
Cheers,
JamesT
EDIT:
This made it all better.
foreach (#secontrol_template) {
s/#/$fastqfilenames/g;
push #secontrol, $_;
}
And as both suggestions, the $fastqfiles is a filehandle.
replaced this: open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
with this:
my $fastqfilenames = join "\n", #fastqfiles;
made it all good. Thanks both of you.
$fastqfilenames is a filehandle. You have to read the information out of the filehandle before you can use it.
However, you have other problems.
You are printing all of the filenames to a file, then reading them back out of the file. This is not only a questionable design (why read from the file again, since you already have what you need in an array?), it also won't even work:
Perl buffers file I/O for performance reasons. The lines you have written to the file may not actually be there yet, because Perl is waiting until it has a large chunk of data saved up, to write it all at once.
You can override this buffering behavior in a few different ways (closing the file handle being the simplest if you are done writing to it), but as I said, there is no reason to reopen the file again and read from it anyway.
Also note, the /e option in a regex replacement evaluates the replacement as Perl code. This is not necessary in your case, so you should remove it.
Solution: Instead of reopening the file and reading it, just use the #fastqfiles variable you previously created when replacing in the template. It is not clear exactly what you mean by replacing # with the filenames.
Do you want to to replace each # with a list of all filenames together? If so, you should probably need to join the filenames together in some way before doing the replacement.
Do you want to create a separate version of the template file for each filename? If so, you need an inner for loop that goes over each filename for each template. And you will need something other than a simple replacement, because the replacement will change the original string on the first time through. If you are on Perl 5.16, you could use the /r option to replace non-destructively: push(#secontrol,s/#/$file_name/gr); Otherwise, you should copy to another variable before doing the replacement.
$_ =~ s/#/$fastqfilenames/eg;
$fastqfilenames is a file handle, not the file contents.
In any case, I recommend the use of Text::Template module in order to do this kind of work (file text substitution).