Perl Get the web content then writing it as a text file

Perl Get the web content then writing it as a text file - perl

I'm trying to create a script which get from the website a log file(content) then inputting it to a text file, but I am having errors if use strict is present:
Can't use string ("/home/User/Downloads/text") as a symbol ref while "strict refs" in use at ./scriptname line 92.
Also by removing the use strict: I get another error which is:
File name too long at ./scriptname line 91.
I tried the Perl: Read web text file and "open" it
But, it did not work for me. Plus I am a newbie at Perl and confuse of the Perl syntax.
Are there any suggestions or advices available?
Note: The code does it greps the entire line with the RoomOutProcessTT present and display it together with how many times it appears.
Here is the code.
my $FOutput = get "http://website/Logs/Log_number.ini";
my $FInput = "/home/User/Downloads/text";
open $FInput, '<', $FOutput or die "could not open $FInput: $!";
my $ctr;
my #results;
my #words = <$FInput>;
#results = grep /RoomOutProcessTT/, #words;
print "#results\n";
close $FInput;
open $FInput, '<', $FOutput or die "could not open $FInput: $!";
while(<$FInput>){
$ctr = grep /RoomOutProcessTT/, split ' ' , $_;
$ctr += $ctr;
}
print "RoomOutProcessTT Count: $ctr\n";
close $FInput;

The first argument to open is the filehandle name, not the actual name of the file. That comes later in the open function.
Change your code to:
my $FOutput = get "http://website/Logs/Log_number.ini"; # your content should be stored in this
# variable, you need to write data to your output file.
my $FInput = "/home/User/Downloads/text";
open OUTPUT_FILEHANDLE, '>', $FInput or die "could not open $FInput: $!"; # give a name to the file
# handle, then supply the file name itself after the mode specifier.
# You want to WRITE data to this file, open it with '>'
my $ctr;
my #results;
my #words = split(/(\r|\n)/, $FOutput); # create an array of words from the content from the logfile
# I'm not 100% sure this will work, but the intent is to show
# an array of 'lines' corresponding to the data
# here, you want to print the results of your grep to the output file
#results = grep /RoomOutProcessTT/, #words;
print OUTPUT_FILEHANDLE "#results\n"; # print to your output file
# close the output file here, since you re-open it in the next few lines.
close OUTPUT_FILEHANDLE;
# not sure why you're re-opening the file here... but that's up to your design I suppose
open INPUT_FILEHANDLE, '<', $FInput or die "could not open $FInput: $!"; # open it for read
while(<INPUT_FILEHANDLE>){
$ctr = grep /RoomOutProcessTT/, split ' ' , $_;
$ctr += $ctr;
}
print "RoomOutProcessTT Count: $ctr\n"; # print to stdout
close INPUT_FILEHANDLE; # close your file handle
I might suggest switching the terms you use to identify "input and output", as it's somewhat confusing. The input in this case is actually the file you pull from the web, output being your text file. At least that's how I interpret it. You may want to address that in your final design.

Related

Scan a large .gz file and split it's strings from a known word(which is repeated in the file) and save the all split strings in a .txt file

I'm trying to write a perl script where I'm trying to open and read a .gz file and split it from a known word('.EOM') which is repeated many times in that file and save all the splits in a .txt or .tmp file. That .gz file is very very large( in some GB). I've tried many different ways but every time it's showing the following error at the end.
"panic:sv_setpvn called with negative strlen at perl_gz1.pl line 7, line 38417185 "
here 'per_gz1.pl' is my perl file name and 'line 101' is the line where I've written the following code line: my #spl=split('.EOM',$join);
I don't know what type of error is this and how I can resolve it. Can anyone help to resolve it? Is there another way to do the same without getting this error? Thanks in advance.
I've attached my full code.
I've tried following codes:
use strict ;
use warnings;
my $file = "/nfs/iind/disks/saptak/dsbnatrgd.scntcl.gz";
open(IN, "gzcat $file |",) or die "gunzip $file: $!";
my $join = join('',<IN>);
#print $join;
my #spl=split('.EOM',$join);
print #spl;
close IN;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ;
my $input = "/nfs/iind/disks/cpc_disk0025/saptak/dsbnatrgd.scntcl.gz";
my $output = "NEW1.tmp";
gunzip $input => $output or die "gunzip failed: $GunzipError\n";
my $data = join("", "NEW1.tmp");
#use File::Slurp;
#my $data = read_file("NEW1.tmp");
my #spl=split(/.EOM/,$data)
and
use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ;
use IO::File ;
my $input = new IO::File "</nfs/iind/disks/cpc_disk0025/saptak/dsbnatrgd.scntcl.gz" or die "Cannot open 'file1.txt.gz': $!\n" ;
my $buffer ;
gunzip $input => \$buffer or die "gunzip failed: $GunzipError\n";
print $buffer;
my #spl=split(".EOM",$buffer);
But same error is coming every time.
I expect array #spl will save the file with split every time at the specified word/string and the output print it. So that I can work forward with this array #spl but no output is coming and The error "panic:sv_setpvn called with negative strlen at perl_gz1.pl line 7, line 38417185 " is showing on the output screen.

This might be how I would do it if it was a one time job:
zcat dsbnatrgd.scntcl.gz | perl -ne'sub newf{$n||="0000";$n++;open($fh,">","output_$n.txt")||die}$fh||newf();/(.*)\.EOM(.*)/ and print {$fh} $1 and newf() and print {$fh} $2 or print {$fh} $_'
This gives you a new file output_nnnn.txt each time an .EOM is seen somewhere. nnnn is 0001, 0002 and so on. The .EOM can be seen in the middle of a line as well, then the before and after .EOM is kept as well as the last string in the previous file and the first string in the next file.
The oneliner explained:
sub newf{
$n||="0000";
$n++; #increase the filename counter
open($fh,">","output_$n.txt")||die #open a new output filehandler
}
$fh||newf(); # 1st input line: create $fh file handler if it dont exists
/(.*)\.EOM(.*)/ # if the input line have a .EOM mark, grab whats before and after
and print {$fh} $1 #...and print the before on current file
and newf() #...and open new file
and print {$fh} $2 #...and print the after .EOM to the new file
or print {$fh} $_ #or if no .EOM on current line, just print it to the current output file
(Or did you mean the .EOM mark was uncompressed inside the .gz file? In that case the .gz file is probably invalid)
The reason your approach don't work might be because of very large input. You mentioned that the .gz file was some GB and then the input is probably several times bigger than that even. My approach here don't attempt to keep everything in memory at once so it doesn't matter how big your file is.

Why is this Perl foreach loop only executing only once?

I am trying to copy the content of three separate .vect files into one. I want to do this for all 5,000 files in the $fromdir directory.
When I run this program it generates just a single modified .vect file in the output directory. If I include the close(DATA) calls after individual while loops inside the foreach loop, I get the same behavior: a single output file in the output directory instead of the wanted 5,000 files.
I have done some reading, and at first thought I may not be opening the files. But if I print($vectfile) in the foreach loop every file name in the directory is printed.
My second thought was that it was how I was closing the files, but
I get the same behavior whether
I close the file handles inside or outside the foreach loop.
My final thought was maybe I don't have write permission to the file or directory, but I don't know how to change this.
How can I get this loop to run all 5,000 times and not just once?
use strict;
use warnings;
use feature qw(say);
my $dir = "D:\\Downloads";
# And M3.1 and P3.1
my $subfolder = "A0.1";
my $fromdir = $dir . "\\" . $subfolder;
my #files = <$fromdir/*vect>;
# Top of file
my $readfiletop = "C:\\Users\\Owner\\Documents\\MoreKnotVis\\ScriptsForAdditionalDataSets\\VectFileHeader.vect";
# Bottom of file
my $readfilebottom = "C:\\Users\\Owner\\Documents\\MoreKnotVis\\ScriptsForAdditionalDataSets\\VectFileCloser.vect";
foreach my $vectfile ( #files ) {
say("$vectfile");
my $count = 0;
my $readfilebody = $vectfile;
my $out_file = "D:\\Downloads\\ColorsA0.1\\" . "$count" . ".vect";
$count++;
# open top part of each file
open(DATA1, "<", $readfiletop) or die "Can't open '$readfiletop': $!";
# open bottom part of each file
open(DATA3, "<", $readfilebottom) or die "Can't open '$readfilebottom': $!";
# open a file to read
open(DATA2, "<", $vectfile) or die "Can't open '$vectfile': $!";
# open a file to write to
open(DATA4, ">" ,$out_file) or die "Can't open '$out_file': $!";
# Copy data from VectFileTop file to another.
while ( <DATA1> ) {
print DATA4 $_;
}
# Copy the data from VectFileBody to another.
while ( <DATA2> ) {
print DATA4 $_, $_ if 8..12;
}
# Copy the data from VectFileBottom to another.
while ( <DATA3> ) {
print DATA4 $_;
}
}
close( DATA1 );
close( DATA2 );
close( DATA3 );
close( DATA4 );
print("quit\n");

You construct the output file name including $count in it.
But note what you do with this variable:
initially, but inside the loop you set it to 0,
the output file name is constructed with 0 in it,
then you increment it, but this has no effect, because this variable
is again set to 0 in the next execution of the loop..
The effect is that:
the loop executes the required numer of times,
but the output file name every time contains 0 as the "number",
so you keep overwriting the same file with a new content.
Move my $count = 0; instruction before the loop and everything
should be OK.

You seem to be clinging to a specific form of code in fear of everything falling apart if you change a single thing. I recommend that you dare to stray a little more from the formula so that the code is more concise and readable
The problem is that you reset your $count to zero before processing each input file, so all the output files have the same name and overwrite one another. The remaining output file contains only the data from the last input file
Here's a refactoring of your code. I can't guarantee that it will run correctly but it looks right and does compile
I've added use autodie to avoid having to check the status of every IO operation
I've used the same lexical file handle $fh for all the input file. Opening another file on a file handle that is already open will close it first, and a lexical file handle will be closed by perl when it goes out of scope at the end of the block
I've used a while loop to iterate over the input file names instead of reading the whole list into an array which unnecessarily uses an additional variable #files and wastes space
I've used forward slashes instead of backslashes in all the file paths. This is fine in library calls on Windows: it is only a problem if they appear in command line input
I hope you'll agree that this form is more readable. I think you would have stood a much better chance of finding the problem if your code were in this form
use strict;
use warnings;
use autodie;
use feature qw/ say /;
my $indir = 'D:/Downloads';
my $subdir = 'A0.1'; # And M3.1 and P3.1
my $extrasdir = 'C:/Users/Owner/Documents/MoreKnotVis/ScriptsForAdditionalDataSets';
my $outdir = "$indir/Colors$subdir";
my $topfile = "$extrasdir/VectFileHeader.vect";
my $bottomfile = "$extrasdir/VectFileCloser.vect";
my $filenum;
while ( my $vectfile = glob "$indir/$subdir/*.vect" ) {
say qq/Processing "$vectfile"/;
$filenum++;
open my $outfh, '>', "$outdir/$filenum.vect";
my $fh;
open $fh, '<', $topfile;
print { $outfh } $_ while <$fh>;
open $fh, '<', $vectfile;
while ( <$fh> ) {
print { $outfh } $_, $_ if 8..12;
}
open $fh, '<', $bottomfile;
print { $outfh } $_ while <$fh>;
}
say 'DONE';

Reading and writing to the same file

I'm using this code I found online to read a properties file in my Perl script:
open (CONFIG, "myfile.properties");
while (CONFIG){
chomp; #no new line
s/#.*//; #no comments
s/^\s+//; #no leading white space
s/\s+$//; #no trailing white space
next unless length;
my ($var, $value) = split (/\s* = \s*/, $_, 2);
$$var = $value;
}
Is it posssible to also write to the text file inside this while loop? Let's say the text file looks like this:
#Some comments
a_variale = 5
a_path = /home/user/path
write_to_this_variable = ""
How can I put some text in write_to_this_variable?

It is not really practical to overwrite text files where you have variable length records (lines). It is normal to copy the file, something like this:
my $filename = 'myfile.properites';
open(my $in, '<', $filename) or die "Unable to open '$filename' for read: $!";
my $newfile = "$filename.new";
open(my $out, '>', $newfile) or die "Unable to open '$newfile' for write: $!";
while (<$in>) {
s/(write_to_this_variable =) ""/$1 "some text"/;
print $out;
}
close $in;
close $out;
rename $newfile,$filename or die "unable to rename '$newfile' to '$filename': $!";
You might have to sanitse the text you are writing with something like \Q if it contains non-alphanumerics.

This is an example of a program that uses the Config::Std module to read an write a simple config file like yours. As far as I know it is the only module that will preserve any comments in the original file.
There are two points to note:
The first hash key in $props{''}{write_to_this_variable} forms the name of the config file section that will contain the value. If there are no sections, as for your file, then you must use an empty string here
If you need quotes around the a value then you must add these explicitly when you are assigning to the hash element, as I do here with '"Some text"'
I think the rest of the program is self-explanatory.
use strict;
use warnings;
use Config::Std { def_sep => ' = ' };
my %props;
read_config 'myfile.properties', %props;
$props{''}{write_to_this_variable} = '"Some text"';
write_config %props;
output
#Some comments
a_variale = 5
a_path = /home/user/path
write_to_this_variable = "Some text"

How do I copy a CSV file, but skip the first line?

I want to write a script that takes a CSV file, deletes its first row and creates a new output csv file.
This is my code:
use Text::CSV_XS;
use strict;
use warnings;
my $csv = Text::CSV_XS->new({sep_char => ','});
my $file = $ARGV[0];
open(my $data, '<', $file) or die "Could not open '$file'\n";
my $csvout = Text::CSV_XS->new({binary => 1, eol => $/});
open my $OUTPUT, '>', "file.csv" or die "Can't able to open file.csv\n";
my $tmp = 0;
while (my $line = <$data>) {
# if ($tmp==0)
# {
# $tmp=1;
# next;
# }
chomp $line;
if ($csv->parse($line)) {
my #fields = $csv->fields();
$csvout->print($OUTPUT, \#fields);
} else {
warn "Line could not be parsed: $line\n";
}
}
On the perl command line I write: c:\test.pl csv.csv and it doesn't create the file.csv output, but when I double click the script it creates a blank CSV file. What am I doing wrong?

Your program isn't ideally written, but I can't tell why it doesn't work if you pass the CSV file on the command line as you have described. Do you get the errors Could not open 'csv.csv' or Can't able to open file.csv? If not then the file must be created in your current directory. Perhaps you are looking in the wrong place?
If all you need to do is to drop the first line then there is no need to use a module to process the CSV data - you can handle it as a simple text file.
If the file is specified on the command line, as in c:\test.pl csv.csv, you can read from it without explicitly opening it using the <> operator.
This program reads the lines from the input file and prints them to the output only if the line counter (the $. variable) isn't equal to one).
use strict;
use warnings;
open my $out, '>', 'file.csv' or die $!;
while (my $line = <>) {
print $out $line unless $. == 1;
}

Yhm.. you don't need any modules for this task, since CSV ( comma separated value ) are simply text files - just open file, and iterate over its lines ( write to output all lines except particular number, e.g. first ). Such task ( skip first line ) is so simple, that it would be probably better to do it with command line one-liner than a dedicated script.
quick search - see e.g. this link for an example, there are numerous tutorials about perl input/output operations
http://learn.perl.org/examples/read_write_file.html
PS. Perl scripts ( programs ) usually are not "compiled" into binary file - they are of course "compiled", but, uhm, on the fly - that's why /usr/bin/perl is called rather "interpreter" than "compiler" like gcc or g++. I guess what you're looking for is some editor with syntax highlighting and other development goods - you probably could try Eclipse with perl plugin for that ( cross platform ).
http://www.eclipse.org/downloads/
http://www.epic-ide.org/download.php/
this
user#localhost:~$ cat blabla.csv | perl -ne 'print $_ if $x++; '
skips first line ( prints out only if variable incremented AFTER each use of it is more than zero )

You are missing your first (and only) argument due to Windows.
I think this question will help you: #ARGV is empty using ActivePerl in Windows 7

update a column in input file by taking value from Database in perl

input file:
1,a,USA,,
2,b,UK,,
3,c,USA,,
i want to update the 4th column in the input file from taking values from one of the table.
my code looks like this:
my $number_dbh = DBI->connect("DBI:Oracle:$INST", $USER, $PASS ) or die "Couldn't
connect to datbase $INST";
my $num_smh;
print "connected \n ";
open FILE , "+>>$input_file" or die "can't open the input file";
print "echo \n";
while(my $line=<FILE>)
{
my #line_a=split(/\,/,$line);
$num_smh = $number_dbh->prepare("SELECT phone_no from book where number = $line_a[0]");
$num_smh->execute() or die "Couldn't execute stmt, error : $DBI::errstr";
my $number = $num_smh->fetchrow_array();
$line_a[3]=$number;
}

Looks like your data is in CSV format. You may want to use Parse::CSV.

+>> doesn't do what you think it does. In fact, in testing it doesn't seem to do anything at all. Further, +< does something very strange:
% cat file.txt
1,a,USA,,
2,b,UK,,
3,c,USA,,
% cat update.pl
#!perl
use strict;
use warnings;
open my $fh, '+<', 'file.txt' or die "$!";
while ( my $line = <$fh> ) {
$line .= "hello\n";
print $fh $line;
}
% perl update.pl
% cat file.txt
1,a,USA,,
1,a,USA,,
hello
,,
,,
hello
%
+> appears to truncate the file.
Really, what you want to do is to write to a new file, then copy that file over the old one. Opening a file for simultaneous read/write looks like you'd be entering a world of hurt.
As an aside, you should use the three-argument form of open() (safer for "weird" filenames) and use lexical filehandles (they're not global, and when they go out of scope your file automatically closes for you).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl Get the web content then writing it as a text file - perl

Related

Scan a large .gz file and split it's strings from a known word(which is repeated in the file) and save the all split strings in a .txt file

Why is this Perl foreach loop only executing only once?

Reading and writing to the same file

How do I copy a CSV file, but skip the first line?

update a column in input file by taking value from Database in perl

Categories

Resources